Data Transformation with ETL

by Katherine VasilegaSeptember 13, 2010

Finding the necessary information can turn into a tremendously time-consuming process, when stored data is assigned to different platforms and systems. ETL tools help solve this problem efficiently. Extract-transform-load software enables to keep data of different types and from various sources at one place so that it could be later on accessed with a single click.

Today, I’d like to concentrate on data transformation in the ETL process. Data transformation is a set of rules that provides functionality to remove useless data and produce relevant, complete and standardized data. The data transformation process includes several stages:

  • Sort – the data is sorted by type to be structured.
  • Clean – the data that violates the given rules is changed to fit these rules in the ETL process.
  • Summarize – values are summarized and stored at the multiple levels in fact tables.
  • Derive – new data is created from the source data. For example, customers’ age is calculated based on their date of birth and the current year.
  • Aggregate – data elements are collected from multiple source files and databases (.doc, .exe).
  • Integrate – as a result of this ETL process, each data element is given one standard name with one standard definition. Data integration brings together different names and values for the same data element.

One more thing to mention, data transformation is only a part of the whole data integration process that is first of all aimed at satisfying your business needs. That’s why data integration rules should imply 80% business rules + 20% technical process.

So, it’s a good idea to outline the business rules first, then consider the technical sides, and only then choose the suitable ETL tool.