ETL – what’s in abbreviation?

by Alena SemeshkoMarch 6, 2008

ETL (data Extraction, Transformation and Loading) is an integral part of data warehousing.

In brief, ETL processes are used to extract data from various sources, transform it by cleaning and integrating, and finally load it into data warehouses. The data you get in the end is clean, well-structured, systematic, and ready to use. It sounds quite easy, but in reality there’s much more to it. There’s a catch to each step.

During the first step, data extraction, the challenge you are dealing with is the bulk of data from different sources. It might be different departments, databases, formats, reporting systems, etc. This scattered data needs to somehow be captured and moved into the staging database.

Next — data transformation. This step is the most complicated. It can be broken into four separate steps under the umbrella of transformation:
data verification – comparing the extracted data with the DW quality standards. In case data doesn’t meet the outlined standards, it either gets rejected, or held to be reviewed by the administrator.
data cleaning – the data left from the previous step is made more precise. (The techniques this stage includes are so many that they deserve a different post.)
data mapping – merging data from different sources into a single interface, structuring it into columns and tying it together logically.
data consolidation (or aggregation) – summarizing data from the previous step and performing overall calculations to provide the user with a more complete picture.

Finally, loading. This step simply uploads the data organized during transformation into a warehouse.

During this whole process, the one thing to be careful of is losing data. ETL process does not presuppose changing the initial data, it should only make it better, cleaner, more correct and organized.