Data Map Modeling as an Important Part of ETL Process

by Katherine VasilegaOctober 21, 2010

One of the interesting phases in ETL is a data mapping stage. By the time the data-mapping phase starts, the target model and the source model will have been defined. But sometimes users pull out a report and require it to be a target. Is this the right approach? Absolutely not! Will it work? Probably, yes. Why?

Before building an actual data map for ETL, you have to define a data model for target even if it is not an actual database. It will help you understand the entities which it belongs to. Then it will be easier to find the similar entity in the source and map it to the target element. This is called building a logical data map model and you have to do it before you actually start mapping the data. Data map modeling includes the following stages:

    1. Identifying the data sources. The data modeling session indicates only the major source systems. It depends on the ETL team if it dives deeper and finds every source that can be used.
    2. Collecting and documenting source systems. You have to create the source systems tracking report that shows information about who is responsible for each source.
    3. Identifying the system-of-record (SOR). SOR is an information storage system that is required because in most enterprises data is re-processed in order to fit a new business use. That is why most of legacy systems have redundant and duplicated data.
    4. Creating and analyzing the entity relations of SOR. You have to build diagrams to analyze how two or more entities are related to one another.

After you have built a data map model for ETL, you have to analyze the data content, complete business rules for ETL process, integrate data sources to a single source and only then build an actual data map. This way, it will be easier for you to re-use data map components when you build new data maps for similar purposes.