ETL Challenges: Data Formats

by Katherine VasilegaNovember 30, 2010

ETL tools should be able to find a standard way of handling a large variety of source and target data formats. This feature is needed to implement comprehensive business rules in your data integration solution. This is how you can optimize an ETL tool to avoid issues with multiple data formats:

    • Make data normalization rules descriptive, don’t hide them in procedural code blocks. Enable business people to specify rules in terms that make sense to them.
    • Set clear identification and reporting rules to detect the impact of data format changes, as well as maintenance rules, when a change occurs.
    • Rules should be expressed in terms of context-independent concepts that can easily be referred to by business people.
    • Do not express business rules in physical field names, as it will require normalization functions to be created for each new source data format.

Some ETL tools force developers into a variety of steps and complex procedures for accommodating the variation in source and target data formats. They lack the features to clearly express business rules, and therefore will hardly ever be leveraged by business users.