Quality of Transformed Data in Data Integration
Ensuring data quality after transformation is the most difficult part of data integration procedures. Data transformation algorithms often rely on the theoretical data definitions and data models, rather than on actual information about data content. Since this information is usually incomplete, outdated, and incorrect, the converted data looks nothing like what was expected before the data integration project started.
Every system consists of three layers: database, business rules, and user interface. As a result, what users see is not what is actually stored in the database. This is especially true for legacy systems, which are notorious for elaborate hidden business rules. Even if the data is transformed with accuracy, the information that comes out of the new system will be totally incorrect, if you are not aware of those rules.
Moreover, the source data itself can be in issue in data integration. Inaccurate data tends to spread like a virus during the transformation process. A data cleansing initiative is typically necessary and must be performed before, rather than after, transformation.
To gain data quality, you have to precede the transformation stage with extensive data profiling and analysis. In fact, data quality after the transformation is directly related to the amount of knowledge about the actual data you possess. Lack of an in-depth analysis will guarantee a significant loss of data quality in data integration. In an ideal data integration project, 80 percent of the time should be spent on data analysis, and 20 percent on designing transformation rules. In practice, however, this rarely occurs. Therefore, the initial stage of data integration process needs full attention of your team.