Data Semantics in Data Integration

by Katherine VasilegaNovember 19, 2010

The term “data semantics” is mostly used in the context of application integration and database processing. It is important to realize that data semantics is also important for data integration. When you understand the semantics of the source and target systems, data integration can be much more effective.

Data semantics is critical when we want to extract information from one system and place it to another. Selecting the right data attributes is an important part of data integration process. In some systems, the semantics is simple and obvious (e.g. customer’s name, address, e-mail are stored in a single database and have the similar structure). In other systems, data may be redundant and confusing. For example, you can have many places where ”customer” is defined, and many attributes around ”customer” stored in different places and having a different structure.

Here are the steps to take for better data semantics leveraging in data integration:

    1. Create a data catalog of the source and target systems. Define the meaning of all of the data in one place and use this catalog during the data integration project.

    2. Align the data flows in accordance with the semantics of the source and target systems. Note which data attributes replicate, which attributes need to be split and/or combined, transformed, cleaned, etc.

    3. Incorporate these alignments as business rules into your data integration tool.

Better understanding of data prevents the risk of doing extra work after the implementation of the data integration tool. In addition, data semantics is helpful in understanding where data changes should be made in the future.