Data Federation vs. Data Integration?

by Alena SemeshkoSeptember 29, 2008

Data federation and data integration. What’s the difference between the two?

The algorithm of retrieving data from a federated database (Image credit)

James Kobielus

I understand data federation as something that joins data from different sources distributed around the company without actually moving it from the original source. That is to say, data federation software creates a single repository that doesn’t contain the data itself, rather its metadata (information about the actual data/its location). This technology allows users to have a single standardized view of data displayed in a single data layer without having to deal with a variety of original data sources.

James Kobielus in his ZDNet blog explores the core difference between the enterprise data warehousing (EDW) and data federation.

Data federation generally seems outdated, compared to data warehousing, which at first looks like a more reasonable approach:

Federated environments are not optimized for heavy-hitting data matching, merging, transformation and cleansing, all of which are essential functions to deliver a “single version of the truth” for business intelligence (BI).

However, James also lists the benefits data federation may deliver in the company’s overall Business Intelligence strategy:

Data federation is an umbrella term for a wide range of operational BI topologies that provide decentralized, on-demand alternatives to the centralized, batch-oriented architectures characteristic of traditional EDW environments.

In the real world, data federation and EDW are not that mutually exclusive, and may very well target different markets, as data federation is better suited to near-real-time BI requirements than the batch-oriented EDWs deployed in many organizations.

An example of a data warehouse (Image credit)

So, where does data integration fit in the picture? Certain aspects of data integration intersect with both of the technologies discussed above. On the one hand, data integration may very well involve copying and moving data around, which is contrary to the definition of data federation, yet fits very well into the concept of data warehousing. On the other hand, data federation is in many aspects only a single instance of data integration in that the metadata it uses can be employed in the integration processes.

Last week Gartner published its Magic Quadrant for Data Integration Tools that you can access for free over here. I’m glad to see Apatar mentioned, although I wouldn’t quite call it a data federation tool.