Data Federation vs. Data Warehousing vs. Data Integration

by Alena SemeshkoSeptember 29, 2008
The concepts are very similar, but what’s the high-level difference between the three?

(Featured image credit)


A single view of data

I understand data federation as something that joins data from different sources distributed around the company without actually moving it from the original source. That is to say, data federation software creates a single repository that doesn’t contain the data itself, rather its metadata (information about the actual data/its location). This technology allows users to have a single standardized view of data displayed in a single data layer without having to deal with the variety of original data sources.

James Kobielus

James Kobielus in his ZDNet blog explores the core difference between the enterprise data warehousing (EDW) and data federation. Data federation generally seems outdated, compared to data warehousing, which at first looks like a more reasonable approach:

Federated environments are not optimized for heavy-hitting data matching, merging, transformation, and cleansing, all of which are essential functions to deliver a “single version of the truth” for business intelligence (BI).

However, James also lists the benefits data federation may deliver in the company’s overall business intelligence strategy:

Data federation is an umbrella term for a wide range of operational BI topologies that provide decentralized, on-demand alternatives to the centralized, batch-oriented architectures characteristic of traditional EDW environments.

In the real world, data federation and EDW are not that mutually exclusive, and may very well target different markets, as data federation is better suited to near-real-time BI requirements than the batch-oriented EDWs deployed in many organizations.

An example of a data warehouse (Image credit)


Intersection with data integration

So, where does data integration fit in the picture? Certain aspects of data integration intersect with both of the technologies discussed above. On the one hand, data integration may very well involve copying and moving data around, which is contrary to the definition of data federation, yet fits very well into the concept of data warehousing. On the other hand, data federation is in many aspects only a single instance of data integration in that the metadata it uses can be employed in the integration processes.

Last week, Gartner published its Magic Quadrant for Data Integration Tools that you can access for free over here. I’m glad to see Apatar mentioned, although I wouldn’t quite call it a data federation tool.


Further reading

The post is written by Alena Semeshko; edited by Alex Khizhniak.