ETL Support for Heterogeneous Technologies
In my previous posts I often mentioned the ‘transform’ function of ETL tools. Today, I’m going to concentrate on ETL’s extract and load processes. These operations can be homogeneous or heterogeneous. Homogeneous operations involve extracting data from a repository and loading into a repository of the same technology. If two or more technologies are involved in the ETL process, it is called a heterogeneous operation.
The challenge with data integration is that there are always numerous repository technologies to be addressed. For example, if data is stored in SharePoint, Excel, and Web logs sources and has to be consolidated in the Salesforce.com CRM, then extract must be performed on three different technologies and load on one technology.
In order to work with heterogeneous data structures, the ETL tool must support the following features:
- • Work with a variety of formats and databases
• Convert flat files, unstructured data
• Define mappings and transformation rules through a drag-and-drop interface and store these rules independently from the actual implementations
• Have reusable components
Constructing extract connections to each source repository technology is the most technically challenging part of data integration. Using pre-existing extract connections that some ETL tools provide reduces this risk greatly.