Building ETL That Fits Your Business Requirements

by Olga BelokurskayaDecember 3, 2009
From a high-level perspective, what is important for building a successful ETL strategy? Read this opinion to learn and discuss.

Business needs above all

Have you ever thought that one of the serious issues that makes data integration initiative complex is the lack of well-defined user needs for data? No, let’s start differently. Why do companies need data integration and spent so much efforts and resources on this initiative? To get the full view and better understanding of company’s data. And this information, in its turn, is needed for business decision-makers to make the right decisions.

Back again to user needs, or better say, requirements for data. This may come as a surprise, but the “lack of well defined user needs” has been named on the third place among the reasons for the failed data integration initiative to deliver business important data to decision makers, according to a survey by Aberdeen Group.

So, why user requirements are so important for successful data integration? In fact, the goal of data integration is not simply to gather all the data from systems and applications used within a company in a single place, but to get the data that is important for business. They’re business representatives who are the end users of data integration, because they make decisions based on the data received. So, to ensure the process of data integration is correct, specific business focus should be placed on data standards and requirements.

These business requirements should be taken into consideration and thoroughly defined before data integration is started. In other words, there should be clear definition of what data is critical for business.

Things to consider when selecting a data integration vendor (image credit)


Choosing the right ETL tool

Without clear business requirements, the process will comprise just blind gathering all kinds of data available at the enterprise with no clear purpose. In other words, data integration initiative may turn into some kind of monkey business. Okay, that’s clear.

Well, data integration tools selection, including ETL (extract, transform, and load) solutions, is a job that requires efforts but if done right, it’s worth it. What do I mean by this “done right”? The message is simple. When choosing an ETL tool, a company should bear in mind business requirements for data, and make their choice based on whether an ETL solution possesses functionalities that meet those requirements, or whether a vendor may add the needed functionality to their solution.

Look. You’ve defined your data integration strategy, business users have created the list of requirements for the data they would need to work with. So, now it is clear what data should gather the future ETL tool, and what operations it should perform over that data. Now, having all the necessary criteria, you won’t be wandering blindly among multiple vendors, but will concentrate on those whose ETL solutions meet your criteria.


Consider open source

I won’t make any discovery if, again, repeat that today’s open source solutions are good enough for ETL operations, and data integration and BI experts are expecting them to develop into solutions for master data management.

But today, open-source ETL provides alternative to proprietary solutions which are usually costly and supposed to be used for more complex data integration processes, apart from mere ETL. However, for mid-sized and small businesses that, as a rule, have smaller budgets and smaller open source ETL solutions are a means to address their data integration needs.

But I was about business requirements, or rather how open source ETL tools may address company’s business requirements for data. Here I see several ways:

First, if a company by chance has a couple of their own developers, they could make necessary customization to company’s ETL, thanks to the availability of the code.

Then, as a rule, open source solutions are supported by developers’ communities, some of which are really powerful. So, the community behind the open source ETL that a company uses may help with needed functionality or customization of existing ones to meet company’s business requirements.

And don’t forget about the vendor itself. A company may address directly to the vendor of their open source ETL and require additional functionalities that meet company’s peculiar needs.

And, as a rule, any of the actions described above will cost less and the result will take less time to deliver than in case with proprietary data integration tools.

Well, though the posting sounds so bright, there still may be issues with open source, such as vendors that stop supporting their solutions, etc. However, with the communities behind, and thanks to the open code, the chances to overcome those issues seem to me higher than in case of proprietary solutions.


The post is written by Olga Belokurskaya; edited by Alex Khizhniak.