Building an ETL Process That Fits Your Business Requirements

by Olga BelokurskayaDecember 3, 2009
From a high-level perspective, what is important for designing a successful business-critical ETL strategy? Read this opinion to learn and discuss.

(Featured image credit)


Business needs above all

Do you know that one of the main issues making data integration complex is the lack of well-defined user needs? No, let’s start differently. Why do companies need data integration and spent so much effort and resources on this initiative? The answer is, to get the full view and a better understanding of corporate data. This information, in its turn, is needed for business decision-makers to make the right decisions.

This may come as a surprise, but the “lack of well-defined user needs” has been named on the third place among the reasons for the failed data integration initiative to deliver business-critical data to decision-makers, according to a survey by Aberdeen Group.

Why are user requirements so important for successful data integration and ETL (extract, transform, and load)? In fact, the goal of data integration is not to simply gather all the data from systems and applications used within a company in a single place, but to get the data that is important for business. It is business representatives who are the end users of data integration, because they make decisions based on the data received. Therefore, to ensure the process of data integration is correct, specific business focus should be placed on data standards and requirements.

Some of the data integration characteristics critical to businesses (source)

These business requirements should be taken into consideration and thoroughly defined before data integration is started. In other words, there should be clear definition of what data is critical for business.


Choosing the right ETL tool

There is a wide variety of ETL tools available on the market. The options vary from minimum functionality to the systems that help to solve complex tasks. There is a choice between proprietary offerings and open-source ETL tools, web-based and desktop products. When choosing an ETL tool, a lot of things should be taken into consideration—including existing data management processes, technologies utilized, IT staff available, etc.

Thus, to evaluate ETL tools, a set of criteria should be reviewed, such as:

  • the operating systems supported
  • the volume of data the tool is able to handle within a given period of time
  • data sources and formats the ETL tool supports
  • the conditions for maintenance and support (paid or free)

Besides, before making a decision and purchasing a tool, your company’s requirements to data integration should be analyzed and compared against the functionality that different ETL tools provide. Thus, a company may avoid paying for the functionality they are not going to use.

The functionality important when choosing a data integration tool (source)

Selecting an ETL tool requires some effort, but if done right, it’s worth it. What do I mean by saying “done right”? The message is simple. When choosing an ETL tool, a company should bear in mind its business requirements for data, and make the choice based on whether the product helps to meet them.

(For more about the technical criteria for selecting and evaluating an ETL tool, check out our detailed post on the topic.)


Ensuring the right expectations

At the same time, when deciding to start a data integration project, many companies consider hand-coding instead of buying prebuilt ETL tools. While many organizations think custom software requires less investment, it depends on the enterprise’s infrastructure, deadlines, and developer availability. On average, a data warehouse gathers information from 12 distinct sources, according to a TDWI survey. So, the need to create a custom connector for each data location may be resource-intensive.

However, the opposite may also be true. Many experts assume that an ETL tool will help to deliver their data integration or BI project faster. Though ETL tools, indeed, accelerate the process of data integration, one should not leave aside the actual time that is to be spent on the product’s evaluation and, of course, implementation. Whether it is an out-of-the-box software or custom code, the time is needed for adoption, anyway—to train personnel to use the tool and be productive with it. According to TDWI, it takes developers about three months to learn a new tool and 12 months (or more) to “become proficient with it.”

The issues to expect when adopting an ETL tool (source)

These hidden issues can eventually evolve into misconceptions about cost savings. While the acquisition cost of an ETL tool can be quite clear from the very beginning, the annual spendings on support and maintenane are often overlooked.


Consider open source

I won’t make any discovery if, again, repeat analyst’s words that today’s open-source solutions are “good enough” for ETL operations, and data integration.

Today, open-source ETL provides an alternative to proprietary solutions, which are usually costly and supposed to be used for more complex data integration processes. However, for mid-sized and small businesses, which, as a rule, have modest budgets, smaller open-source ETL solutions are a means to address their data integration needs.

Things to consider when selecting a data integration vendor (source)

Still, what about business requirements, or how can open-source ETL tools address the company’s business requirements for data? Here, I see several options.

  • First, if a company by chance has a couple of their own developers, they could make necessary customization to the company’s ETL solution, thanks to the availability of the code.
  • Then, as a rule, open-source projects are supported by developer communities, some of which are really powerful. So, the community behind the open-source ETL that a company uses may help with needed functionality or customization of existing ones to meet company’s business requirements.
  • Don’t forget about the vendor itself. A company may contact the vendor of their open-source ETL and ask for additional functionality to meet particular needs.

Any of the actions described above will cost less and the result will take less time to deliver than in case with proprietary data integration tools, which may require $500,000 annually, according to Gartner.

Well, though the recommendations sound so bright, there may still be issues with open source—e.g., vendors can stop supporting their solutions. However, with the communities behind, and thanks to the open code, the chances to overcome these issues seem to me higher than in the case of a deprecated proprietary solution.


Further reading

The post is written by Olga Belokurskaya; edited by Alex Khizhniak.