Open-Source Data Integration and ETL Keep Maturing

by Olga BelokurskayaDecember 9, 2009
While the market for open-source data integration continues to grow, learn about the 10 best practices for a successful OSS implementation.

(Featured image credit)


Why open-source ETL?

According to a recent survey by Third Nature, open-source BI tools—as well as extract, transform, and load (ETL) solutions—keep maturing. Moreover, it becomes more accessible for end users who have almost no technical background, thanks to enhanced user interfaces, allowing ETL operations performed without hand coding.

There are several more reasons of such an interest to open-source data integration tools. First, open source is viewed as a cost-cutting model. The interest in open-source tools adoption from this point of view is obvious.

Second, what came as a surprise to me is that open-source solutions are preferred to proprietary tools in terms of simplicity. Not having as many functions as proprietary software, open-source tools seem to provide “just enough” functionality for data integration initiatives. While there is a tendency for proprietary software vendors to overload their tools with lots of functions which are never used, users seem to need “basic software that works.”

Popular tools and scenarios for data integration (Image credit)

While the open-source ecosystem may offer mature solutions, not all open-source tools are equal, notes.


10 steps to smooth open-source implementations

Today, open source has become standard fare for enterprises, now it’s time for them to get smart about open-source implementations. Though we take into account the ability of open source to reduce costs and boost innovation, it doesn’t mean that deploying open-source software within an enterprise should go without proper planning.

According to Baseline, there are 10 strategies facilitating the success of an open-source implementation (by Ray Wang and Bernard Golden). Though they were written without specifying the type of software, the best practices seem to be 100% true for open-source data integration, as well.

Ray Wang

  1. Create governance program to know who and what for uses open source and how the software performs.
  2. Create open-source review board so it could evaluate in-company requests to use open source products.
  3. Thoroughly test the applications.
  4. Maintain separate environment for testing and production.
  5. Select widely supported platforms for open-source platforms with the greatest support are, normally, the most reliable and mature ones.
  6. Bernard Golden

  7. Keep abreast of release changes for open-source applications are often updated, and you need to know about new features and capabilities as soon as they are released.
  8. Upgrade only when needed, it’s not necessary to upgrade with every release. Focus on key requirements and update only when key requirements like security updates appear.
  9. Be active in communities. Open source succeeds because people are improving software all the time. Users’ active approach is a key to success.
  10. Any revisions in open-source code should be submitted to community for review so it could be included in the mainline code base later.
  11. Share successful strategies. Successful adoption of open source is based on best practices and experiences from others.

In fact, community involvement is a very important point. Enterprises can get a lot more out of open source, if they put more into it. Instead of thousands of enterprises modifying open-source projects in isolation, contributing back code and getting involved in the relevant communities would help enterprises to coordinate and pool resources across industries.


Open source to become mainstream?

According to multiple predictions and publications, 2010 is going to become quite an interesting year for open-source data integration. A recent survey by Gartner revealed that about 11% of organizations dealing with data integration have evaluated open-source tools along with commercial products. Furthermore, the analyst firm predicts that production deployments based on open-source business intelligence tools “will grow five-fold through 2012.”

As you may remember, in October, Gartner proclaimed open-source solutions “good enough” for BI, and a bit later has mentioned (at last!) open-source data integration vendors in its Magic Quadrant, thus admitting that OSS can be mature enough to meet enterprise requirements.

The two studies—by Third Nature and Gartner—brilliantly compliment each other.

The reasons for software evaluation failures (Image credit)

Though sometimes there are still talks about the need to have skilled developers at hand—for the sake of support and maintenance—it seems that open-source data integration tools move closer to becoming mainstream, and not just a cheap alternative (with limited possibilities) to proprietary data integration solutions.

Proprietary BI and data integration vendors seem to admit this fact, as, according to Gartner, some of them have introduced free “starter editions” of their solutions. All this brings us hopes that times when open-source data integration tools were regarded as offerings for only SMBs are passing, gaining the right to be deployed in large enterprises alongside commercial BI solutions.


Want more? See the slides!

At LinuxWorld 2008 in San Francisco, Jason Pratt of Autodesk and Renat Khasanshyn of Apatar presented a real-life customer story about adopting open source at a large enterprise. The session called “Case Study: Professional Open Source at Autodesk” focused on how Transocean utilized Buzzsaw and Apatar to solve its integration needs.

This is Mark Madsen’s presentation of the study by Third Nature.


Further reading

The post is written by Olga Belokurskaya; edited by Alex Khizhniak.