Open-Source Data Warehousing: Pros and Cons

by Olga BelokurskayaJune 24, 2009
Getting the most out of an open-source data warehouse is harder than it seems. Learn about the hidden issues along with the benefits.

(Featured image credit: Oracle)


The reasons to build and maintain a DW

First, let’s remember what a data warehouse (DW) is, and why it may be useful for a business. In fact, it is a repository of an organization’s data which is electronically stored, and it is designed to facilitate reporting and analysis. The broader meaning of data warehouse focuses not only on data storage, but the means to retrieve and analyze data, to extract, transform, and load data, and to manage the data dictionary are also considered essential components of a data warehousing system.

Today, data warehousing is a popular approach and is frequently used as a business model. However, not every system is applicable to every business setting. So when thinking about implementing the strategy, one should consider pros and cons of data warehousing.

Among the major benefits of data warehousing is enhanced access to data and information and easy reporting and analysis. Besides:

  • Data retrieval is faster within data warehouses.
  • Prior to loading data into the data warehouse, inconsistencies are identified and resolved.
  • Data warehouses can work in conjunction with and, hence, enhance the value of operational business applications, such as, for example, CRM systems.

Data warehousing architecture (Image credit)

And here are some cons:

  • Preparation is very frequently time-consuming, since an effort is needed to create a cohesive, compatible system of data collection, storage, and retrieval. Moreover, because data must be extracted, transformed, and loaded into the warehouse, there is an element of latency in data warehouse information.
  • Compatibility with existing systems. The use of the data warehousing technology may require a company to modify the database system already in place. This could really be the foremost concern of businesses when adapting the model given the cost of the computer systems and software needed.
  • Security flaws that data warehousing technology may contain. If the database contains sensitive information, its use may be restricted to a limited group of people and precautions will be required to insure that access is not compromised. Limited data access situations can also effect the overall utilization of the data strategy.
  • Over their life, data warehouses can have high costs. The data warehouse is usually not static, it gets outdated and needs regular maintenance, which may be quite costly.

So, before any implementations, one should make sure that data warehousing will be a good fit for the business and be prepared to commit to the level of work required to get the system in place. However, once a data warehouse starts working, most companies are glad to have their “corporate memory.”

Some of the ways to improve the performance of a data warehouse (Image credit)


Why an open-source data warehouse?

Open-source data warehouses possess the same options as any other types of open-source software: the same model of licensing, community development processes, and same degree of openness. They may be offered as free downloads, or for a nominal flat fee, as fully supported systems. Or there may be no limit to the number of licenses and implementations a company may make with the software.

According to a recent BeyeNetwork article, the benefits of the open-source data warehouses are as follows.

  • Up front and maintenance costs are less than those of proprietary software. Besides, there is a possibility to customize the products companies use to improve their operations, for the original source code is open and may be downloaded.
  • Skill sets that are widely available in the market are employed. As a result, an organization with existing database or data warehouse expertise will not have to look further when a new open-source data warehouse project is put into place.
  • Improved standardization. Transparent and community supported open-source code considers important standards to be consistently supported across all versions and implementations. Something that proprietary formats cannot and will not offer.
  • Flexibility which enables enterprises to expand the solutions to an unlimited number of users, with no per-user or per-processor charges of proprietary software packages.
  • Community effect. Open-source solutions leverage communities of developers and innovators to advance development. New code and features are contributed back to the community, constantly increasing the range of new options available to end users. Moreover, companies may address the community in order to fix any bugs or security flaws, which takes, normally, only days, instead of waiting weeks and months for the next security patch or service pack from a vendor.
  • Incremental implementation. There is no need to a mega project at once. Projects can start small and build upon the success of implementations. This dumps the tendency to “overpromise,” which is often a necessary evil for acquiring optimal levels of funding for data warehouse projects.


Things to note

There’s been quite a period of time since open-source data warehouses evolved and gained popularity. However, an open-source data warehouse is still regarded as a solution for small or mid-sized companies lacking enough budgets for solid proprietary solutions. Bigger companies may also use open-source solutions as complimentary to their proprietary data warehouses.

Getting the most out of an open-source data warehouse implementation is possible. Still, there are some hidden issues, as mentioned by Claudia Imhoff:

Claudia Imhoff

  • Open-source data warehouses complementing already existing proprietary enterprise solutions may help quickly address the new company’s needs. Proprietary solutions being more strategic are not so fast to react to those changes.
  • Normally, it’s the analysts who work with data warehouses; they are familiar with building massive queries and other technical stuff. But in some cases, there are end users who don’t have special technical knowledge, and need as much ease of use as possible.
  • Open-source data warehouses should be compatible with related open-source environments.
  • While open-source data warehouses may seem cheaper than proprietary solutions at first, additional costs, such as transition and training costs, should be taken into account.

For more, check out her report called “Open Sesame: Why Open Source BI, Data Integration, and Data Warehousing Solutions are Gaining in Acceptance.”


Further reading

The post is written by Olga Belokurskaya; edited by Alex Khizhniak.