5 Things to Watch Out for in Data Warehousing

by Alena SemeshkoAugust 26, 2008
Problems don’t come out of nowhere—usually, there are some reasons behind.

Featured image: Oracle Warehouse Builder (credit)


Why create a data warehouse?

In a recent article, we discussed how business intelligence coupled with data warehousing helps to improve decision-making and creates a unified view of an organization, in case it is done properly. Today, let’s focus deeper on the role of a data warehouse in this process.

The post by Rajan Gupta in BeyeBlogs gets to the core of data warehousing and explores what you need to enhance your ROI. According to Rajan, there are several reasons for using a data warehouse as a single reference information.

1. Maintain consistency.
2. If your production data needs an offline fix (like standardizing customer and product IDs), it’s better to do that data-fix in one place. If you have separate enterprise reporting and analysis platforms, you will need to do that data transformation at two places, instead of one.
3. Data auditability. A single information reference point having detailed data will provide a good audit-trail of your summary transactions/analysis.
4. ETL synergy. If you have diverse systems, and you want to have some level of information integration, its better to do it at one place. Doing ETL for summary data warehouse and a detailed reporting database, will almost double your efforts.
5. Overall platform ease. You maintain only one information infrastructure (administration, scheduling, publishing, performance tuning…).
6. Ease of change management. Any change in your information requirements, or changes in your source systems will be managed and done at one place.

So, then, with all these benefits, why is there so much fuss about granular data in a data warehouse? “Having most granular (or detailed) transaction-level data is core to broad-basing the data warehouse applications,” Rajan says. However, he enlists some of the releated challenges and demands:

1. Brings forth the real issues with transactional data. In summary data warehouses, you can ignore some of the transaction-level data issues and do some patch-work to ensure that aggregated data has a level of acceptable quality. Bringing in granular data, will need more incisive surgery on your data issues. This will extend the time of implementation.
2. ETL efforts go up. This is related to the first point. Your key plumbing task in DW will become larger and more complex.
3. Existing robust and stable reporting and querying platforms. Why fix what ain’t broken? etc.


Things to pay attention to

So, yes, there have been talks of the concept of data warehousing being misleading, failing to deliver efficient solutions at the enterprise level and frequently causing problems upon implementation. That’s why I’ll try to sum up a few things you should definitely try to watch out for when tackling your data warehouses.

1) First and foremost: data quality. When your data is dirty, outdated and/or inconsistent upon entering the warehouse, the results you are gonna get won’t be any better, really. Data warehousing is not supposed to deal with your erroneous data, it’s not supposed to perform data cleansing. These processes need to take place BEFORE your data gets even close to the warehouse, that is, your data integration strategy needs to address low-quality data problem.

A sample architecture for a complex data warehouse (Image credit: Oracle)

2) Come to think of it, data integration is the second thing to watch out for. Do your integration tools live up to your requirements? Can your software handle the data volumes you have? Will it comply with the newly added to your warehouse source systems and subject areas? How high is the level of automation of your integration system? Can you avoid manual intervention? You gotta ask yourself all of these questions before you complain that your warehouse isn’t providing you with the quality of information you expected.

3) Next, dreaming too big. When you build sand castles, you gotta realize they’ll disappear in a matter of days, or even hours. Your can’t have it all and, at the same time, you can’t have your pie and eat it, too. Breaking the project into small segments, giving them enough time to deliver and having patience is the key to having a pleasant experience with your data warehousing solution. What? Did you think you can fix all the mess in your data in a matter of days? =)

4) Then, don’t go rushing into solutions. Don’t panic. Yes, warehouse projects require time and effort on your part. Yes, it’s gonna be complicated at first. But that’s not the reason to stop with one project and rush into another. Stick with your first choice, fix it, work on it. Multiple projects will waste your resources and end up as another silo aimlessly taking up your corporate resources.

5) Finally, make sure you have a scalable architecture that you can redesign according to your increasing needs. Your business grows, sometimes grows quicker than you think (the number of customers increases, they have more information, more data to be processed) and you want your solution to continue to perform on the same level and live up to your expectations.

The list goes on actually, as there are more things to watch out for…but these are the first that come to mind.


Success metrics in data warehousing

Recently, a question was posed to a SearchDataManagement.com expert as to what metrics should be used for a data warehousing project.

William McKnight from Lucidity Consulting recommended the following three as most valuable:

William McKnight

  1. Business return on investment (ROI). Are you getting the bottom line success with your project?
  2. Data usage. Is your data used as intended by the users?
  3. Data gathering and availability. Is your data available to the extent it should be?

He also mentioned up time, cycle end times, successful loads and clean data levels as secondary technical metrics to pay attention to.

“In short, you want to eliminate intolerable defects—as defined by the data stewards. These defects come in 10 different categories: referential integrity, uniqueness/deduplication, cardinality, subtype/supertype constructs, value domains/bounds, formatting errors, contingency conditions, calculations, correctness and conformance to ‘clean’ set of values.” —William McKnight

With all these recomendations, your data warehouse has chances to serve your enterprise needs.


Further reading


The post is written by Alena Semeshko, with contributions from Alex Khizhniak.