Data Quality Metrics in Data Warehousing

by Alena SemeshkoSeptember 2, 2008

A question was posed to a expert as to what metrics should be used for a data warehousing project.

The expert (William McKnight from Lucidity Consulting) recommended the following three as most valuable:

# Business return on investment (ROI) – Are you getting the bottom line success with your project?
# Data usage – Is your data used as intended by the users?
# Data gathering and availability – Is your data available to the extent it should be?

He also mentioned up time, cycle end times, successful loads and clean data levels as secondary technical metrics to pay attention to.

In short, you want to eliminate intolerable defects – as defined by the data stewards. These defects come in 10 different categories: referential integrity, uniqueness/deduplication, cardinality, subtype/supertype constructs, value domains/bounds, formatting errors, contingency conditions, calculations, correctness and conformance to “clean” set of values.