Data Quality

Data Quality functionalities are often ignored in DW system design, which is a potentially a serious mistake – the quality of BI system reports directly depends on the quality of the data it represent.

DQ is most commonly associated with Data Integration and Data Warehousing, mostly due to the fact that corporate reporting systems are largely based on data warehouses for which data quality represents one of key success factors. The DQ module allows data quality control in the DW system, using several types of business and information rules for data monitoring.

The DQ module includes the following components:

  • Metadata repository – basis and external tables for data quality evaluation, data rules and dependant data
  • Module for data quality control and logging of results
  • BI graphical interface for implementation monitoring and review of evaluation results

In comparison to the “manual” approach in which data quality is checked occasionally (mostly using SQL scripts), the specialized DQ module has the following comparative advantages:

  • The rules are documented in the repository and can be viewed through the graphical interface
  • Ready-made templates for all rule types make their entry quicker and easier
  • For each rule its scope and critical threshold are defined (number or percentage of permitted incorrect entries) which determine the output evaluation status
  • Rules are simply activated/deactivated
  • DQ evaluation results in output statistics, output table/rule statuses and detailed logging of incorrect entries
  • Rules are not evaluated and logged in sequences but in bulk, which results in extraordinary performance
  • Monitoring module contains a set of predefined operative reports and interactive monitoring panels with advanced analytical possibilities

Features and benefits overview

  • Business Entities & Rules
    • Defined by Business Users
    • Rules Based on Business Terms
    • Horizontal & Vertical Rules
  • Configs/Mappings
    • Mappings from Business Terms to Physical Data Dictionary
    • Support for Multiple Databases – Reusability
    • Errors/Actions Configurations
    • Checks Executed on Remote Databases
  • Logs & Stats
    • Summary Info on Central Repository
    • Detail Info on Local Database
    • Row Level Info on Errors/Warnings
  • Integration/Execution
    • Scheduled Data Validation
    • ETL Initiated Rules Check