Does your company have high quality data?
- 48% do not have a data quality plan
- 78% need education to maintain data quality
- 44% believe data quality is worse than everyone thinks
Source: Data Warehousing Institute 2001 Survey of 647 companies
Today, companies compete on their ability to absorb and respond to information, and not just on their ability to manufacture and distribute products. The world has moved from an industrial to an information economy where the new currency is information. Given this scenario data is a critical raw material for success. Poor data quality in one area can impact the entire business significantly.
Data is used to generate multiple information assets and reports that form the basis for strategic plans and actions. Poor data quality, when not identified and corrected, can affect all downstream reports, increasing costs, causing imprecise forecasts and poor decisions. The Data Warehousing Institute (TDWI) estimates that poor quality customer data costs US businesses $611 billion a year. Their report cites a real life example of an insurance company, which processes 2 million claims per month. Each claim has 377 data elements per claim. Even with an error rate of .001, the claims data contain more than 754,000 errors. The company risk exposure can be estimated at a cost of $10 per error. This includes staff time to fix the error downstream, the loss of customer trust and the cost of payoffs (both high and low). With this conservative cost estimate, the company’s risk exposure is $10 million a year, from claim data alone.Larry English, a leading Data Quality expert writes that the business costs of low-quality data, including irrecoverable costs, rework of products and services, workarounds, and lost and missed revenue may be as high as 10-20% of revenue or total budget of the organization.
Data cleansing is one of the first actions needed in creating a high quality and reliable data warehouse. Each source system may have individual definitions for specific items, such as revenue, which cause inconsistencies when systems are linked. Data cleansing identifies these differences and creates consistency in order to better align data output.
Data cleansing is also key to recognizing simple inconsistencies such as naming conventions. Customers, parts, or other data types, may be incompatible in different source systems. For example, XYZ Energy Company may be listed as such in one system, but as XYZ Co. in another. Data warehouse confusion caused by seemingly simple disparities can cause quality of data to suffer.
How is the data quality in your company? Progress Energy asked the same question earlier this year. Student teams in Dr. Payton’s Database Management course spent their spring semester identifying improvement areas in PE’s data warehouse. You can find the results of their research, as well as many other interesting projects, on our student projects page. The Data Cleansing Prototype for Progress Energy project tackles many of the issues discussed above, and offers a method and recommendations for data cleansing.