One common approach to improving data quality begins with system-based logic. This logic is typically designed to find and eliminate two types of data quality issues – duplicates and errors. This approach allows rapid elimination of these two types of data quality issues, but whereas defining duplicates can be a straight-forward compare process, error identification is much more intricate. Paraphrasing, one group’s trash is another group’s treasure.
Improve the chances of success for the data quality improvement initiative by following the three R’s:
Define the characteristics of good quality data to discover poor quality data.
Organizations often begin their quest to improve data quality by focusing on those areas that can be addressed through information technology. Writing code to find the occurrences of uncommon letter combinations, like “uu”, is quick and effective, and implementing logic to exclude “vacuum” from the results can yield very specific results that can be resolved en masse.
Issues arise when partner organizations disagree with the definition of poor data quality.
Focusing consensus-building efforts on the characteristics of high-quality data improves the chances of successfully reaching data quality goals. As good data quality is defined and measured across the enterprise, any records that fall outside those parameters become part of the ongoing data quality conversation.
The review process includes discussions on each set of data that falls outside the parameters of good quality data. Categorize the data quality outliers, then prioritize the cross-organizational conversations to focus on those portions that will have the greatest impact to your company, whether that impact is measured in achievement of organizational objectives, impact to the bottom line, or volume.
As the review process progresses, it is possible that portions of each data quality category will not be resolved definitively. This remainder dataset represents topics for future conversations, following the prioritization guidelines.
The objective for the review sessions is to resolve the discrepancies represented within the identified categories. Data quality issues are resolved in a variety of ways, but they can be summarized into two categories:
- Business process changes
- Technology system changes
Chances are, the resolutions from your review sessions will include both of these types of changes. It is important that you understand not only the difference between the two types of activities, but also their interrelationship to supporting organizational goals and achieving data quality objectives.
Understanding the dependencies between reaching data quality goals and both the business process and technology system changes will help to limit the risk of initiative failure due to incomplete implementation of the resolution activities. Data quality objectives are unlikely to be reached if the technical portion of the resolution is implemented without the accompanying deployment of business process changes. Likewise, rolling out the business process changes without the supporting technology is unlikely to result in successful completion of the data quality initiative.