A DataOps Approach to Data Quality
The Growing Complexity of Data Quality
Data quality issues are widespread, affecting organizations across industries, from manufacturing to healthcare and financial services. According to DataKitchenโs 2024 market research, conducted with over three dozen data quality leaders, the complexity of data quality problems stems from the diverse nature of data sources, the increasing scale of data, and the fragmented nature of data systems.
Key statistics highlight the severity of the issue:
- 57% of respondents in a 2024 dbt Labs survey rated data quality as one of the three most challenging aspects of data preparation (up from 41% in 2023).
- 73% of data practitioners do not trust their data (IDC).
- Millions are lost annually due to poor data quality, and the potential for billions to be lost with the integration of AI without intervention is looming (Forrester).
The challenge is not simply a technical one. Data quality issues often arise because data that is โgood enoughโ for the immediate needs of source systems is insufficient for downstream analysis and decision-making. This disconnect leads to a scenario where data quality leaders are tasked with improving data that was deemed acceptable at its source.
ย
Data Quality Leadership: Influence Without Power
Data quality leaders often find themselves in a position where they can identify problems but lack the authority or resources to drive necessary changes. DataKitchenโs research revealed that many data quality leaders are frustrated by their limited ability to enforce changes. These leaders are expected to influence organizational behavior without direct authority, leading to what DataKitchen CEO Christopher Bergh described as โdata nagsโโindividuals who know whatโs wrong but struggle to get others to act.
Data quality leaders need to determine:
- Where the change should occur (source systems, data lakes, or at the point of analysis).
- Who should make the change (data engineers, system owners, or data quality professionals).
- Why is the change necessary (alignment with business objectives or regulatory compliance)?
- How the change should be communicated and implemented.
The core issue is that data quality leaders often have influence but little power. Their role is to highlight problems and propose solutions, but the responsibility for actual changes often lies with data engineers or business units.
ย
Methods to Drive Change for Data Quality Leaders
ย
Empowering Through DataOps
The fundamental challenge for data quality leaders is leveraging their influence to drive meaningful change. The DataOps methodology offers a solution by providing a structured, iterative approach to managing data quality at scale. DataOps emphasizes rapid iteration, continuous improvement, and team collaboration, enabling data quality leaders to address issues proactively and systematically.
ย
Agile and Iterative Approach to Data Quality
Traditional approaches to data quality often resemble waterfall project management: detailed plans, lengthy analysis phases, and slow execution. However, this approach struggles to keep up with the pace of modern data environments. DataOps introduces agility by advocating for:
- Measuring data quality early: Data quality leaders should begin measuring and assessing data quality even before perfect standards are in place. Early measurements provide valuable insights that can guide future improvements.
- Iterating quickly: Data quality initiatives should focus on continuous iteration instead of waiting for perfect solutions. DataOps promotes learning by doingโeach iteration provides insights that drive the next round of improvements.
- Continuous feedback loops: DataOps emphasizes short feedback cycles, allowing teams to test data quality improvements and quickly refine them based on real-world outcomes. This iterative approach is precious in dynamic environments where data constantly changes.
ย
The DataOps Data Quality Cycle
One of the key takeaways from DataKitchenโs research is the need for a structured cycle that empowers data quality leaders to drive improvements even without direct authority. This cycle includes:
- Understanding Data Quality Issues: Data profiling is a crucial first step, allowing data quality leaders to gain a fact-based understanding of their data’s current state. Profiling tools help identify key issues such as missing values, inconsistency, and schema drift.
- Generating Data Quality Scores: DataKitchenโs market research highlighted the importance of having a scoring system to measure data quality. Scoring allows data quality leaders to quantify the state of their data and communicate it effectively to stakeholders. Scoring should be multidimensional and configurable to suit different use cases, whether assessing critical data elements (CDEs), DAMA quality dimensions, or machine learning model data.
- Automating Data Quality Tests: Automation is essential for scaling data quality efforts. DataOps encourages the use of automated tests to continuously monitor data quality across multiple dimensions. Automated tests can validate business rules, check for anomalies, and ensure data conforms to expected standards.
- Enabling Action: Once data quality issues are identified, itโs crucial to enable othersโsuch as data engineers or system ownersโto take action. This can be achieved by creating actionable packages of test results, data quality scores, and recommendations. Integrating these packages into existing workflow tools ensures that issues are addressed in a timely and organized manner.
- Measuring and Refining: DataOps is an iterative process. As improvements are made, data quality scores should be tracked over time. This longitudinal tracking allows data quality leaders to demonstrate progress and refine their approaches based on evolving data and business needs.
ย
Leveraging Data Quality Scoring for Organizational Change
- Data quality scores play a pivotal role in influencing organizational change. DataKitchenโs research revealed that scoring systems must be:
- Granular: Scores should be able to drill down to individual data elements or data sets.
- Multi-dimensional: Different dimensions of data quality (e.g., accuracy, completeness, consistency) should be captured in the scoring model.
- Configurable: Organizations should have the flexibility to define different scoring models based on specific business needs, such as regulatory compliance or machine learning.
By leveraging scoring, data quality leaders can build a compelling case for change. For example, an organization might set a goal to improve data quality scores from 80% to 90%, and data quality leaders can track progress against this goal. Scoring provides a common language that aligns data quality initiatives with broader organizational objectives.
ย
Conclusion
Improving data quality is a critical challenge for modern organizations, and data quality leaders often find themselves navigating complex environments with limited power. However, by adopting a DataOps approach, these leaders can drive meaningful improvements by leveraging influence, automation, and data-driven insights.
The key to success lies in adopting an agile, iterative process that emphasizes continuous improvement. DataOps empowers data quality leaders to begin improving data quality immediately, even without perfect standards, and to iterate and refine their approaches over time. By incorporating data quality scoring, automated testing, and collaborative workflows, DataOps provides the tools necessary to manage data quality at scale and effect real change within organizations.
DataKitchenโs market research and webinar on โData Quality Power Movesโ offer valuable insights into how data quality leaders can navigate their challenges, leverage DataOps principles, and align their efforts with broader organizational goals. With the right tools and processes, data quality leaders can transform their influence into measurable improvements, ensuring that their organizations make better decisions based on high-quality, trusted data.