How TestGen Complements Microsoft Purview for Enterprise Data Quality

DataKitchen's TestGen and Microsoft's Purview complement each other: Purview serves as the governance and catalog “source of truth,” while TestGen is the deep, automated data-quality and testing engine that writes thousands of data quality tests in seconds.

How does DataKitchen’s TestGen complement Microsoft’s Purview? 


Organizations that deploy Microsoft Purview gain a powerful foundation for data governance, cataloging, and lineage across their Microsoft ecosystem. But once teams begin governing data assets, they quickly encounter the challenge of validating data quality at scale and in an automated way. That is where DataKitchen’s TestGen and Purview complement each other: Purview serves as the governance and catalog “source of truth,” while TestGen is the deep, automated data-quality and testing engine that operationalizes trust at scale.

What is TestGen?


TestGen automatically profiles datasets and generates comprehensive data tests based on the structure, distributions, patterns, and irregularities it discovers. It provides detailed issue listings with context that allow data teams to decide whether bad records should be passed, patched, purged, or returned to an upstream system. TestGen’s entire approach is designed to minimize manual configuration, allowing organizations to stand up meaningful data quality assurance in days rather than months.

What Does TestGen Do That Purview Does Not?

TestGen delivers specific capabilities that Purview does not provide. Most importantly, TestGen automatically generates tests from data profiles using AI- and heuristic-driven techniques, rather than requiring stewards or engineers to manually author rules. This makes TestGen compelling because it is comprehensive and automatic — requiring no configuration to get started. It also supports exploratory data-quality discovery across large estates, scanning many tables to surface anomalies, distribution shifts, odd patterns, and other hard-to-anticipate defects with no setup.

TestGen further supports multi-table and relational data quality through “fill in the blank” test specifications, which help teams express domain logic without manually coding SQL. It offers an opinionated Pass / Patch / Purge / Push Back workflow that not only identifies issues but helps teams decide what to do with them by providing record-level detail. Operationally, TestGen is pipeline-adjacent: it can integrate with CI/CD and orchestration tools, run wherever the data lives (multi-cloud, on-prem, non-Microsoft), and is open source and tool-agnostic, allowing it to work across heterogeneous data estates.

What Does Purview Do That TestGen Does Not?

Microsoft Purview brings governance capabilities that TestGen purposely does not address. It provides an enterprise data catalog and search experience that acts as the “system of record” for schemas, datasets, classifications, reports, and data products. It offers end-to-end lineage visualization for data flowing across databases, data lakes, workflows, Fabric, Power BI, and other services — giving governance and BI teams insight into where data originates and how it changes.

Purview also handles broad governance features, including business glossaries, classifications, sensitivity labels, access control, and domain ownership. Because it is tightly integrated with the Microsoft ecosystem (e.g., Azure, Fabric, Power BI, M365, SQL).  Purview provides both a consistent user experience and an enforcement model in environments where Microsoft is already the dominant stack. These governance and discovery features make Purview invaluable to stewards, security teams, and leadership.

Where Do TestGen and Purview Overlap?

There are several functional areas where Purview and TestGen both provide capability, but with different approaches and trade-offs. Both can profile data to compute statistics at the dataset and column level, helping teams understand distributions, null patterns, distinct counts, and schema characteristics. Both can define and run data-quality checks such as null checks, value ranges, pattern validation, and referential consistency. The difference is that TestGen automatically generates thousands of checks based on data profiles, while Purview requires manual configuration of rules.

Both tools can produce quality outcomes, including passed and failed checks, and both can support governance conversations by making data quality visible to a wider audience. Each platform helps different stakeholders prioritize fixes: TestGen supports engineers and operators by showing exactly which records are bad, while Purview helps stewards and leaders understand which assets and domains present governance risk.

What is an ROI Example for TestGen?

To understand the operational impact of TestGen’s automatic test generation, consider an illustrative scenario from DataKitchen’s internal benchmarking: To cover 20 tables containing 1000 columns with an average of 2.5 tests per column, using TestGen, a junior operator can complete the tasks with two steps:  profile and generate tests.

For a Data Engineer to accomplish this task manually, assume it takes 30 minutes to write each test and that the engineer has no meetings, breaks, or vacation.  It would consume roughly 1,250 hours, or 156 working days, which translates to about 31 weeks or 7.2 months. 

This simple illustration shows that automation in test generation does not merely reduce labor — it enables scale that would otherwise be operationally impossible.

Conclusion

For organizations already using Microsoft Purview, TestGen is a natural complement that fills the operational testing and anomaly-discovery gaps that governance alone cannot address. Together, they share a common interest in making data trustworthy and offer a clear division of responsibility:  Purview for governance and stewardship, TestGen for automated quality. 

As data estates continue to grow in size and complexity, combining TestGen’s automated testing with Purview’s governance and lineage gives data quality, data governance, and data engineering teams a practical path to improving data reliability at enterprise scale.

Frequently Asked Questions

What is a short summary of how DataOps, Data Quality TestGen, Compliments Microsoft Purview?

Here is what TestGen does that Purview does not: it generates tests comprehensively and automatically – no configuration required.

  • Automatically generates 1000s tests from data profiles (AI/heuristic–driven checks rather than only hand-authored rules).
  • Exploratory data-quality discovery with no config: quickly scans lots of tables to surface “unknown unknowns” (weird distributions, anomalies, odd patterns)
  • Provides support for multi-table tests via “fill in the blank” test specifications.
  • Pass / Patch / Purge / Push Back workflow: opinionated framing around what to do with bad records, including record-level issue listings.
  • Pipeline-adjacent usage: designed to plug into diverse orchestration/CI/CD setups (not just Microsoft-native) and run wherever the data lives (multi-cloud / on-prem / non-MS).
  • Open-source, tool-agnostic engine: can be used across many platforms and tech stacks outside the Microsoft ecosystem.

What Purview does that TestGen does not:

  • Enterprise data catalog & search: central “system of record” for datasets, reports, schemas, business terms, and data products.
  • End-to-end lineage visualization: shows how data flows across sources, pipelines, Fabric/Lakehouse, Power BI, etc.
  • Broad governance features: classifications, sensitivity labels, access policies, domains, ownership, stewardship roles.
  • Integrated Microsoft ecosystem experience: deep hooks into Fabric, Azure, Power BI, SQL, M365, etc. for discovery and policy enforcement.

Where they overlap: You will need to see which tool you want to use for these features.

  • Data profiling: compute statistics on datasets/columns to understand shape, nulls, distinct counts, etc.
  • Define & run data-quality checks: e.g., null checks, ranges, pattern/format checks, referential consistency. TestGen creates these automatically while they need to be configured in Purview.
  • Produce quality scores / pass–fail outcomes for datasets and runs.
  • Support governance and stewardship conversations by giving visibility into data quality and enabling prioritization of fixes.

Here is an ROI calculation using TestGen

With TestGen, a junior operator can generate 2,526 tests with two steps (profile, generate tests)
It would take a trained Data Engineer 7.2 months to achieve the same results – with no time for meetings, breaks, or vacations

What is a quick summary of the blog?

This blog details how Microsoft Purview and DataKitchen’s TestGen function as a unified solution for managing enterprise data quality and governance. While Purview acts as the primary system of record for data cataloging, lineage, and policy enforcement, TestGen provides an automated engine for deep data profiling and test generation. The documentation highlights that TestGen significantly reduces manual labor by using AI-driven heuristics to create thousands of tests in minutes, a task that would otherwise take months for a human engineer to complete. By integrating these tools, organizations can combine broad stewardship with operational automation to ensure data reliability across diverse technical environments. This synergy allows teams to identify anomalies and unknown defects while maintaining a consistent governance framework within the Microsoft ecosystem.

author avatar
Gil Benghiat
Gil Benghiat is one of three founders of DataKitchen, a company on a mission to enable analytic teams to deliver value quickly and with high quality. Gil’s career has always been data oriented and has included positions collecting and displaying network data at AT&T Bell Laboratories (now Alcatel-Lucent), managing data at Sybase (purchased by SAP), collecting and cleaning clinical trial data at PhaseForward (IPO then purchased by Oracle), integrating pharmaceutical sales data at LeapFrogRx (purchased by Model N), and liberating data at Solid Oak Consulting. Gil holds an MS in computer science from Stanford University and a BS in applied mathematics and biology from Brown University.

Sign-Up for our Newsletter

Get the latest straight into your inbox

DataOps Data Quality TestGen:

Simple, Fast, Generative Data Quality Testing, Execution, and Scoring.

[Open Source, Enterprise]

DataOps Observability:

Monitor every data pipeline, from source to customer value, & find problems fast

[Open Source, Enterprise]

DataOps Automation:

Orchestrate and automate your data toolchain with few errors and a high rate of change.

[Enterprise]

recipes for dataops success

DataKitchen Consulting Services


DataOps Assessments

Identify obstacles to remove and opportunities to grow

DataOps Consulting, Coaching, and Transformation

Deliver faster and eliminate errors

DataOps Training

Educate, align, and mobilize

Commercial Data & Analytics Platform for Pharma

Get trusted data and fast changes to create a single source of truth

 

dataops-cookbook-download

DataOps Learning and Background Resources


DataOps Journey FAQ
DataOps Observability basics
Data Journey Manifesto
Why it matters!
DataOps FAQ
All the basics of DataOps
DataOps 101 Training
Get certified in DataOps
Maturity Model Assessment
Assess your DataOps Readiness
DataOps Manifesto
Thirty thousand signatures can't be wrong!

 

DataKitchen Basics


About DataKitchen

All the basics on DataKitchen

DataKitchen Team

Who we are; Why we are the DataOps experts

Careers

Come join us!

Contact

How to connect with DataKitchen

 

DataKitchen News


Newsroom

Hear the latest from DataKitchen

Events

See DataKitchen live!

Partners

See how partners are using our Products

 

Monitor every Data Journey in an enterprise, from source to customer value, in development and production.

Simple, Fast Data Quality Test Generation and Execution. Your Data Journey starts with verifying that you can trust your data.

Orchestrate and automate your data toolchain to deliver insight with few errors and a high rate of change.