DataOps Data Quality TestGen
Open Source
Say goodbye to the complexity of writing data quality validation tests yourself. DataOps TestGen takes care of that for you, using AI to automate the terms and conditions of your data contract through simple, automatically generated data test creation, execution, and profiling.
DataOps Data Quality TestGen delivers simple, fast data quality test generation and execution by:
Data Profiling.
New Dataset Screening And Hygiene Review
AI-Based Generation of Data Quality Validation Tests
Ongoing Production Testing Of New Data Refreshes
Continuous Periodic Anomaly Monitoring Of Datasets
Flexible, Focused Data Scoring (Coming Soon)Â
Use AI To Generate Dozens Of Data Quality Checks
Data Engineers don’t need detailed knowledge of your enterprise data or customer needs. Auto-test generation means you can start quickly and easily.
Data Profiling And Data Hygiene Detector Tests
Data Engineers get an understanding of the characteristics of every column of data. You can identify prominent problem rows of data before your production begins.
Fast ‘Push Down’ In-Database SQL Execution
Pushing down queries to a database is more efficient than copying data because it minimizes data movement. It leverages the database’s processing power to handle filtering, aggregation, and transformations directly on the data. This approach enhances security by limiting data exposure and improves performance by executing operations close to the data.
Vast Range Of AI-Driven Data Quality Validation Options
Experience comprehensive data quality validation with our robust suite of profiling, data quality, and hygiene checks. With 51 Data Profiling Column Characteristics that deeply analyze your data structure, 32 Auto-Generated Data Tests for rapid quality assessments, 27 Data Hygiene Detector Tests that root out inconsistencies, and 8 Business Rule Data Tests to enforce critical business logic, our platform ensures your data meets the highest standards. Additionally, with 2 User-Created Custom Tests, you can easily tailor validation to unique requirements. Achieve full coverage and confidence in data integrity with the most extensive automated quality checks available.
not displayed
51 Data Profiling Column Characteristics
Data profiling is the periodic X-ray of tables in a database to gather extensive information about the contents of each column. Results are stored in a standard table in DataOps TestGen. This table is available for direct review and is used for rules derivation downstream. Examples include:
• Averages
• Column & Table Types & Names
• Date Characteristics
• Min/Max Value
• Numeric Counts:
• Percentiles
• Positions
• Unique Values
32 Auto-Generated Data Tests
The goal of Automatically Generated Data Tests is to cast a wide net for data problems that can’t be predicted by targeted testing devised in advance. It’s the same way you might set up a burglar alarm in your home by deploying sensors at all possible entrances to catch a burglar who would only try one window. Your goal in refining these tests is to maintain maximum sensitivity to real problems while minimizing false positives that are not worth the follow-up.  Examples of Test Are:
- Alpha Truncation
- Average Shift
- Constant Value Present
- Daily Record Count
- Value present in List-of-Values
- Distinct Value Change
- Value present in List-of-Values
- Future Date
- Incremental Average Shift
27 Data Hygiene Detector Tests
Once data profiling is complete, Data Hygiene Detection Tests automatically confirm how closely data structures and assumptions match the actual contents of each column. Results can be used to assist the Data Engineer in refining data structure definitions and target the addition of data ‘patching’ steps which help to generate a more usable, analyzable dataset. Examples Include:
- Invalid Zip Code Format
- Leading Spaces
- Mostly Dates In String
- Mostly not null, empty, or filled values.
- Multiple Data Types Per Column Name
- No Column Values Present
- Non-standard Blank Values
8 Business Rule Data Tests
Business Rule Configurable Data Tests allow you to configure data quality validation tests that can’t be gleaned automatically from prior data. It is faster and easier to set up Business Rule Configurable Data Tests than to program custom SQL. Business Rule, Data Test logic is already programmed, tested, and verified to work. To collaborate on rules and documentation, they can be configured and shared with business users, not database programmers. (coming soon) Examples include:
- Data Match
- Prior Match
- Aggregate Match No Drops
2 User Created Custom Test Data Tests
User-created configurable Data Tests allow you to create reusable data quality validation tests unique to your data sets and customers.
Freshness, Volume, Schema, and Data Drift Anomaly Detection
You need help to quickly identify issues in your data before someone else finds them first — before bad data is passed into reports, models or other deliverables. You need to confirm that your data is fresh. You need to be sure that data volume is trending in the right direction. You need to know if a schema has been altered, or if there is any change to the health of your data. You want to get alerted without being bothered with every transient issue. The sooner you find problems with your data, the better!
A Data Catalog To Drill Into The Details Of Your Data and Metadata
Get a 360-degree view of every one of your data table’s metadata in a single view.  See table characteristics, profile results, hygiene issues, test results, potential PII issues, and Critical Data Element (CDE) identification. View and edit key metadata for every table and column.
You Don’t Have Time To Write Data Quality Validation Tests – DataOps Testgen Does It Automatically!
The daily grind of data engineering leaves you with a backlog of customer requests and no time to innovate. DataOps TestGen algorithmically generates data quality tests and anomaly detectors and finds data profiling issues, all based on scanning your data — with no coding or massive YAML configuration.
Configurable Data Quality Scoring and Dashboards
DataKitchen DataOps Data Quality TestGen simply, efficiently, and quickly creates data quality dashboards and scorecards. You can focus your score based on DAMA categories, a score based on CDE, the score for a specific data scientist model, and/or a score based on current business goals. And cause action automatically generated data quality issues!
Read More About DataOps TestGen
DataKitchen provides software to observe and automate every data journey in an organization, from source to customer value, in development and production, so that teams can deliver insight to their customers with few errors and a high rate of new insight creation.
Our software allows data and analytic teams to observe, test, and automate the tools, data, processes, and environments in their entire data analytics organization, providing massive increases in quality, cycle time, and team productivity.
Start Improving Your Data Quality Validation and DataOps Today!