Open Source Data Observability
DataOps Data Quality TestGen
Â
Say goodbye to the complexity of writing data quality validation tests yourself. DataOps TestGen takes care of that for you, using AI to automate the terms and conditions of your data contract through simple, automatically generated data test creation, execution, and profiling.
DataOps Data Quality TestGen delivers simple, fast data quality test generation and execution by:
Data Profiling.
New Dataset Screening And Hygiene Review
AI-Based Generation of Data Quality Validation Tests
Ongoing Production Testing Of New Data Refreshes
Continuous Periodic Anomaly Monitoring Of Datasets
Use AI To Generate Dozens Of Data Quality Checks
Data Engineers don’t need detailed knowledge of your enterprise data or customer needs. Auto-test generation means you can start quickly and easily.
Data Profiling And Data Hygiene Detector Tests
Data Engineers get an understanding of the characteristics of every column of data. You can identify prominent problem rows of data before your production begins.
Efficient, Understandable In-Database SQL Test Execution
not displayed
51 Data Profiling Column Characteristics
Data profiling is the periodic X-ray of tables in a database to gather extensive information about the contents of each column. Results are stored in a standard table in DataOps TestGen. This table is available for direct review and is used for rules derivation downstream. Examples include:
• Averages
• Column & Table Types & Names
• Date Characteristics
• Min/Max Value
• Numeric Counts:
• Percentiles
• Positions
• Unique Values
32 Auto-Generated Data Tests
The goal of Automatically Generated Data Tests is to cast a wide net for data problems that can’t be predicted by targeted testing devised in advance. It’s the same way you might set up a burglar alarm in your home by deploying sensors at all possible entrances to catch a burglar who would only try one window. Your goal in refining these tests is to maintain maximum sensitivity to real problems while minimizing false positives that are not worth the follow-up.  Examples of Test Are:
- Alpha Truncation
- Average Shift
- Constant Value Present
- Daily Record Count
- Value present in List-of-Values
- Distinct Value Change
- Value present in List-of-Values
- Future Date
- Incremental Average Shift
27 Data Hygiene Detector Tests
Once data profiling is complete, Data Hygiene Detection Tests automatically confirm how closely data structures and assumptions match the actual contents of each column. Results can be used to assist the Data Engineer in refining data structure definitions and target the addition of data ‘patching’ steps which help to generate a more usable, analyzable dataset. Examples Include:
- Invalid Zip Code Format
- Leading Spaces
- Mostly Dates In String
- Mostly not null, empty, or filled values.
- Multiple Data Types Per Column Name
- No Column Values Present
- Non-standard Blank Values
8 Business Rule Data Tests
Business Rule Configurable Data Tests allow you to configure data quality validation tests that can’t be gleaned automatically from prior data. It is faster and easier to set up Business Rule Configurable Data Tests than to program custom SQL. Business Rule, Data Test logic is already programmed, tested, and verified to work. To collaborate on rules and documentation, they can be configured and shared with business users, not database programmers. (coming soon) Examples include:
- Data Match
- Prior Match
- Aggregate Match No Drops
2 User Created Custom Test Data Tests
User-created configurable Data Tests allow you to create reusable data quality validation tests unique to your data sets and customers.
Complete Coverage. No data duplication because tests are run in your production database quickly and with low impact. You can understand the test queries clearly.
Freshness, Volume, Schema, and Data Drift Anomaly Detection
You need help to quickly identify issues in your data before someone else finds them first — before bad data is passed into reports, models or other deliverables. You need to confirm that your data is fresh. You need to be sure that data volume is trending in the right direction. You need to know if a schema has been altered, or if there is any change to the health of your data. You want to get alerted without being bothered with every transient issue. The sooner you find problems with your data, the better!
You Don’t Have Time To Write Data Quality Validation Tests – DataOps Testgen Does It Automatically!
The daily grind of data engineering leaves you with a backlog of customer requests and no time to innovate. DataOps TestGen algorithmically generates data quality tests and anomaly detectors and finds data profiling issues, all based on scanning your data — with no coding or massive YAML configuration.
Read More About DataOps TestGen
DataKitchen provides software to observe and automate every data journey in an organization, from source to customer value, in development and production, so that teams can deliver insight to their customers with few errors and a high rate of new insight creation.
Our software allows data and analytic teams to observe, test, and automate the tools, data, processes, and environments in their entire data analytics organization, providing massive increases in quality, cycle time, and team productivity.
Start Improving Your Data Quality Validation and DataOps Today!