Open Source Data Observability

DataOps TestGen

 

Say goodbye to the complexity of writing data quality validation tests yourself. DataOps TestGen takes care of that for you, automating the terms and conditions of your data contract through simple, automatically generated data test creation, execution, and profiling.

TestGen UI  - test results

DataOps TestGen delivers simple, fast data quality test generation and execution by:

Data Profiling.

New Dataset Screening And Hygiene Review

Algorithmic Generation of Data Quality Validation Tests

Ongoing Production Testing Of New Data Refreshes

Continuous Periodic Anomaly Monitoring Of Datasets

 

Automatically Generate Dozens Of Data Quality Checks

Data Engineers don’t need detailed knowledge of your enterprise data or customer needs.  Auto-test generation means you can start quickly and easily.

Data Profiling And Data Hygiene Detector Tests

TestGen UI -- Profiling

Data Engineers get an understanding of the characteristics of every column of data. You can identify prominent problem rows of data before your production begins.

Efficient, Understandable In-Database SQL Test Execution

not displayed

51 Data Profiling Column Characteristics

Data profiling is the periodic X-ray of tables in a database to gather extensive information about the contents of each column. Results are stored in a standard table in DataOps TestGen. This table is available for direct review and is used for rules derivation downstream. Examples include:
• Averages
• Column & Table Types & Names
• Date Characteristics
• Min/Max Value
• Numeric Counts:
• Percentiles
• Positions
• Unique Values

32 Auto-Generated Data Tests

The goal of Automatically Generated Data Tests is to cast a wide net for data problems that can’t be predicted by targeted testing devised in advance.  It’s the same way you might set up a burglar alarm in your home by deploying sensors at all possible entrances to catch a burglar who would only try one window.  Your goal in refining these tests is to maintain maximum sensitivity to real problems while minimizing false positives that are not worth the follow-up.   Examples of Test Are:

  • Alpha Truncation
  • Average Shift
  • Constant Value Present
  • Daily Record Count
  • Value present in List-of-Values
  • Distinct Value Change
  • Value present in List-of-Values
  • Future Date
  • Incremental Average Shift

 

27 Data Hygiene Detector Tests

Once data profiling is complete, Data Hygiene Detection Tests automatically confirm how closely data structures and assumptions match the actual contents of each column. Results can be used to assist the Data Engineer in refining data structure definitions and target the addition of data ‘patching’ steps which help to generate a more usable, analyzable dataset.  Examples Include:

  • Invalid Zip Code Format
  • Leading Spaces
  • Mostly Dates In String
  • Mostly not null, empty, or filled values.
  • Multiple Data Types Per Column Name
  • No Column Values Present
  • Non-standard Blank Values

 

8 Business Rule Data Tests

Business Rule Configurable Data Tests allow you to configure data quality validation tests that can’t be gleaned automatically from prior data. It is faster and easier to set up Business Rule Configurable Data Tests than to program custom SQL. Business Rule, Data Test logic is already programmed, tested, and verified to work. To collaborate on rules and documentation, they can be configured and shared with business users, not database programmers.  (coming soon) Examples include:

  • Data Match
  • Prior Match
  • Aggregate Match No Drops

2 User Created Custom Test Data Tests

User-created configurable Data Tests allow you to create reusable data quality validation tests unique to your data sets and customers.

Complete Coverage.  No data duplication because tests are run in your production database quickly and with low impact.  You can understand the test queries clearly.

Freshness, Volume, Schema, and Data Drift Anomaly Detection

You need help to quickly identify issues in your data before someone else finds them first — before bad data is passed into reports, models or other deliverables. You need to confirm that your data is fresh. You need to be sure that data volume is trending in the right direction. You need to know if a schema has been altered, or if there is any change to the health of your data. You want to get alerted without being bothered with every transient issue. The sooner you find problems with your data, the better!

You Don’t Have Time To Write Data Quality Validation Tests

OMG, it's full of pipelines

The daily grind of data engineering leaves you with a backlog of customer requests and no time to innovate.  DataOps TestGen algorithmically generates data quality tests and anomaly detectors and finds data profiling issues, all based on scanning your data — with no coding or massive YAML configuration.

 

DataKitchen provides software to observe and automate every data journey in an organization, from source to customer value, in development and production, so that teams can deliver insight to their customers with few errors and a high rate of new insight creation.

Our software allows data and analytic teams to observe, test, and automate the tools, data, processes, and environments in their entire data analytics organization, providing massive increases in quality, cycle time, and team productivity.

Start Improving Your Data Quality Validation and DataOps Today!

 

Webinar

Data Quality Testing Techniques Webinar »

Data Observability Software

DataOps Observability: Monitor every Data Journey in an enterprise, from source to customer value, and find errors fast! [Open Source, Enterprise]

DataOps TestGen: Simple, Fast Data Quality Test Generation and Execution. Trust, but verify your data! [Open Source, Enterprise]

DataOps Software

DataOps Automation: Orchestrate and automate your data toolchain to deliver insight with few errors and a high rate of change. [Enterprise]

recipes for dataops success

DataKitchen Consulting Services


Assessments

Identify obstacles to remove and opportunities to grow

DataOps Consulting, Coaching, and Transformation

Deliver faster and eliminate errors

DataOps Training

Educate, align, and mobilize

Commercial Pharma Agile Data Warehouse

Get trusted data and fast changes from your warehouse

 

dataops-cookbook-download

DataOps Learning and Background Resources


DataOps Journey FAQ
DataOps Observability basics
Data Journey Manifesto
Why it matters!
DataOps FAQ
All the basics of DataOps
DataOps 101 Training
Get certified in DataOps
Maturity Model Assessment
Assess your DataOps Readiness
DataOps Manifesto
Thirty thousand signatures can't be wrong!

 

DataKitchen Basics


About DataKitchen

All the basics on DataKitchen

DataKitchen Team

Who we are; Why we are the DataOps experts

Careers

Come join us!

Contact

How to connect with DataKitchen

 

DataKitchen News


Newsroom

Hear the latest from DataKitchen

Events

See DataKitchen live!

Partners

See how partners are using our Products

 

Monitor every Data Journey in an enterprise, from source to customer value, in development and production.

Simple, Fast Data Quality Test Generation and Execution. Your Data Journey starts with verifying that you can trust your data.

Orchestrate and automate your data toolchain to deliver insight with few errors and a high rate of change.