Install open-source DataOps TestGen

Data profiling is just the start.

Next generation open source. Data quality, data observability, data testing generate thousands of data quality tests automatically.

$ Select a platform above to see the install commands.
  • Apache 2.0
  • Runs on your infrastructure
  • No data leaves your environment

See it first

Take a tour before
you install.

Click through TestGen's profiling, hygiene detection, and auto-generated tests on a real dataset. No signup. No install.

Found a data error?

Here's what TestGen flags on the first profiling pass.

Point it at your warehouse. Twenty-eight minutes later, the issue list is in your browser. Like these:

  • Null-rate spikes
  • Stale timestamps
  • Broken foreign keys
  • Schema drift
  • Uniqueness violations
  • Row-volume spikes
  • Outlier values
  • Missing required values

Plus 19 more. Out-of-the-box Freshness/Volume/Schema Anomaly Detection.

Profile every column. Catch what profilers miss.

Most profilers stop at counts and nulls. TestGen reads patterns, encodings, dates, IDs, and category drift. Each one becomes a running test.

51 profiling characteristics per column

Data types, null rates, cardinality, distributions, length stats, value ranges, top-N values, regex patterns, encoding detection, date semantics, candidate keys. Computed in-database so you don't extract billions of rows.

27 hygiene issues, flagged on the first run

Stale or future-dated timestamps. Broken foreign keys. Suspicious unique constraints. Columns drifting out of their declared domain. The classes of bugs that quietly break downstream dashboards. You see them before your stakeholders do.

Auto-generated tests, not hand-written assertions

Every profile becomes a running test. Schedule them. Score the results. TestGen flags drift and regressions, not just the boolean pass/fail. No YAML to write, no SQL fixtures to maintain.

A real data catalog, not a sticky note

Every table, every column, every test, every score. Searchable, linkable, linked to history. Stops the "what's in this table again?" Slack thread from happening twice a week.

TestGen Enterprise

Profiling that earns its keep.

A profiler that doesn't generate tests is a one-shot report. TestGen profiles, scores, tests, and watches — so the work you put in on day one keeps paying back on day 90.

See How TestGen Works
  • In-database execution. Profile billions of rows where they sit. Snowflake, Databricks, Postgres, BigQuery, MS SQL, and more.
  • Apache 2.0, no feature gating. Every detector, every characteristic, every test type. In the open source.
  • Self-hosted. Your data never leaves your environment. Compliant by default.
  • Bootstrapped since 2013. Profitable, independent, and not pivoting next quarter.

Honest comparison

We built TestGen because writing data tests sucks.

hard pass

Great Expectations and Soda Core are excellent frameworks for hand-writing data quality tests. TestGen learns your data and generates the data tests you need in minutes. You get a UI to share tests with your customers, an MCP server, and a library of custom best practice tests.

Capability Next-generation open source DataKitchen TestGen Previous generation open source Soda Core · Great Expectations
Web UI Built-in CLI + Python or YAML
Test authoring Automatic Enjoy spending hours writing Python per expectation or YAML per check
Data profiling 51 characteristics per column Basic stats only
Hygiene detection 27 detectors out of the box Manual
Built-in data catalog Yes No
Anomaly scoring over time Yes No
MCP support Full MCP interface No
Self-hosted, runs inside your firewall Yes Yes

All three projects are Apache 2.0 and run inside your environment. The difference is what you have to build yourself.

Ready? Scroll up and install. Really. Or check out our github repo.