Install open-source DataOps TestGen
Data profiling is just the start.
Next generation open source. Data quality, data observability, data testing generate thousands of data quality tests automatically.
# TestGen install for Mac, Linux.
# Download the latest installer.
$ curl -o dk-installer.py \
'https://raw.githubusercontent.com/DataKitchen/data-observability-installer/main/dk-installer.py'
# Run the install command.
$ python3 dk-installer.py tg install
# Full docs: docs.datakitchen.io/testgen/get-started/install-on-mac-linux/ # TestGen install for Windows.
# Download the latest installer.
PS> Invoke-WebRequest `
-Uri 'https://github.com/DataKitchen/data-observability-installer/releases/download/latest/dk-installer.exe' `
-OutFile dk-installer.exe
# Run the installer.
PS> .\dk-installer.exe
# Full docs: docs.datakitchen.io/testgen/get-started/install-on-windows/ - Apache 2.0
- Runs on your infrastructure
- No data leaves your environment
See it first
Take a tour before
you install.
Click through TestGen's profiling, hygiene detection, and auto-generated tests on a real dataset. No signup. No install.
Found a data error?
Here's what TestGen flags on the first profiling pass.
Point it at your warehouse. Twenty-eight minutes later, the issue list is in your browser. Like these:
- Null-rate spikes
- Stale timestamps
- Broken foreign keys
- Schema drift
- Uniqueness violations
- Row-volume spikes
- Outlier values
- Missing required values
Plus 19 more. Out-of-the-box Freshness/Volume/Schema Anomaly Detection.
Profile every column. Catch what profilers miss.
Most profilers stop at counts and nulls. TestGen reads patterns, encodings, dates, IDs, and category drift. Each one becomes a running test.
TestGen Enterprise
Profiling that earns its keep.
A profiler that doesn't generate tests is a one-shot report. TestGen profiles, scores, tests, and watches — so the work you put in on day one keeps paying back on day 90.
See How TestGen Works- In-database execution. Profile billions of rows where they sit. Snowflake, Databricks, Postgres, BigQuery, MS SQL, and more.
- Apache 2.0, no feature gating. Every detector, every characteristic, every test type. In the open source.
- Self-hosted. Your data never leaves your environment. Compliant by default.
- Bootstrapped since 2013. Profitable, independent, and not pivoting next quarter.
Honest comparison
We built TestGen because writing data tests sucks.
hard pass
Great Expectations and Soda Core are excellent frameworks for hand-writing data quality tests. TestGen learns your data and generates the data tests you need in minutes. You get a UI to share tests with your customers, an MCP server, and a library of custom best practice tests.
| Capability | Next-generation open source DataKitchen TestGen | Previous generation open source Soda Core · Great Expectations |
|---|---|---|
| Web UI | Built-in | CLI + Python or YAML |
| Test authoring | Automatic | Enjoy spending hours writing Python per expectation or YAML per check |
| Data profiling | 51 characteristics per column | Basic stats only |
| Hygiene detection | 27 detectors out of the box | Manual |
| Built-in data catalog | Yes | No |
| Anomaly scoring over time | Yes | No |
| MCP support | Full MCP interface | No |
| Self-hosted, runs inside your firewall | Yes | Yes |
All three projects are Apache 2.0 and run inside your environment. The difference is what you have to build yourself.
Ready? Scroll up and install. Really. Or check out our github repo.