# PRD: Human-in-the-loop data quality fix agent

A two-program pattern that finds bad source data, proposes the exact SQL fix, waits for a human to approve, then applies only what was approved and logs what ran.

## 1. Goal

Let an agent do the boring 90 percent (scan the data, write the SQL, record the result) while a human keeps the one decision that matters (is this fix correct and safe to run). No fix touches a source table without explicit human approval recorded in a ticket.

This document is enough to rebuild the system in your own environment. It names the components, the contract between them, the safety gates, and the config. It does not ship full source.

## 2. The pattern

Two programs and a control plane.

1. **Proposer** (`propose_fixes.py`). Runs a data quality scan, gathers context for each failure, asks an LLM for an exact SQL fix, validates the SQL, and writes each proposed fix into a ticket. It has read-only database access. It applies nothing.
2. **Control plane** (Jira, or any ticket system). One ticket per fix. A human reads each one and sets an approval flag, with an optional comment.
3. **Applier** (`apply_fixes.py`). Reads the tickets, runs only the approved SQL inside a transaction with safety guards, writes the result back to the ticket, and records an audit log entry. It has write access. It never decides anything.

Mapping to the blog: the proposer is the program that "finds problems and proposes fixes" (it uses TestGen for detection and Claude for the proposal). The applier is the program that "applies them, but only the ones you approved."

The separation is the whole point. The program that can write to your tables has no judgment. The program with judgment cannot write.

## 3. Architecture

```
source DB (read-only role)
        |
        v
[ propose_fixes.py ] --(DQ scan)--> failures
        |  for each failure: gather context
        |  call LLM -> proposed SQL + rationale + risk
        |  validate SQL (sqlglot) against live catalog
        v
   Jira tickets  (status: proposed)
        |
        |  <-- human reviews, sets Approved flag + comment
        v
[ apply_fixes.py ] --(read approved tickets)
        |  dry run / EXPLAIN
        |  row-count guard, PII guard, statement-type guard
        |  apply approved SQL in a transaction (write role)
        v
source DB (write role)   +   audit log   +   ticket updated with result
```

## 4. The fix record (the contract)

Both programs share one record shape. The proposer writes it, the applier reads it, the ticket stores it. Store the JSON in the ticket description or a custom field so the applier never has to re-derive anything.

```json
{
  "fix_id": "stable hash of source_table + column + rule",
  "run_id": "uuid of the proposer run that created this",
  "source_table": "<schema>.<table>",
  "column": "<column>",
  "rule": "<name of the rule that failed>",
  "problem": "Plain-language description of what is wrong and how many rows are affected.",
  "evidence": {
    "bad_value_examples": ["<sample bad value>", "<sample bad value>"],
    "affected_rows": 0,
    "total_rows": 0,
    "data_type": "<column data type>"
  },
  "proposed_sql": "UPDATE <schema>.<table> SET <column> = <new value or expression> WHERE <condition>;",
  "rationale": "Why this is the right fix and why it is the least destructive option.",
  "risk": {
    "statement_type": "UPDATE",
    "destructive": true,
    "irreversible": false,
    "touches_pii": false,
    "estimated_rows_affected": 0
  },
  "status": "proposed",
  "approver": null,
  "approval_comment": null,
  "result": { "applied_at": null, "rows_affected": null, "error": null }
}
```

`status` values: `proposed`, `approved`, `rejected`, `applied`, `failed`, `skipped`.

`fix_id` is a stable hash so the same problem on the same column produces the same id across runs. This gives you idempotency and dedup for free.

## 5. Control plane (Jira)

One Jira issue per fix. A parent issue per proposer run, each fix as a sub-task, is the cleanest layout.

Field mapping:

- **Summary**: `[fix_id] <source_table>.<column>: <rule>`
- **Description**: the human-readable problem, evidence, proposed SQL in a code block, and rationale.
- **Stored record**: the full fix-record JSON in a custom field (or a fenced block in the description the applier parses).
- **Approval flag**: a custom checkbox or single-select field named `DQ Approval` with values `Approved` / `Rejected`. Default empty.
- **Approval comment**: the standard Jira comment field. Optional.
- **Result**: the applier appends a comment and transitions status.

A human disposition is: set `DQ Approval` to `Approved` (optionally add a comment), and nothing else. Anything not explicitly `Approved` is treated as not approved.

If you do not use Jira, any system with a per-fix record, a settable approval field, and an API works. GitHub issues, a database table with a UI, a spreadsheet with a column. The applier only needs to read "is this fix approved" and write back a result.

## 6. Functional requirements: proposer

P1. MUST connect to the source database with a **read-only** role. It must not hold credentials that can write.

P2. MUST run a data quality scan and collect failures. TestGen is the reference detector. The minimum a detector must emit per failure: table, column, rule name, a human description, sample bad values, affected row count, total row count, and column data type.

P3. For each failure, MUST gather schema context: the table's columns and types, and any relevant reference or lookup tables the fix might join against. This context goes to the LLM.

P4. MUST call the LLM with the failure plus context and request a structured proposal: SQL, rationale, and a risk assessment. See section 9 for the prompt contract.

P5. MUST validate every proposed SQL before writing it to a ticket (section 8). A proposal that fails validation is discarded, and the failure is logged. The proposer never writes invalid or unparseable SQL into a ticket.

P6. MUST compute `estimated_rows_affected` by running the fix's WHERE clause as a `SELECT count(*)` under the read-only role. This is an estimate written into the record, not a guarantee.

P7. MUST flag PII. If the target table or column is on the PII denylist, set `risk.touches_pii = true` and either skip the fix entirely (default) or propose it with a loud warning, per config.

P8. MUST write one fix record per ticket with `status = proposed`. MUST be idempotent: if a ticket for the same `fix_id` already exists and is open, update it rather than create a duplicate.

P9. MUST NOT apply, execute, or stage any change to the source data. Its only writes are to the ticket system and its own logs.

## 7. Functional requirements: applier

A1. MUST connect to the source database with a **write** role. This is the only program that can write.

A2. MUST read tickets and select only those where `DQ Approval = Approved`. Anything else is skipped and left untouched.

A3. For each approved fix, MUST re-run validation (section 8) against the live catalog before applying. Schemas drift between propose time and apply time. A fix that no longer validates is marked `failed` with the reason, and a comment is posted. It is not applied.

A4. MUST enforce the row-count guard. Re-count affected rows immediately before apply. If the count exceeds `max_rows_affected_per_fix`, do not apply. Mark `skipped`, post the count, and require a human to raise the limit on that ticket or in config.

A5. MUST apply inside a single transaction per fix. On any error, roll back, mark `failed`, record the error, move to the next fix. One bad fix never blocks the rest.

A6. MUST be idempotent. A fix already in `applied` status is never re-applied. Track applied `fix_id`s in the audit log and check before running.

A7. MUST write the result back: `rows_affected`, `applied_at`, and on the ticket a comment with the SQL that ran and the row count. Transition the ticket to a done state.

A8. MUST honor `dry_run`. In dry-run mode it does everything except commit: it runs the guards, runs the SQL inside a transaction, reads the row count, then rolls back and reports what would have happened.

A9. MUST NOT propose, edit, or invent SQL. It runs the exact `proposed_sql` from the approved record, nothing else.

## 8. SQL validation (the hallucination guard)

This is the gate that catches the failure the blog names: the model references a column that does not exist, or nulls a field it should not. Use `sqlglot` to parse and inspect, not string matching.

Both programs run this. The proposer runs it before writing a ticket. The applier runs it again before applying.

```python
def validate(sql, expected_table, live_catalog, config):
    parsed = sqlglot.parse(sql)            # parse, do not trust raw text
    assert len(parsed) == 1                # exactly one statement
    stmt = parsed[0]
    assert stmt.key.upper() in config.allowed_statements   # default: {"UPDATE"}
    assert tables_in(stmt) == {expected_table}             # only the table we scoped
    for col in columns_in(stmt):
        assert col in live_catalog[expected_table]         # no hallucinated columns
    assert no_unbounded_write(stmt)        # an UPDATE with no WHERE is rejected
    return True
```

Requirements:

- V1. Exactly one statement. Reject multi-statement strings.
- V2. Statement type must be on the allowlist. Default allows `UPDATE` only. `DELETE`, `DROP`, `TRUNCATE`, `ALTER` are off by default and require explicit config to enable, per table.
- V3. The only table written must be the table the fix was scoped to.
- V4. Every column referenced must exist in the live catalog. This is the check that stops the hallucinated-column failure.
- V5. No unbounded writes. An `UPDATE` without a `WHERE`, or one whose WHERE matches the whole table, is rejected unless explicitly allowed for that fix.

## 9. LLM prompt contract

The proposer calls the model once per failure. Keep the model's job narrow: given a described problem and the real schema, return a fix as strict JSON. The model does not get to run anything, and its output is always validated downstream, so a bad answer is caught, not applied.

Inputs to include in the prompt:

- The failing table name, column, and data type.
- The rule that failed and a plain description.
- Sample bad values and the affected row count.
- The full column list and types for the table.
- Relevant reference tables (names, key columns, sample rows).
- The PII status of the column.

Required output: a single JSON object with `proposed_sql`, `rationale`, and `risk` (statement_type, destructive, irreversible, touches_pii). Instruct the model to return JSON only, no prose, no code fences. Parse it, and if parsing fails, discard the proposal and log it.

Prompt rules to state explicitly to the model:

- Return one statement only.
- Write against the named table only.
- Use only columns that appear in the provided schema.
- Prefer the least destructive fix that resolves the rule.
- If the right fix is uncertain or needs business judgment, say so in `rationale` and propose the safest option, do not guess at business meaning.

## 10. Configuration

Keep secrets in env vars, keep limits and lists in a checked-in YAML so the gates are reviewable.

```yaml
database:
  read_dsn_env: DQ_READ_DSN      # read-only role, used by proposer
  write_dsn_env: DQ_WRITE_DSN    # write role, used by applier only
ticketing:
  base_url: <your ticket system base url>
  project_key: <your project key>
  token_env: TICKET_TOKEN
  approval_field: <id of your approval field>   # the "Approved" field
llm:
  provider: <your model provider>
  model: <your model id>
  api_key_env: LLM_API_KEY
limits:
  max_rows_affected_per_fix: 5000
  allowed_statements: ["UPDATE"]
  dry_run: false
guards:
  pii_denylist:
    - <schema>.<table>.<column>   # one entry per protected column
    - <schema>.<table>.<column>
  pii_on_match: skip            # skip | warn
```

Least privilege is a hard requirement, not a suggestion. The proposer gets `DQ_READ_DSN` and only that. The applier gets `DQ_WRITE_DSN`. If the proposer is ever compromised or prompted into proposing something destructive, it still cannot write.

## 11. Logging and audit

Every run of both programs appends to an audit log (a table or append-only file). One entry per fix touched.

Record: timestamp, program, run_id, fix_id, source_table, column, status transition, the exact SQL, rows affected, approver, ticket id, and any error. This is the record of what ran and who said yes. It is also how the applier checks idempotency.

The audit log is append-only. Nothing edits or deletes prior entries.

## 12. Non-goals

- Not a scheduler. Cron or your orchestrator runs the two programs. The pattern is the contract between them, not the trigger.
- Not a detector. Detection is pluggable. TestGen is the reference, any scanner that emits the failure shape works.
- Not a fixer of root cause. This patches bad source data after it lands. The partner is still sending you garbage. That is a separate conversation.
- No auto-approve mode. There is no flag that skips the human. If you want that, you want a different design, and you will eventually delete something you should not have.

## 13. Recreate-it checklist

1. Stand up two database roles: one read-only, one write. Confirm the read-only role genuinely cannot write.
2. Create a Jira project and a `DQ Approval` single-select field (`Approved` / `Rejected`). Note its custom field id.
3. Write the fix-record schema (section 4) as a shared module both programs import.
4. Build the validator (section 8) with `sqlglot`. Write tests for the five rules, especially the hallucinated-column case.
5. Build the proposer: scan, gather context, call the LLM, validate, write tickets. Verify it holds no write credentials.
6. Build the applier: read approved tickets, re-validate, run guards, apply in a transaction, write results, log.
7. Run end to end in `dry_run: true`. Confirm nothing commits.
8. Seed one real fix on a low-risk, easily reversible column, approve it, run the applier, and check the audit log and the row count.
9. Turn off dry run for one low-risk rule. Widen scope rule by rule, not all at once.
