Skip to main content

// prototype · public incident feed

Data Watchtower

Catch drift at the producer boundary.

A data-quality monitor that catches schema drift, distribution shifts, and cardinality collapse at the source, before a bad batch reaches the pipelines that trust it. Every alert names the column and the number that moved, and a threshold breach fails the build.

// the failure mode

A column changes shape upstream, and nothing notices until the dashboard is wrong.

Data quality fails quietly. A producer renames a field, a currency column starts arriving in cents, an enum gains a value nobody planned for. The pipeline keeps running because the rows still parse. The damage shows up days later as a revenue chart that dropped by a factor of six, and by then the bad data is in every table downstream.

The honest fix is to check at the boundary, where the data enters, not after it has spread. That means a profile of what good looks like, a comparison that explains exactly what moved, and a gate that refuses to pass a batch that breaks the contract.

// the engine

Profile a baseline. Compare each batch. Gate on severity.

01

Profile the baseline

Fingerprint a trusted snapshot: column types, value distributions, null rates, cardinality. The profile is the contract a later batch is held to.

02

Compare and explain

Diff a new batch against the baseline. Every finding names the column, the statistic that moved, and by how much, so the alert is a sentence a human can act on, not a red light.

03

Gate on severity

Map findings to a severity, then block the build when the threshold is breached. A warning-level drift fails the gate on purpose: the batch stops before it corrupts everything downstream.

// the proof

A real drift, caught and explained.

The public scan runs a deterministic drift scenario over a labeled orders fixture: a billing column collapses toward zero, the kind of value shift that looks plausible and is wrong. The engine catches it, names the column and the ratio it moved by, writes a markdown incident report, and fails the gate.

A failing gate here is the point. It means the watchtower stopped a corrupt batch before it spread. The figures below come straight from /api/incident-latest, never seeded, and the run reproduces offline with no credentials.

querying /api/incident-latest…

// reproduce

Run it yourself in a minute.

git clone https://github.com/IgnazioDS/data-quality-watchtower
cd data-quality-watchtower && pip install -e .
python -m data_quality_watchtower.incident_runner

The runner picks one of five drift scenarios, runs the engine, and rewrites the committed artifact the endpoint serves. The baseline and drifted batches live in the public fixture, dependency-free and offline. Persistence is repo-committed JSON, no external store and no secret.

// graduation

The path from prototype.

Today the monitor is honest about its stage: a real profiling engine, a public incident feed, and a deterministic scenario it catches every run. The next steps are connectors for live warehouse tables, a wider library of drift checks, and a gate teams can drop into their own ingestion CI. The bar it is working toward is NexusRAG: every claim backed by a public repo, a live deploy, and a number you can check in sixty seconds. AI infrastructure, not AI theater.

Want drift caught before it spreads?

Start a conversation