// flagship · production reference

NexusRAG

Multi-tenant RAG platform.

The shipped reference every other system in this portfolio points to. Multi-tenant, multi-cloud retrieval with the production substrate most RAG demos skip.

Live deploy View on GitHub

// the gap

Demo retrieval is easy. Production retrieval is the hard part.

Most RAG projects clear the first 80 percent in a weekend: embed some documents, wire a vector search, pipe the top chunks into a model. Then real users arrive with messy queries, the corpus grows past a single tenant, and the demo starts returning confident nonsense.

The remaining 20 percent is the part that ships: tenant isolation, access control on every retrieved chunk, cost ceilings, failover across providers, and an audit trail that survives a compliance review. NexusRAG is that substrate, built once so retrieval becomes the dependable part of the stack instead of the risky one.

// architecture

Three layers, one contract.

Multi-cloud retrieval routing

pgvector, Bedrock Knowledge Bases, and Vertex AI Search behind one router. A provider degrades, retrieval keeps serving.

LangGraph agent core

Retrieve, rank, generate, stream as an inspectable graph, not an opaque prompt chain. Every step is traceable.

Auth, cache, rate-limit pipeline

Every request carries a tenant, every chunk an ACL, every caller a token bucket and a cost budget. The boring parts, done once.

// what ships

The production checklist, already checked.

Every capability below maps to a live endpoint in the public API. Open the source and read the route.

Retrieval

Streaming RAG (retrieve, rank, generate, stream)
Multi-cloud routing: pgvector + Bedrock KB + Vertex
Corpora + document management
Idempotent run keys

Identity + access

SSO discovery + callback
SCIM user provisioning
RBAC + ABAC on every chunk
API-key lifecycle
Envelope encryption + key rotation

Governance + cost

Token-bucket rate limits
Tenant quotas + entitlements
Cost budgets + chargeback reports
SOC 2 compliance bundles
DSAR, legal holds, retention

Reliability + ops

Multi-region failover
Kill switches + canary rollouts
Tamper-evident audit log
SLA evaluation engine
Prometheus metrics

// retrieval quality

Measured, not asserted.

Retrieval quality is a number, not an adjective. A nightly public benchmark runs a fixed, labeled question set against the live index and publishes recall, precision, and nDCG at /api/benchmark-latest. The harness is instrumented; the first public run lands shortly.

benchmark instrumented · public runs publishing soon

// telemetry

Live, and auditable.

The numbers below come straight from the production /api/stats endpoint, real workload only, never seeded. When there is no recent traffic the figures read low and honest rather than inflated. The schema is public and versioned.

querying /api/stats (cold start may take a few seconds)…

// the bar

The standard the rest is graduating toward.

NexusRAG is the engineering bar the rest of the portfolio is working up to: every claim backed by a public repo, a live deploy, and a number you can check in sixty seconds. AI infrastructure, not AI theater.

Need this substrate hardened for your team?

Start a conversation