// flagship · production reference
NexusRAG
Multi-tenant RAG platform.
The shipped reference every other system in this portfolio points to. Multi-tenant, multi-cloud retrieval with the production substrate most RAG demos skip.
// the gap
Demo retrieval is easy. Production retrieval is the hard part.
Most RAG projects clear the first 80 percent in a weekend: embed some documents, wire a vector search, pipe the top chunks into a model. Then real users arrive with messy queries, the corpus grows past a single tenant, and the demo starts returning confident nonsense.
The remaining 20 percent is the part that ships: tenant isolation, access control on every retrieved chunk, cost ceilings, failover across providers, and an audit trail that survives a compliance review. NexusRAG is that substrate, built once so retrieval becomes the dependable part of the stack instead of the risky one.
// architecture
Three layers, one contract.
Multi-cloud retrieval routing
pgvector, Bedrock Knowledge Bases, and Vertex AI Search behind one router. A provider degrades, retrieval keeps serving.
LangGraph agent core
Retrieve, rank, generate, stream as an inspectable graph, not an opaque prompt chain. Every step is traceable.
Auth, cache, rate-limit pipeline
Every request carries a tenant, every chunk an ACL, every caller a token bucket and a cost budget. The boring parts, done once.
// what ships
The production checklist, already checked.
Every capability below maps to a live endpoint in the public API. Open the source and read the route.
Retrieval
- Streaming RAG (retrieve, rank, generate, stream)
- Multi-cloud routing: pgvector + Bedrock KB + Vertex
- Corpora + document management
- Idempotent run keys
Identity + access
- SSO discovery + callback
- SCIM user provisioning
- RBAC + ABAC on every chunk
- API-key lifecycle
- Envelope encryption + key rotation
Governance + cost
- Token-bucket rate limits
- Tenant quotas + entitlements
- Cost budgets + chargeback reports
- SOC 2 compliance bundles
- DSAR, legal holds, retention
Reliability + ops
- Multi-region failover
- Kill switches + canary rollouts
- Tamper-evident audit log
- SLA evaluation engine
- Prometheus metrics
// retrieval quality
Measured, not asserted.
Retrieval quality is a number, not an adjective. A nightly public benchmark runs a fixed, labeled question set against the live index and publishes recall, precision, and nDCG at /api/benchmark-latest. The harness is instrumented; the first public run lands shortly.
// telemetry
Live, and auditable.
The numbers below come straight from the production /api/stats endpoint, real workload only, never seeded. When there is no recent traffic the figures read low and honest rather than inflated. The schema is public and versioned.
// the bar
The standard the rest is graduating toward.
NexusRAG is the engineering bar the rest of the portfolio is working up to: every claim backed by a public repo, a live deploy, and a number you can check in sixty seconds. AI infrastructure, not AI theater.