RAG Explorer

Retrieval-augmented generation, with the wires showing. Type a question — the corpus panel lights up the chunks that were retrieved, the answer streams in with inline citations, and every step is inspectable: the embedding call, the similarity scores, the prompt sent to the model with the retrieved context grounded in.

Question Try: "What are TOGAF's ADM phases?" · "How does the NIST AI RMF treat hallucination?" · "When is a use case high-risk under the EU AI Act?"

top-k: loading corpus…

Ask a question to see retrieved chunks light up on the left and a grounded answer stream here.

// Citations

Telemetry — embed call, retrieval, chat call

Send a question to see telemetry.

Architecture — what just happened

A two-call retrieval pipeline, with the math visible:

Browser
  ├─ corpus.json (77 chunks × 1024-dim embeddings) loaded once on page open
  │
  ├─→ POST /api/lab/embed {text, input_type:"query"}
  │     → NVIDIA NIM (nv-embedqa-e5-v5)
  │     ← {embedding: [1024 floats], latency_ms, …}
  │
  ├─ cosineSimilarity(query_vec, chunk_vec) for every chunk in corpus.json
  ├─ sort desc, take top-k → highlight in corpus pane
  │
  └─→ POST /api/lab/chat {messages: [system+user], …}
        - system: "Answer using ONLY these chunks. Cite as [1], [2]…"
        - user:   the question
        - context: top-k chunks injected into the system prompt
        ← SSE stream of tokens with [n] citation markers

Retrieval runs in your browser. The backend never touches the corpus — it only embeds your query (one cheap call) and runs the grounded chat (one ordinary chat call). corpus.json is a static asset on the same CDN edge as the page.

Corpus & chunking

Seven sources, paragraph-aware chunking with a soft 500-char cap:

TOGAF 10 — concise summary of the standard's ADM phases, principles, ARBs, ADRs.
NIST AI RMF — Govern/Map/Measure/Manage and the Generative AI Profile risks.
EU AI Act — risk tiers, GPAI obligations, penalties, timeline.
ArchiMate — layers, aspects, relationships, viewpoints.
Enterprise AI Strategy — original short essay on portfolio, operating model, sequencing.
GenAI FinOps — original short essay on token economics, model portfolio, caching.
SVT Essays — five short pieces previously published in the lab build.

Standards are summarized in original prose, clearly labeled. The corpus is rebuilt via scripts/build_corpus.py any time the source files change.