Three agents, one model, one shared corpus. The Researcher iterates over the knowledge base — deciding what to look up next based on what it has found — then hands its notes to the Analyst, who weighs trade-offs and risks. The Writer synthesizes the final answer with citations. Everything streams live so you can watch the system think.
A real iterative agent loop, all client-side, with the corpus as the shared knowledge base:
Researcher (up to 3 iterations):
loop:
POST /api/lab/chat
"Decide: next search query, or stop?"
→ {next_query: "...", rationale: "...", done: false}
if done or iteration ≥ 3: break
POST /api/lab/embed (input_type=query)
→ 1024-dim vector
cosineSimilarity over corpus.json (client-side)
→ top-5 chunks (deduped vs prior iterations)
feed retrieved chunks back into next iteration's context
POST /api/lab/chat (streaming)
"Synthesize research notes from the chunks gathered, with [n] citations"
→ research notes
Analyst:
POST /api/lab/chat (streaming)
"Identify trade-offs, risks, decision factors. Be direct."
context: question + research notes
→ analysis
Writer:
POST /api/lab/chat (streaming)
"Synthesize a concise executive-ready answer with [n] citations."
context: question + research notes + analyst's analysis
→ final answer
One model (Kimi K2). Three roles, differentiated only by system prompt and the context they receive. The Researcher's iteration loop is the most "agentic" part — every loop the model sees what it's already learned and decides whether to dig deeper.
Total: 4–6 LLM calls + 1–3 embed calls per question. ~15–25 s end-to-end. All requests cached (LRU keyed on full request) so re-running the same question is instant and free of upstream cost.
A real production agent system would use formal function calling — the model emits structured tool requests, the orchestrator runs them, results flow back. That's how Claude, GPT, Gemini, and Kimi all support tools today.
This demo simulates the same pattern with plain prompts and JSON outputs. The visible structure is the same: plan a query, get results, decide what's next. The visitor sees the agent reason between calls rather than watching opaque tool-call traces.
In production you'd also add: parallel branches (multiple researchers working in parallel), critic agents (one agent reviews another's output), human-in-the-loop gates, and observability via OpenTelemetry. The Lab keeps it focused — three roles in series — to make the choreography legible.