Document Intelligence

The highest-volume real enterprise GenAI use case: turn an unstructured document into structured output. Paste a contract, an insurance claim, or a vendor proposal — one call returns doc type, an executive summary, color-coded entities (people, organizations, dates, money, obligations), and risk flags with severity and rationale. Hover anything in the right pane to flash its match in the document.

DOCUMENT · editable 0 chars

Architecture — what just happened

Browser
  └─→ POST /api/lab/chat
        - system: structured-output prompt with the schema below
        - user: the document text
        - temperature: 0.2 (deterministic-ish for extraction)
        - max_tokens: 2048
        ← single response, parsed as JSON
        ← entity / risk spans located in the original text by exact
           verbatim match (model returns the literal substring, not
           character offsets — offsets are unreliable, substrings are
           always re-locatable)
        ← rendered inline with color-coded backgrounds and underlines

One LLM call. The clever bit isn't the call — it's asking the model for verbatim substrings instead of character offsets. Models hallucinate offsets confidently and constantly; substrings are self-verifying because we re-locate them in the source.

The schema (versioned doc.v1)

Loading schema…

Telemetry — request, response, parsed JSON

Run an analysis to see telemetry.

Honest caveat

This is a teaching demo. A production document-intelligence system would add:

OCR for scanned PDFs and images (the demo only handles text input)
A specialized extraction model fine-tuned on the document domain
Confidence calibration via held-out test sets, not just self-reported scores
Schema validation against canonical taxonomies (NAICS, contract clauses, ICD-10)
Human-in-the-loop review for edge cases and below-threshold confidence
Audit trail of every extraction with model version + prompt version

The Lab demonstrates the shape of the capability — the structured output, the inline rendering, the confidence scores. Real enterprise deployments take 3-6 months to harden because the long tail of edge cases is where the value (and the risk) lives.