The first 30 seconds of work that used to take a product team a week. Paste 30–80 customer comments — NPS verbatims, claim-experience surveys, return reasons. One structured-output call clusters them into 5–8 themes with sentiment, severity, illustrative verbatim quotes, and a concrete recommended action per theme. The same shape a product manager would produce in a workshop, generated as fast as you can pick a preset.
Browser
└─→ POST /api/lab/chat
- system: clustering prompt with the schema below
(versioned: voc.v1)
- user: the comments, one per line
- temperature: 0.3 (low — clustering should be stable)
- max_tokens: 2048
← single response, parsed as JSON: { summary, themes[] }
← rendered as a sentiment-color-spined theme grid with
verbatim quote pull-outs and recommended actions
One LLM call. The clustering, sentiment classification, severity scoring, quote selection, and recommendation drafting all happen in the same pass — the model fits the structure and emits it as JSON. Quotes are required to be verbatim substrings; we don't re-locate them in the input (unlike Document Intelligence) because comments are independent rows, not a continuous document.
Every product team has a feedback backlog: NPS comments, support tickets, customer-interview notes, returns reasons, claim experience surveys. Someone is supposed to read them all and pull out themes. Most teams either (a) don't, or (b) hire a researcher for two weeks per analysis cycle.
This demo collapses that into 30 seconds. It doesn't replace the human judgement — the product manager still decides which theme to act on first, and which recommendation is viable. But the clustering, the verbatim quote pulls, and the sentiment scoring all happen up front, so the human starts the conversation already informed.
Honest caveat: the model isn't doing real statistical clustering. It's reading the comments and grouping them by perceived theme — which is exactly what a human researcher does, but a human can be wrong about edge cases. For a real production deployment you'd run this against a sample, have a human review the themes, and use those human-labeled themes as the canonical taxonomy going forward.
Synthesize a set to see telemetry.