The capstone of the AI Lab. Describe a strategic decision in plain prose — build vs buy, market entry, vendor consolidation, M&A. Unlike every prior demo, no structured criteria are given; the model designs the analytical framework itself, weights each criterion with rationale, scores each option, runs sensitivity analysis on which assumptions, if changed, would flip the recommendation, and returns a confidence-bounded recommendation with explicit upper and lower outcomes. The first demo where the model is the strategy consultant, not the spreadsheet.
Browser
└─→ POST /api/lab/chat
- system: option-weighing prompt with the schema
below (versioned: option.v1)
- user: the decision prose, verbatim
- temperature: 0.3 (slightly higher — the criteria
framework benefits from some divergent
thinking; the math stays stable)
- max_tokens: 3200
← single response, parsed as JSON: {
decision_summary, category, decision_horizon,
options[], criteria[], scores[], totals[],
sensitivity[], recommendation
}
← 6 rendered sections in executive read order:
summary → recommendation → ranked options →
DERIVED criteria framework → scored matrix →
sensitivity analysis
One LLM call. The architectural twist: the model
designs the criteria framework rather than receiving
it. Each criterion comes with a weight (1-5) AND a rationale
for that weight, surfaced in the criteria section so the
executive can sanity-check the lens before trusting the
scores. The sensitivity field is the
consultant's discipline: which assumptions, if they shift,
would change the answer? — ranked by criticality. The
recommendation includes both a lower-bound
outcome (what's the worst case if you pick this option?)
and an upper-bound outcome (best case), plus the single
most decisive factor.
Every strategy team I've worked with builds the same artifact for every major decision: a framework, a weighted score-card, an option ranking, a sensitivity analysis, a recommendation memo. McKinsey, BCG, Bain — their entire consulting playbook is built around standardizing the production of these artifacts. A senior associate spends two weeks on this for each engagement.
This demo collapses the first draft. The model designs the framework, runs the math, drafts the sensitivity, and writes the recommendation. The executive team still does the deciding — but they start from a populated artifact, not a blank one. If the framework is wrong, you know within 30 seconds because the criteria are right there at the top with rationales. Edit the prose, re-run.
For internal strategy teams, this is the new "blank PowerPoint" replacement. For board members, it's a way to interrogate a recommendation against an alternative framing. For consultants, it's a tool for accelerating the first 60% of an engagement.
Honest caveat:
the model is a fluent generalist, not a domain expert.
It can produce a polished framework that misses the one
criterion that actually matters in your industry. The
sensitivity analysis is the model's read of which
assumptions are critical — but the model doesn't know
your private context (the COO who hates Salesforce, the
board commitment that wasn't in the prose). Treat the
output as a fast first draft a smart generalist
consultant would produce; use the framework to surface
disagreements, not to settle them. The
confidence_band is the model's honest
assessment of how clean the answer is.
Run the demo to see telemetry.