GenAI economics behave differently from cloud-compute economics, and most FinOps frameworks were built for the latter. Drag the sliders and watch token economics, caching, and model routing shape the bill. The numbers update as fast as you can move the controls. When you're done, optionally generate an LLM-written executive summary with action items.
Defaults reflect public Q1 2026 list pricing for OpenAI/Anthropic frontier models, Kimi K2 / Claude Haiku class workhorses, and self-hosted Llama-3-class open-weight models on a managed inference stack at ~60% utilization.
| 10% frontier | 30% frontier | 60% frontier | |
|---|---|---|---|
| 0% cache | — | — | — |
| 30% cache | — | — | — |
| 60% cache | — | — | — |
For each model tier t in {frontier, workhorse, open}:
tier_calls = monthly_volume × mix[t]
per_call_cost[t] = (input_tokens × rate_in[t] + output_tokens × rate_out[t]) / 1,000,000
tier_monthly_cost = tier_calls × per_call_cost[t] × (1 − cache_hit_rate)
monthly_total = Σ tier_monthly_cost
annual_total = monthly_total × 12
cache_savings_per_year = annual_total × cache_hit_rate / (1 − cache_hit_rate)
(i.e. what you'd be paying if cache were disabled, minus what you pay now)
break_even_volume:
self_host_monthly = $2,400 / GPU * gpu_count (assuming a single A100/H100 class
GPU at ~50,000 throughput-tokens/sec
× 60% utilization ≈ ~80M tokens/month)
solve: monthly_volume_at_which (cloud_per_call × volume) = self_host_monthly
All math runs in the browser — no upstream model call for the calculator. The LLM is only called when you press Generate FinOps recommendations; it receives the current numbers and writes a paragraph or two.
This is a planning tool, not a billing system. Real bills include: long-context price tiers, output-tokens rate variations on some providers, dedicated-throughput discounts, prompt-caching credits (Anthropic, OpenAI), batch-API discounts (~50%), egress, observability and gateway costs.
Use the numbers here to scope conversations with vendors and finance, not to commit to a number. The model rate inputs are editable so you can override with your actual contract pricing.