The Pre-IPO Tax
Why I stopped paying the Big 3 — and what’s running on my laptop instead
Last week Anthropic’s pre-IPO valuation crossed a trillion dollars on secondary markets. OpenAI’s CFO walked back the IPO timeline and admitted the org isn’t ready. Google spent the year shoveling Gemini into every product with a cursor in it. Behind all of it is a story most builders aren’t saying out loud to each other:
The line items on your monthly API bill aren’t engineering economics. They’re pre-IPO economics.
Every token you push through Sonnet, GPT-5, or Gemini 2.5 Pro is helping pay down GPU debt and underwriting a public-offering narrative. That isn’t a moral complaint — it’s a pricing reality. And once you see it, you start asking the only question that actually matters:
Does the work require what I’m paying for?
For me, increasingly, the answer is no. Here’s what I’m doing about it.
The uncomfortable truth I owe you up front
Everything has converged.
Sonnet 4.6, GPT-5.5, MiniMax M2.7, Kimi K2.6, Qwen3 Coder, DeepSeek V3 — for the 80% of work hands-on builders do every day, they all feel the same. There are still edges where Claude is genuinely better at nuanced refactor work and Gemini is genuinely better at long-context document reasoning. But the gap that justifies a 10x cost difference? I haven’t found it in my own daily loop.
The frontier labs spent two years convincing us that the model was the product. The market is now telling us the model is the commodity, and the agent harness — the CLI, the skills, the MCPs, the team’s accumulated conventions — is the actual product.
That reframing changes everything about where the money should go.
The math is no longer subtle
The Big 3 today, per million tokens (input / output):
Claude Sonnet 4.6 — $3.00 / $15.00
Claude Opus — $5.00 / $25.00
GPT-5.4 — $2.50 / $15.00
Gemini 2.5 Pro — $1.25 / output varies
Now here’s what the bottom of the market looks like:
MiniMax M2.7 — $0.30 / $1.20
DeepSeek V3 — ~$0.25 / $0.38
Qwen3 Coder Plus — $0.65 / $3.25
Kimi K2.6 — $0.60 / $2.50 (with cache hits at 75% off)
That’s not 20% cheaper. That’s an order of magnitude on output, which is where coding agents actually spend money. A long agent loop that runs me $40 on Sonnet runs about $3 on MiniMax M2.7. The diff between the two outputs is, on most days, taste.
The Big 3 don’t lose this comparison on capability. They lose it on price-per-utility, which is the only number that matters when you have a budget.
The CLI is already portable. Most builders haven’t noticed yet.
The lock-in argument used to be the prompt library, the MCP stack, the painstakingly tuned CLAUDE.md, the sub-agent definitions, the skill folder. That’s the muscle memory Anthropic and OpenAI are betting you won’t migrate.
You don’t have to migrate it. You can keep all of it.
OpenCode is a Go-based terminal coding agent that reads your CLAUDE.md files natively, supports MCPs, runs sub-agents, and connects to any provider — Anthropic, OpenAI, Google, OpenRouter, local Ollama, or the Chinese frontier labs direct. The base tier is free. OpenCode Go is $5 the first month, $10/mo after with generous limits on curated open-source models. OpenCode Zen is pay-as-you-go for builders who want a managed bench of tested coding models.
Pi (the badlogic/pi-mono project by Mario Zechner) is the minimalist twin. Four tools out of the box — read, write, edit, bash — and a deliberately tiny system prompt that makes it cheap to run on small models. It honors AGENTS.md and CLAUDE.md, supports skills, and switches providers mid-session with Ctrl+L. The whole thing is open source. You pay only the model API.
Here’s the part that should land: everything you’ve invested in Claude Code ports. The skills you wrote, the CLAUDE.md conventions you tuned, the MCP servers your team stood up — they all run inside OpenCode and Pi against whatever model you point them at. Your investment isn’t in Anthropic. Your investment is in the agentic CLI pattern, and that pattern is now genuinely portable.
Migration cost: about an evening. The most expensive part is your inertia.
The four models worth your attention
I’m typing this on a stack that’s been live for two months, so I’ll keep this honest.
MiniMax M2.7 (released March 2026) is what I run daily. 205K context, $0.30 / $1.20 pricing, and a temperament that’s good at long agent loops without going off the rails. It’s the closest thing I’ve found to a Sonnet-feel at MiniMax pricing. Most days, it’s all I need.
Kimi K2.6 from Moonshot AI is the long-horizon specialist. $0.60 / $2.50, with automatic context caching that drops repeat-work input costs by 75%. Built for multi-agent orchestration and end-to-end coding across Python, Rust, Go. If your work involves long-running autonomous loops, Kimi is the one to test first. The OpenAI SDK is a drop-in replacement — you literally change the endpoint URL.
Qwen3 Coder from Alibaba is the open-weight workhorse. The 480B variant is free if you self-host. The proprietary Qwen3 Coder Plus runs $0.65 / $3.25. Strongest single-shot code generation in this group, in my testing — particularly on multi-file changes.
DeepSeek V3 continues to be the low-cost frontier reference point. Around $0.25 / $0.38 makes it the cheapest credible model on this list, and the 131K context window covers most real work. R1 (the reasoning variant) at $0.55 / $2.00 is the budget alternative to o1 and Opus extended thinking.
Four labs, four flavors. None of them are charging you to underwrite a public offering.
My stack, and what doesn’t port
I’m running OpenCode with MiniMax M2.7. Same AGENTS.md I’ve been refining for a year. Same MCPs. Same skill folder. Same workflow. The only thing that changed is the invoice.
A few things that don’t port cleanly yet, because I’d rather be useful than evangelical:
Rate-limit personality. The Big 3 have stable, well-documented quotas. The Chinese frontier labs and aggregators are still maturing here. Build in retry logic and a fallback model.
Tool-call fidelity at the edges. Sonnet still has the most polished tool-use behavior under stress. If your agent is making 30+ tool calls per turn on critical paths, you’ll feel the difference.
Long-context document reasoning over 100K tokens. Gemini still wins. If your work lives in giant PDFs, keep a Gemini key warm.
For everything else? My bill went down by roughly 90% and the work didn’t.
What to do Tonight
Three moves, in order:
First, install OpenCode or Pi tonight. Point it at your existing CLAUDE.md and one of your MCP servers. Watch it work. The migration cost is genuinely about an evening.
Second, run a real ticket — not a benchmark, an actual piece of work — through MiniMax M2.7 or DeepSeek V3. Compare what comes out to what Sonnet would have produced. Be honest with yourself about whether the difference is worth 10x.
Third — and this is the move most teams skip — read your last month’s API bill out loud to whoever owns your budget. That conversation tends to be productive.
I’m not telling you to abandon Anthropic. I still run Claude for the hardest 10% of my work and I’ll keep doing it as long as it earns its place. What I’m telling you is that paying frontier prices for every token, when 90% of your tokens are doing routine agentic plumbing, isn’t a technical decision anymore. It’s a tax on inattention.
The Big 3 are going public. Your stack doesn’t have to subscribe to the IPO.
Running something different that’s working well? I’d genuinely like to know — drop it in the comments. The whole point of an unbundled stack is that no two builders need to land in the same place.


