Frontier labs won't build good harnesses. Their incentives won't let them.

Frontier labs won't build good harnesses. Their incentives won't let them.

Here's the problem nobody at Anthropic or OpenAI wants to say out loud: you can't build a token-efficient harness when your entire business model depends on burning tokens.Anthropic just committed $50B to data centers with Fluidstack. OpenAI's Stargate is past $400B in committed investment and climbing. Total hyperscaler AI capex is on pace for roughly $700B this year. Dario literally said there's "no hedge on earth" against overbuying compute. That capital has to be fed. Every GPU in Texas and New York needs tokens flowing through it to justify the balance sheet.Now ask yourself what a "good" harness actually looks like. A good harness uses the fewest tokens possible to complete a task. That's it. That's the whole job. The harness is the scaffolding around the model — context management, tool calls, planning loops, memory. If two harnesses get the same job done and one uses 5x fewer tokens, the efficient one is better. Period. The harness matters more than the model at this point. Same Sonnet underneath Claude Code, Cursor, Cline, and a dozen no-name CLIs, and they feel like completely different products — different token burn, different success rates, different cost per task. The model is the engine. The harness is the entire car. Once you accept that, the question of who builds the best car becomes a question about who's actually trying to make it fuel-efficient. And that's where the math gets ugly for the labs.

This isn't bad engineering

Anthropic's engineers are excellent. This is a story about which bugs get prioritized and which don't. When an independent harness-maker finds token waste, they ship the fix that night — their entire competitive position depends on it. When a frontier lab finds token waste, it's a P2 that loses every sprint to a feature that drives more API calls.

I'm not accusing anyone of bad faith. I'm saying incentives bend engineering decisions without anyone noticing, and over a year those bends compound into 1.25M-token-per-session overhead that nobody at the top of the org chart has a reason to care about.

A pure harness-maker wakes up every morning trying to cut token usage. A frontier lab wakes up every morning trying to fill a gigawatt of Blackwell. These are opposite jobs.

The subsidy is the trick

Here's the part that makes the whole thing work — for now. None of those wasted tokens show up on any user's bill in a way they can see. Pro is $20/month. Max is $100 or $200/month. You pay flat, you get throttled, you never see the backend cost of any individual run. The inefficiency is invisible because the pricing model hides it.What's actually happening underneath is that Anthropic and OpenAI are eating their own bloat with margin. Adjusted gross margins at both labs are reportedly in the 30-40% range — respectable, but that number exists partly because the labs are absorbing token waste a lean harness wouldn't generate. The subsidy runs in two directions at once: it hides the harness inefficiency from the customer, and it quietly pressures the lab's P&L into making the inefficiency worse, because the fastest way to paper over a margin squeeze is to drive more usage onto data centers you already bought. The Max users hitting their caps in 90 minutes are noticing the symptom without seeing the cause. They think they're paying for a seat. They're actually paying for a lossy abstraction over compute that nobody upstream is motivated to make efficient. In March 2026, $200/month Max subscribers were reporting 5-hour usage windows vanishing in 90 minutes. That's the subsidy leaking through. The margin can't hide a 4x regression. The day enterprises start pricing coding agents by task instead of by seat, the whole thing breaks. On that day the token-efficient harness wins by default — not because it's smarter, but because it's cheaper per unit of real work done. And weirdly, that's also the day open-source catches up instantly. Strip the subsidy and GLM or Qwen running inside a well-built harness beats Opus running inside Claude Code on price-performance. I've been saying this for a while.

We built our own for a reason

We built our own native harness at pre.dev after Claude Code SDK failed on long-running tasks. The second you're not trying to maximize tokens, you find insane amounts of waste. Pre-processing files before they hit context. Shorthand for recurring entities. Skills loading on demand instead of at session start. Constraints inside the loop. Deduping context injections the default tooling fires off fifty times a session.None of this is hard engineering. It's the kind of work you only bother to do when your incentives line up with it.The counterargument the labs will make: "we have the best models, we understand them best, obviously we'll build the best harness." Maybe. But understanding the model is table stakes. Harness engineering is a different discipline — systems work, constraint design, saying no to every extra token. Ryan Lopopolo at OpenAI Frontier clearly gets it, which is why he talks about token-efficient tools and one-minute inner loops. But Frontier is one team inside a company whose revenue line goes up when that one-minute loop becomes a ten-minute loop. Good luck.Frontier labs are racing into harness-building because they see agent harnesses eating the value chain. Martin Fowler called it "harness engineering." Aakash Gupta wrote "2025 was agents, 2026 is agent harnesses." They know. But they're running uphill against their own P&L. Every good harness decision they make hurts unit economics. Every bad harness decision makes their investors nervous about the $700B capex bill.You don't have to pick a winner yet. You just have to notice that the guy building the harness and the guy paying for the data center can't be the same guy.