Hidden Token Tax — Part 2 / March 2026

The Gatekeeper's Spec

The companies that authored the Model Context Protocol also control the only data channel through it in consumer apps, bill per byte on that channel, and exempt their own tools from the cost.

ZHnukez.xyzgithub.com/nukez-xyz

Part 1 documented the empirical findings: a 653KB photograph, already resident on the provider's infrastructure, costs $3.26 in output tokens to transfer to an MCP server — or crashes the session entirely. The provider's own present_files tool moves the identical file to the user's browser at zero token cost. The SHA-256 corruption measurements, sandbox topology mapping, and economic analysis are published and attested on-chain.

This post examines what sits behind those findings. The question is no longer whether the bottleneck exists. The question is who built it, who profits from it, and whether the gap between first-party and third-party tool access is structural or incidental.

The Data Path

In every major LLM web and mobile application, the topology between a user-uploaded file and a connected MCP server is identical:

User's file
  ↓
Provider's servers (file uploaded from device, written to disk)
  ↓
Sandbox filesystem (/mnt/user-data/uploads/)
  ↓
  ╳ No network egress. No DNS. No outbound TCP.
  ↓
  Only exit: the LLM's output token stream
  ↓
LLM generates base64, character by character (billed as output tokens)
  ↓
Provider's MCP proxy forwards generated JSON to MCP server

There is one path. It runs through the LLM's probabilistic prediction engine. Every byte that crosses it is an output token. Every output token is billed at inference pricing.

The sandbox has no network stack. resolv.conf is empty. curl returns HTTP 000 to every domain, including the MCP server's own Cloud Run URL. Direct IP connections time out — TCP itself is blocked, not just DNS. The artifact iframe's Content Security Policy blocks every external domain except the provider's own API. There is no side channel.

MCP tool calls function because they route through a separate backend proxy. But the proxy only forwards what the LLM generates as output tokens. It does not read files from the sandbox filesystem. The LLM's token stream is the sole data exit from the compute environment.

Who Built the Path

The Model Context Protocol was developed by Anthropic. It was published as an open standard, promoted across the industry, and subsequently adopted by OpenAI, Google, and every major platform. The proposition to developers: build tools once, connect them to any LLM.

The spec assumes a direct relationship between the MCP client and the MCP server. In local environments — Claude Desktop, Claude Code, Cursor — that assumption holds. The client runs on the same machine as the model. It reads files from disk, sends them to the server over the network. The spec works as designed.

In the web and mobile apps — the environment where the majority of users operate — the assumption breaks. A proxy sits between the LLM and the MCP server. The sandbox has no network. The only data that reaches the proxy is what the LLM produces as output tokens.

The spec says your MCP server can receive file data. The deployment environment makes it structurally impossible to send file data without routing every byte through inference-priced token generation.

The company that authored the spec also built the app. The company that built the app also designed the proxy architecture. The company that designed the proxy architecture also sets the per-token billing rate. At no point in this chain did anyone build a mechanism for MCP tool parameters to reference files on disk — despite having built precisely that mechanism for their own present_files tool, on the same platform, operating on the same filesystem.

Three Layers of Control

Developing and deploying an MCP server for consumer LLM apps requires navigating three constraints, each governed by the same set of companies that promote the protocol.

The Bottleneck

Your MCP server is spec-compliant and fully functional. In the web and mobile apps, the only data channel to your server is the LLM's output token stream. You did not design this topology. You cannot modify it. The platform controls it. Every byte that traverses it costs the user money and risks corruption above ~5,400 characters of base64. The platform's own tools bypass it entirely.

The Spec's Silence

The MCP specification does not document this constraint. It does not disclose that in proxied environments, tool parameters are the sole data channel. It does not state that sandbox compute environments lack network egress. It does not warn that binary data above 5KB will silently corrupt when passed through the token stream. A developer reads the spec, builds a tool that accepts file data, tests it in a local client where it works, and discovers in production — the environment where the paying users are — that it does not. The spec was authored by the same organization that operates that environment.

The Marketplace Gate

Once you have built your MCP server, discovered the limitations through your own debugging, and engineered workarounds for the bottleneck, a third constraint applies: the MCP connector marketplace. The providers determine which MCP servers are listed, which receive promotion, and which are accessible from their apps. You are building for a platform that controls the distribution channel for the tools that connect to it.

Author the spec. Control the bottleneck. Gate the marketplace. Three layers, one set of operators.

What the User Sees

Nothing.

The user attaches a file. The user asks the model to store it. The operation either succeeds (small file, under 4KB of base64), runs for ten minutes through hundreds of chunked tool calls (the workaround), or crashes the session (anything over ~50KB inline). No interface element discloses:

“This operation will consume approximately 170,000 output tokens ($3.26) to transfer a file that is already on our servers. The transfer may silently corrupt the data. A zero-token path exists for this operation but is not available to third-party MCP tools.”

The user does not know the file is already on the platform's infrastructure. Does not know the model is regenerating every byte as a fresh probabilistic prediction. Does not know that present_files performs the equivalent transfer at zero cost. Does not know the session crashed because output tokens were exhausted on data transfer rather than reasoning. The user sees credits decrease and the operation fail. There is no attribution.

What the Developer Sees

You read the spec. You build your server. You test it in Claude Desktop and everything works — files upload, tools respond, the pipeline is clean. You deploy to production. You connect it to Claude.ai. A user attaches a 653KB image and calls your tool.

Session crash.

You spend a day diagnosing. The sandbox has no network. The proxy only forwards LLM-generated JSON. The fallback hint in your tool response instructed the model to base64-encode the entire file, which exceeded the output token limit. You rewrite the fallback hint. Now it instructs the model to chunk at 1KB. That works, but requires 650 sequential tool calls, each a full inference round-trip. You discover data corruption above ~5,400 characters. You add SHA-256 verification per chunk. You build a prep script. You ship a chunked upload pipeline with retry logic. Ten revisions of your MCP server, multiple deployments broken by a --set-env-vars versus --update-env-vars flag during rapid iteration.

An entire day of engineering to compensate for the absence of one parameter resolution rule in the provider's proxy — a rule that already exists for their own present_files tool.

None of this appears in the spec. None of it was documented anywhere. You discovered it by crashing sessions and reading an empty resolv.conf.

The Differential Pricing

When a user uploads a file and a first-party tool (such as present_files) acts on it, the transfer occurs at the infrastructure level. The LLM generates approximately 10 tokens of instruction. The file moves without prediction, without base64, without billing on the data itself.

When the same user uploads the same file and a third-party MCP tool acts on it, the LLM must regenerate the entire file content as base64 in its output stream. For a 100KB file: approximately 34,000 output tokens. For a 1MB file: approximately 340,000 output tokens. Billed at the same rate as the model's reasoning output.

A 100KB file via a first-party tool: ~60 tokens (the instruction).

The same 100KB file via your MCP server: ~34,200 tokens (the base64 transfer).

A 570x cost differential for the same operation on the same file, determined entirely by whose tool performs the work.

The platform's tools receive infrastructure-level file access. Third-party tools receive the token stream. The user absorbs the difference.

Charitable Interpretations

There are several.

“Security decision.” Sandboxing the compute environment is defensible. Proxying MCP traffic is defensible. Neither requires routing file data through the token stream. present_files demonstrates that file data can move at the infrastructure level from the sandbox to an external destination without compromising the security boundary. A file reference resolved in the proxy before forwarding to the MCP server would preserve the air gap while eliminating the token cost. The security model does not require the bottleneck. The bottleneck exists alongside the security model.

“Engineering gap.” The proxy team and the MCP team did not coordinate on file handling. Perhaps. But the MCP spec, the proxy architecture, and present_files were all developed internally at the same organization. The gap between “we can move files without tokens” and “third-party tools can move files without tokens” sits between teams at the same company building components of the same system. At a certain point, the distinction between “we didn't coordinate” and “we didn't prioritize” becomes academic.

“It's early.” The Files API shipped in April 2025 — twelve months ago. It supports uploads to 500MB with stable file_id references. MCP has been live in consumer apps for over a year. present_files has existed for over a year. The infrastructure to resolve file references at the proxy level has been deployed for twelve months. The gap between that capability and its extension to MCP tool parameters is not a function of technical maturity. It is a prioritization decision that has been deferred for a year while the token meter runs.

What Cannot Be Claimed

That the constraint is unknown. The empirical data is public. The corruption threshold, the sandbox topology, the present_files comparison, and the economic arithmetic are all reproducible. The investigation reports are stored in a Nukez locker with on-chain attestation via Switchboard oracle. The evidence chain is immutable.

That a fix is technically infeasible. present_files is the fix, pointed at a different destination. The Files API is the fix, with a parameter resolution rule in the proxy. MCP Resources already support binary data in the spec. The components exist. They are deployed. They run in the same process. They have not been connected.

That developers have not raised the issue. This post and its predecessor constitute the documentation. The methodology is published. The data is on a blockchain. And the problem is not unique to Nukez — every MCP server that handles file data in web and mobile apps encounters the same structural constraint.

The Ecosystem Question

The MCP protocol is open-source. The spec is public. Developers are invited to build against it. But the execution environment — the platform where the tools actually run for the majority of users — operates under a different set of rules than the spec describes. The spec says “build tools that handle data.” The platform routes every byte of that data through inference-priced prediction.

I built Nukez because I believe agent autonomy is a design problem that gets solved by infrastructure, not by larger models. This investigation demonstrated something related: the infrastructure that would close the gap between first-party and third-party tool capabilities already exists on the platforms. It has existed for a year. The decision not to connect it is not a technical limitation. It is a business posture.

If you are building an MCP server and encountering the same constraints — sandbox cannot reach the network, base64 corrupts above 5KB, large files crash sessions — the problem is not in your implementation. It is in the path. And the organizations that control the path are the same ones that invited you to build for it.

The diagnostic methodology is published. The present_files comparison is reproducible by anyone with a Claude.ai session and a connected MCP server. Attach a 500KB file. Ask your MCP tool to store it. Observe the result. Then call present_files on the same file and watch it arrive in the browser, instantly, at no token cost. Draw your own conclusions.

The LLM providers' control over ingress and egress paths from mobile and web applications to MCP servers appears trivial. A proxy here, a sandbox restriction there. It is not trivial. This is precisely where monopolies form — at chokepoints so narrow and so technical that they escape scrutiny while determining who can build what, at what cost, for whom. The obscurity is what makes them dangerous. Because the constraint looks like an implementation detail rather than a market structure decision, it attracts no regulatory attention, no developer outcry, and no competitive pressure. It just sits there, quietly taxing every byte.

The significance of this chokepoint compounds when paired with the providers' control of their proprietary MCP connector marketplaces — the app stores that determine which third-party tools users can discover and connect. Write the spec, control the data path, gate the distribution. Individually, each layer looks like a reasonable product decision. Together, they constitute vertical control over an ecosystem that was promoted as open.

And the most corrosive element is this: the MCP specification was published by its founding members as an ideological proposition — an open protocol, a universal standard, an invitation to build. Developers accepted the invitation, invested engineering effort against the spec's promises, and discovered in production that the platforms do not honor those promises in the environments where their users actually are. The spec describes a world of direct connections and unrestricted data transfer. The apps deliver a world of proxied bottlenecks and token-metered file access. The distance between those two descriptions is not a gap. It is a bait and switch. And the longer it goes unaddressed, the harder it becomes to interpret as anything other than deliberate.