Hidden Token Tax — Part 1 / March 2026

The Hidden Token Tax

LLM platforms force binary data through probabilistic prediction, charge per token on both ends, and already possess the infrastructure to make it cost zero. An empirical investigation with SHA-256 verification, sandbox topology mapping, and on-chain evidence.

ZHnukez.xyzgithub.com/nukez-xyzEvidence attested on Solana devnet

A 653-kilobyte photograph was uploaded through the Claude iOS app. The file traveled from my phone to Anthropic's servers and was written to /mnt/user-data/uploads/IMG_7935.png in the compute sandbox. Standard upload. No LLM involvement. The bytes were on their infrastructure, bit-perfect, on disk.

Then I asked the model to store that file in my Nukez locker via the MCP server.

The session crashed.

Not because the file was corrupt. Not because the storage service was unreachable. Because the only available mechanism to move that file from where it already sat to where it needed to go was to force every byte through the LLM's token prediction engine — base64-encoded, character by character, each one an independent probabilistic sample from a 100,000+ token vocabulary. At 653KB, that is approximately 217,000 output tokens of arbitrary base64. The model exceeded its output limit and the conversation died.

I spent the next twelve hours establishing why, measuring exactly where the failure boundary sits, and documenting that the provider already built the fix for a different destination on the same platform.

The Sequence of Events

1. File lands at /mnt/user-data/uploads/IMG_7935.png    ← Standard upload, no LLM
2. LLM calls nukez_store({sandbox_path: "..."})          ← ~100 tokens
3. MCP server returns a signed PUT URL + curl command     ← Correct behavior
4. LLM executes curl in sandbox → HTTP 000                ← DNS blocked. No network.
5. Fallback hint says: "base64-encode the file"           ← The crash instruction
6. LLM attempts to output 678,256 chars of base64         ← ~170,000 OUTPUT TOKENS
7. Output token limit exceeded → SESSION CRASH

Step 6 is where the economics break down. A probabilistic text prediction engine was tasked with regenerating 678,256 characters of base64, character by character, as output tokens billed at the output rate. This is not reasoning. It is the most expensive memcpy in the history of computing. And it fails.

~170K

Output tokens attempted

$3.26

Cost at Opus pricing

CRASH

Outcome

Empirical Corruption Data

SHA-256 verification was built into the Nukez MCP server to measure exactly what happens when binary data passes through the token stream at different payload sizes. The methodology: Python's base64 module encoded each chunk deterministically. The LLM read the encoded output from bash. The LLM reproduced that string in a tool call parameter. The server verified the hash on receipt.

Chunk Size	Base64 Chars	~Tokens	Result
512 B	684	170	5/5 SHA-256 verified clean
1 KB	1,368	340	Accepted, valid base64
2 KB	2,732	680	CORRUPTED — 57 bytes lost, hash mismatch
4 KB	5,464	1,400	CORRUPTED — 247 chars silently dropped
653 KB	870,000	217,500	SESSION CRASH

The corruption threshold sits between 1,368 and 2,732 base64 characters of output per inference. Above that boundary, the model silently drops characters from the reproduced string. At 4KB, the output was 5,217 characters instead of 5,464. No error. No warning. The output remains structurally valid base64 — it decodes without exception — but the decoded result is shorter than the input. Without checksums, the corruption is invisible.

Sandbox Topology

A reasonable question: why not bypass the LLM entirely and curl the file to the MCP server? The sandbox architecture eliminates that possibility at the TCP level.

$ cat /etc/resolv.conf
(empty — no DNS resolver configured)

$ curl -s https://api.nukez.xyz/health
curl: (6) Could not resolve host: api.nukez.xyz

$ curl --resolve api.nukez.xyz:443:34.120.x.x https://api.nukez.xyz/health
HTTP 000 (TCP itself blocked, not just DNS)

Every outbound network path from the sandbox is dead. Raw TCP to 8.8.8.8 on ports 443 and 53: timeout. Python urllib: “Temporary failure in name resolution.” The proxy environment variables are unset. The sandbox is not behind a firewall — it has no network stack to speak of.

MCP tool calls function because they route through the provider's backend proxy via a separate channel, not the sandbox's network. The proxy bridges the gap. But the proxy only forwards the JSON that the LLM generates as output tokens. It does not read files from the sandbox filesystem. The LLM's token stream is the only exit.

The artifact iframe is similarly constrained. Content Security Policy blocks requests to any domain except the provider's own API. The Files API endpoint is reachable but rejects the request — it requires an API key, and no implicit auth is provided in the sandbox context.

The Proof It Should Cost Zero

present_files moves bytes from sandbox disk to the user's browser. No token prediction. No base64 encoding. The LLM generates a filepath — twenty tokens — and the infrastructure does the rest. Same sandbox. Same files. Same platform. The capability exists. It was not extended to MCP.

present_files is a built-in tool. It accepts a path like /mnt/user-data/outputs/report.pdf. The platform reads bytes from disk and serves them to the browser. I used it throughout this investigation — delivering diagnostic reports, artifacts, screenshots. It never failed. It never corrupted data. It works for any file size.

MCP tool parameters have no equivalent. When the LLM calls nukez_store({data_b64: "iVBOR..."}), it must generate every character of that base64 string as output tokens. The proxy forwards the generated JSON. There is no pass-through mode. There is no copy-from-disk instruction.

The same file, on the same filesystem, sitting at the same path. One destination (the browser) costs twenty tokens. The other destination (the MCP server) costs 170,000 tokens and crashes the session. The difference is twenty lines of proxy code.

def resolve_file_references(arguments):
    if isinstance(arguments, dict):
        if "file_reference" in arguments:
            path = arguments["file_reference"]
            with open(path, "rb") as f:
                raw = f.read()
            arguments["data_b64"] = base64.b64encode(raw).decode()
            arguments["sha256"] = hashlib.sha256(raw).hexdigest()
            del arguments["file_reference"]
        for v in arguments.values():
            resolve_file_references(v)

Insert between “parse tool arguments” and “forward to MCP server” in the proxy pipeline. The MCP server receives standard data_b64. It does not need to know the proxy filled it in. Zero protocol changes. Zero model changes.

The Compounding Tax

Binary data transfer is the most dramatic symptom, but the token tax extends across every turn of every conversation. The following costs are imposed by the platform, invisible in the conversation interface, and billed at the input rate.

System prompts: ~4,000 tokens/turn

Behavioral instructions, safety guidelines, formatting rules, product information. Loaded on every turn. The user does not see them, does not consent to them, and cannot disable them. Across a 50-turn conversation, this single category accounts for approximately 200,000 input tokens.

Tool schemas: ~3,000+ tokens/turn

Every connected MCP tool's full schema — name, description, parameter definitions, enums — is loaded into context on every turn. With nine Nukez tools plus a deferred tool registry, this is 3,000 tokens per turn whether or not any tool is invoked. GitHub's MCP server alone, at 91 tools, consumes nearly a quarter of a 200K context window.

Context window compounding: O(n²)

Each turn's input includes all previous turns. Turn 1 costs T tokens. Turn 2 costs 2T. Turn N costs NT. Total cost over a conversation is proportional to the square of the number of turns: a 50-turn conversation costs approximately 1,275 times a single turn, not 50 times. The system prompt, tool schemas, and memory injections are re-processed on every one of those turns.

Image re-tokenization: ~1,000 tokens/image/turn

A screenshot uploaded at turn 5 continues consuming tokens at turns 6, 7, 8, and every subsequent turn it remains in context. The user uploaded it once. They are billed for it every turn.

Compaction overhead: ~10,000 tokens per event

When the context window fills, the platform generates a summary (output tokens, billed at the output rate) and re-reads that summary on subsequent turns (input tokens, billed at the input rate). The user experiences a brief pause. They pay twice.

The Measured Cost

All measurements taken on March 8, 2026 across 10+ Cloud Run deployments of the Nukez MCP server. Token counts derived from session diagnostics and provider usage logs.

11.9M

Tokens wasted

$449

At Opus pricing

Binary files (>8KB) stored

Category	Tokens	Cost (Opus)
Failed binary uploads	1,164,279	$43.66
Diagnostic sessions (compounded)	10,669,962	$400.12
Retry loops (curl, path, fallback)	145,000	$5.44
Total measurable waste	11,979,241	$449.22

The data that should have been stored: 5,639 KB across 11 file upload attempts. The data stored: approximately 20 bytes of placeholder strings. Cost of the same operation with file_reference: approximately 330 tokens total, or $0.01.

Reduction factor: 36,301x.

Cross-Provider Pattern

This investigation focused on Anthropic's Claude.ai, but the structural problem is not provider-specific. OpenAI's ChatGPT shares the same sandbox architecture, the same MCP proxy topology, the same egress restrictions. The identical File arg rewrite paths are required when proxied mounts are present error was documented on ChatGPT's MCP implementation on March 7, 2026.

The MCP specification itself has no concept of binary transfer in tool parameters. Resources can contain binary blobs, but resources are application-controlled — the client decides when to fetch them, not the model. Tools are model-controlled. The spec was designed for a world where the MCP client sits on the same machine as the model. In web and mobile apps, a proxy sits between them, and the proxy only passes what the model generates as tokens.

The industry promoted MCP as a universal connector. The current implementations force all data through a probabilistic prediction pipe that corrupts above 1KB and crashes above 650KB. The connector works. The deployment context breaks it. And the deployment context is where the users are.

The Economic Structure

The following observations are factual. The reader may draw their own conclusions about intent.

Providers charge per token, input and output. Providers force file data through the token stream for MCP tool calls. Providers have infrastructure to bypass the token stream for file transfer — present_files, the Files API (up to 500MB, stable file_id references, shipped April 2025). Providers have not extended that infrastructure to MCP tool calls.

Every failed upload attempt burns tokens. The user pays. Every retry burns tokens. The user pays. Every diagnostic session attempting to identify and work around the limitation burns tokens. The user pays. A 653KB file that crashes the session still consumes tokens up to the crash point. The user pays. The session is destroyed. The work is lost.

The fix requires no new infrastructure. Both capabilities — file path resolution and MCP request forwarding — exist on the same platform, operate on the same filesystem, and run in the same process. They have not been connected. One parameter resolution rule in the proxy would eliminate the problem entirely.

The Evidence Is On-Chain

Every measurement, diagnostic finding, and session transcript from this investigation is stored in a Nukez cryptographic locker with on-chain attestation via Switchboard oracle on Solana devnet. The evidence is immutable and publicly verifiable.

Record	Detail
Locker	`locker_9f3e94dbf694`
Receipt	`0d3336707fccdd30`
Solana TX	`2SzibMNeh...GAzUvT`
Attestation	Merkle root via Switchboard PullFeed + SPL Memo
Contents	Investigation evidence, feature request, micro-batch spec, this report

It cannot be altered after the fact. That is the point of receipt-bound storage: the evidence chain exists independent of the parties who generated it.

If you are building MCP servers and encountering the same constraint in web and mobile deployments, the problem is structural, not specific to your implementation. The binary transfer path does not exist in the proxy. The workaround — chunked upload at 1KB increments with SHA-256 verification per chunk — works, but it costs 400+ sequential tool calls for a 400KB file and takes ten minutes.

The proxy already resolves file paths for the browser. It already forwards JSON to MCP servers. Both run in the same process. Someone just has to connect them.