§ proof · benchmark

The Benchmark

48 production cloud models across 8 providers. 720 end-to-end runs on Solana mainnet. 614 all-pass. 31 at a perfect 15 / 15. Real receipts, real signatures, byte-perfect verification.

§ 01 — at a glance

The full result, in one frame.

Every cell below traces back to the same source: a single canonical end-to-end run on Solana mainnet. A run counts as a pass only when the agent completes the full workflow — request, pay, confirm, provision, create, upload, download — and the downloaded bytes match the uploaded bytes exactly. No partial credit, no synthetic data, no mocks.

614/720

aggregate all-pass

85.3%

run pass rate

production models

models at 100%

cloud providers

denominator, no rollups

The number to the right of the slash is every attempted run in the corpus. The number to the left is every attempted run that completed the entire workflow and produced a byte-identical round-trip. Of the 48 production models exercised, 31 clear the workflow on every attempt; the remainder land along the distribution shown in §03 below.

§ 02 — by provider

Eight providers, one workflow.

Every provider in the catalog runs the same canonical workflow against the same gateway. Anthropic, Google, Mistral, and Inception models route through their native SDKs; xAI, Cohere, NVIDIA, and OpenAI-compatible endpoints are exercised through the OpenAI-compat surface. The gateway sees no difference between them.

Provider	Models	Runs	All-pass	Rate
anthropic	7	105	104	99.0%
google	8	120	117	97.5%
mistral	5	75	72	96.0%
inception	1	15	14	93.3%
openai	13	195	171	87.7%
xai	8	120	93	77.5%
nvidia	2	30	18	60.0%
cohere	4	60	25	41.7%
aggregate	48	720	614	85.3%

§ 03 — per-model results

All forty-eight models, one table.

Every model the catalog advertises, every run it attempted, its end-to-end pass rate. The 31 perfect-score models appear first; the long tail follows in descending order. The bar to clear is end-to-end completion plus byte-equal verification — the same bar for every row.

Provider	Model	Runs	Pass rate
anthropic	claude-haiku-4-5-20251001	15/15	100.0%
anthropic	claude-opus-4-1-20250805	15/15	100.0%
anthropic	claude-opus-4-5-20251101	15/15	100.0%
anthropic	claude-opus-4-6	15/15	100.0%
anthropic	claude-sonnet-4-5-20250929	15/15	100.0%
anthropic	claude-sonnet-4-6	15/15	100.0%
google	gemini-2.5-flash	15/15	100.0%
google	gemini-2.5-pro	15/15	100.0%
google	gemini-3-flash-preview	15/15	100.0%
google	gemini-3.1-flash-lite	15/15	100.0%
google	gemini-3.1-flash-lite-preview	15/15	100.0%
google	gemini-3.1-pro-preview	15/15	100.0%
mistral	devstral-latest	15/15	100.0%
mistral	magistral-medium-latest	15/15	100.0%
mistral	mistral-large-latest	15/15	100.0%
mistral	mistral-medium-latest	15/15	100.0%
openai	gpt-4.1	15/15	100.0%
openai	gpt-4.1-mini	15/15	100.0%
openai	gpt-4o	15/15	100.0%
openai	gpt-5-mini	15/15	100.0%
openai	gpt-5.1	15/15	100.0%
openai	gpt-5.4	15/15	100.0%
openai	gpt-5.4-mini	15/15	100.0%
openai	gpt-5.5	15/15	100.0%
openai	o3	15/15	100.0%
openai	o4-mini	15/15	100.0%
xai	grok-3	15/15	100.0%
xai	grok-4-0709	15/15	100.0%
xai	grok-4-1-fast-reasoning	15/15	100.0%
xai	grok-4-fast-reasoning	15/15	100.0%
xai	grok-code-fast-1	15/15	100.0%
anthropic	claude-opus-4-7	14/15	93.3%
cohere	command-a-reasoning-08-2025	14/15	93.3%
google	gemini-2.0-flash	14/15	93.3%
inception	mercury-2	14/15	93.3%
google	gemini-2.5-flash-lite	13/15	86.7%
openai	gpt-5-nano	13/15	86.7%
mistral	mistral-small-latest	12/15	80.0%
nvidia	nvidia/nemotron-3-nano-30b-a3b	12/15	80.0%
xai	grok-3-mini	12/15	80.0%
cohere	command-a-03-2025	11/15	73.3%
openai	gpt-5.4-nano	8/15	53.3%
nvidia	nvidia/llama-3.3-nemotron-super-49b-v1	6/15	40.0%
xai	grok-4.20-0309-non-reasoning	4/15	26.7%
xai	grok-4.20-0309-reasoning	2/15	13.3%
cohere	command-r-plus-08-2024	0/15	0.0%
cohere	command-r7b-12-2024	0/15	0.0%
openai	gpt-4.1-nano	0/15	0.0%

§ 04 — end-to-end mainnet proof

A single live session, every artifact verifiable.

The numbers above are the population. This is one member of that population — a single live session on https://api.nukez.xyz on 2026-04-13. Every identifier below is a real on-chain or in-gateway artifact: the transaction is on Solana mainnet; the receipt and locker are records in the production gateway; the uploaded file is byte-equal to the downloaded file.

Network	solana-mainnet
Quote	$20.00 · 0.240211387 SOL @ $83.26 / SOL · 5 payment legs advertised (SOL · USDC · USDT0 · MON · WETH)
Payer pubkey	BhBeSkwKyqysZstzkqdf4qAcYfS9r27wEMmouvSVfp1U
Transaction signature	4QAaRa4LBUT4311HBsSiw88JoDyZCzW4ViFmaSz88MhKD3nzLL2W6KSzhcDUt1VmNHqPUnZExazAUpy6b9W3Sx1g
Receipt ID	ee330fef107cfe66
Locker ID	locker_67670a230a15
Single-file upload	`Anza.pdf` — 5,395,694 bytes uploaded, 5,395,694 bytes downloaded, byte equal
Batch upload	3-file batch · video / kml / json · all `HTTP 200` on the signed upload URLs

Wire excerpt — POST /v1/storage/confirm response

{
  "ok": true,
  "receipt_id": "ee330fef107cfe66",
  "receipt": {
    "id": "ee330fef107cfe66",
    "resource": "storage",
    "provider": "gcs",
    "tx_hash": "4QAaRa4LBUT4311HBsSiw88JoDyZCzW4ViFmaSz88MhKD3nzLL2W6KSzhcDUt1VmNHqPUnZExazAUpy6b9W3Sx1g",
    "payer_pubkey": "BhBeSkwKyqysZstzkqdf4qAcYfS9r27wEMmouvSVfp1U",
    "pay_to_address": "HqrTLaNk89dHPVR5zyMPtEiRnQfabmqc8B9yR3KVbyVp",
    "network": "solana:5eykt4UsFv8P8NJdTREpY1vzqKqZKvdp",
    "pay_asset": "SOL",
    "units": 1,
    "unit_price_usd": 20.00,
    "paid_amount": "0.240211387",
    "slot": 412988282,
    "receipt_sig_alg": "ed25519",
    "receipt_hash": "01f365df877adccabfcfeeef31edcfd7653a90364cafd74846257b0338708d31"
  }
}

Wire excerpt — 5.4 MB download verification

# Download via the signed download URL
DOWNLOAD_URL="https://api.nukez.xyz/f/AhNsb2NrZXJfNjc2NzBhMjMwYTE1..."
curl -sS -o /tmp/Anza_downloaded.pdf -w "DOWNLOAD_HTTP=%{http_code} SIZE=%{size_download}\n" \
  -L "$DOWNLOAD_URL"
# DOWNLOAD_HTTP=200 SIZE=5395694

# Original 5,395,694 bytes  =  Downloaded 5,395,694 bytes  ✓

§ 05 — operation latency

Per-operation wall clock, real client to production.

A separate live client session captured per-operation latency end-to-end against the production front-door (2026-03-24). Each row is a real round-trip including any client-side envelope signing. The 1.7 MB image upload completes in under six seconds; warm reads and public verifications return in under one.

Operation	Wall (ms)	Payload	Notes	Status
status	7,790.4	25 B	cold first call	success
store_small_text	4,912.1	270 B	delegate + sign + write	success
store_image_1685kb	5,811	2.30 MB	1.7 MB image, signed write	success
retrieve	988.1	37 B	warm read	success
verify	851.9	39 B	public verification	success
status_warm	2,968.1	25 B	warm follow-up	success

Read-side operations (retrieve, verify) land under a second. Writes that move multi-megabyte payloads complete in five to six seconds, including the client-side signed envelope construction.

§ 06 — hosted multi-agent integration

The protocol composes, inside an agent mesh.

The same Nukez agent slots into a hosted multi-agent orchestration mesh as a delegated specialist. In the canonical AAAP test stack a Claude-Sonnet-led router receives a user request, recognises storage-related keywords, and delegates verbatim to the Nukez agent at http://localhost:8080/v1/chat. The Nukez agent completes the storage operation and returns a structured result; the lead summarises it for the user.

Orchestration mode	standard
Lead model	anthropic / claude-sonnet-4
Delegated agent	`nukez_agent` · service · {base_url}/v1/chat
Delegation triggers	storesaveuploadretrievedownloadfilelockernukezstoragememoryrememberreportbackup
Signing-request relay	Lead surfaces signing requests verbatim; developer authorises through the standard signed-envelope flow.

Excerpt — canonical AAAP stack (aaap_test.yaml)

name: aaap-integration-test

orchestration:
  mode: standard

lead:
  model: claude-sonnet-4-20250514
  provider: anthropic

  routing_prompt_extra: |
    You have access to a storage agent (nukez_agent) that can
    store, retrieve, and manage files via the Nukez decentralised
    storage network. Delegate any storage-related requests to it.

  agents:
    nukez_agent:
      type: service
      base_url: http://localhost:8080
      endpoint: /v1/chat
      description: >
        Nukez storage specialist agent. Handles storing files,
        retrieving data, managing lockers, and all storage
        operations on the Nukez network.

§ 07 — zero-prior-knowledge integration

One discovery URL, one wallet, byte-equal data.

A single autonomous agent run, given only a discovery URL and a payment-enabled Solana wallet. No SDK, no examples, no prior knowledge of Nukez. The agent reads the published documentation, executes the full payment / authentication / storage / verification flow, and reports EXACT_MATCH against the original payload.

Date	Model	Iterations	Duration	Verification	Receipt
2026-01-22	claude-sonnet-4	21	448.74s	EXACT_MATCH	d2c2d507159010f1

For the full thirteen-step walk-through — including the verbatim task prompt, the agent’s final structured output, and the transaction signature — see /proof/methodology §04.

§ 08 — cumulative footprint

What the program covers, at full scope.

Eleven dimensions of the testing program, in one card. Every entry traces back to an artifact published on this page or on /proof/methodology.

Cloud providers exercised	8
Production models tested	48
Models at 100% pass	31
End-to-end runs (this corpus)	720
Aggregate all-pass runs	614
Aggregate run pass rate	85.3%
Mainnet payment chains used	Solana + Monad (EVM)
Payment legs advertised	5 (SOL · USDC · USDT0 · MON · WETH)
Largest upload verified	5.4 MB · byte-perfect
Hosted-service integration	AAAP via multiagentz mesh
Cold-start integration	21 iterations · zero prior knowledge

§ 09 — reproducibility

Run it yourself.

Every datum on this page is reproducible from the public test harness against the production gateway. The wallet pattern, the test scripts, and the payment legs are the same ones every consumer uses.

Gateway. https://api.nukez.xyz — production, no private path.
Network. Solana mainnet plus Monad (EVM) on the payment-leg side. Real lamports / native tokens out of a real wallet on every request — no faucet, no devnet, no simulation.
Models. All 48 are reachable via their providers' public APIs. No private model access.
Sweep harness. The canonical end-to-end sweep that produced §01–§03 lives in the public agent-testing repo. The mainnet curl session in §04 reproduces step-by-step from the published curl log.
Operation latency. The §05 timings come from a separate client (nukezcortex-front-door) hitting production. Same gateway, same wallet pattern.
AAAP. The §06 multi-agent stack is a single yaml file. Stand up the NukezAgent service, run maz --config aaap_test.yaml, and the storage delegations land at the same gateway as everything else.

§ 10 — source artifacts

The artifacts this page summarises.

Five underlying artifacts produced the data on this page. Each is a primary source — the page is the synthesis, not the truth-of-record.

Canonical end-to-end sweep. The 720-run, 48-model, 8-provider corpus that produced §01–§03. Every run records its tx_signature, receipt_id, locker_id, trace, and a byte-equality check on the round-trip.
Mainnet curl test log · 2026-04-13 · The §04 step-by-step session against api.nukez.xyz. Records the verbatim curl commands, the 402 payment quote, the on-chain transfer, the receipt, the locker provision, file create + upload + download, and the byte-equality check on a 5.4 MB PDF.
Operation-latency client session · 2026-03-24 · The §05 per-operation timings from a real client against the production front-door.
AAAP integration stack · The §06 yaml that wires the Nukez agent into a hosted multi-agent mesh as a delegated storage specialist.
Cold-start appendix · 2026-01-22 · The §07 single-agent zero-prior-knowledge run. Discovery URL only; the agent figured out everything else from the public documentation.

← back to proof · or read the methodology →