§ proof · benchmark
The Benchmark
48 production cloud models across 8 providers. 720 end-to-end runs on Solana mainnet. 614 all-pass. 31 at a perfect 15 / 15. Real receipts, real signatures, byte-perfect verification.
§ 01 — at a glance
The full result, in one frame.
Every cell below traces back to the same source: a single canonical end-to-end run on Solana mainnet. A run counts as a pass only when the agent completes the full workflow — request, pay, confirm, provision, create, upload, download — and the downloaded bytes match the uploaded bytes exactly. No partial credit, no synthetic data, no mocks.
614/720
aggregate all-pass
85.3%
run pass rate
48
production models
31
models at 100%
8
cloud providers
1
denominator, no rollups
The number to the right of the slash is every attempted run in the corpus. The number to the left is every attempted run that completed the entire workflow and produced a byte-identical round-trip. Of the 48 production models exercised, 31 clear the workflow on every attempt; the remainder land along the distribution shown in §03 below.
§ 02 — by provider
Eight providers, one workflow.
Every provider in the catalog runs the same canonical workflow against the same gateway. Anthropic, Google, Mistral, and Inception models route through their native SDKs; xAI, Cohere, NVIDIA, and OpenAI-compatible endpoints are exercised through the OpenAI-compat surface. The gateway sees no difference between them.
| Provider | Models | Runs | All-pass | Rate |
|---|---|---|---|---|
| anthropic | 7 | 105 | 104 | 99.0% |
| 8 | 120 | 117 | 97.5% | |
| mistral | 5 | 75 | 72 | 96.0% |
| inception | 1 | 15 | 14 | 93.3% |
| openai | 13 | 195 | 171 | 87.7% |
| xai | 8 | 120 | 93 | 77.5% |
| nvidia | 2 | 30 | 18 | 60.0% |
| cohere | 4 | 60 | 25 | 41.7% |
| aggregate | 48 | 720 | 614 | 85.3% |
§ 03 — per-model results
All forty-eight models, one table.
Every model the catalog advertises, every run it attempted, its end-to-end pass rate. The 31 perfect-score models appear first; the long tail follows in descending order. The bar to clear is end-to-end completion plus byte-equal verification — the same bar for every row.
| Provider | Model | Runs | Pass rate |
|---|---|---|---|
| anthropic | claude-haiku-4-5-20251001 | 15/15 | 100.0% |
| anthropic | claude-opus-4-1-20250805 | 15/15 | 100.0% |
| anthropic | claude-opus-4-5-20251101 | 15/15 | 100.0% |
| anthropic | claude-opus-4-6 | 15/15 | 100.0% |
| anthropic | claude-sonnet-4-5-20250929 | 15/15 | 100.0% |
| anthropic | claude-sonnet-4-6 | 15/15 | 100.0% |
| gemini-2.5-flash | 15/15 | 100.0% | |
| gemini-2.5-pro | 15/15 | 100.0% | |
| gemini-3-flash-preview | 15/15 | 100.0% | |
| gemini-3.1-flash-lite | 15/15 | 100.0% | |
| gemini-3.1-flash-lite-preview | 15/15 | 100.0% | |
| gemini-3.1-pro-preview | 15/15 | 100.0% | |
| mistral | devstral-latest | 15/15 | 100.0% |
| mistral | magistral-medium-latest | 15/15 | 100.0% |
| mistral | mistral-large-latest | 15/15 | 100.0% |
| mistral | mistral-medium-latest | 15/15 | 100.0% |
| openai | gpt-4.1 | 15/15 | 100.0% |
| openai | gpt-4.1-mini | 15/15 | 100.0% |
| openai | gpt-4o | 15/15 | 100.0% |
| openai | gpt-5-mini | 15/15 | 100.0% |
| openai | gpt-5.1 | 15/15 | 100.0% |
| openai | gpt-5.4 | 15/15 | 100.0% |
| openai | gpt-5.4-mini | 15/15 | 100.0% |
| openai | gpt-5.5 | 15/15 | 100.0% |
| openai | o3 | 15/15 | 100.0% |
| openai | o4-mini | 15/15 | 100.0% |
| xai | grok-3 | 15/15 | 100.0% |
| xai | grok-4-0709 | 15/15 | 100.0% |
| xai | grok-4-1-fast-reasoning | 15/15 | 100.0% |
| xai | grok-4-fast-reasoning | 15/15 | 100.0% |
| xai | grok-code-fast-1 | 15/15 | 100.0% |
| anthropic | claude-opus-4-7 | 14/15 | 93.3% |
| cohere | command-a-reasoning-08-2025 | 14/15 | 93.3% |
| gemini-2.0-flash | 14/15 | 93.3% | |
| inception | mercury-2 | 14/15 | 93.3% |
| gemini-2.5-flash-lite | 13/15 | 86.7% | |
| openai | gpt-5-nano | 13/15 | 86.7% |
| mistral | mistral-small-latest | 12/15 | 80.0% |
| nvidia | nvidia/nemotron-3-nano-30b-a3b | 12/15 | 80.0% |
| xai | grok-3-mini | 12/15 | 80.0% |
| cohere | command-a-03-2025 | 11/15 | 73.3% |
| openai | gpt-5.4-nano | 8/15 | 53.3% |
| nvidia | nvidia/llama-3.3-nemotron-super-49b-v1 | 6/15 | 40.0% |
| xai | grok-4.20-0309-non-reasoning | 4/15 | 26.7% |
| xai | grok-4.20-0309-reasoning | 2/15 | 13.3% |
| cohere | command-r-plus-08-2024 | 0/15 | 0.0% |
| cohere | command-r7b-12-2024 | 0/15 | 0.0% |
| openai | gpt-4.1-nano | 0/15 | 0.0% |
§ 04 — end-to-end mainnet proof
A single live session, every artifact verifiable.
The numbers above are the population. This is one member of that population — a single live session on https://api.nukez.xyz on 2026-04-13. Every identifier below is a real on-chain or in-gateway artifact: the transaction is on Solana mainnet; the receipt and locker are records in the production gateway; the uploaded file is byte-equal to the downloaded file.
| Network | solana-mainnet |
| Quote | $20.00 · 0.240211387 SOL @ $83.26 / SOL · 5 payment legs advertised (SOL · USDC · USDT0 · MON · WETH) |
| Payer pubkey | BhBeSkwKyqysZstzkqdf4qAcYfS9r27wEMmouvSVfp1U |
| Transaction signature | 4QAaRa4LBUT4311HBsSiw88JoDyZCzW4ViFmaSz88MhKD3nzLL2W6KSzhcDUt1VmNHqPUnZExazAUpy6b9W3Sx1g |
| Receipt ID | ee330fef107cfe66 |
| Locker ID | locker_67670a230a15 |
| Single-file upload | Anza.pdf — 5,395,694 bytes uploaded, 5,395,694 bytes downloaded, byte equal |
| Batch upload | 3-file batch · video / kml / json · all HTTP 200 on the signed upload URLs |
Wire excerpt — POST /v1/storage/confirm response
{
"ok": true,
"receipt_id": "ee330fef107cfe66",
"receipt": {
"id": "ee330fef107cfe66",
"resource": "storage",
"provider": "gcs",
"tx_hash": "4QAaRa4LBUT4311HBsSiw88JoDyZCzW4ViFmaSz88MhKD3nzLL2W6KSzhcDUt1VmNHqPUnZExazAUpy6b9W3Sx1g",
"payer_pubkey": "BhBeSkwKyqysZstzkqdf4qAcYfS9r27wEMmouvSVfp1U",
"pay_to_address": "HqrTLaNk89dHPVR5zyMPtEiRnQfabmqc8B9yR3KVbyVp",
"network": "solana:5eykt4UsFv8P8NJdTREpY1vzqKqZKvdp",
"pay_asset": "SOL",
"units": 1,
"unit_price_usd": 20.00,
"paid_amount": "0.240211387",
"slot": 412988282,
"receipt_sig_alg": "ed25519",
"receipt_hash": "01f365df877adccabfcfeeef31edcfd7653a90364cafd74846257b0338708d31"
}
}Wire excerpt — 5.4 MB download verification
# Download via the signed download URL
DOWNLOAD_URL="https://api.nukez.xyz/f/AhNsb2NrZXJfNjc2NzBhMjMwYTE1..."
curl -sS -o /tmp/Anza_downloaded.pdf -w "DOWNLOAD_HTTP=%{http_code} SIZE=%{size_download}\n" \
-L "$DOWNLOAD_URL"
# DOWNLOAD_HTTP=200 SIZE=5395694
# Original 5,395,694 bytes = Downloaded 5,395,694 bytes ✓§ 05 — operation latency
Per-operation wall clock, real client to production.
A separate live client session captured per-operation latency end-to-end against the production front-door (2026-03-24). Each row is a real round-trip including any client-side envelope signing. The 1.7 MB image upload completes in under six seconds; warm reads and public verifications return in under one.
| Operation | Wall (ms) | Payload | Notes | Status |
|---|---|---|---|---|
| status | 7,790.4 | 25 B | cold first call | success |
| store_small_text | 4,912.1 | 270 B | delegate + sign + write | success |
| store_image_1685kb | 5,811 | 2.30 MB | 1.7 MB image, signed write | success |
| retrieve | 988.1 | 37 B | warm read | success |
| verify | 851.9 | 39 B | public verification | success |
| status_warm | 2,968.1 | 25 B | warm follow-up | success |
Read-side operations (retrieve, verify) land under a second. Writes that move multi-megabyte payloads complete in five to six seconds, including the client-side signed envelope construction.
§ 06 — hosted multi-agent integration
The protocol composes, inside an agent mesh.
The same Nukez agent slots into a hosted multi-agent orchestration mesh as a delegated specialist. In the canonical AAAP test stack a Claude-Sonnet-led router receives a user request, recognises storage-related keywords, and delegates verbatim to the Nukez agent at http://localhost:8080/v1/chat. The Nukez agent completes the storage operation and returns a structured result; the lead summarises it for the user.
| Orchestration mode | standard |
| Lead model | anthropic / claude-sonnet-4 |
| Delegated agent | nukez_agent · service · {base_url}/v1/chat |
| Delegation triggers | storesaveuploadretrievedownloadfilelockernukezstoragememoryrememberreportbackup |
| Signing-request relay | Lead surfaces signing requests verbatim; developer authorises through the standard signed-envelope flow. |
Excerpt — canonical AAAP stack (aaap_test.yaml)
name: aaap-integration-test
orchestration:
mode: standard
lead:
model: claude-sonnet-4-20250514
provider: anthropic
routing_prompt_extra: |
You have access to a storage agent (nukez_agent) that can
store, retrieve, and manage files via the Nukez decentralised
storage network. Delegate any storage-related requests to it.
agents:
nukez_agent:
type: service
base_url: http://localhost:8080
endpoint: /v1/chat
description: >
Nukez storage specialist agent. Handles storing files,
retrieving data, managing lockers, and all storage
operations on the Nukez network.§ 07 — zero-prior-knowledge integration
One discovery URL, one wallet, byte-equal data.
A single autonomous agent run, given only a discovery URL and a payment-enabled Solana wallet. No SDK, no examples, no prior knowledge of Nukez. The agent reads the published documentation, executes the full payment / authentication / storage / verification flow, and reports EXACT_MATCH against the original payload.
| Date | Model | Iterations | Duration | Verification | Receipt |
|---|---|---|---|---|---|
| 2026-01-22 | claude-sonnet-4 | 21 | 448.74s | EXACT_MATCH | d2c2d507159010f1 |
For the full thirteen-step walk-through — including the verbatim task prompt, the agent’s final structured output, and the transaction signature — see /proof/methodology §04.
§ 08 — cumulative footprint
What the program covers, at full scope.
Eleven dimensions of the testing program, in one card. Every entry traces back to an artifact published on this page or on /proof/methodology.
| Cloud providers exercised | 8 |
| Production models tested | 48 |
| Models at 100% pass | 31 |
| End-to-end runs (this corpus) | 720 |
| Aggregate all-pass runs | 614 |
| Aggregate run pass rate | 85.3% |
| Mainnet payment chains used | Solana + Monad (EVM) |
| Payment legs advertised | 5 (SOL · USDC · USDT0 · MON · WETH) |
| Largest upload verified | 5.4 MB · byte-perfect |
| Hosted-service integration | AAAP via multiagentz mesh |
| Cold-start integration | 21 iterations · zero prior knowledge |
§ 09 — reproducibility
Run it yourself.
Every datum on this page is reproducible from the public test harness against the production gateway. The wallet pattern, the test scripts, and the payment legs are the same ones every consumer uses.
- Gateway.
https://api.nukez.xyz— production, no private path. - Network. Solana mainnet plus Monad (EVM) on the payment-leg side. Real lamports / native tokens out of a real wallet on every request — no faucet, no devnet, no simulation.
- Models. All 48 are reachable via their providers' public APIs. No private model access.
- Sweep harness. The canonical end-to-end sweep that produced §01–§03 lives in the public agent-testing repo. The mainnet curl session in §04 reproduces step-by-step from the published curl log.
- Operation latency. The §05 timings come from a separate client (
nukezcortex-front-door) hitting production. Same gateway, same wallet pattern. - AAAP. The §06 multi-agent stack is a single yaml file. Stand up the NukezAgent service, run
maz --config aaap_test.yaml, and the storage delegations land at the same gateway as everything else.
§ 10 — source artifacts
The artifacts this page summarises.
Five underlying artifacts produced the data on this page. Each is a primary source — the page is the synthesis, not the truth-of-record.
- Canonical end-to-end sweep. The 720-run, 48-model, 8-provider corpus that produced §01–§03. Every run records its tx_signature, receipt_id, locker_id, trace, and a byte-equality check on the round-trip.
- Mainnet curl test log · 2026-04-13 · The §04 step-by-step session against
api.nukez.xyz. Records the verbatim curl commands, the 402 payment quote, the on-chain transfer, the receipt, the locker provision, file create + upload + download, and the byte-equality check on a 5.4 MB PDF. - Operation-latency client session · 2026-03-24 · The §05 per-operation timings from a real client against the production front-door.
- AAAP integration stack · The §06 yaml that wires the Nukez agent into a hosted multi-agent mesh as a delegated storage specialist.
- Cold-start appendix · 2026-01-22 · The §07 single-agent zero-prior-knowledge run. Discovery URL only; the agent figured out everything else from the public documentation.
← back to proof · or read the methodology →
