Nukez

§ proof · benchmark

The Benchmark

48 production cloud models across 8 providers. 720 end-to-end runs on Solana mainnet. 614 all-pass. 31 at a perfect 15 / 15. Real receipts, real signatures, byte-perfect verification.

§ 01 — at a glance

The full result, in one frame.

Every cell below traces back to the same source: a single canonical end-to-end run on Solana mainnet. A run counts as a pass only when the agent completes the full workflow — request, pay, confirm, provision, create, upload, download — and the downloaded bytes match the uploaded bytes exactly. No partial credit, no synthetic data, no mocks.

614/720

aggregate all-pass

85.3%

run pass rate

48

production models

31

models at 100%

8

cloud providers

1

denominator, no rollups

The number to the right of the slash is every attempted run in the corpus. The number to the left is every attempted run that completed the entire workflow and produced a byte-identical round-trip. Of the 48 production models exercised, 31 clear the workflow on every attempt; the remainder land along the distribution shown in §03 below.

§ 02 — by provider

Eight providers, one workflow.

Every provider in the catalog runs the same canonical workflow against the same gateway. Anthropic, Google, Mistral, and Inception models route through their native SDKs; xAI, Cohere, NVIDIA, and OpenAI-compatible endpoints are exercised through the OpenAI-compat surface. The gateway sees no difference between them.

ProviderModelsRunsAll-passRate
anthropic710510499.0%
google812011797.5%
mistral5757296.0%
inception1151493.3%
openai1319517187.7%
xai81209377.5%
nvidia2301860.0%
cohere4602541.7%
aggregate4872061485.3%

§ 03 — per-model results

All forty-eight models, one table.

Every model the catalog advertises, every run it attempted, its end-to-end pass rate. The 31 perfect-score models appear first; the long tail follows in descending order. The bar to clear is end-to-end completion plus byte-equal verification — the same bar for every row.

ProviderModelRunsPass rate
anthropicclaude-haiku-4-5-2025100115/15100.0%
anthropicclaude-opus-4-1-2025080515/15100.0%
anthropicclaude-opus-4-5-2025110115/15100.0%
anthropicclaude-opus-4-615/15100.0%
anthropicclaude-sonnet-4-5-2025092915/15100.0%
anthropicclaude-sonnet-4-615/15100.0%
googlegemini-2.5-flash15/15100.0%
googlegemini-2.5-pro15/15100.0%
googlegemini-3-flash-preview15/15100.0%
googlegemini-3.1-flash-lite15/15100.0%
googlegemini-3.1-flash-lite-preview15/15100.0%
googlegemini-3.1-pro-preview15/15100.0%
mistraldevstral-latest15/15100.0%
mistralmagistral-medium-latest15/15100.0%
mistralmistral-large-latest15/15100.0%
mistralmistral-medium-latest15/15100.0%
openaigpt-4.115/15100.0%
openaigpt-4.1-mini15/15100.0%
openaigpt-4o15/15100.0%
openaigpt-5-mini15/15100.0%
openaigpt-5.115/15100.0%
openaigpt-5.415/15100.0%
openaigpt-5.4-mini15/15100.0%
openaigpt-5.515/15100.0%
openaio315/15100.0%
openaio4-mini15/15100.0%
xaigrok-315/15100.0%
xaigrok-4-070915/15100.0%
xaigrok-4-1-fast-reasoning15/15100.0%
xaigrok-4-fast-reasoning15/15100.0%
xaigrok-code-fast-115/15100.0%
anthropicclaude-opus-4-714/1593.3%
coherecommand-a-reasoning-08-202514/1593.3%
googlegemini-2.0-flash14/1593.3%
inceptionmercury-214/1593.3%
googlegemini-2.5-flash-lite13/1586.7%
openaigpt-5-nano13/1586.7%
mistralmistral-small-latest12/1580.0%
nvidianvidia/nemotron-3-nano-30b-a3b12/1580.0%
xaigrok-3-mini12/1580.0%
coherecommand-a-03-202511/1573.3%
openaigpt-5.4-nano8/1553.3%
nvidianvidia/llama-3.3-nemotron-super-49b-v16/1540.0%
xaigrok-4.20-0309-non-reasoning4/1526.7%
xaigrok-4.20-0309-reasoning2/1513.3%
coherecommand-r-plus-08-20240/150.0%
coherecommand-r7b-12-20240/150.0%
openaigpt-4.1-nano0/150.0%

§ 04 — end-to-end mainnet proof

A single live session, every artifact verifiable.

The numbers above are the population. This is one member of that population — a single live session on https://api.nukez.xyz on 2026-04-13. Every identifier below is a real on-chain or in-gateway artifact: the transaction is on Solana mainnet; the receipt and locker are records in the production gateway; the uploaded file is byte-equal to the downloaded file.

Networksolana-mainnet
Quote$20.00 · 0.240211387 SOL @ $83.26 / SOL · 5 payment legs advertised (SOL · USDC · USDT0 · MON · WETH)
Payer pubkeyBhBeSkwKyqysZstzkqdf4qAcYfS9r27wEMmouvSVfp1U
Transaction signature4QAaRa4LBUT4311HBsSiw88JoDyZCzW4ViFmaSz88MhKD3nzLL2W6KSzhcDUt1VmNHqPUnZExazAUpy6b9W3Sx1g
Receipt IDee330fef107cfe66
Locker IDlocker_67670a230a15
Single-file uploadAnza.pdf 5,395,694 bytes uploaded, 5,395,694 bytes downloaded, byte equal
Batch upload3-file batch · video / kml / json · all HTTP 200 on the signed upload URLs

Wire excerpt — POST /v1/storage/confirm response

{
  "ok": true,
  "receipt_id": "ee330fef107cfe66",
  "receipt": {
    "id": "ee330fef107cfe66",
    "resource": "storage",
    "provider": "gcs",
    "tx_hash": "4QAaRa4LBUT4311HBsSiw88JoDyZCzW4ViFmaSz88MhKD3nzLL2W6KSzhcDUt1VmNHqPUnZExazAUpy6b9W3Sx1g",
    "payer_pubkey": "BhBeSkwKyqysZstzkqdf4qAcYfS9r27wEMmouvSVfp1U",
    "pay_to_address": "HqrTLaNk89dHPVR5zyMPtEiRnQfabmqc8B9yR3KVbyVp",
    "network": "solana:5eykt4UsFv8P8NJdTREpY1vzqKqZKvdp",
    "pay_asset": "SOL",
    "units": 1,
    "unit_price_usd": 20.00,
    "paid_amount": "0.240211387",
    "slot": 412988282,
    "receipt_sig_alg": "ed25519",
    "receipt_hash": "01f365df877adccabfcfeeef31edcfd7653a90364cafd74846257b0338708d31"
  }
}

Wire excerpt — 5.4 MB download verification

# Download via the signed download URL
DOWNLOAD_URL="https://api.nukez.xyz/f/AhNsb2NrZXJfNjc2NzBhMjMwYTE1..."
curl -sS -o /tmp/Anza_downloaded.pdf -w "DOWNLOAD_HTTP=%{http_code} SIZE=%{size_download}\n" \
  -L "$DOWNLOAD_URL"
# DOWNLOAD_HTTP=200 SIZE=5395694

# Original 5,395,694 bytes  =  Downloaded 5,395,694 bytes  ✓

§ 05 — operation latency

Per-operation wall clock, real client to production.

A separate live client session captured per-operation latency end-to-end against the production front-door (2026-03-24). Each row is a real round-trip including any client-side envelope signing. The 1.7 MB image upload completes in under six seconds; warm reads and public verifications return in under one.

OperationWall (ms)PayloadNotesStatus
status7,790.425 Bcold first callsuccess
store_small_text4,912.1270 Bdelegate + sign + writesuccess
store_image_1685kb5,8112.30 MB1.7 MB image, signed writesuccess
retrieve988.137 Bwarm readsuccess
verify851.939 Bpublic verificationsuccess
status_warm2,968.125 Bwarm follow-upsuccess

Read-side operations (retrieve, verify) land under a second. Writes that move multi-megabyte payloads complete in five to six seconds, including the client-side signed envelope construction.

§ 06 — hosted multi-agent integration

The protocol composes, inside an agent mesh.

The same Nukez agent slots into a hosted multi-agent orchestration mesh as a delegated specialist. In the canonical AAAP test stack a Claude-Sonnet-led router receives a user request, recognises storage-related keywords, and delegates verbatim to the Nukez agent at http://localhost:8080/v1/chat. The Nukez agent completes the storage operation and returns a structured result; the lead summarises it for the user.

Orchestration modestandard
Lead modelanthropic / claude-sonnet-4
Delegated agentnukez_agent · service · {base_url}/v1/chat
Delegation triggers
storesaveuploadretrievedownloadfilelockernukezstoragememoryrememberreportbackup
Signing-request relayLead surfaces signing requests verbatim; developer authorises through the standard signed-envelope flow.

Excerpt — canonical AAAP stack (aaap_test.yaml)

name: aaap-integration-test

orchestration:
  mode: standard

lead:
  model: claude-sonnet-4-20250514
  provider: anthropic

  routing_prompt_extra: |
    You have access to a storage agent (nukez_agent) that can
    store, retrieve, and manage files via the Nukez decentralised
    storage network. Delegate any storage-related requests to it.

  agents:
    nukez_agent:
      type: service
      base_url: http://localhost:8080
      endpoint: /v1/chat
      description: >
        Nukez storage specialist agent. Handles storing files,
        retrieving data, managing lockers, and all storage
        operations on the Nukez network.

§ 07 — zero-prior-knowledge integration

One discovery URL, one wallet, byte-equal data.

A single autonomous agent run, given only a discovery URL and a payment-enabled Solana wallet. No SDK, no examples, no prior knowledge of Nukez. The agent reads the published documentation, executes the full payment / authentication / storage / verification flow, and reports EXACT_MATCH against the original payload.

DateModelIterationsDurationVerificationReceipt
2026-01-22claude-sonnet-421448.74sEXACT_MATCHd2c2d507159010f1

For the full thirteen-step walk-through — including the verbatim task prompt, the agent’s final structured output, and the transaction signature — see /proof/methodology §04.

§ 08 — cumulative footprint

What the program covers, at full scope.

Eleven dimensions of the testing program, in one card. Every entry traces back to an artifact published on this page or on /proof/methodology.

Cloud providers exercised8
Production models tested48
Models at 100% pass31
End-to-end runs (this corpus)720
Aggregate all-pass runs614
Aggregate run pass rate85.3%
Mainnet payment chains usedSolana + Monad (EVM)
Payment legs advertised5 (SOL · USDC · USDT0 · MON · WETH)
Largest upload verified5.4 MB · byte-perfect
Hosted-service integrationAAAP via multiagentz mesh
Cold-start integration21 iterations · zero prior knowledge

§ 09 — reproducibility

Run it yourself.

Every datum on this page is reproducible from the public test harness against the production gateway. The wallet pattern, the test scripts, and the payment legs are the same ones every consumer uses.

  • Gateway. https://api.nukez.xyz — production, no private path.
  • Network. Solana mainnet plus Monad (EVM) on the payment-leg side. Real lamports / native tokens out of a real wallet on every request — no faucet, no devnet, no simulation.
  • Models. All 48 are reachable via their providers' public APIs. No private model access.
  • Sweep harness. The canonical end-to-end sweep that produced §01–§03 lives in the public agent-testing repo. The mainnet curl session in §04 reproduces step-by-step from the published curl log.
  • Operation latency. The §05 timings come from a separate client (nukezcortex-front-door) hitting production. Same gateway, same wallet pattern.
  • AAAP. The §06 multi-agent stack is a single yaml file. Stand up the NukezAgent service, run maz --config aaap_test.yaml, and the storage delegations land at the same gateway as everything else.

§ 10 — source artifacts

The artifacts this page summarises.

Five underlying artifacts produced the data on this page. Each is a primary source — the page is the synthesis, not the truth-of-record.

  • Canonical end-to-end sweep. The 720-run, 48-model, 8-provider corpus that produced §01–§03. Every run records its tx_signature, receipt_id, locker_id, trace, and a byte-equality check on the round-trip.
  • Mainnet curl test log · 2026-04-13 · The §04 step-by-step session against api.nukez.xyz. Records the verbatim curl commands, the 402 payment quote, the on-chain transfer, the receipt, the locker provision, file create + upload + download, and the byte-equality check on a 5.4 MB PDF.
  • Operation-latency client session · 2026-03-24 · The §05 per-operation timings from a real client against the production front-door.
  • AAAP integration stack · The §06 yaml that wires the Nukez agent into a hosted multi-agent mesh as a delegated storage specialist.
  • Cold-start appendix · 2026-01-22 · The §07 single-agent zero-prior-knowledge run. Discovery URL only; the agent figured out everything else from the public documentation.

back to proof · or read the methodology →