How ZK LLM API Works

Full technical breakdown — from credit purchase to ZK proof to LLM response.

Overview

ZK LLM API lets anyone access a private LLM endpoint by paying with CLAWD token. The server never knows who you are — it only verifies a zero-knowledge proof that you hold a valid, unspent credit in an onchain Merkle tree.

The system is fully open-source and self-hostable. Anyone can fork it, deploy their own contract, point it at any LLM provider, and run the same privacy-preserving access control.

About This Project

A first working prototype implementing the anonymous API credits concept from “ZK API Usage Credits: LLMs and Beyond” by Vitalik Buterin and Davide Crapis. MIT licensed, fully open source, fork it and deploy it for your own token, your own provider, your own chain.

It works, but it has a real limitation: you pay a flat rate per credit and get no refund if your actual API call uses fewer tokens than the budget. The next step is variable-cost, session-scoped API keys with a ZK counter — see issue #12 for the full RFC.

🔐 The Privacy Stack

ZK LLM API now combines two independent, orthogonal privacy layers — our zero-knowledge proofs plus Venice's new end-to-end encrypted AI inference. Together they form a killer combination: nobody knows both WHO you are AND WHAT you're asking.

🛡️

Layer 1: ZK Proofs — Hides WHO

Our zero-knowledge proof breaks the link between your wallet and your API call. The server verifies a proof of valid credit — it never learns your identity, wallet address, or which onchain commitment you used.

🔒

Layer 2: Venice TEE/E2EE — Hides WHAT

Venice runs zai-org-glm-5 inside a hardware-secured Trusted Execution Environment (TEE). The TEE provides strong isolation: Venice and the GPU operator cannot access the enclave memory or computation — inference happens inside a black box verified by cryptographic remote attestation. Your prompts are processed by the enclave; the raw prompt data is not accessible to Venice infrastructure outside the TEE boundary. Each response is cryptographically signed by the enclave.

Combined result:

LayerWhat it hidesMechanism
ZK proof (us)WHO is paying / callingBreaks wallet ↔ API call link on-chain
Venice TEEWHAT you're asking (enclave-isolated inference)Hardware enclave; Venice infrastructure cannot access TEE memory/computation

No one — not us, not Venice, not the GPU operator, not the blockchain — knows both who you are and what you're asking. These are orthogonal privacy guarantees that reinforce each other.

End-to-End Flow (So Far)

1

Buy CLAWD

CLAWD is an ERC-20 token on Base mainnet. Swap ETH or USDC for CLAWD on any Base DEX. Token: 0x9f86dB9fc6f7c9408e8Fda3Ff8ce4e78ac7a6b07

2

Generate commitment locally

Your browser generates a random nullifier and secret. It computes commitment = Poseidon2(nullifier, secret) using Barretenberg's WASM prover. The nullifier and secret never leave your device.

3

Buy Credits — one transaction (calls stakeAndRegister())

You approve CLAWD, then the router purchases N credits by calling stakeAndRegister(amount, commitments[]) on the APICredits contract. The router swaps ETH → CLAWD at market rate and locks N × pricePerCredit CLAWD. The USD cost per credit is fixed (~$0.05 via onchain oracle); the CLAWD amount varies with market price. One transaction, N credits.

4

Client fetches the Merkle tree

Your browser fetches the full Merkle tree from the API server's /tree endpoint. It finds your commitment's leaf index locally and computes the sibling path — the server never learns which commitment you're using.

5

Client generates a ZK proof

Using the locally computed Merkle path, your browser runs the Noir circuit via Barretenberg UltraHonk. The proof shows: (a) you know a nullifier+secret whose Poseidon2 hash is in the Merkle tree, and (b) the nullifier hash is correct. All private inputs stay on-device.

6

Server verifies and responds

The server verifies the UltraHonk proof against the onchain root, checks the nullifier hasn't been spent, marks it spent, then forwards your message to the Venice LLM API and returns the response.

Two Ways to Use the API

DIY — Proof in your browser

Your browser generates the ZK proof using Barretenberg WASM. The nullifier and secret never leave your device.

  • ✅ Maximum privacy — server never sees your secret
  • ⚠️ Requires downloading the circuit (~500KB)
  • ⚠️ Proof takes 30–60s on first load

Used by the web chat interface and the proxy.

API key — Proof on the server

Send your nullifier, secret, and commitment to the backend. It generates the proof for you.

  • ✅ No circuit download, no setup
  • ✅ Proof in ~2–3s (server hardware)
  • ⚠️ The backend learns your nullifier and secret

See SKILL.md for the full API reference.

The ZK Circuit

Written in Noir, compiled with Barretenberg (UltraHonk backend). The circuit has:

Public inputs (verifier sees)

  • nullifier_hash — Poseidon2(nullifier)
  • root — onchain Merkle root
  • depth — current tree depth

Private inputs (never leave client)

  • nullifier — random 256-bit value
  • secret — random 256-bit value
  • indices[16] — Merkle path bits
  • siblings[16] — Merkle sibling hashes

// main.nr — the full circuit

use std::hash::poseidon2::Poseidon2;
use binary_merkle_root::binary_merkle_root;

fn main(
    nullifier_hash: pub Field,   // public
    root: pub Field,             // public
    depth: pub u32,              // public

    nullifier: Field,            // private
    secret: Field,               // private
    indices: [u1; 16],           // private
    siblings: [Field; 16],       // private
) {
    // 1. commitment = Poseidon2(nullifier, secret)
    let commitment = Poseidon2::hash([nullifier, secret], 2);

    // 2. commitment is in the Merkle tree
    let computed_root = binary_merkle_root(
        |pair: [Field; 2]| -> Field { Poseidon2::hash(pair, 2) },
        commitment, depth, indices, siblings,
    );
    assert(computed_root == root);

    // 3. nullifier_hash = Poseidon2(nullifier)
    let computed_nullifier_hash = Poseidon2::hash([nullifier], 1);
    assert(computed_nullifier_hash == nullifier_hash);
}

The circuit proves three things simultaneously without revealing the nullifier or secret: the commitment was correctly formed, it exists in the registered set, and the nullifier hash matches — enabling the server to track spent credits without learning which credit belongs to whom.

Poseidon2 Hashing

All hashing uses Poseidon2 — a ZK-friendly hash function designed for efficient in-circuit computation. Critically, this is not the same as the original Poseidon hash used by iden3/Circom.

We use Barretenberg's implementation (@aztec/bb.js v0.82.0), which must match exactly between the circuit, the API server, and the frontend client. Using any other Poseidon implementation will produce different hashes and invalid proofs.

Three hash operations in the system:

  • commitment = Poseidon2(nullifier, secret) — computed client-side, stored onchain
  • node = Poseidon2(left, right) — used at every level of the Merkle tree
  • nullifier_hash = Poseidon2(nullifier) — public, used to track spent credits

Incremental Merkle Tree

The onchain contract maintains a Semaphore-style incremental binary Merkle tree with max depth 16 (up to 65,536 leaves). Each registered commitment is a leaf.

Empty subtrees use precomputed zero hashes: zeros[0] = 0, zeros[i+1] = Poseidon2(zeros[i], zeros[i]). Every level always hashes two children — this matches Noir's binary_merkle_root exactly.

Why not LeanIMT?

LeanIMT promotes odd nodes to the next level without hashing, which doesn't match Noir's standard binary Merkle root algorithm. We use the Semaphore approach instead: every level hashes two children, padding with the zero hash for the current level.

🎯 Model Policy

The server runs zai-org-glm-5 for all API calls. Any model field in your request is accepted but ignored.

Privacy Guarantees (Current)

✅ Server never sees your wallet address

The proof is generated client-side. The server receives only the proof, nullifier_hash, and your message.

✅ Server cannot link two API calls

Each credit has a unique nullifier. There's no correlation between calls unless you reuse a credential.

✅ Server cannot identify which leaf you used

The ZK proof proves membership in the set without revealing the index or commitment.

⚠️ Proof generation happens in your browser

The API server handles LLM routing. When using Venice TEE/E2EE models, your prompt is encrypted end-to-end — even Venice and the GPU operator can't see it. For non-E2EE models, the server sees your plaintext message; for full privacy with those, self-host the server.

⚠️ Credits are stored in localStorage

If you clear your browser, unspent credits are gone (CLAWD is locked onchain, but the credentials are lost). Back them up — or better yet, script the purchase and let your bot manage credits automatically via the skill.md API.

Self-Hosting

Everything is open-source. You can deploy your own instance pointing at any LLM provider.

# Clone both repos
git clone https://github.com/clawdbotatg/zk-api-credits   # contracts + API server
git clone https://github.com/clawdbotatg/zk-llm-frontend   # frontend
cd zk-api-credits

# Configure
cp packages/api-server/.env.example packages/api-server/.env
# Set: CONTRACT_ADDRESS, VENICE_API_KEY (or any OpenAI-compatible key), RPC_URL

# Compile contracts (Foundry)
cd packages/contracts && forge build

# Deploy contract (Foundry)
# See packages/contracts/script/Deploy.s.sol for instructions

# Run API server
docker build -f packages/api-server/Dockerfile -t zk-api-server .
docker run -p 3001:3001 --env-file packages/api-server/.env zk-api-server

# Deploy frontend (Vercel)
cd ../zk-llm-frontend
NEXT_PUBLIC_API_URL=https://your-server.com vercel deploy

What Else Is Left to Build

The build order toward the full paper vision, roughly ordered by complexity.

1

Generalized API Support

Low

The contract is already generic. Swap the hardcoded Venice routing for a pluggable proxy layer — any OpenAI-compatible endpoint, any fixed-cost API. RPC nodes, image generation, VPNs, data APIs. Makes this a platform, not just an LLM wrapper.

2

Dual Staking (Policy Stake)

Low–Medium

Split the deposit into D (RLN stake) and S (policy stake). The server can burn S but never claim it — removing any incentive to falsely ban users. Pure contract change, no circuit modifications.

3

Variable Cost + Refund Tickets

Medium

Venice returns token counts on every response. The server signs a refund ticket for unused capacity (C_max - C_actual). The client accumulates these locally. Unlocks efficient per-token pricing instead of fixed credits.

4

Rate-Limit Nullifiers (RLN)

Medium–High

Replace single-use nullifiers with RLN. Each request uses a ticket index i; the signal is y = secret + Hash(secret, i) × Hash(message). Reusing the same index with a different message reveals the secret key mathematically. Requires porting the RLN circuit to Noir and updating the contract, server, and frontend.

5

RLN Slashing

Medium

Once RLN is in place, slashing is a contract function: submit two (nullifier, x, y) pairs for the same index, recover the secret key, verify it matches a tree leaf, burn the deposit. Anyone can slash — no trusted arbiter needed.

6

ZK Solvency Proof

Very High

The circuit proves (ticket_index + 1) × C_max ≤ deposit + Σ(refunds), verifying server signatures on refund tickets as private inputs. Requires a ZK-friendly signing scheme and is the most complex circuit change in the roadmap. The full paper vision lives here.

7

Homomorphic Refund Accumulation

High

Replace the growing refund ticket list with a single Pedersen Commitment the server updates homomorphically — without learning the user's balance. Constant client-side state regardless of call count. An optimization on top of Step 6.

See the paper for the full concept. MIT licensed, fork to build it your way.

Links