How ZK LLM API Works
Full technical breakdown — from credit purchase to ZK proof to LLM response.
Overview
ZK LLM API lets anyone access a private LLM endpoint by paying with CLAWD token. The server never knows who you are — it only verifies a zero-knowledge proof that you hold a valid, unspent credit in an onchain Merkle tree.
The system is fully open-source and self-hostable. Anyone can fork it, deploy their own contract, point it at any LLM provider, and run the same privacy-preserving access control.
About This Project
A first working prototype implementing the anonymous API credits concept from “ZK API Usage Credits: LLMs and Beyond” by Vitalik Buterin and Davide Crapis. MIT licensed, fully open source, fork it and deploy it for your own token, your own provider, your own chain.
It works, but it has a real limitation: you pay a flat rate per credit and get no refund if your actual API call uses fewer tokens than the budget. The next step is variable-cost, session-scoped API keys with a ZK counter — see issue #12 for the full RFC.
🔐 The Privacy Stack
ZK LLM API now combines two independent, orthogonal privacy layers — our zero-knowledge proofs plus Venice's new end-to-end encrypted AI inference. Together they form a killer combination: nobody knows both WHO you are AND WHAT you're asking.
Layer 1: ZK Proofs — Hides WHO
Our zero-knowledge proof breaks the link between your wallet and your API call. The server verifies a proof of valid credit — it never learns your identity, wallet address, or which onchain commitment you used.
Layer 2: Venice TEE/E2EE — Hides WHAT
Venice runs zai-org-glm-5 inside a hardware-secured Trusted Execution Environment (TEE). The TEE provides strong isolation: Venice and the GPU operator cannot access the enclave memory or computation — inference happens inside a black box verified by cryptographic remote attestation. Your prompts are processed by the enclave; the raw prompt data is not accessible to Venice infrastructure outside the TEE boundary. Each response is cryptographically signed by the enclave.
Combined result:
| Layer | What it hides | Mechanism |
|---|---|---|
| ZK proof (us) | WHO is paying / calling | Breaks wallet ↔ API call link on-chain |
| Venice TEE | WHAT you're asking (enclave-isolated inference) | Hardware enclave; Venice infrastructure cannot access TEE memory/computation |
No one — not us, not Venice, not the GPU operator, not the blockchain — knows both who you are and what you're asking. These are orthogonal privacy guarantees that reinforce each other.
End-to-End Flow (So Far)
Buy CLAWD
CLAWD is an ERC-20 token on Base mainnet. Swap ETH or USDC for CLAWD on any Base DEX. Token: 0x9f86dB9fc6f7c9408e8Fda3Ff8ce4e78ac7a6b07
Generate commitment locally
Your browser generates a random nullifier and secret. It computes commitment = Poseidon2(nullifier, secret) using Barretenberg's WASM prover. The nullifier and secret never leave your device.
Buy Credits — one transaction (calls stakeAndRegister())
You approve CLAWD, then the router purchases N credits by calling stakeAndRegister(amount, commitments[]) on the APICredits contract. The router swaps ETH → CLAWD at market rate and locks N × pricePerCredit CLAWD. The USD cost per credit is fixed (~$0.05 via onchain oracle); the CLAWD amount varies with market price. One transaction, N credits.
Client fetches the Merkle tree
Your browser fetches the full Merkle tree from the API server's /tree endpoint. It finds your commitment's leaf index locally and computes the sibling path — the server never learns which commitment you're using.
Client generates a ZK proof
Using the locally computed Merkle path, your browser runs the Noir circuit via Barretenberg UltraHonk. The proof shows: (a) you know a nullifier+secret whose Poseidon2 hash is in the Merkle tree, and (b) the nullifier hash is correct. All private inputs stay on-device.
Server verifies and responds
The server verifies the UltraHonk proof against the onchain root, checks the nullifier hasn't been spent, marks it spent, then forwards your message to the Venice LLM API and returns the response.
Two Ways to Use the API
DIY — Proof in your browser
Your browser generates the ZK proof using Barretenberg WASM. The nullifier and secret never leave your device.
- ✅ Maximum privacy — server never sees your secret
- ⚠️ Requires downloading the circuit (~500KB)
- ⚠️ Proof takes 30–60s on first load
Used by the web chat interface and the proxy.
API key — Proof on the server
Send your nullifier, secret, and commitment to the backend. It generates the proof for you.
- ✅ No circuit download, no setup
- ✅ Proof in ~2–3s (server hardware)
- ⚠️ The backend learns your nullifier and secret
See SKILL.md for the full API reference.
The ZK Circuit
Written in Noir, compiled with Barretenberg (UltraHonk backend). The circuit has:
Public inputs (verifier sees)
nullifier_hash— Poseidon2(nullifier)root— onchain Merkle rootdepth— current tree depth
Private inputs (never leave client)
nullifier— random 256-bit valuesecret— random 256-bit valueindices[16]— Merkle path bitssiblings[16]— Merkle sibling hashes
// main.nr — the full circuit
use std::hash::poseidon2::Poseidon2;
use binary_merkle_root::binary_merkle_root;
fn main(
nullifier_hash: pub Field, // public
root: pub Field, // public
depth: pub u32, // public
nullifier: Field, // private
secret: Field, // private
indices: [u1; 16], // private
siblings: [Field; 16], // private
) {
// 1. commitment = Poseidon2(nullifier, secret)
let commitment = Poseidon2::hash([nullifier, secret], 2);
// 2. commitment is in the Merkle tree
let computed_root = binary_merkle_root(
|pair: [Field; 2]| -> Field { Poseidon2::hash(pair, 2) },
commitment, depth, indices, siblings,
);
assert(computed_root == root);
// 3. nullifier_hash = Poseidon2(nullifier)
let computed_nullifier_hash = Poseidon2::hash([nullifier], 1);
assert(computed_nullifier_hash == nullifier_hash);
}The circuit proves three things simultaneously without revealing the nullifier or secret: the commitment was correctly formed, it exists in the registered set, and the nullifier hash matches — enabling the server to track spent credits without learning which credit belongs to whom.
Poseidon2 Hashing
All hashing uses Poseidon2 — a ZK-friendly hash function designed for efficient in-circuit computation. Critically, this is not the same as the original Poseidon hash used by iden3/Circom.
We use Barretenberg's implementation (@aztec/bb.js v0.82.0), which must match exactly between the circuit, the API server, and the frontend client. Using any other Poseidon implementation will produce different hashes and invalid proofs.
Three hash operations in the system:
commitment = Poseidon2(nullifier, secret)— computed client-side, stored onchainnode = Poseidon2(left, right)— used at every level of the Merkle treenullifier_hash = Poseidon2(nullifier)— public, used to track spent credits
Incremental Merkle Tree
The onchain contract maintains a Semaphore-style incremental binary Merkle tree with max depth 16 (up to 65,536 leaves). Each registered commitment is a leaf.
Empty subtrees use precomputed zero hashes: zeros[0] = 0, zeros[i+1] = Poseidon2(zeros[i], zeros[i]). Every level always hashes two children — this matches Noir's binary_merkle_root exactly.
Why not LeanIMT?
LeanIMT promotes odd nodes to the next level without hashing, which doesn't match Noir's standard binary Merkle root algorithm. We use the Semaphore approach instead: every level hashes two children, padding with the zero hash for the current level.
🎯 Model Policy
The server runs zai-org-glm-5 for all API calls. Any model field in your request is accepted but ignored.
Privacy Guarantees (Current)
✅ Server never sees your wallet address
The proof is generated client-side. The server receives only the proof, nullifier_hash, and your message.
✅ Server cannot link two API calls
Each credit has a unique nullifier. There's no correlation between calls unless you reuse a credential.
✅ Server cannot identify which leaf you used
The ZK proof proves membership in the set without revealing the index or commitment.
⚠️ Proof generation happens in your browser
The API server handles LLM routing. When using Venice TEE/E2EE models, your prompt is encrypted end-to-end — even Venice and the GPU operator can't see it. For non-E2EE models, the server sees your plaintext message; for full privacy with those, self-host the server.
⚠️ Credits are stored in localStorage
If you clear your browser, unspent credits are gone (CLAWD is locked onchain, but the credentials are lost). Back them up — or better yet, script the purchase and let your bot manage credits automatically via the skill.md API.
Self-Hosting
Everything is open-source. You can deploy your own instance pointing at any LLM provider.
# Clone both repos git clone https://github.com/clawdbotatg/zk-api-credits # contracts + API server git clone https://github.com/clawdbotatg/zk-llm-frontend # frontend cd zk-api-credits # Configure cp packages/api-server/.env.example packages/api-server/.env # Set: CONTRACT_ADDRESS, VENICE_API_KEY (or any OpenAI-compatible key), RPC_URL # Compile contracts (Foundry) cd packages/contracts && forge build # Deploy contract (Foundry) # See packages/contracts/script/Deploy.s.sol for instructions # Run API server docker build -f packages/api-server/Dockerfile -t zk-api-server . docker run -p 3001:3001 --env-file packages/api-server/.env zk-api-server # Deploy frontend (Vercel) cd ../zk-llm-frontend NEXT_PUBLIC_API_URL=https://your-server.com vercel deploy
What Else Is Left to Build
The build order toward the full paper vision, roughly ordered by complexity.
Generalized API Support
LowThe contract is already generic. Swap the hardcoded Venice routing for a pluggable proxy layer — any OpenAI-compatible endpoint, any fixed-cost API. RPC nodes, image generation, VPNs, data APIs. Makes this a platform, not just an LLM wrapper.
Dual Staking (Policy Stake)
Low–MediumSplit the deposit into D (RLN stake) and S (policy stake). The server can burn S but never claim it — removing any incentive to falsely ban users. Pure contract change, no circuit modifications.
Variable Cost + Refund Tickets
MediumVenice returns token counts on every response. The server signs a refund ticket for unused capacity (C_max - C_actual). The client accumulates these locally. Unlocks efficient per-token pricing instead of fixed credits.
Rate-Limit Nullifiers (RLN)
Medium–HighReplace single-use nullifiers with RLN. Each request uses a ticket index i; the signal is y = secret + Hash(secret, i) × Hash(message). Reusing the same index with a different message reveals the secret key mathematically. Requires porting the RLN circuit to Noir and updating the contract, server, and frontend.
RLN Slashing
MediumOnce RLN is in place, slashing is a contract function: submit two (nullifier, x, y) pairs for the same index, recover the secret key, verify it matches a tree leaf, burn the deposit. Anyone can slash — no trusted arbiter needed.
ZK Solvency Proof
Very HighThe circuit proves (ticket_index + 1) × C_max ≤ deposit + Σ(refunds), verifying server signatures on refund tickets as private inputs. Requires a ZK-friendly signing scheme and is the most complex circuit change in the roadmap. The full paper vision lives here.
Homomorphic Refund Accumulation
HighReplace the growing refund ticket list with a single Pedersen Commitment the server updates homomorphically — without learning the user's balance. Constant client-side state regardless of call count. An optimization on top of Step 6.
See the paper for the full concept. MIT licensed, fork to build it your way.