Documentation
Everything you need to integrate Cluster Protocol into your app
Getting Started
Cluster Protocol provides a REST API for decentralized AI inference. All responses are JSON. The base URL for all endpoints is:
Base URL
https://api.clusterprotocol.aiQuick start in 3 steps
Create an account to get your API key
Deposit USDC/ETH to your balance
Call /v1/chat/completions
# 1. Register and get your API key
curl -X POST https://api.clusterprotocol.ai/api/auth/register \
-H "Content-Type: application/json" \
-d '{"email": "dev@example.com", "password": "securepassword"}'
# Response:
# { "userId": "abc123", "email": "dev@example.com",
# "apiKey": "sk-cluster-...", "balance": "0.00" }
# 2. Run your first inference
curl https://api.clusterprotocol.ai/v1/chat/completions \
-H "Authorization: Bearer sk-cluster-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-70b",
"messages": [{"role": "user", "content": "Hello!"}]
}'Authentication
Cluster Protocol uses API keys for authentication. Pass your key via the Authorization header on every authenticated request.
Authorization: Bearer sk-cluster-YOUR_API_KEY
Register
/api/auth/registerCreate a new account. Returns userId, email, apiKey, and starting balance.
emailrequiredEmail address for the account
passwordrequiredPassword (min 8 characters)
POST /api/auth/register
Content-Type: application/json
{
"email": "dev@example.com",
"password": "securepassword"
}Login
/api/auth/loginLogin with existing credentials. Returns same shape as register.
emailrequiredYour registered email
passwordrequiredYour password
Get Current User
/api/auth/meRetrieve the authenticated user's profile and balance.
curl https://api.clusterprotocol.ai/api/auth/me \ -H "Authorization: Bearer sk-cluster-YOUR_KEY"
x402 Payments
Cluster Protocol supports the x402 payment protocol — an open standard by Coinbase that enables pay-per-request access using USDC on Base. No account or API key needed. AI agents and developers can pay directly with their wallet.
How It Works
1. Request
Call any paid endpoint without authentication. Server returns HTTP 402 with payment details.
2. Pay
Sign a USDC payment on Base mainnet using your wallet. Resubmit with X-PAYMENT header.
3. Access
Server verifies payment via facilitator. If valid, serves the response. Settlement is on-chain.
402 Response
When you call a paid endpoint without authentication, you receive a 402 response. Payment instructions are in the payment-required header (base64-encoded JSON):
# No auth header → triggers 402
curl -i -X POST https://api.clusterprotocol.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama-3.3-70b-instruct", "messages": [{"role": "user", "content": "Hello"}]}'
# HTTP/2 402
# payment-required: <base64-encoded JSON>Pricing
| Endpoint | Price (USDC) |
|---|---|
POST /v1/chat/completions | $0.003 |
POST /v1/embeddings | $0.0005 |
POST /v1/images/generations | $0.02 |
POST /v1/audio/transcriptions | $0.006 |
POST /v1/audio/speech | $0.015 |
POST /v1/rerank | $0.001 |
Supported Networks
| Network | CAIP-2 ID | Token |
|---|---|---|
| Base Mainnet | eip155:8453 | USDC |
Client Integration
import { fetchWithPayment } from "@x402/fetch";
import { CdpWallet } from "@coinbase/cdp-sdk";
// Create a wallet on Base mainnet
const wallet = await CdpWallet.create({ networkId: "base-mainnet" });
// fetchWithPayment handles 402 → sign → resubmit automatically
const response = await fetchWithPayment(
"https://api.clusterprotocol.ai/v1/chat/completions",
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "llama-3.3-70b-instruct",
provider: "venice", // optional: route through Venice
messages: [{ role: "user", content: "Hello!" }]
})
},
wallet // signs USDC payment on Base
);
console.log(await response.json());Pricing (per request)
| Endpoint | Price | Token |
|---|---|---|
| /v1/chat/completions | $0.003 | USDC on Base |
| /v1/embeddings | $0.001 | USDC on Base |
| /v1/images/generations | $0.02 | USDC on Base |
| /v1/audio/transcriptions | $0.005 | USDC on Base |
| /v1/audio/speech | $0.01 | USDC on Base |
| /v1/rerank | $0.001 | USDC on Base |
Key Benefits
- •No signup or API key required — just pay and use
- •AI agents can autonomously pay for inference
- •Zero protocol fees — you pay only the listed price
- •Settlement in USDC on Base (2-second finality)
- •Open standard — compatible with any x402 client
Venice AI via x402 (Private + Permissionless)
Combine x402 with Venice provider pinning for fully private, permissionless AI inference. No API key, no account — your wallet pays USDC on Base and you get E2EE private inference through Venice. Prompts and responses are never stored.
import { fetchWithPayment } from "@x402/fetch";
import { ethers } from "ethers";
// Any EVM wallet works (MetaMask, Coinbase, etc.)
const wallet = new ethers.Wallet("YOUR_PRIVATE_KEY");
// Venice private inference via x402 — no API key needed
const response = await fetchWithPayment(
"https://api.clusterprotocol.ai/v1/chat/completions",
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "llama-3.3-70b-instruct",
provider: "venice", // ← routes to Venice (E2EE, no logs)
messages: [
{ role: "user", content: "Explain ERC-7710 delegation" }
],
max_tokens: 512
})
},
wallet // signs $0.003 USDC on Base automatically
);
const data = await response.json();
console.log(data.choices[0].message.content);Venice Models via x402
These models are verified working with x402 + Venice provider pinning:
| Model | Slug | x402 Price | Best For |
|---|---|---|---|
| Llama 3.3 70B | llama-3.3-70b-instruct | $0.003 | General chat, reasoning |
| Qwen 3.5 35B | qwen3.5-35b-a3b | $0.003 | Fast reasoning |
| Qwen 3.5 397B | qwen3.5-397b-a17b | $0.003 | Heavy reasoning, complex tasks |
| Mistral Small 3.2 | mistral-small-3.2-24b | $0.003 | Fast, efficient |
| Gemma 4 31B | gemma-4-31b-it | $0.003 | Instruction following |
Quick Setup (npm)
npm install @x402/fetch ethers # or with Coinbase CDP: npm install @x402/fetch @coinbase/cdp-sdk
x402 Protocol Flow
Your Agent/App Cluster Protocol Base Chain (L2)
│ │ │
├── POST /v1/chat ────────►│ │
│ (no auth header) │ │
│ │ │
│◄── HTTP 402 ─────────────┤ │
│ payment-required: │ │
│ {amount: 3000, │ │
│ network: eip155:8453, │ │
│ payTo: 0x6839...} │ │
│ │ │
├── Sign EIP-3009 ─────────────────────────────────► │
│ (USDC transferWithAuth) │ │
│ │ │
├── POST /v1/chat ────────►│ │
│ X-PAYMENT: <signed> │── verify on-chain ─────►│
│ │ │
│◄── HTTP 200 ─────────────┤ │
│ {choices: [...]} │ │
│ (Venice E2EE response) │ │Inference
The inference endpoint is fully OpenAI-compatible. Use any OpenAI client library by pointing it to Cluster Protocol. Supports both streaming (SSE) and non-streaming responses.
/v1/chat/completionsOpenAI-compatible chat completion. Supports streaming via SSE.
Request Body
modelrequiredModel ID (e.g. "llama-3.3-70b-instruct", "deepseek-v3.2")
messagesrequiredArray of message objects with role and content
providerPin to a specific provider: "venice", "groq", "together", "fireworks", "deepinfra", "sambanova", "phala", "redpill"
streamEnable SSE streaming (default: false)
temperatureSampling temperature 0-2 (default: 1)
max_tokensMaximum tokens to generate
top_pNucleus sampling parameter (default: 1)
stopStop sequences
Non-Streaming
curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
-H "Authorization: Bearer sk-cluster-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in 2 sentences."}
],
"temperature": 0.7,
"max_tokens": 256
}'{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1714000000,
"model": "llama-3.1-70b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses qubits that can exist in superposition..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 42,
"total_tokens": 70
}
}Streaming (SSE)
Set stream: true to receive tokens as Server-Sent Events. Each event contains a delta with the next token. The stream ends with [DONE].
curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
-H "Authorization: Bearer sk-cluster-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-70b",
"messages": [{"role": "user", "content": "Write a haiku about AI"}],
"stream": true
}'
# Response (SSE):
# data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Silicon"}}]}
# data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" dreams"}}]}
# data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" awaken"}}]}
# ...
# data: [DONE]Pricing
Each model has independent per-token pricing for input and output tokens. Balance is deducted atomically per call. Check current pricing via the pricing endpoint.
/api/pricingGet pricing tiers for all available models.
Provider Routing
Cluster Protocol routes inference across multiple providers for maximum uptime and speed. By default, the system picks the optimal provider automatically. You can override this by passing the provider field in your request body.
Available Providers
| Provider | Slug | Highlight |
|---|---|---|
| Venice AI | venice | Private inference, no data retention, E2EE |
| Groq | groq | Ultra-low latency (LPU hardware) |
| Together AI | together | Widest model selection |
| Fireworks AI | fireworks | High throughput, competitive pricing |
| DeepInfra | deepinfra | Cheapest per-token rates |
| SambaNova | sambanova | Enterprise-grade, high context |
| Phala TEE | phala | Hardware-level privacy (Intel TDX), ~10 TEE-native models |
| Red Pill | redpill | 60+ models via NearAI, Chutes, Tinfoil sub-providers |
| OpenRouter | openrouter | Aggregated access to 200+ models |
Venice AI — Private Inference
Venice AI provides fully private inference with end-to-end encryption. Your prompts and responses are never stored or logged. Pass "provider": "venice" to route through Venice.
curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
-H "Authorization: Bearer sk-cluster-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-instruct",
"provider": "venice",
"messages": [
{"role": "user", "content": "Explain zero-knowledge proofs simply."}
],
"temperature": 0.7,
"max_tokens": 256
}'Venice Models
Models available on Venice (use these exact slugs with "provider": "venice"):
| Model | Slug | Use Case |
|---|---|---|
| Llama 3.3 70B | llama-3.3-70b-instruct | General chat, reasoning |
| Qwen 3.5 35B | qwen3.5-35b-a3b | Fast reasoning |
| Qwen 3.5 397B | qwen3.5-397b-a17b | Large-scale reasoning |
| Gemma 4 31B | gemma-4-31b-it | Instruction following |
| Mistral Small 3.2 | mistral-small-3.2-24b | Fast, efficient |
Auto vs Pinned Routing
Auto (default)
Omit the provider field. The system picks the fastest available provider for the model. Best for most use cases.
Pinned
Set "provider": "venice" to force routing to a specific provider. Use for privacy (Venice/Phala), TEE models (Phala/Red Pill), or latency (Groq).
Venice AI — Private Inference
Venice AI provides fully private, end-to-end encrypted inference. Your prompts and responses are never stored, never logged, and never used for training. Venice is our recommended provider when privacy is critical — financial data, personal information, or sensitive prompts.
Full Venice AI Documentation— 75+ models, all endpoints, code examplesEnd-to-End Encrypted
Prompts encrypted in transit and at rest. Venice cannot read your data.
Zero Data Retention
Nothing stored after response. No logs, no training data collection.
No Account Needed
Use via x402 (wallet pays) or via API key. Both fully private.
Via API Key
curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-instruct",
"provider": "venice",
"messages": [
{"role": "user", "content": "Analyze this financial data privately..."}
],
"temperature": 0.7,
"max_tokens": 1024
}'Via x402 (No Account, Wallet Pays)
import { fetchWithPayment } from "@x402/fetch";
import { ethers } from "ethers";
const wallet = new ethers.Wallet(process.env.PRIVATE_KEY!);
// Private Venice inference — no API key, no account, wallet pays $0.003
const response = await fetchWithPayment(
"https://api.clusterprotocol.ai/v1/chat/completions",
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "llama-3.3-70b-instruct",
provider: "venice",
messages: [{ role: "user", content: "What are ERC-7710 delegations?" }],
max_tokens: 512
})
},
wallet
);
const data = await response.json();
console.log(data.choices[0].message.content);Verified Venice Models
| Model | Slug | Best For |
|---|---|---|
| Llama 3.3 70B | llama-3.3-70b-instruct | General chat, reasoning, coding |
| Qwen 3.5 35B | qwen3.5-35b-a3b | Fast reasoning, lightweight |
| Qwen 3.5 397B | qwen3.5-397b-a17b | Heavy reasoning, complex tasks |
| Mistral Small 3.2 | mistral-small-3.2-24b | Fast, efficient, multilingual |
| Gemma 4 31B | gemma-4-31b-it | Instruction following, safe |
Venice Response Format
{
"id": "chatcmpl-RUpKyiqi14G6MrwbPrOk7iJs",
"object": "chat.completion",
"model": "llama-3.3-70b",
"choices": [{
"index": 0,
"message": { "role": "assistant", "content": "..." },
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 1507, "completion_tokens": 42, "total_tokens": 1549 },
"venice_parameters": {
"enable_e2ee": true,
"include_venice_system_prompt": true,
"strip_thinking_response": false
}
}Phala TEE — Hardware Privacy
Phala Network provides hardware-level privacy through Trusted Execution Environments (TEE). Unlike software-based encryption, TEE guarantees that even the server operator cannot access your data — it runs inside Intel TDX or NVIDIA Confidential Computing enclaves.
Intel TDX Enclaves
Inference runs inside hardware-isolated enclaves. CPU-level isolation from host OS.
Verifiable Computation
Remote attestation proves your request ran in a genuine TEE. Cryptographic proof.
Zero-Trust Model
Neither Cluster nor Phala can see your prompts. Hardware enforces privacy.
Via API Key
curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-instruct",
"provider": "phala",
"messages": [
{"role": "user", "content": "Process this sensitive medical record..."}
],
"max_tokens": 1024
}'Via x402 (No Account, Wallet Pays)
import { fetchWithPayment } from "@x402/fetch";
import { ethers } from "ethers";
const wallet = new ethers.Wallet(process.env.PRIVATE_KEY!);
// Hardware-private inference via x402 — no API key needed
const response = await fetchWithPayment(
"https://api.clusterprotocol.ai/v1/chat/completions",
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "llama-3.3-70b-instruct",
provider: "phala",
messages: [{ role: "user", content: "Analyze this contract..." }],
max_tokens: 1024
})
},
wallet
);
const data = await response.json();
console.log(data.choices[0].message.content);Venice vs Phala — When to Use
| Feature | Venice AI | Phala TEE |
|---|---|---|
| Privacy method | E2EE + no-log policy | Hardware enclave (Intel TDX) |
| Trust model | Trust Venice's no-log claim | Zero-trust (hardware enforced) |
| Verification | Audit-based | Remote attestation (cryptographic) |
| Speed | Fast (optimized GPU clusters) | Slightly slower (enclave overhead) |
| Best for | General privacy, financial data | Regulated industries, compliance |
| x402 support | Yes ($0.003/req) | Yes ($0.003/req) |
Models
Browse public models, upload your own, deploy to GPU nodes, and download model weights.
/api/models/publicList all public models with metadata, pricing, and stats.
/api/models/public/:idGet details for a single model by ID.
/api/models/uploadUpload a model file (multipart/form-data).
/api/models/:id/deployDeploy a model to an available GPU node.
/api/models/:id/versionsGet version history for a model.
/api/models/:id/downloadGet a presigned S3 URL to download model weights.
List Models
curl https://api.clusterprotocol.ai/api/models/public
Upload a Model
curl -X POST https://api.clusterprotocol.ai/api/models/upload \ -H "Authorization: Bearer sk-cluster-YOUR_KEY" \ -F "file=@my-model.safetensors" \ -F "name=my-custom-model" \ -F "category=chat" \ -F "description=A fine-tuned model for customer support"
Deploy a Model
curl -X POST https://api.clusterprotocol.ai/api/models/llama-3.1-70b/deploy \
-H "Authorization: Bearer sk-cluster-YOUR_KEY"
# Response: { "status": "deploying", "nodeId": "node_xyz" }Fine-Tuning
Fine-tune models using LoRA adapters. Upload a JSONL training file and configure hyperparameters. Jobs run on the cluster GPU nodes and you can monitor progress in real-time.
/api/finetune/jobsCreate a new fine-tuning job.
/api/finetune/jobsList all your fine-tuning jobs.
/api/finetune/jobs/:idGet status and progress of a specific job.
/api/finetune/jobs/:id/cancelCancel a running job.
Create a Fine-Tuning Job
baseModelrequiredModel ID to fine-tune (e.g. "llama-3.1-8b")
trainingFilerequiredURL or path to JSONL training data
suffixCustom suffix for the fine-tuned model name
epochsNumber of training epochs (default: 3)
learningRateLearning rate (default: 2e-5)
loraRankLoRA rank (default: 16)
loraAlphaLoRA alpha (default: 32)
curl -X POST https://api.clusterprotocol.ai/api/finetune/jobs \
-H "Authorization: Bearer sk-cluster-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"baseModel": "llama-3.1-8b",
"trainingFile": "https://example.com/data.jsonl",
"suffix": "customer-support-v1",
"epochs": 3,
"learningRate": 2e-5,
"loraRank": 16,
"loraAlpha": 32
}'Training Data Format
Training data should be a JSONL file where each line is a conversation in the OpenAI chat format:
{"messages": [{"role": "system", "content": "You are a support agent."}, {"role": "user", "content": "How do I reset my password?"}, {"role": "assistant", "content": "Go to Settings > Security > Reset Password."}]}
{"messages": [{"role": "user", "content": "What are your hours?"}, {"role": "assistant", "content": "We're available 24/7 via chat."}]}Monitor Progress
curl https://api.clusterprotocol.ai/api/finetune/jobs/ft_job_abc123 \
-H "Authorization: Bearer sk-cluster-YOUR_KEY"
# Response:
# {
# "id": "ft_job_abc123",
# "status": "running",
# "progress": 0.65,
# "currentEpoch": 2,
# "totalEpochs": 3,
# "trainLoss": 0.342
# }Billing & Usage
Cluster Protocol uses an off-chain billing model: deposit crypto once, then balance is deducted per inference call. No gas fees per request. Check your balance and usage history via these endpoints.
/api/user/statsGet balance, total calls, models deployed, and total spent.
/api/user/transactionsList recent transactions (deposits + usage deductions).
/api/pricingGet pricing tiers for all models.
/api/statsPublic platform stats (total models, nodes, inferences, users).
Dashboard Stats
curl https://api.clusterprotocol.ai/api/user/stats \ -H "Authorization: Bearer sk-cluster-YOUR_KEY"
How Billing Works
Send USDC or ETH via the deposit page or Coinbase Commerce. Funds appear in your balance.
Each inference deducts (input_tokens * input_price + output_tokens * output_price) atomically.
View detailed usage logs and transaction history via the dashboard or API.
WebSocket
Connect to the admin WebSocket endpoint for real-time cluster events including node heartbeats, inference completions, and model deployments.
/ws/adminReal-time WebSocket event stream for cluster monitoring.
Connection
const ws = new WebSocket("wss://api.clusterprotocol.ai/ws/admin");
ws.onopen = () => {
console.log("Connected to cluster events");
// Authenticate
ws.send(JSON.stringify({
type: "auth",
token: "sk-cluster-YOUR_KEY"
}));
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log("Event:", data.type, data);
};
ws.onclose = () => console.log("Disconnected");Event Types
node:heartbeatGPU node is alive and reporting statsnode:joinedNew GPU node connected to the clusternode:leftGPU node disconnected or timed outinference:completeAn inference request completedmodel:deployedA model was deployed to a nodemodel:undeployedA model was undeployed from a nodePython SDK
The Cluster SDK wraps the REST API with a Pythonic interface. Full documentation on the SDK page.
pip install cluster-sdk
from cluster_sdk import ClusterClient
client = ClusterClient(api_key="sk-cluster-YOUR_KEY")
# List models
models = client.models.list()
for m in models:
print(f"{m.id}: {m.name}")
# Chat completion
response = client.chat.completions.create(
model="llama-3.1-70b",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)Installation, async usage, model management, fine-tuning, and more.
Errors
All errors return a JSON body with a error field containing a human-readable message and an HTTP status code.
{
"error": "Insufficient balance. Current: $0.50, required: $1.20",
"code": "INSUFFICIENT_BALANCE"
}Error Codes
Missing or invalid parameters. Check the request body.
Missing or invalid API key. Include Authorization: Bearer <key>.
Insufficient balance. Deposit funds before retrying.
Model or resource doesn't exist. Check the ID.
Resource already exists (e.g., duplicate email on register).
Too many requests. Back off and retry with exponential backoff.
Internal error. Retry or contact support.
No GPU nodes available for the requested model. Try again later.
Rate Limiting
Rate limits are applied per API key. When rate limited, implement exponential backoff starting at 1 second. Rate limit headers are included in all responses:
X-RateLimit-Limit: 100 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1714000060
/healthHealth check. Returns 200 OK if the gateway is running.
curl https://api.clusterprotocol.ai/health
# { "status": "ok", "uptime": 86400 }