Cluster Protocol

Documentation

Everything you need to integrate Cluster Protocol into your app

Getting Started

Cluster Protocol provides a REST API for decentralized AI inference. All responses are JSON. The base URL for all endpoints is:

Base URL

https://api.clusterprotocol.ai

Quick start in 3 steps

1Register

Create an account to get your API key

2Add funds

Deposit USDC/ETH to your balance

3Run inference

Call /v1/chat/completions

# 1. Register and get your API key
curl -X POST https://api.clusterprotocol.ai/api/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "dev@example.com", "password": "securepassword"}'

# Response:
# { "userId": "abc123", "email": "dev@example.com",
#   "apiKey": "sk-cluster-...", "balance": "0.00" }

# 2. Run your first inference
curl https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-70b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Authentication

Cluster Protocol uses API keys for authentication. Pass your key via the Authorization header on every authenticated request.

http
Authorization: Bearer sk-cluster-YOUR_API_KEY

Register

POST
/api/auth/register

Create a new account. Returns userId, email, apiKey, and starting balance.

emailrequired
string

Email address for the account

passwordrequired
string

Password (min 8 characters)

POST /api/auth/register
Content-Type: application/json

{
  "email": "dev@example.com",
  "password": "securepassword"
}

Login

POST
/api/auth/login

Login with existing credentials. Returns same shape as register.

emailrequired
string

Your registered email

passwordrequired
string

Your password

Get Current User

GET
/api/auth/me

Retrieve the authenticated user's profile and balance.

Auth
curl https://api.clusterprotocol.ai/api/auth/me \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY"

x402 Payments

Cluster Protocol supports the x402 payment protocol — an open standard by Coinbase that enables pay-per-request access using USDC on Base. No account or API key needed. AI agents and developers can pay directly with their wallet.

How It Works

1. Request

Call any paid endpoint without authentication. Server returns HTTP 402 with payment details.

2. Pay

Sign a USDC payment on Base mainnet using your wallet. Resubmit with X-PAYMENT header.

3. Access

Server verifies payment via facilitator. If valid, serves the response. Settlement is on-chain.

402 Response

When you call a paid endpoint without authentication, you receive a 402 response. Payment instructions are in the payment-required header (base64-encoded JSON):

# No auth header → triggers 402
curl -i -X POST https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3.3-70b-instruct", "messages": [{"role": "user", "content": "Hello"}]}'

# HTTP/2 402
# payment-required: <base64-encoded JSON>

Pricing

EndpointPrice (USDC)
POST /v1/chat/completions$0.003
POST /v1/embeddings$0.0005
POST /v1/images/generations$0.02
POST /v1/audio/transcriptions$0.006
POST /v1/audio/speech$0.015
POST /v1/rerank$0.001

Supported Networks

NetworkCAIP-2 IDToken
Base Mainneteip155:8453USDC

Client Integration

import { fetchWithPayment } from "@x402/fetch";
import { CdpWallet } from "@coinbase/cdp-sdk";

// Create a wallet on Base mainnet
const wallet = await CdpWallet.create({ networkId: "base-mainnet" });

// fetchWithPayment handles 402 → sign → resubmit automatically
const response = await fetchWithPayment(
  "https://api.clusterprotocol.ai/v1/chat/completions",
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "llama-3.3-70b-instruct",
      provider: "venice",  // optional: route through Venice
      messages: [{ role: "user", content: "Hello!" }]
    })
  },
  wallet  // signs USDC payment on Base
);

console.log(await response.json());

Pricing (per request)

EndpointPriceToken
/v1/chat/completions$0.003USDC on Base
/v1/embeddings$0.001USDC on Base
/v1/images/generations$0.02USDC on Base
/v1/audio/transcriptions$0.005USDC on Base
/v1/audio/speech$0.01USDC on Base
/v1/rerank$0.001USDC on Base

Key Benefits

  • No signup or API key required — just pay and use
  • AI agents can autonomously pay for inference
  • Zero protocol fees — you pay only the listed price
  • Settlement in USDC on Base (2-second finality)
  • Open standard — compatible with any x402 client

Venice AI via x402 (Private + Permissionless)

Combine x402 with Venice provider pinning for fully private, permissionless AI inference. No API key, no account — your wallet pays USDC on Base and you get E2EE private inference through Venice. Prompts and responses are never stored.

import { fetchWithPayment } from "@x402/fetch";
import { ethers } from "ethers";

// Any EVM wallet works (MetaMask, Coinbase, etc.)
const wallet = new ethers.Wallet("YOUR_PRIVATE_KEY");

// Venice private inference via x402 — no API key needed
const response = await fetchWithPayment(
  "https://api.clusterprotocol.ai/v1/chat/completions",
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "llama-3.3-70b-instruct",
      provider: "venice",  // ← routes to Venice (E2EE, no logs)
      messages: [
        { role: "user", content: "Explain ERC-7710 delegation" }
      ],
      max_tokens: 512
    })
  },
  wallet  // signs $0.003 USDC on Base automatically
);

const data = await response.json();
console.log(data.choices[0].message.content);

Venice Models via x402

These models are verified working with x402 + Venice provider pinning:

ModelSlugx402 PriceBest For
Llama 3.3 70Bllama-3.3-70b-instruct$0.003General chat, reasoning
Qwen 3.5 35Bqwen3.5-35b-a3b$0.003Fast reasoning
Qwen 3.5 397Bqwen3.5-397b-a17b$0.003Heavy reasoning, complex tasks
Mistral Small 3.2mistral-small-3.2-24b$0.003Fast, efficient
Gemma 4 31Bgemma-4-31b-it$0.003Instruction following

Quick Setup (npm)

npm install @x402/fetch ethers
# or with Coinbase CDP:
npm install @x402/fetch @coinbase/cdp-sdk

x402 Protocol Flow

Your Agent/App            Cluster Protocol            Base Chain (L2)
      │                           │                          │
      ├── POST /v1/chat ────────►│                          │
      │   (no auth header)        │                          │
      │                           │                          │
      │◄── HTTP 402 ─────────────┤                          │
      │    payment-required:      │                          │
      │    {amount: 3000,         │                          │
      │     network: eip155:8453, │                          │
      │     payTo: 0x6839...}     │                          │
      │                           │                          │
      ├── Sign EIP-3009 ─────────────────────────────────► │
      │   (USDC transferWithAuth) │                          │
      │                           │                          │
      ├── POST /v1/chat ────────►│                          │
      │   X-PAYMENT: <signed>     │── verify on-chain ─────►│
      │                           │                          │
      │◄── HTTP 200 ─────────────┤                          │
      │    {choices: [...]}       │                          │
      │    (Venice E2EE response) │                          │

Inference

The inference endpoint is fully OpenAI-compatible. Use any OpenAI client library by pointing it to Cluster Protocol. Supports both streaming (SSE) and non-streaming responses.

POST
/v1/chat/completions

OpenAI-compatible chat completion. Supports streaming via SSE.

Auth

Request Body

modelrequired
string

Model ID (e.g. "llama-3.3-70b-instruct", "deepseek-v3.2")

messagesrequired
array

Array of message objects with role and content

provider
string

Pin to a specific provider: "venice", "groq", "together", "fireworks", "deepinfra", "sambanova", "phala", "redpill"

stream
boolean

Enable SSE streaming (default: false)

temperature
number

Sampling temperature 0-2 (default: 1)

max_tokens
number

Maximum tokens to generate

top_p
number

Nucleus sampling parameter (default: 1)

stop
string[]

Stop sequences

Non-Streaming

curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in 2 sentences."}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "llama-3.1-70b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses qubits that can exist in superposition..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 42,
    "total_tokens": 70
  }
}

Streaming (SSE)

Set stream: true to receive tokens as Server-Sent Events. Each event contains a delta with the next token. The stream ends with [DONE].

curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-70b",
    "messages": [{"role": "user", "content": "Write a haiku about AI"}],
    "stream": true
  }'

# Response (SSE):
# data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Silicon"}}]}
# data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" dreams"}}]}
# data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" awaken"}}]}
# ...
# data: [DONE]

Pricing

Each model has independent per-token pricing for input and output tokens. Balance is deducted atomically per call. Check current pricing via the pricing endpoint.

GET
/api/pricing

Get pricing tiers for all available models.

Provider Routing

Cluster Protocol routes inference across multiple providers for maximum uptime and speed. By default, the system picks the optimal provider automatically. You can override this by passing the provider field in your request body.

Available Providers

ProviderSlugHighlight
Venice AIvenicePrivate inference, no data retention, E2EE
GroqgroqUltra-low latency (LPU hardware)
Together AItogetherWidest model selection
Fireworks AIfireworksHigh throughput, competitive pricing
DeepInfradeepinfraCheapest per-token rates
SambaNovasambanovaEnterprise-grade, high context
Phala TEEphalaHardware-level privacy (Intel TDX), ~10 TEE-native models
Red Pillredpill60+ models via NearAI, Chutes, Tinfoil sub-providers
OpenRouteropenrouterAggregated access to 200+ models

Venice AI — Private Inference

Venice AI provides fully private inference with end-to-end encryption. Your prompts and responses are never stored or logged. Pass "provider": "venice" to route through Venice.

curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-instruct",
    "provider": "venice",
    "messages": [
      {"role": "user", "content": "Explain zero-knowledge proofs simply."}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Venice Models

Models available on Venice (use these exact slugs with "provider": "venice"):

ModelSlugUse Case
Llama 3.3 70Bllama-3.3-70b-instructGeneral chat, reasoning
Qwen 3.5 35Bqwen3.5-35b-a3bFast reasoning
Qwen 3.5 397Bqwen3.5-397b-a17bLarge-scale reasoning
Gemma 4 31Bgemma-4-31b-itInstruction following
Mistral Small 3.2mistral-small-3.2-24bFast, efficient

Auto vs Pinned Routing

Auto (default)

Omit the provider field. The system picks the fastest available provider for the model. Best for most use cases.

Pinned

Set "provider": "venice" to force routing to a specific provider. Use for privacy (Venice/Phala), TEE models (Phala/Red Pill), or latency (Groq).

Venice AI — Private Inference

Venice AI provides fully private, end-to-end encrypted inference. Your prompts and responses are never stored, never logged, and never used for training. Venice is our recommended provider when privacy is critical — financial data, personal information, or sensitive prompts.

Full Venice AI Documentation— 75+ models, all endpoints, code examples

End-to-End Encrypted

Prompts encrypted in transit and at rest. Venice cannot read your data.

Zero Data Retention

Nothing stored after response. No logs, no training data collection.

No Account Needed

Use via x402 (wallet pays) or via API key. Both fully private.

Via API Key

curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-instruct",
    "provider": "venice",
    "messages": [
      {"role": "user", "content": "Analyze this financial data privately..."}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
  }'

Via x402 (No Account, Wallet Pays)

import { fetchWithPayment } from "@x402/fetch";
import { ethers } from "ethers";

const wallet = new ethers.Wallet(process.env.PRIVATE_KEY!);

// Private Venice inference — no API key, no account, wallet pays $0.003
const response = await fetchWithPayment(
  "https://api.clusterprotocol.ai/v1/chat/completions",
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "llama-3.3-70b-instruct",
      provider: "venice",
      messages: [{ role: "user", content: "What are ERC-7710 delegations?" }],
      max_tokens: 512
    })
  },
  wallet
);

const data = await response.json();
console.log(data.choices[0].message.content);

Verified Venice Models

ModelSlugBest For
Llama 3.3 70Bllama-3.3-70b-instructGeneral chat, reasoning, coding
Qwen 3.5 35Bqwen3.5-35b-a3bFast reasoning, lightweight
Qwen 3.5 397Bqwen3.5-397b-a17bHeavy reasoning, complex tasks
Mistral Small 3.2mistral-small-3.2-24bFast, efficient, multilingual
Gemma 4 31Bgemma-4-31b-itInstruction following, safe

Venice Response Format

{
  "id": "chatcmpl-RUpKyiqi14G6MrwbPrOk7iJs",
  "object": "chat.completion",
  "model": "llama-3.3-70b",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "..." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 1507, "completion_tokens": 42, "total_tokens": 1549 },
  "venice_parameters": {
    "enable_e2ee": true,
    "include_venice_system_prompt": true,
    "strip_thinking_response": false
  }
}

Phala TEE — Hardware Privacy

Phala Network provides hardware-level privacy through Trusted Execution Environments (TEE). Unlike software-based encryption, TEE guarantees that even the server operator cannot access your data — it runs inside Intel TDX or NVIDIA Confidential Computing enclaves.

Intel TDX Enclaves

Inference runs inside hardware-isolated enclaves. CPU-level isolation from host OS.

Verifiable Computation

Remote attestation proves your request ran in a genuine TEE. Cryptographic proof.

Zero-Trust Model

Neither Cluster nor Phala can see your prompts. Hardware enforces privacy.

Via API Key

curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-instruct",
    "provider": "phala",
    "messages": [
      {"role": "user", "content": "Process this sensitive medical record..."}
    ],
    "max_tokens": 1024
  }'

Via x402 (No Account, Wallet Pays)

import { fetchWithPayment } from "@x402/fetch";
import { ethers } from "ethers";

const wallet = new ethers.Wallet(process.env.PRIVATE_KEY!);

// Hardware-private inference via x402 — no API key needed
const response = await fetchWithPayment(
  "https://api.clusterprotocol.ai/v1/chat/completions",
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "llama-3.3-70b-instruct",
      provider: "phala",
      messages: [{ role: "user", content: "Analyze this contract..." }],
      max_tokens: 1024
    })
  },
  wallet
);

const data = await response.json();
console.log(data.choices[0].message.content);

Venice vs Phala — When to Use

FeatureVenice AIPhala TEE
Privacy methodE2EE + no-log policyHardware enclave (Intel TDX)
Trust modelTrust Venice's no-log claimZero-trust (hardware enforced)
VerificationAudit-basedRemote attestation (cryptographic)
SpeedFast (optimized GPU clusters)Slightly slower (enclave overhead)
Best forGeneral privacy, financial dataRegulated industries, compliance
x402 supportYes ($0.003/req)Yes ($0.003/req)

Models

Browse public models, upload your own, deploy to GPU nodes, and download model weights.

GET
/api/models/public

List all public models with metadata, pricing, and stats.

GET
/api/models/public/:id

Get details for a single model by ID.

POST
/api/models/upload

Upload a model file (multipart/form-data).

Auth
POST
/api/models/:id/deploy

Deploy a model to an available GPU node.

Auth
GET
/api/models/:id/versions

Get version history for a model.

Auth
GET
/api/models/:id/download

Get a presigned S3 URL to download model weights.

Auth

List Models

curl https://api.clusterprotocol.ai/api/models/public

Upload a Model

curl -X POST https://api.clusterprotocol.ai/api/models/upload \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY" \
  -F "file=@my-model.safetensors" \
  -F "name=my-custom-model" \
  -F "category=chat" \
  -F "description=A fine-tuned model for customer support"

Deploy a Model

bash
curl -X POST https://api.clusterprotocol.ai/api/models/llama-3.1-70b/deploy \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY"

# Response: { "status": "deploying", "nodeId": "node_xyz" }

Fine-Tuning

Fine-tune models using LoRA adapters. Upload a JSONL training file and configure hyperparameters. Jobs run on the cluster GPU nodes and you can monitor progress in real-time.

POST
/api/finetune/jobs

Create a new fine-tuning job.

Auth
GET
/api/finetune/jobs

List all your fine-tuning jobs.

Auth
GET
/api/finetune/jobs/:id

Get status and progress of a specific job.

Auth
POST
/api/finetune/jobs/:id/cancel

Cancel a running job.

Auth

Create a Fine-Tuning Job

baseModelrequired
string

Model ID to fine-tune (e.g. "llama-3.1-8b")

trainingFilerequired
string

URL or path to JSONL training data

suffix
string

Custom suffix for the fine-tuned model name

epochs
number

Number of training epochs (default: 3)

learningRate
number

Learning rate (default: 2e-5)

loraRank
number

LoRA rank (default: 16)

loraAlpha
number

LoRA alpha (default: 32)

curl -X POST https://api.clusterprotocol.ai/api/finetune/jobs \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "baseModel": "llama-3.1-8b",
    "trainingFile": "https://example.com/data.jsonl",
    "suffix": "customer-support-v1",
    "epochs": 3,
    "learningRate": 2e-5,
    "loraRank": 16,
    "loraAlpha": 32
  }'

Training Data Format

Training data should be a JSONL file where each line is a conversation in the OpenAI chat format:

jsonl
{"messages": [{"role": "system", "content": "You are a support agent."}, {"role": "user", "content": "How do I reset my password?"}, {"role": "assistant", "content": "Go to Settings > Security > Reset Password."}]}
{"messages": [{"role": "user", "content": "What are your hours?"}, {"role": "assistant", "content": "We're available 24/7 via chat."}]}

Monitor Progress

bash
curl https://api.clusterprotocol.ai/api/finetune/jobs/ft_job_abc123 \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY"

# Response:
# {
#   "id": "ft_job_abc123",
#   "status": "running",
#   "progress": 0.65,
#   "currentEpoch": 2,
#   "totalEpochs": 3,
#   "trainLoss": 0.342
# }

Billing & Usage

Cluster Protocol uses an off-chain billing model: deposit crypto once, then balance is deducted per inference call. No gas fees per request. Check your balance and usage history via these endpoints.

GET
/api/user/stats

Get balance, total calls, models deployed, and total spent.

Auth
GET
/api/user/transactions

List recent transactions (deposits + usage deductions).

Auth
GET
/api/pricing

Get pricing tiers for all models.

GET
/api/stats

Public platform stats (total models, nodes, inferences, users).

Dashboard Stats

curl https://api.clusterprotocol.ai/api/user/stats \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY"

How Billing Works

Deposit

Send USDC or ETH via the deposit page or Coinbase Commerce. Funds appear in your balance.

Per-call deduction

Each inference deducts (input_tokens * input_price + output_tokens * output_price) atomically.

Track usage

View detailed usage logs and transaction history via the dashboard or API.

WebSocket

Connect to the admin WebSocket endpoint for real-time cluster events including node heartbeats, inference completions, and model deployments.

WS
/ws/admin

Real-time WebSocket event stream for cluster monitoring.

Auth

Connection

const ws = new WebSocket("wss://api.clusterprotocol.ai/ws/admin");

ws.onopen = () => {
  console.log("Connected to cluster events");
  // Authenticate
  ws.send(JSON.stringify({
    type: "auth",
    token: "sk-cluster-YOUR_KEY"
  }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log("Event:", data.type, data);
};

ws.onclose = () => console.log("Disconnected");

Event Types

node:heartbeatGPU node is alive and reporting stats
node:joinedNew GPU node connected to the cluster
node:leftGPU node disconnected or timed out
inference:completeAn inference request completed
model:deployedA model was deployed to a node
model:undeployedA model was undeployed from a node

Python SDK

The Cluster SDK wraps the REST API with a Pythonic interface. Full documentation on the SDK page.

bash
pip install cluster-sdk
from cluster_sdk import ClusterClient

client = ClusterClient(api_key="sk-cluster-YOUR_KEY")

# List models
models = client.models.list()
for m in models:
    print(f"{m.id}: {m.name}")

# Chat completion
response = client.chat.completions.create(
    model="llama-3.1-70b",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Full SDK Documentation

Installation, async usage, model management, fine-tuning, and more.

Errors

All errors return a JSON body with a error field containing a human-readable message and an HTTP status code.

json
{
  "error": "Insufficient balance. Current: $0.50, required: $1.20",
  "code": "INSUFFICIENT_BALANCE"
}

Error Codes

400
Bad Request

Missing or invalid parameters. Check the request body.

401
Unauthorized

Missing or invalid API key. Include Authorization: Bearer <key>.

402
Payment Required

Insufficient balance. Deposit funds before retrying.

404
Not Found

Model or resource doesn't exist. Check the ID.

409
Conflict

Resource already exists (e.g., duplicate email on register).

429
Rate Limited

Too many requests. Back off and retry with exponential backoff.

500
Server Error

Internal error. Retry or contact support.

503
Unavailable

No GPU nodes available for the requested model. Try again later.

Rate Limiting

Rate limits are applied per API key. When rate limited, implement exponential backoff starting at 1 second. Rate limit headers are included in all responses:

http
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714000060
GET
/health

Health check. Returns 200 OK if the gateway is running.

bash
curl https://api.clusterprotocol.ai/health
# { "status": "ok", "uptime": 86400 }