Documentation

Everything you need to integrate Cluster Protocol into your app

Getting Started

Cluster Protocol provides a REST API for decentralized AI inference. All responses are JSON. The base URL for all endpoints is:

Base URL

https://api.clusterprotocol.ai

Quick start in 3 steps

1Register

Create an account to get your API key

2Add funds

Deposit USDC/ETH to your balance

3Run inference

Call /v1/chat/completions

# 1. Register and get your API key
curl -X POST https://api.clusterprotocol.ai/api/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "dev@example.com", "password": "securepassword"}'

# Response:
# { "userId": "abc123", "email": "dev@example.com",
#   "apiKey": "sk-cluster-...", "balance": "0.00" }

# 2. Run your first inference
curl https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-70b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Authentication

Cluster Protocol uses API keys for authentication. Pass your key via the Authorization header on every authenticated request.

http

Authorization: Bearer sk-cluster-YOUR_API_KEY

Register

POST

/api/auth/register

Create a new account. Returns userId, email, apiKey, and starting balance.

emailrequired

string

Email address for the account

passwordrequired

string

Password (min 8 characters)

POST /api/auth/register
Content-Type: application/json

{
  "email": "dev@example.com",
  "password": "securepassword"
}

Login

POST

/api/auth/login

emailrequired

string

Your registered email

passwordrequired

string

Your password

Get Current User

GET

/api/auth/me

Retrieve the authenticated user's profile and balance.

Auth

curl https://api.clusterprotocol.ai/api/auth/me \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY"

x402 Payments

Cluster Protocol supports the x402 payment protocol — an open standard by Coinbase that enables pay-per-request access using USDC on Base. No account or API key needed. AI agents and developers can pay directly with their wallet.

How It Works

1. Request

Call any paid endpoint without authentication. Server returns HTTP 402 with payment details.

2. Pay

Sign a USDC payment on Base mainnet using your wallet. Resubmit with X-PAYMENT header.

3. Access

Server verifies payment via facilitator. If valid, serves the response. Settlement is on-chain.

402 Response

When you call a paid endpoint without authentication, you receive a 402 response. Payment instructions are in the payment-required header (base64-encoded JSON):

# No auth header → triggers 402
curl -i -X POST https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3.3-70b-instruct", "messages": [{"role": "user", "content": "Hello"}]}'

# HTTP/2 402
# payment-required: <base64-encoded JSON>

Pricing

Endpoint	Price (USDC)
`POST /v1/chat/completions`	$0.003
`POST /v1/embeddings`	$0.0005
`POST /v1/images/generations`	$0.02
`POST /v1/audio/transcriptions`	$0.006
`POST /v1/audio/speech`	$0.015
`POST /v1/rerank`	$0.001

Supported Networks

Network	CAIP-2 ID	Token
Base Mainnet	`eip155:8453`	USDC

Client Integration

import { fetchWithPayment } from "@x402/fetch";
import { CdpWallet } from "@coinbase/cdp-sdk";

// Create a wallet on Base mainnet
const wallet = await CdpWallet.create({ networkId: "base-mainnet" });

// fetchWithPayment handles 402 → sign → resubmit automatically
const response = await fetchWithPayment(
  "https://api.clusterprotocol.ai/v1/chat/completions",
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "llama-3.3-70b-instruct",
      provider: "venice",  // optional: route through Venice
      messages: [{ role: "user", content: "Hello!" }]
    })
  },
  wallet  // signs USDC payment on Base
);

console.log(await response.json());

Pricing (per request)

Endpoint	Price	Token
/v1/chat/completions	$0.003	USDC on Base
/v1/embeddings	$0.001	USDC on Base
/v1/images/generations	$0.02	USDC on Base
/v1/audio/transcriptions	$0.005	USDC on Base
/v1/audio/speech	$0.01	USDC on Base
/v1/rerank	$0.001	USDC on Base

Key Benefits

•No signup or API key required — just pay and use
•AI agents can autonomously pay for inference
•Zero protocol fees — you pay only the listed price
•Settlement in USDC on Base (2-second finality)
•Open standard — compatible with any x402 client

Venice AI via x402 (Private + Permissionless)

Combine x402 with Venice provider pinning for fully private, permissionless AI inference. No API key, no account — your wallet pays USDC on Base and you get E2EE private inference through Venice. Prompts and responses are never stored.

import { fetchWithPayment } from "@x402/fetch";
import { ethers } from "ethers";

// Any EVM wallet works (MetaMask, Coinbase, etc.)
const wallet = new ethers.Wallet("YOUR_PRIVATE_KEY");

// Venice private inference via x402 — no API key needed
const response = await fetchWithPayment(
  "https://api.clusterprotocol.ai/v1/chat/completions",
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "llama-3.3-70b-instruct",
      provider: "venice",  // ← routes to Venice (E2EE, no logs)
      messages: [
        { role: "user", content: "Explain ERC-7710 delegation" }
      ],
      max_tokens: 512
    })
  },
  wallet  // signs $0.003 USDC on Base automatically
);

const data = await response.json();
console.log(data.choices[0].message.content);

Venice Models via x402

These models are verified working with x402 + Venice provider pinning:

Model	Slug	x402 Price	Best For
Llama 3.3 70B	`llama-3.3-70b-instruct`	$0.003	General chat, reasoning
Qwen 3.5 35B	`qwen3.5-35b-a3b`	$0.003	Fast reasoning
Qwen 3.5 397B	`qwen3.5-397b-a17b`	$0.003	Heavy reasoning, complex tasks
Mistral Small 3.2	`mistral-small-3.2-24b`	$0.003	Fast, efficient
Gemma 4 31B	`gemma-4-31b-it`	$0.003	Instruction following

Quick Setup (npm)

npm install @x402/fetch ethers
# or with Coinbase CDP:
npm install @x402/fetch @coinbase/cdp-sdk

x402 Protocol Flow

Your Agent/App            Cluster Protocol            Base Chain (L2)
      │                           │                          │
      ├── POST /v1/chat ────────►│                          │
      │   (no auth header)        │                          │
      │                           │                          │
      │◄── HTTP 402 ─────────────┤                          │
      │    payment-required:      │                          │
      │    {amount: 3000,         │                          │
      │     network: eip155:8453, │                          │
      │     payTo: 0x6839...}     │                          │
      │                           │                          │
      ├── Sign EIP-3009 ─────────────────────────────────► │
      │   (USDC transferWithAuth) │                          │
      │                           │                          │
      ├── POST /v1/chat ────────►│                          │
      │   X-PAYMENT: <signed>     │── verify on-chain ─────►│
      │                           │                          │
      │◄── HTTP 200 ─────────────┤                          │
      │    {choices: [...]}       │                          │
      │    (Venice E2EE response) │                          │

Inference

The inference endpoint is fully OpenAI-compatible. Use any OpenAI client library by pointing it to Cluster Protocol. Supports both streaming (SSE) and non-streaming responses.

POST

/v1/chat/completions

OpenAI-compatible chat completion. Supports streaming via SSE.

Auth

Request Body

modelrequired

string

Model ID (e.g. "llama-3.3-70b-instruct", "deepseek-v3.2")

messagesrequired

array

Array of message objects with role and content

provider

string

Pin to a specific provider: "venice", "groq", "together", "fireworks", "deepinfra", "sambanova", "phala", "redpill"

stream

boolean

Enable SSE streaming (default: false)

temperature

number

Sampling temperature 0-2 (default: 1)

max_tokens

number

Maximum tokens to generate

top_p

number

Nucleus sampling parameter (default: 1)

stop

string[]

Stop sequences

Non-Streaming

curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in 2 sentences."}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "llama-3.1-70b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses qubits that can exist in superposition..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 42,
    "total_tokens": 70
  }
}

Streaming (SSE)

Set stream: true to receive tokens as Server-Sent Events. Each event contains a delta with the next token. The stream ends with [DONE].

curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-70b",
    "messages": [{"role": "user", "content": "Write a haiku about AI"}],
    "stream": true
  }'

# Response (SSE):
# data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Silicon"}}]}
# data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" dreams"}}]}
# data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" awaken"}}]}
# ...
# data: [DONE]

Pricing

Each model has independent per-token pricing for input and output tokens. Balance is deducted atomically per call. Check current pricing via the pricing endpoint.

GET

/api/pricing

Get pricing tiers for all available models.

Provider Routing

Cluster Protocol routes inference across multiple providers for maximum uptime and speed. By default, the system picks the optimal provider automatically. You can override this by passing the provider field in your request body.

Available Providers

Provider	Slug	Highlight
Venice AI	`venice`	Private inference, no data retention, E2EE
Groq	`groq`	Ultra-low latency (LPU hardware)
Together AI	`together`	Widest model selection
Fireworks AI	`fireworks`	High throughput, competitive pricing
DeepInfra	`deepinfra`	Cheapest per-token rates
SambaNova	`sambanova`	Enterprise-grade, high context
Phala TEE	`phala`	Hardware-level privacy (Intel TDX), ~10 TEE-native models
Red Pill	`redpill`	60+ models via NearAI, Chutes, Tinfoil sub-providers
OpenRouter	`openrouter`	Aggregated access to 200+ models

Venice AI — Private Inference

Venice AI provides fully private inference with end-to-end encryption. Your prompts and responses are never stored or logged. Pass "provider": "venice" to route through Venice.

curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-instruct",
    "provider": "venice",
    "messages": [
      {"role": "user", "content": "Explain zero-knowledge proofs simply."}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Venice Models

Models available on Venice (use these exact slugs with "provider": "venice"):

Model	Slug	Use Case
Llama 3.3 70B	`llama-3.3-70b-instruct`	General chat, reasoning
Qwen 3.5 35B	`qwen3.5-35b-a3b`	Fast reasoning
Qwen 3.5 397B	`qwen3.5-397b-a17b`	Large-scale reasoning
Gemma 4 31B	`gemma-4-31b-it`	Instruction following
Mistral Small 3.2	`mistral-small-3.2-24b`	Fast, efficient

Auto vs Pinned Routing

Auto (default)

Omit the provider field. The system picks the fastest available provider for the model. Best for most use cases.

Pinned

Set "provider": "venice" to force routing to a specific provider. Use for privacy (Venice/Phala), TEE models (Phala/Red Pill), or latency (Groq).

Venice AI — Private Inference

Venice AI provides fully private, end-to-end encrypted inference. Your prompts and responses are never stored, never logged, and never used for training. Venice is our recommended provider when privacy is critical — financial data, personal information, or sensitive prompts.

Full Venice AI Documentation— 75+ models, all endpoints, code examples

End-to-End Encrypted

Prompts encrypted in transit and at rest. Venice cannot read your data.

Zero Data Retention

Nothing stored after response. No logs, no training data collection.

No Account Needed

Use via x402 (wallet pays) or via API key. Both fully private.

Via API Key

curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-instruct",
    "provider": "venice",
    "messages": [
      {"role": "user", "content": "Analyze this financial data privately..."}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
  }'

Via x402 (No Account, Wallet Pays)

import { fetchWithPayment } from "@x402/fetch";
import { ethers } from "ethers";

const wallet = new ethers.Wallet(process.env.PRIVATE_KEY!);

// Private Venice inference — no API key, no account, wallet pays $0.003
const response = await fetchWithPayment(
  "https://api.clusterprotocol.ai/v1/chat/completions",
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "llama-3.3-70b-instruct",
      provider: "venice",
      messages: [{ role: "user", content: "What are ERC-7710 delegations?" }],
      max_tokens: 512
    })
  },
  wallet
);

const data = await response.json();
console.log(data.choices[0].message.content);

Verified Venice Models

Model	Slug	Best For
Llama 3.3 70B	`llama-3.3-70b-instruct`	General chat, reasoning, coding
Qwen 3.5 35B	`qwen3.5-35b-a3b`	Fast reasoning, lightweight
Qwen 3.5 397B	`qwen3.5-397b-a17b`	Heavy reasoning, complex tasks
Mistral Small 3.2	`mistral-small-3.2-24b`	Fast, efficient, multilingual
Gemma 4 31B	`gemma-4-31b-it`	Instruction following, safe

Venice Response Format

{
  "id": "chatcmpl-RUpKyiqi14G6MrwbPrOk7iJs",
  "object": "chat.completion",
  "model": "llama-3.3-70b",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "..." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 1507, "completion_tokens": 42, "total_tokens": 1549 },
  "venice_parameters": {
    "enable_e2ee": true,
    "include_venice_system_prompt": true,
    "strip_thinking_response": false
  }
}

Phala TEE — Hardware Privacy

Phala Network provides hardware-level privacy through Trusted Execution Environments (TEE). Unlike software-based encryption, TEE guarantees that even the server operator cannot access your data — it runs inside Intel TDX or NVIDIA Confidential Computing enclaves.

Intel TDX Enclaves

Inference runs inside hardware-isolated enclaves. CPU-level isolation from host OS.

Verifiable Computation

Remote attestation proves your request ran in a genuine TEE. Cryptographic proof.

Zero-Trust Model

Neither Cluster nor Phala can see your prompts. Hardware enforces privacy.

Via API Key

curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-instruct",
    "provider": "phala",
    "messages": [
      {"role": "user", "content": "Process this sensitive medical record..."}
    ],
    "max_tokens": 1024
  }'

Via x402 (No Account, Wallet Pays)

import { fetchWithPayment } from "@x402/fetch";
import { ethers } from "ethers";

const wallet = new ethers.Wallet(process.env.PRIVATE_KEY!);

// Hardware-private inference via x402 — no API key needed
const response = await fetchWithPayment(
  "https://api.clusterprotocol.ai/v1/chat/completions",
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "llama-3.3-70b-instruct",
      provider: "phala",
      messages: [{ role: "user", content: "Analyze this contract..." }],
      max_tokens: 1024
    })
  },
  wallet
);

const data = await response.json();
console.log(data.choices[0].message.content);

Venice vs Phala — When to Use

Feature	Venice AI	Phala TEE
Privacy method	E2EE + no-log policy	Hardware enclave (Intel TDX)
Trust model	Trust Venice's no-log claim	Zero-trust (hardware enforced)
Verification	Audit-based	Remote attestation (cryptographic)
Speed	Fast (optimized GPU clusters)	Slightly slower (enclave overhead)
Best for	General privacy, financial data	Regulated industries, compliance
x402 support	Yes ($0.003/req)	Yes ($0.003/req)

Models

Browse public models, upload your own, deploy to GPU nodes, and download model weights.

GET

/api/models/public

List all public models with metadata, pricing, and stats.

GET

/api/models/public/:id

Get details for a single model by ID.

POST

/api/models/upload

Upload a model file (multipart/form-data).

Auth

POST

/api/models/:id/deploy

Deploy a model to an available GPU node.

Auth

GET

/api/models/:id/versions

Get version history for a model.

Auth

GET

/api/models/:id/download

Get a presigned S3 URL to download model weights.

Auth

List Models

curl https://api.clusterprotocol.ai/api/models/public

Upload a Model

curl -X POST https://api.clusterprotocol.ai/api/models/upload \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY" \
  -F "file=@my-model.safetensors" \
  -F "name=my-custom-model" \
  -F "category=chat" \
  -F "description=A fine-tuned model for customer support"

Deploy a Model

bash

curl -X POST https://api.clusterprotocol.ai/api/models/llama-3.1-70b/deploy \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY"

# Response: { "status": "deploying", "nodeId": "node_xyz" }

Fine-Tuning

Fine-tune models using LoRA adapters. Upload a JSONL training file and configure hyperparameters. Jobs run on the cluster GPU nodes and you can monitor progress in real-time.

POST

/api/finetune/jobs

Create a new fine-tuning job.

Auth

GET

/api/finetune/jobs

List all your fine-tuning jobs.

Auth

GET

/api/finetune/jobs/:id

Get status and progress of a specific job.

Auth

POST

/api/finetune/jobs/:id/cancel

Cancel a running job.

Auth

Create a Fine-Tuning Job

baseModelrequired

string

Model ID to fine-tune (e.g. "llama-3.1-8b")

trainingFilerequired

string

URL or path to JSONL training data

suffix

string

Custom suffix for the fine-tuned model name

epochs

number

Number of training epochs (default: 3)

learningRate

number

Learning rate (default: 2e-5)

loraRank

number

LoRA rank (default: 16)

loraAlpha

number

LoRA alpha (default: 32)

curl -X POST https://api.clusterprotocol.ai/api/finetune/jobs \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "baseModel": "llama-3.1-8b",
    "trainingFile": "https://example.com/data.jsonl",
    "suffix": "customer-support-v1",
    "epochs": 3,
    "learningRate": 2e-5,
    "loraRank": 16,
    "loraAlpha": 32
  }'

Training Data Format

Training data should be a JSONL file where each line is a conversation in the OpenAI chat format:

jsonl

{"messages": [{"role": "system", "content": "You are a support agent."}, {"role": "user", "content": "How do I reset my password?"}, {"role": "assistant", "content": "Go to Settings > Security > Reset Password."}]}
{"messages": [{"role": "user", "content": "What are your hours?"}, {"role": "assistant", "content": "We're available 24/7 via chat."}]}

Monitor Progress

bash

curl https://api.clusterprotocol.ai/api/finetune/jobs/ft_job_abc123 \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY"

# Response:
# {
#   "id": "ft_job_abc123",
#   "status": "running",
#   "progress": 0.65,
#   "currentEpoch": 2,
#   "totalEpochs": 3,
#   "trainLoss": 0.342
# }

Billing & Usage

Cluster Protocol uses an off-chain billing model: deposit crypto once, then balance is deducted per inference call. No gas fees per request. Check your balance and usage history via these endpoints.

GET

/api/user/stats

Get balance, total calls, models deployed, and total spent.

Auth

GET

/api/user/transactions

List recent transactions (deposits + usage deductions).

Auth

GET

/api/pricing

Get pricing tiers for all models.

GET

/api/stats

Public platform stats (total models, nodes, inferences, users).

Dashboard Stats

curl https://api.clusterprotocol.ai/api/user/stats \
  -H "Authorization: Bearer sk-cluster-YOUR_KEY"

How Billing Works

Deposit

Send USDC or ETH via the deposit page or Coinbase Commerce. Funds appear in your balance.

Per-call deduction

Each inference deducts (input_tokens * input_price + output_tokens * output_price) atomically.

Track usage

View detailed usage logs and transaction history via the dashboard or API.

WebSocket

Connect to the admin WebSocket endpoint for real-time cluster events including node heartbeats, inference completions, and model deployments.

/ws/admin

Real-time WebSocket event stream for cluster monitoring.

Auth

Connection

const ws = new WebSocket("wss://api.clusterprotocol.ai/ws/admin");

ws.onopen = () => {
  console.log("Connected to cluster events");
  // Authenticate
  ws.send(JSON.stringify({
    type: "auth",
    token: "sk-cluster-YOUR_KEY"
  }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log("Event:", data.type, data);
};

ws.onclose = () => console.log("Disconnected");

Event Types

node:heartbeatGPU node is alive and reporting stats

node:joinedNew GPU node connected to the cluster

node:leftGPU node disconnected or timed out

inference:completeAn inference request completed

model:deployedA model was deployed to a node

model:undeployedA model was undeployed from a node

Python SDK

The Cluster SDK wraps the REST API with a Pythonic interface. Full documentation on the SDK page.

bash

pip install cluster-sdk

from cluster_sdk import ClusterClient

client = ClusterClient(api_key="sk-cluster-YOUR_KEY")

# List models
models = client.models.list()
for m in models:
    print(f"{m.id}: {m.name}")

# Chat completion
response = client.chat.completions.create(
    model="llama-3.1-70b",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Full SDK Documentation

Installation, async usage, model management, fine-tuning, and more.

Errors

All errors return a JSON body with a error field containing a human-readable message and an HTTP status code.

json

{
  "error": "Insufficient balance. Current: $0.50, required: $1.20",
  "code": "INSUFFICIENT_BALANCE"
}

Error Codes

400

Bad Request

Missing or invalid parameters. Check the request body.

401

Unauthorized

Missing or invalid API key. Include Authorization: Bearer <key>.

402

Payment Required

Insufficient balance. Deposit funds before retrying.

404

Not Found

Model or resource doesn't exist. Check the ID.

409

Conflict

Resource already exists (e.g., duplicate email on register).

429

Rate Limited

Too many requests. Back off and retry with exponential backoff.

500

Server Error

Internal error. Retry or contact support.

503

Unavailable

No GPU nodes available for the requested model. Try again later.

Rate Limiting

Rate limits are applied per API key. When rate limited, implement exponential backoff starting at 1 second. Rate limit headers are included in all responses:

http

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714000060

GET

/health

Health check. Returns 200 OK if the gateway is running.

bash

curl https://api.clusterprotocol.ai/health
# { "status": "ok", "uptime": 86400 }