Cluster Protocol
Back to API Docs

Venice AI

Access 75+ Venice AI models through the Cluster Protocol unified API

75+
AI Models
Frontier, open-source, and private
E2EE
Private Inference
End-to-end encrypted models
USDC
x402 Payments
Pay-per-request on Base

Quick Start

Make your first Venice AI request through Cluster Protocol in seconds. The API is OpenAI-compatible — use any existing OpenAI SDK by changing the base URL.

curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "llama-3.3-70b",
    "provider": "venice",
    "messages": [
      {"role": "user", "content": "Hello from Venice AI!"}
    ]
  }'

Authentication

Two authentication methods are available. Use an API key for standard access, or leverage x402 for permissionless, pay-per-request access with USDC.

Method 1: API Key

Include your API key in the Authorization header.

bash
Authorization: Bearer YOUR_API_KEY

Method 2: x402 (No API Key)

Send requests without an API key. The server responds with HTTP 402 and a payment-required header containing payment details. Pay with USDC on Base (chain 8453) and re-submit with the payment proof.

Chat Completion
$0.003 / request
Image Generation
$0.02 / request
Network
Base (Chain 8453)
Currency
USDC

Chat Completions

Generate text responses from Venice AI models. Compatible with the OpenAI chat completions format.

POST/v1/chat/completions

Parameters

ParameterTypeDescription
modelrequiredstringModel ID from the Venice catalog
providerrequiredstringSet to "venice"
messagesrequiredarrayArray of message objects with role and content
streambooleanEnable SSE streaming (default: false)
max_tokensintegerMaximum tokens to generate
temperaturenumberSampling temperature (0-2, default: 1)

Example Response

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "llama-3.3-70b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  }
}

Image Generation

Generate images using Venice AI image models through Cluster Protocol.

POST/v1/images/generations

Parameters

ParameterTypeDescription
modelrequiredstringImage model ID
providerrequiredstringSet to "venice"
promptrequiredstringText description of the desired image
nintegerNumber of images to generate (1-4)
sizestringImage dimensions (e.g. "1024x1024")
curl -X POST https://api.clusterprotocol.ai/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "fluently-xl",
    "provider": "venice",
    "prompt": "A futuristic city skyline at sunset",
    "n": 1,
    "size": "1024x1024"
  }'

Model Catalog

Venice AI provides access to 75+ models across multiple categories. Set the model parameter to any model ID below.

Frontier22 models
claude-opus-4-5claude-opus-4-6claude-opus-4-6-fastclaude-opus-4-7claude-opus-4-7-fastclaude-sonnet-4-5claude-sonnet-4-6openai-gpt-52openai-gpt-52-codexopenai-gpt-53-codexopenai-gpt-54openai-gpt-54-miniopenai-gpt-54-proopenai-gpt-55openai-gpt-55-proopenai-gpt-4o-2024-11-20openai-gpt-4o-mini-2024-07-18gemini-3-1-pro-previewgemini-3-flash-previewgrok-4-20grok-4-20-multi-agentgrok-4-3
Open Source31 models
llama-3.2-3bllama-3.3-70bhermes-3-llama-3.1-405bdeepseek-v3.2deepseek-v4-flashdeepseek-v4-proqwen-3-6-plusqwen3-235b-a22b-instruct-2507qwen3-235b-a22b-thinking-2507qwen3-5-35b-a3bqwen3-5-397b-a17bqwen3-5-9bqwen3-6-27bqwen3-coder-480b-a35b-instruct-turboqwen3-next-80bqwen3-vl-235b-a22bmistral-small-2603mistral-small-3-2-24b-instructgoogle-gemma-3-27b-itgoogle-gemma-4-26b-a4b-itgoogle-gemma-4-31b-itnvidia-nemotron-3-nano-30b-a3bnvidia-nemotron-cascade-2-30b-a3bminimax-m25minimax-m27kimi-k2-5kimi-k2-6mercury-2aion-labs-aion-2-0arcee-trinity-large-thinkingopenai-gpt-oss-120b
E2EE Private11 models
e2ee-gemma-3-27b-pe2ee-glm-4-7-flash-pe2ee-glm-4-7-pe2ee-glm-5-1e2ee-gpt-oss-120b-pe2ee-gpt-oss-20b-pe2ee-qwen-2-5-7b-pe2ee-qwen3-30b-a3b-pe2ee-qwen3-5-122b-a10be2ee-qwen3-vl-30b-a3b-pe2ee-venice-uncensored-24b-p
Uncensored4 models
venice-uncensored-1-2venice-uncensored-role-playgemma-4-uncensoredolafangensan-glm-4.7-flash-heretic
GLM Series7 models
z-ai-glm-5-turboz-ai-glm-5v-turbozai-org-glm-4.6zai-org-glm-4.7zai-org-glm-4.7-flashzai-org-glm-5zai-org-glm-5-1

x402 Payment Integration

The x402 protocol enables permissionless API access without pre-registration. Send a request without an API key, receive payment instructions, pay with USDC, and re-submit with proof.

Flow

1Send request without Authorization header
2Server responds HTTP 402 with payment-required header (base64 JSON)
3Decode payment info, execute USDC transfer on Base (chain 8453)
4Re-submit original request with payment proof in header

Example: Handling 402 Response

typescript
const response = await fetch("https://api.clusterprotocol.ai/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "llama-3.3-70b",
    provider: "venice",
    messages: [{ role: "user", content: "Hello!" }],
  }),
});

if (response.status === 402) {
  const paymentHeader = response.headers.get("payment-required");
  const paymentInfo = JSON.parse(atob(paymentHeader!));

  // paymentInfo contains:
  // - recipient: USDC recipient address
  // - amount: amount in USDC (e.g. "0.003")
  // - chain: 8453 (Base)
  // - token: USDC contract address

  // Execute payment on Base, then re-submit with proof
}

Streaming

Enable real-time token streaming via Server-Sent Events (SSE) by setting stream: true. Each chunk contains a delta of the response.

curl -X POST https://api.clusterprotocol.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "llama-3.3-70b",
    "provider": "venice",
    "messages": [{"role": "user", "content": "Write a haiku"}],
    "stream": true
  }'

SSE Format

text
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Error Handling

The API returns standard HTTP status codes. Error responses include a JSON body with details about what went wrong.

StatusMeaningAction
400Bad RequestCheck request body and parameters
401UnauthorizedVerify your API key
402Payment RequiredComplete x402 payment or add funds
404Not FoundVerify model ID and endpoint path
429Rate LimitedBack off and retry with exponential delay
500Server ErrorRetry after a brief delay
503Model UnavailableThe model is temporarily offline; try another

Error Response Format

json
{
  "error": {
    "message": "Invalid model ID: unknown-model",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}