Skip to main content

API Reference

Base URL

https://api.pureai-api.com

Authentication

All requests require an API key via header:
x-api-key: {your-api-key}

Endpoints

POST /v1/chat/completions

Create a chat completion.
cURL Example:
curl -X POST "https://api.pureai-api.com/v1/chat/completions" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are helpful."},
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 100,
    "temperature": 0.7,
    "stream": false
  }'
Response:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699123456,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21,
    "input_cost_usd": 0.000018,
    "output_cost_usd": 0.000027,
    "total_cost_usd": 0.000045,
    "latency_ms": 523.4,
    "ttft_ms": 215.2
  }
}
Parameters:
FieldTypeRequiredDescription
modelstringYesModel identifier
messagesarrayYesConversation messages
max_tokensintegerNoMaximum tokens to generate
temperaturefloatNoRandomness (0-2)
top_pfloatNoNucleus sampling
streambooleanNoEnable streaming (default: false)
stoparrayNoStop sequences

POST /v1/completions

Create a text completion.
cURL Example:
curl -X POST "https://api.pureai-api.com/v1/completions" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "prompt": "The capital of France is",
    "max_tokens": 50,
    "temperature": 0.7,
    "stream": false
  }'
Response:
{
  "id": "cmpl-abc123",
  "object": "text_completion",
  "created": 1699123456,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "text": " Paris",
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 1,
    "total_tokens": 6,
    "input_cost_usd": 0.000008,
    "output_cost_usd": 0.000003,
    "total_cost_usd": 0.000011,
    "latency_ms": 312.1
  }
}
Parameters:
FieldTypeRequiredDescription
modelstringYesModel identifier
promptstringYesText prompt
max_tokensintegerNoMaximum tokens
temperaturefloatNoRandomness (0-2)
top_pfloatNoNucleus sampling
streambooleanNoEnable streaming (default: false)
stoparrayNoStop sequences

GET /v1/models

List available models. Response:
{
  "object": "list",
  "data": [
    {
      "id": "gpt-4o-mini",
      "object": "model",
      "created": 1699000000,
      "owned_by": "openai"
    },
    {
      "id": "claude-3-haiku",
      "object": "model",
      "created": 1699000000,
      "owned_by": "anthropic"
    }
  ]
}

GET /v1/providers

List providers for a model. Query Parameters:
FieldTypeRequiredDescription
modelstringYesModel identifier
Response:
{
  "providers": [
    {
      "id": "openai",
      "type": "primary",
      "enabled": true,
      "params": {}
    },
    {
      "id": "groq",
      "type": "backup",
      "enabled": true,
      "params": {}
    }
  ]
}

POST /v1/router

Intelligent routing endpoint that automatically selects the best model for your request based on the task complexity, cost optimization, and performance requirements.
Uses all available models from your tenant with intelligent profile selection:
curl -X POST "https://api.pureai-api.com/v1/router" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Write a haiku about programming"}
    ],
    "execute": true
  }'
Parameters:
FieldTypeRequiredDescription
messagesarrayYesConversation messages
executebooleanNoExecute the request after routing (default: true)
modelsarrayNoRestrict selection to specific models
cost_weightfloatNoCost optimization weight 0-1 (0 = quality, 1 = cost)
streambooleanNoEnable streaming response
max_tokensintegerNoMaximum tokens to generate
temperaturefloatNoRandomness (0-2)
Response (with execute: true):
{
  "id": "router-abc123",
  "object": "chat.completion",
  "model": "gpt-4o-mini",
  "routing": {
    "selected_model": "gpt-4o-mini",
    "reasoning": "Simple conversational request",
    "confidence": 0.92
  },
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here's a programming haiku..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 25,
    "total_tokens": 40,
    "total_cost_usd": 0.00003
  }
}
The router analyzes your prompt to determine complexity, required capabilities (code, math, reasoning, etc.), and selects the most appropriate model balancing quality and cost.

Message Object

FieldTypeDescription
rolestringsystem, user, or assistant
contentstringMessage content

Usage Object

FieldTypeDescription
prompt_tokensintegerInput token count
completion_tokensintegerOutput token count
total_tokensintegerTotal tokens
input_cost_usdfloatInput cost (USD)
output_cost_usdfloatOutput cost (USD)
cache_input_cost_usdfloatCached input cost
total_cost_usdfloatTotal cost (USD)
latency_msfloatRequest latency (ms)
ttft_msfloatTime to first token (ms)

Streaming

Set stream: true to receive Server-Sent Events:
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Rate Limits

Rate limits are applied per API key. When exceeded, you’ll receive a 429 response with retry_after header.

Custom Deployments

Use your own deployed models on PureAI’s GPU infrastructure. When you create a deployment in the PureAI Console, you get a unique model ID that you can use with the pureai/ prefix.

How to Use

  1. Deploy your model on the PureAI Console
  2. Get the model ID from the deployment (e.g., gemma-3-4b-it)
  3. Use pureai/{model-id} in your API requests
curl -X POST "https://api.pureai-api.com/v1/completions" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "pureai/gemma-3-4b-it",
    "prompt": "Hello",
    "max_tokens": 1024,
    "temperature": 0.7,
    "stream": false
  }'

Python SDK

from lunar import Lunar

client = Lunar()

# Using your deployed model
response = client.chat.completions.create(
    model="pureai/gemma-3-4b-it",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Text completions with streaming
for chunk in client.completions.create(
    model="pureai/DeepSeek-R1-Distill-Llama-8B",
    prompt="Explain quantum computing",
    max_tokens=1024,
    stream=True
):
    print(chunk.choices[0].text, end="")
The model ID in the pureai/ prefix is the exact name of your deployed model. You can find this in the PureAI Console after creating a deployment.

SDK Usage

from lunar import Lunar

client = Lunar(api_key="your-key")

# Chat completions
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Text completions
response = client.completions.create(
    model="gpt-4o-mini",
    prompt="Hello"
)

# List models
models = client.models.list()

# List providers
providers = client.providers.list(model="gpt-4o-mini")