API Reference

Base URL

https://api.pureai-api.com

Authentication

All requests require an API key via header:

x-api-key: {your-api-key}

Endpoints

POST /v1/chat/completions

Create a chat completion.

Without Streaming
With Streaming

cURL Example:

curl -X POST "https://api.pureai-api.com/v1/chat/completions" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are helpful."},
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 100,
    "temperature": 0.7,
    "stream": false
  }'

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699123456,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21,
    "input_cost_usd": 0.000018,
    "output_cost_usd": 0.000027,
    "total_cost_usd": 0.000045,
    "latency_ms": 523.4,
    "ttft_ms": 215.2
  }
}

cURL Example:

curl -X POST "https://api.pureai-api.com/v1/chat/completions" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 100,
    "temperature": 0.7,
    "stream": true
  }'

Response (Server-Sent Events):

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Parameters:

Field	Type	Required	Description
`model`	`string`	Yes	Model identifier
`messages`	`array`	Yes	Conversation messages
`max_tokens`	`integer`	No	Maximum tokens to generate
`temperature`	`float`	No	Randomness (0-2)
`top_p`	`float`	No	Nucleus sampling
`stream`	`boolean`	No	Enable streaming (default: false)
`stop`	`array`	No	Stop sequences

POST /v1/completions

Create a text completion.

Without Streaming
With Streaming

cURL Example:

curl -X POST "https://api.pureai-api.com/v1/completions" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "prompt": "The capital of France is",
    "max_tokens": 50,
    "temperature": 0.7,
    "stream": false
  }'

Response:

{
  "id": "cmpl-abc123",
  "object": "text_completion",
  "created": 1699123456,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "text": " Paris",
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 1,
    "total_tokens": 6,
    "input_cost_usd": 0.000008,
    "output_cost_usd": 0.000003,
    "total_cost_usd": 0.000011,
    "latency_ms": 312.1
  }
}

cURL Example:

curl -X POST "https://api.pureai-api.com/v1/completions" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "prompt": "Hello",
    "max_tokens": 1024,
    "temperature": 0.7,
    "stream": true
  }'

Response (Server-Sent Events):

data: {"id":"cmpl-abc","object":"text_completion.chunk","choices":[{"index":0,"text":"Hello"}]}

data: {"id":"cmpl-abc","object":"text_completion.chunk","choices":[{"index":0,"text":" there"}]}

data: {"id":"cmpl-abc","object":"text_completion.chunk","choices":[{"index":0,"text":"!","finish_reason":"stop"}]}

data: [DONE]

Parameters:

Field	Type	Required	Description
`model`	`string`	Yes	Model identifier
`prompt`	`string`	Yes	Text prompt
`max_tokens`	`integer`	No	Maximum tokens
`temperature`	`float`	No	Randomness (0-2)
`top_p`	`float`	No	Nucleus sampling
`stream`	`boolean`	No	Enable streaming (default: false)
`stop`	`array`	No	Stop sequences

GET /v1/models

List available models. Response:

{
  "object": "list",
  "data": [
    {
      "id": "gpt-4o-mini",
      "object": "model",
      "created": 1699000000,
      "owned_by": "openai"
    },
    {
      "id": "claude-3-haiku",
      "object": "model",
      "created": 1699000000,
      "owned_by": "anthropic"
    }
  ]
}

GET /v1/providers

List providers for a model. Query Parameters:

Field	Type	Required	Description
`model`	`string`	Yes	Model identifier

Response:

{
  "providers": [
    {
      "id": "openai",
      "type": "primary",
      "enabled": true,
      "params": {}
    },
    {
      "id": "groq",
      "type": "backup",
      "enabled": true,
      "params": {}
    }
  ]
}

POST /v1/router

Intelligent routing endpoint that automatically selects the best model for your request based on the task complexity, cost optimization, and performance requirements.

Auto Mode
Restricted Mode
Decision Only
With Streaming

Uses all available models from your tenant with intelligent profile selection:

curl -X POST "https://api.pureai-api.com/v1/router" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Write a haiku about programming"}
    ],
    "execute": true
  }'

Choose from specific models only:

curl -X POST "https://api.pureai-api.com/v1/router" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Write Python code to sort a list"}
    ],
    "models": ["gpt-4o-mini", "codestral-latest"],
    "cost_weight": 0.5,
    "execute": true
  }'

Get routing decision without executing the request:

curl -X POST "https://api.pureai-api.com/v1/router" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of Brazil?"}
    ],
    "execute": false
  }'

Response:

{
  "selected_model": "gpt-4o-mini",
  "reasoning": "Simple factual question, low complexity",
  "confidence": 0.95
}

curl -X POST "https://api.pureai-api.com/v1/router" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Tell me a joke"}
    ],
    "stream": true,
    "execute": true
  }'

Parameters:

Field	Type	Required	Description
`messages`	`array`	Yes	Conversation messages
`execute`	`boolean`	No	Execute the request after routing (default: true)
`models`	`array`	No	Restrict selection to specific models
`cost_weight`	`float`	No	Cost optimization weight 0-1 (0 = quality, 1 = cost)
`stream`	`boolean`	No	Enable streaming response
`max_tokens`	`integer`	No	Maximum tokens to generate
`temperature`	`float`	No	Randomness (0-2)

Response (with execute: true):

{
  "id": "router-abc123",
  "object": "chat.completion",
  "model": "gpt-4o-mini",
  "routing": {
    "selected_model": "gpt-4o-mini",
    "reasoning": "Simple conversational request",
    "confidence": 0.92
  },
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here's a programming haiku..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 25,
    "total_tokens": 40,
    "total_cost_usd": 0.00003
  }
}

The router analyzes your prompt to determine complexity, required capabilities (code, math, reasoning, etc.), and selects the most appropriate model balancing quality and cost.

Message Object

Field	Type	Description
`role`	`string`	`system`, `user`, or `assistant`
`content`	`string`	Message content

Usage Object

Field	Type	Description
`prompt_tokens`	`integer`	Input token count
`completion_tokens`	`integer`	Output token count
`total_tokens`	`integer`	Total tokens
`input_cost_usd`	`float`	Input cost (USD)
`output_cost_usd`	`float`	Output cost (USD)
`cache_input_cost_usd`	`float`	Cached input cost
`total_cost_usd`	`float`	Total cost (USD)
`latency_ms`	`float`	Request latency (ms)
`ttft_ms`	`float`	Time to first token (ms)

Streaming

Set stream: true to receive Server-Sent Events:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Rate Limits

Rate limits are applied per API key. When exceeded, you’ll receive a 429 response with retry_after header.

Custom Deployments

Use your own deployed models on PureAI’s GPU infrastructure. When you create a deployment in the PureAI Console, you get a unique model ID that you can use with the pureai/ prefix.

How to Use

Deploy your model on the PureAI Console
Get the model ID from the deployment (e.g., gemma-3-4b-it)
Use pureai/{model-id} in your API requests

Without Streaming
With Streaming
Chat Completions

curl -X POST "https://api.pureai-api.com/v1/completions" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "pureai/gemma-3-4b-it",
    "prompt": "Hello",
    "max_tokens": 1024,
    "temperature": 0.7,
    "stream": false
  }'

curl -X POST "https://api.pureai-api.com/v1/completions" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "pureai/gemma-3-4b-it",
    "prompt": "Hello",
    "max_tokens": 1024,
    "temperature": 0.7,
    "stream": true
  }'

curl -X POST "https://api.pureai-api.com/v1/chat/completions" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "pureai/gemma-3-4b-it",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 1024,
    "temperature": 0.7
  }'

Python SDK

from lunar import Lunar

client = Lunar()

# Using your deployed model
response = client.chat.completions.create(
    model="pureai/gemma-3-4b-it",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Text completions with streaming
for chunk in client.completions.create(
    model="pureai/DeepSeek-R1-Distill-Llama-8B",
    prompt="Explain quantum computing",
    max_tokens=1024,
    stream=True
):
    print(chunk.choices[0].text, end="")

The model ID in the pureai/ prefix is the exact name of your deployed model. You can find this in the PureAI Console after creating a deployment.

SDK Usage

from lunar import Lunar

client = Lunar(api_key="your-key")

# Chat completions
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Text completions
response = client.completions.create(
    model="gpt-4o-mini",
    prompt="Hello"
)

# List models
models = client.models.list()

# List providers
providers = client.providers.list(model="gpt-4o-mini")

​API Reference

​Base URL

​Authentication

​Endpoints

​POST /v1/chat/completions

​POST /v1/completions

​GET /v1/models

​GET /v1/providers

​POST /v1/router

​Message Object

​Usage Object

​Streaming

​Rate Limits

​Custom Deployments

​How to Use

​Python SDK

​SDK Usage

API Reference

Base URL

Authentication

Endpoints

POST /v1/chat/completions

POST /v1/completions

GET /v1/models

GET /v1/providers

POST /v1/router

Message Object

Usage Object

Streaming

Rate Limits

Custom Deployments

How to Use

Python SDK

SDK Usage