API Reference
Base URL
https://api.pureai-api.com
Authentication
All requests require an API key via header:
x-api-key: {your-api-key}
Endpoints
POST /v1/chat/completions
Create a chat completion.
Without Streaming
With Streaming
cURL Example:curl -X POST "https://api.pureai-api.com/v1/chat/completions" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"}
],
"max_tokens": 100,
"temperature": 0.7,
"stream": false
}'
Response:{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1699123456,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 9,
"total_tokens": 21,
"input_cost_usd": 0.000018,
"output_cost_usd": 0.000027,
"total_cost_usd": 0.000045,
"latency_ms": 523.4,
"ttft_ms": 215.2
}
}
cURL Example:curl -X POST "https://api.pureai-api.com/v1/chat/completions" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Hello!"}
],
"max_tokens": 100,
"temperature": 0.7,
"stream": true
}'
Response (Server-Sent Events):data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"}}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"}}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Parameters:
| Field | Type | Required | Description |
|---|
model | string | Yes | Model identifier |
messages | array | Yes | Conversation messages |
max_tokens | integer | No | Maximum tokens to generate |
temperature | float | No | Randomness (0-2) |
top_p | float | No | Nucleus sampling |
stream | boolean | No | Enable streaming (default: false) |
stop | array | No | Stop sequences |
POST /v1/completions
Create a text completion.
Without Streaming
With Streaming
cURL Example:curl -X POST "https://api.pureai-api.com/v1/completions" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"prompt": "The capital of France is",
"max_tokens": 50,
"temperature": 0.7,
"stream": false
}'
Response:{
"id": "cmpl-abc123",
"object": "text_completion",
"created": 1699123456,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"text": " Paris",
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 1,
"total_tokens": 6,
"input_cost_usd": 0.000008,
"output_cost_usd": 0.000003,
"total_cost_usd": 0.000011,
"latency_ms": 312.1
}
}
cURL Example:curl -X POST "https://api.pureai-api.com/v1/completions" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"prompt": "Hello",
"max_tokens": 1024,
"temperature": 0.7,
"stream": true
}'
Response (Server-Sent Events):data: {"id":"cmpl-abc","object":"text_completion.chunk","choices":[{"index":0,"text":"Hello"}]}
data: {"id":"cmpl-abc","object":"text_completion.chunk","choices":[{"index":0,"text":" there"}]}
data: {"id":"cmpl-abc","object":"text_completion.chunk","choices":[{"index":0,"text":"!","finish_reason":"stop"}]}
data: [DONE]
Parameters:
| Field | Type | Required | Description |
|---|
model | string | Yes | Model identifier |
prompt | string | Yes | Text prompt |
max_tokens | integer | No | Maximum tokens |
temperature | float | No | Randomness (0-2) |
top_p | float | No | Nucleus sampling |
stream | boolean | No | Enable streaming (default: false) |
stop | array | No | Stop sequences |
GET /v1/models
List available models.
Response:
{
"object": "list",
"data": [
{
"id": "gpt-4o-mini",
"object": "model",
"created": 1699000000,
"owned_by": "openai"
},
{
"id": "claude-3-haiku",
"object": "model",
"created": 1699000000,
"owned_by": "anthropic"
}
]
}
GET /v1/providers
List providers for a model.
Query Parameters:
| Field | Type | Required | Description |
|---|
model | string | Yes | Model identifier |
Response:
{
"providers": [
{
"id": "openai",
"type": "primary",
"enabled": true,
"params": {}
},
{
"id": "groq",
"type": "backup",
"enabled": true,
"params": {}
}
]
}
POST /v1/router
Intelligent routing endpoint that automatically selects the best model for your request based on the task complexity, cost optimization, and performance requirements.
Auto Mode
Restricted Mode
Decision Only
With Streaming
Uses all available models from your tenant with intelligent profile selection:curl -X POST "https://api.pureai-api.com/v1/router" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Write a haiku about programming"}
],
"execute": true
}'
Choose from specific models only:curl -X POST "https://api.pureai-api.com/v1/router" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Write Python code to sort a list"}
],
"models": ["gpt-4o-mini", "codestral-latest"],
"cost_weight": 0.5,
"execute": true
}'
Get routing decision without executing the request:curl -X POST "https://api.pureai-api.com/v1/router" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What is the capital of Brazil?"}
],
"execute": false
}'
Response:{
"selected_model": "gpt-4o-mini",
"reasoning": "Simple factual question, low complexity",
"confidence": 0.95
}
curl -X POST "https://api.pureai-api.com/v1/router" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Tell me a joke"}
],
"stream": true,
"execute": true
}'
Parameters:
| Field | Type | Required | Description |
|---|
messages | array | Yes | Conversation messages |
execute | boolean | No | Execute the request after routing (default: true) |
models | array | No | Restrict selection to specific models |
cost_weight | float | No | Cost optimization weight 0-1 (0 = quality, 1 = cost) |
stream | boolean | No | Enable streaming response |
max_tokens | integer | No | Maximum tokens to generate |
temperature | float | No | Randomness (0-2) |
Response (with execute: true):
{
"id": "router-abc123",
"object": "chat.completion",
"model": "gpt-4o-mini",
"routing": {
"selected_model": "gpt-4o-mini",
"reasoning": "Simple conversational request",
"confidence": 0.92
},
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here's a programming haiku..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 25,
"total_tokens": 40,
"total_cost_usd": 0.00003
}
}
The router analyzes your prompt to determine complexity, required capabilities (code, math, reasoning, etc.), and selects the most appropriate model balancing quality and cost.
Message Object
| Field | Type | Description |
|---|
role | string | system, user, or assistant |
content | string | Message content |
Usage Object
| Field | Type | Description |
|---|
prompt_tokens | integer | Input token count |
completion_tokens | integer | Output token count |
total_tokens | integer | Total tokens |
input_cost_usd | float | Input cost (USD) |
output_cost_usd | float | Output cost (USD) |
cache_input_cost_usd | float | Cached input cost |
total_cost_usd | float | Total cost (USD) |
latency_ms | float | Request latency (ms) |
ttft_ms | float | Time to first token (ms) |
Streaming
Set stream: true to receive Server-Sent Events:
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"}}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"}}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Rate Limits
Rate limits are applied per API key. When exceeded, you’ll receive a 429 response with retry_after header.
Custom Deployments
Use your own deployed models on PureAI’s GPU infrastructure. When you create a deployment in the PureAI Console, you get a unique model ID that you can use with the pureai/ prefix.
How to Use
- Deploy your model on the PureAI Console
- Get the model ID from the deployment (e.g.,
gemma-3-4b-it)
- Use
pureai/{model-id} in your API requests
Without Streaming
With Streaming
Chat Completions
curl -X POST "https://api.pureai-api.com/v1/completions" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "pureai/gemma-3-4b-it",
"prompt": "Hello",
"max_tokens": 1024,
"temperature": 0.7,
"stream": false
}'
curl -X POST "https://api.pureai-api.com/v1/completions" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "pureai/gemma-3-4b-it",
"prompt": "Hello",
"max_tokens": 1024,
"temperature": 0.7,
"stream": true
}'
curl -X POST "https://api.pureai-api.com/v1/chat/completions" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "pureai/gemma-3-4b-it",
"messages": [
{"role": "user", "content": "Hello!"}
],
"max_tokens": 1024,
"temperature": 0.7
}'
Python SDK
from lunar import Lunar
client = Lunar()
# Using your deployed model
response = client.chat.completions.create(
model="pureai/gemma-3-4b-it",
messages=[{"role": "user", "content": "Hello!"}]
)
# Text completions with streaming
for chunk in client.completions.create(
model="pureai/DeepSeek-R1-Distill-Llama-8B",
prompt="Explain quantum computing",
max_tokens=1024,
stream=True
):
print(chunk.choices[0].text, end="")
The model ID in the pureai/ prefix is the exact name of your deployed model. You can find this in the PureAI Console after creating a deployment.
SDK Usage
from lunar import Lunar
client = Lunar(api_key="your-key")
# Chat completions
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
# Text completions
response = client.completions.create(
model="gpt-4o-mini",
prompt="Hello"
)
# List models
models = client.models.list()
# List providers
providers = client.providers.list(model="gpt-4o-mini")