Inference API

Inference Gates

OpenAI-compatible API, credit-billed. Point your existing SDK, query models, and pay per token from your razorBridge balance.

View Docs View Pricing

Getting Started

Three Steps to Inference

Create Key

rb gate keys create "Lab key"

Generate an API key from the CLI or dashboard.

Point SDK

client = OpenAI(

base_url="https://razorbridge.eu/api/v1/compute/gate",

api_key="rb-gate-..."

)

Use the standard OpenAI SDK with our endpoint.

Query Models

response = client.chat

.completions.create(

model="llama-3",

messages=[...]

)

Credits deducted per token, automatically.

Integration

Works With Your Stack

Drop-in replacement for any OpenAI-compatible client.

from openai import OpenAI

client = OpenAI(

base_url="https://razorbridge.eu/api/v1/compute/gate",

api_key="rb-gate-YOUR_KEY",

)

response = client.chat.completions.create(

model="llama-3",

messages=[

{"role": "user", "content": "Hello!"}

)

print(response.choices[0].message.content)

curl https://razorbridge.eu/api/v1/compute/gate/chat/completions \

-H "Authorization: Bearer rb-gate-YOUR_KEY" \

-H "Content-Type: application/json" \

-d '{

"model": "llama-3",

"messages": [

{"role": "user", "content": "Hello!"}

]

const response = await fetch(

"https://razorbridge.eu/api/v1/compute/gate/chat/completions",

{

method: "POST",

headers: {

"Authorization": "Bearer rb-gate-YOUR_KEY",

"Content-Type": "application/json",

body: JSON.stringify({

model: "llama-3",

messages: [

{ role: "user", content: "Hello!" }

}),

}

);

const data = await response.json();

console.log(data.choices[0].message.content);

Models

Available Models

Models route through Local (Outpost) or Cloud (OpenRouter) depending on availability.

Alias	Local (Outpost)	Cloud (OpenRouter)
klusai/coder	qwen3-coder-next	deepseek/deepseek-v4-pro
klusai/coder:local	qwen3-coder-next	—
klusai/fast	qwen2.5:7b	qwen/qwen-2.5-7b-instruct
klusai/fast:local	qwen2.5:7b	—
klusai/ocr	glm-ocr	qwen/qwen3-vl-32b-instruct
klusai/ocr:local	glm-ocr	—
klusai/reasoning	—	anthropic/claude-sonnet-4-thinking
klusai/smart	—	anthropic/claude-sonnet-4
klusai/vision	—	openai/gpt-4o

Complete Workflow

Train and Serve From One Account

One credit balance covers both GPU Blades and Inference Gates.

Train

SSH into a GPU Blade with PyTorch, CUDA, and your datasets. Fine-tune or train from scratch.

rb blade ssh
nvidia-smi
python train.py --model llama-3

Serve

Create an API key and query any model via the OpenAI SDK. Credit-billed per token.

rb gate keys create "Lab key"
client = OpenAI(
  base_url="https://razorbridge.eu/api/v1/compute/gate"
)

Start querying models in minutes

Create an API key and use the OpenAI SDK you already know.

Read the Docs View Pricing