Inference Gates
OpenAI-compatible HTTP APIs for AI model inference, billed against your rb credits. No deployment, no infrastructure — create a key, point the OpenAI SDK at razorbridge.eu, and start making requests.
What are Inference Gates?
Inference Gates give you instant access to AI models through a standard API. They work with any tool that supports the OpenAI API format — Python SDKs, curl, LangChain, or your own code.
- OpenAI-compatible chat completions endpoint
- Billed per-token against your rb credit balance
- No infrastructure to manage — just an API key
- Same account and credits as GPU Blades
Create an API key
Generate a key from the CLI:
$ rb gate keys create "Lab notebook"
✓ API key created
Key: rb-gate-a1b2c3d4e5f6g7h8
Name: Lab notebook
⚠ Store this key securely — it will not be shown again.
API keys are shown only once at creation. Store the key in a secure location (e.g., environment variable or secrets manager).
Use with OpenAI SDK
Point the OpenAI Python SDK at the razorBridge endpoint:
from openai import OpenAI
client = OpenAI(
base_url="https://razorbridge.eu/api/v1/rb/gate",
api_key="rb-gate-a1b2c3d4e5f6g7h8"
)
response = client.chat.completions.create(
model="llama-3.1-8b",
messages=[{"role": "user", "content": "Explain backpropagation in simple terms"}]
)
print(response.choices[0].message.content)
Set your key as an environment variable to avoid hardcoding it:
export RB_GATE_KEY="rb-gate-a1b2c3d4e5f6g7h8"
import os
from openai import OpenAI
client = OpenAI(
base_url="https://razorbridge.eu/api/v1/rb/gate",
api_key=os.environ["RB_GATE_KEY"]
)
Use with curl
Test the endpoint directly:
curl https://razorbridge.eu/api/v1/rb/gate/chat/completions \
-H "Authorization: Bearer rb-gate-a1b2c3d4e5f6g7h8" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Available models
List available models from the CLI:
$ rb gate models
MODEL CONTEXT PRICING (per 1M tokens)
llama-3.1-8b 128k €0.18 input / €0.18 output
llama-3.1-70b 128k €0.88 input / €0.88 output
mistral-large 128k €2.00 input / €6.00 output
qwen-2.5-72b 128k €0.90 input / €0.90 output
Model availability and pricing may change. Use rb gate models for the current list. All pricing is in EUR credits.
Check usage
See your Inference Gates cost breakdown:
$ rb gate usage
Inference Gates usage (last 30 days):
MODEL REQUESTS TOKENS COST
llama-3.1-8b 142 1.2M €0.22
llama-3.1-70b 28 340K €0.30
──────
Total €0.52
Budget control
Professors and organizers can create API keys with restrictions via the Hub:
- Model restrictions — limit which models a key can access (e.g., only allow llama-3.1-8b for coursework)
- Budget caps — set a maximum spend per key (e.g., €5 per student) — coming soon
- Rate limits — control requests per minute to prevent accidental cost spikes — coming soon
Today, account-level spend caps protect every account from runaway cost, and model restrictions are enforced per key. Per-key budgets and rate tiers are on the roadmap.
Organizers: go to Hub > Inference Gates > Keys to manage API keys for your students.
Roadmap — multi-modal endpoints
Inference Gates today serve OpenAI-compatible chat completions (LLMs). We are extending the gate so you can run audio, image, and video models through the same key, billing, and EU-routed infrastructure — no deployment on your side:
- Audio — speech-to-text (transcription) and text-to-speech, OpenAI-compatible (/audio/transcriptions, /audio/speech)
- Images — text-to-image generation (/images/generations)
- Video — text-to-video and image-to-video generation, billed per second of output
- Streaming — server-sent-events streaming for chat completions (stream: true)
These are on the roadmap and not yet live. The chat-completions endpoint above is available today. Want early access? Email [email protected].
Billing
Inference Gates costs are deducted from the same credit balance as GPU Blades. One account, one ledger — no separate billing for inference.
- Costs are calculated per-token based on the model used
- Charges appear instantly in your transaction history
- Check your balance anytime with
rb credits