Inference API

Run any model, pay per token.

Access 158+ open-source and frontier models through a single unified API. Pay per token, per image, or per second of video.

158+
Models available
9
Categories
MiniMax AI
MiniMax M2.7New
MiniMax AI
CHAT
MiniMax AI
MiniMax M2.5
MiniMax AI
CHAT
Moonshot AI
Kimi K2.5
Moonshot AI
CHAT
OpenAI
gpt-oss-120B
OpenAI
CHAT
DeepSeek AI
DeepSeek-V3.2
DeepSeek AI
CHAT
Alibaba Cloud
Qwen3-Max
Alibaba Cloud
CHAT
Alibaba Cloud
Qwen3-Max-Thinking
Alibaba Cloud
REASONING
Google
Gemini 2.5 Flash
Google
CHAT
Google
Gemini 2.5 Pro
Google
CHAT
Zhipu AI
GLM-5
Zhipu AI
CHAT
Anthropic
Claude 4 Sonnet
Anthropic
CHAT
Anthropic
Claude 4 Opus
Anthropic
CHAT
Anthropic
Claude 3.7 Sonnet
Anthropic
CHAT
DeepSeek AI
DeepSeek-R1
DeepSeek AI
REASONING
DeepSeek AI
DeepSeek-R1-Turbo
DeepSeek AI
REASONING
Meta
Llama 4 Scout
Meta
CHAT
Microsoft
Phi-4
Microsoft
CHAT
Microsoft
Phi-4 Reasoning Plus
Microsoft
REASONING
ByteDance
ByteDance Seed-2.0 Mini
ByteDance
CHAT
Moonshot AI
Kimi K2.5 Turbo
Moonshot AI
CHAT

Start building on Runcrate.

One API, every model. Deploy your first endpoint in seconds.

Pay-per-token
No minimum spend
One API
Every provider, one endpoint
200+ models
Chat, code, image, video, audio