When should I use Runcrate instead of Modal?

Use Modal for fine-grained serverless Python with GPUs and tight Python integration. Use Runcrate when you want dedicated GPU instances (no function cold starts), an OpenAI-compatible inference API for production serving, or open-source SDK compatibility.

Can I migrate Modal functions to Runcrate?

Containerize your Modal function as a Docker image, deploy it as a Runcrate instance with `rc instances create`. For inference workloads, the simpler path is to use Runcrate's inference API instead of running your own model server.

runcrate

Contact Sales Console

MODAL ALTERNATIVE

Like Modal, but with native GPU instances.

Modal is excellent for serverless Python execution with GPUs. Runcrate complements it — when you outgrow Modal's per-function model, you get full GPU instances (H100, H200, B200, MI300X), 200+ inference models via OpenAI-compatible API, and per-second billing. No Modal SDK lock-in.

200+

Models

OpenAI-compatible

Format

Per-second

Billing

Try Runcrate View pricing

COMPARISON

Runcrate vs Modal.

Feature	Runcrate	Modal
Runtime model	Full GPU instances + serverless inference	Serverless Python functions
H100 GPU access	Dedicated, $1.50/hr	Function-level, ephemeral
Inference API	200+ models, OpenAI format	Bring-your-own-model functions
Cold start	Always-on or auto-shutdown	Function cold starts
Long-running jobs	Native (no time limits)	Function timeout limits
SDK lock-in	Standard OpenAI / Docker	Modal-specific decorators

Runtime model

Runcrate: Full GPU instances + serverless inference

Modal: Serverless Python functions

H100 GPU access

Runcrate: Dedicated, $1.50/hr

Modal: Function-level, ephemeral

Inference API

Runcrate: 200+ models, OpenAI format

Modal: Bring-your-own-model functions

Cold start

Runcrate: Always-on or auto-shutdown

Modal: Function cold starts

Long-running jobs

Runcrate: Native (no time limits)

Modal: Function timeout limits

SDK lock-in

Runcrate: Standard OpenAI / Docker

Modal: Modal-specific decorators

GPU PRICING

GPU pricing comparison.

Model	Provider	Price	Detail
deepseek-ai/DeepSeek-V3.2	DeepSeek	$0.27 / 1M	Reasoning, code, 128K ctx
anthropic/claude-4-sonnet	Anthropic	$3 / 1M in, $15 / 1M out	Top-tier reasoning
meta-llama/Llama-4-Scout	Meta	$0.20 / 1M	Open weights, multilingual
Qwen/Qwen3-Max	Alibaba	$0.30 / 1M	30+ languages, 128K ctx
openai/whisper-large-v3	OpenAI	$0.02 / min	Speech-to-text, 100+ langs
black-forest-labs/FLUX.1-pro	Black Forest Labs	$0.04 / image	Photorealistic

deepseek-ai/DeepSeek-V3.2

DeepSeek$0.27 / 1M

Reasoning, code, 128K ctx

anthropic/claude-4-sonnet

Anthropic$3 / 1M in, $15 / 1M out

Top-tier reasoning

meta-llama/Llama-4-Scout

Meta$0.20 / 1M

Open weights, multilingual

Qwen/Qwen3-Max

Alibaba$0.30 / 1M

30+ languages, 128K ctx

openai/whisper-large-v3

OpenAI$0.02 / min

Speech-to-text, 100+ langs

black-forest-labs/FLUX.1-pro

Black Forest Labs$0.04 / image

Photorealistic

WHY SWITCH

Why teams switch to Runcrate.

200+ models, one API key

Chat, code, image, video, audio, embeddings, vision — all under a single OpenAI-compatible endpoint with per-token / per-image / per-second billing.

OpenAI-compatible drop-in

Swap the base URL and your existing OpenAI SDK code keeps working. No custom client library, no rewrite, no lock-in.

Inference + GPU rentals

When the API isn't enough, rent a dedicated H100, H200, or B200 from the same account — same billing, same dashboard, no separate vendor.

Per-second billing, no minimums

Pay only for what you use. No hourly bucketing, no commitment, no idle charges. Prepaid credits never expire.

GET STARTED

Try it now.

import Runcrate from "@runcrate/sdk";

const rc = new Runcrate({ apiKey: "rc_live_YOUR_API_KEY" });

// Spin up a dedicated H100 SXM in 60 seconds
const instance = await rc.instances.create({
  gpu: "h100-sxm",
  region: "auto",
  image: "runcrate/vllm:latest",
});

console.log(`SSH: ssh root@${instance.host}`);

FAQ

Common questions.

Try the Modal alternative.

Get API Key View Pricing