Solutions

·

RAG

Build RAG pipelines
that stay fast and private.

Embed with text-embedding-3 or BGE via API. Store vectors in Qdrant, Weaviate, or Chroma on dedicated instances. Generate answers with Claude 4, DeepSeek-V3.2, or any of 200+ models. Zero egress fees -- your data never leaves the platform.

$0
Egress fees
200+
Models via API
Per-token
Embedding pricing

The RAG Stack

Embed, store, retrieve, generate. All here.

Embedding models via API

text-embedding-3-large, BGE-M3, E5-Mistral, and more via a single endpoint. Generate dense or sparse embeddings for text, code, and images.

Vector databases on instances

Run Qdrant, Weaviate, Chroma, or Milvus on dedicated instances with persistent storage. Full root access for custom configuration.

Generation models via API

Claude 4 Sonnet, DeepSeek-V3.2, Llama 4 Scout, Gemini 2.5 Flash -- use any model for the generation step. Switch models without changing your pipeline.

Zero egress, zero latency tax

Embeddings, vector DB, and generation all run on the same platform. No cross-provider network hops, no data transfer fees eating your margins.

Chunking and preprocessing

Use dedicated instances to run custom chunking pipelines -- LangChain, LlamaIndex, or your own scripts. Full Python/Node environment with Docker support.

Scale each layer independently

More embedding throughput? Call the API. More vector storage? Expand your instance disk. More generation capacity? Add models. No over-provisioning.

Models & Tools

Popular choices
for RAG pipelines.

Mix and match embedding models, vector stores, and generation models to fit your use case.

text-embedding-3-largeEmbedding · APIHigh-accuracy dense embeddings
BGE-M3 / E5-MistralEmbedding · APIMultilingual, hybrid retrieval
Qdrant / Weaviate / ChromaVector DB · InstancePersistent vector storage
Claude 4 / DeepSeek-V3.2Generation · APIContext synthesis and answers
Llama 4 Scout / Qwen3Generation · API or self-hostedOpen-source generation

How It Works

Three steps to a RAG pipeline.

01

Embed your documents

Call the Inference API with your text to generate embeddings. Use text-embedding-3, BGE, E5, or any supported embedding model. Per-token pricing, no minimums.

02

Store in a vector database

Deploy Qdrant, Weaviate, or Chroma on a dedicated instance. Index your embeddings with persistent storage. Zero egress means retrieval stays fast and free.

03

Retrieve and generate

Query your vector DB, pass context to Claude 4, DeepSeek-V3.2, or any generation model via the same API. Entire pipeline runs on-platform with no data transfer fees.

Build your RAG pipeline on Runcrate.

Embeddings, vector storage, and generation on one platform. Zero egress fees, per-token pricing, no credit card required to start.

Zero egress fees
Data stays on-platform
Per-token pricing
Embeddings and generation
Cancel anytime
No lock-in, no penalties