EMBEDDING API

Vector embeddings, one API call.

Generate high-quality vector embeddings for retrieval-augmented generation, semantic search, recommendation systems, and clustering. OpenAI-compatible embeddings endpoint with multiple model options. Drop into your existing RAG pipeline with a base URL change.

/v1/embeddings
Endpoint
OpenAI-compatible
Format
RAG, search, clustering
Use cases

QUICK START

Integrate in minutes.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

response = client.embeddings.create(
    model="BAAI/bge-large-en-v1.5",
    input=["What is retrieval-augmented generation?"],
)
embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

AVAILABLE MODELS

Models you can use today.

BAAI/bge-large-en-v1.5
BAAIPer-token
1024 dims, strong retrieval
BAAI/bge-base-en-v1.5
BAAIPer-token
768 dims, balanced speed/quality

WHY RUNCRATE

Built for production.

RAG-Optimized

BGE models are trained specifically for retrieval tasks. High recall on document retrieval benchmarks like MTEB and BEIR.

Batch Processing

Send multiple texts in a single request. Embed entire document collections efficiently with batch support.

OpenAI-Compatible

Standard /v1/embeddings endpoint. Works with LangChain, LlamaIndex, and any OpenAI-compatible vector pipeline.

Low Latency

Embedding models are lightweight and fast. Sub-50ms latency per batch, enabling real-time search and retrieval.

FAQ

Common questions.

Start building RAG pipelines.