EMBEDDING API
Generate high-quality vector embeddings for retrieval-augmented generation, semantic search, recommendation systems, and clustering. OpenAI-compatible embeddings endpoint with multiple model options. Drop into your existing RAG pipeline with a base URL change.
QUICK START
from openai import OpenAI
client = OpenAI(
base_url="https://api.runcrate.ai/v1",
api_key="rc_live_YOUR_API_KEY",
)
response = client.embeddings.create(
model="BAAI/bge-large-en-v1.5",
input=["What is retrieval-augmented generation?"],
)
embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")AVAILABLE MODELS
| Model | Provider | Price | Detail |
|---|---|---|---|
| BAAI/bge-large-en-v1.5 | BAAI | Per-token | 1024 dims, strong retrieval |
| BAAI/bge-base-en-v1.5 | BAAI | Per-token | 768 dims, balanced speed/quality |
WHY RUNCRATE
BGE models are trained specifically for retrieval tasks. High recall on document retrieval benchmarks like MTEB and BEIR.
Send multiple texts in a single request. Embed entire document collections efficiently with batch support.
Standard /v1/embeddings endpoint. Works with LangChain, LlamaIndex, and any OpenAI-compatible vector pipeline.
Embedding models are lightweight and fast. Sub-50ms latency per batch, enabling real-time search and retrieval.
FAQ