What embedding dimensions are available?

BGE Large produces 1024-dimensional vectors. BGE Base produces 768-dimensional vectors. Choose based on your vector database constraints and quality requirements.

Can I use these with Pinecone/Weaviate/Qdrant?

Yes. The embeddings are standard float vectors. Store them in any vector database. The OpenAI-compatible format works with all major vector DB integrations.

How many tokens can I embed at once?

BGE models support up to 512 tokens per text input. For longer documents, chunk them before embedding. You can send multiple chunks in a single batch request.

runcrate

Contact Sales Console

EMBEDDING API

Vector embeddings, one API call.

Generate high-quality vector embeddings for retrieval-augmented generation, semantic search, recommendation systems, and clustering. OpenAI-compatible embeddings endpoint with multiple model options. Drop into your existing RAG pipeline with a base URL change.

/v1/embeddings

Endpoint

OpenAI-compatible

Format

RAG, search, clustering

Use cases

Get API Key View Pricing

QUICK START

Integrate in minutes.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

response = client.embeddings.create(
    model="BAAI/bge-large-en-v1.5",
    input=["What is retrieval-augmented generation?"],
)
embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

AVAILABLE MODELS

Models you can use today.

Model	Provider	Price	Detail
BAAI/bge-large-en-v1.5	BAAI	Per-token	1024 dims, strong retrieval
BAAI/bge-base-en-v1.5	BAAI	Per-token	768 dims, balanced speed/quality

BAAI/bge-large-en-v1.5

BAAIPer-token

1024 dims, strong retrieval

BAAI/bge-base-en-v1.5

BAAIPer-token

768 dims, balanced speed/quality

WHY RUNCRATE

Built for production.

RAG-Optimized

BGE models are trained specifically for retrieval tasks. High recall on document retrieval benchmarks like MTEB and BEIR.

Batch Processing

Send multiple texts in a single request. Embed entire document collections efficiently with batch support.

OpenAI-Compatible

Standard /v1/embeddings endpoint. Works with LangChain, LlamaIndex, and any OpenAI-compatible vector pipeline.

Low Latency

Embedding models are lightweight and fast. Sub-50ms latency per batch, enabling real-time search and retrieval.

FAQ

Common questions.

Start building RAG pipelines.

Get API Key View Pricing