Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Access Google’s Gemini models through the Runcrate API. Same models, OpenAI-compatible format, no Google Cloud project required.

Available Gemini models

ModelContextStrengths
Gemini 2.5 Pro1M tokensStrongest reasoning, long-context analysis
Gemini 2.5 Flash1M tokensFast inference, cost-effective

Basic usage

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

response = client.models.chat_completion(
    model="google/gemini-2.5-pro",
    messages=[
        {"role": "user", "content": "Explain how self-attention works in transformers. Include the math."},
    ],
    max_tokens=1024,
)

print(response.choices[0].message.content)

Long-context analysis (1M tokens)

Gemini’s 1M token context handles entire codebases or books in a single request:
from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

codebase = open("full-codebase.txt").read()

response = client.models.chat_completion(
    model="google/gemini-2.5-pro",
    messages=[
        {"role": "system", "content": "You are a senior engineer performing a code review."},
        {"role": "user", "content": f"Review this codebase for security and performance issues:\n\n{codebase}"},
    ],
    max_tokens=4096,
)

print(response.choices[0].message.content)

Vision — image analysis

from runcrate import Runcrate
import base64

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

with open("diagram.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = client.models.chat_completion(
    model="google/gemini-2.5-flash",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this architecture diagram. List all services and connections."},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_b64}"}},
        ],
    }],
)

print(response.choices[0].message.content)

Runcrate vs. direct Google API

Direct Google APIRuncrate
AuthGoogle Cloud project + service accountSingle API key
FormatGoogle-specific SDKOpenAI-compatible
Other modelsGemini only140+ models, same key

Pro vs. Flash

ScenarioModelWhy
Complex reasoningGemini 2.5 ProStronger reasoning
Bulk processingGemini 2.5 FlashFaster, cheaper
Real-time chatGemini 2.5 FlashLower latency
Vision / image analysisEitherBoth support multimodal

Tips

  • 1M context is real — you can feed entire repositories or book-length texts.
  • Gemini 2.5 Flash is the cost-effective choice for high-volume tasks.
  • Same API format: just change the model string from DeepSeek or Llama.

Next steps