Solutions

·

Vision AI

Understand images
and video with AI.

Vision-language models that read, analyze, and reason about visual content. Qwen3-VL, Llama Vision, and Nemotron available via inference API -- plus bare-metal instances for custom training.

VLMs
Vision-language models
235B
Up to 235B parameters
API
Inference API access

Capabilities

See and reason, not just detect.

Visual understanding

Ask questions about images and get detailed, reasoned answers. Describe scenes, identify objects, interpret charts, and understand spatial relationships.

Document analysis

Extract structured data from invoices, receipts, forms, and contracts. OCR with semantic understanding -- not just text extraction, but comprehension.

Video comprehension

Analyze video content frame-by-frame or holistically. Summarize meetings, extract key moments, and answer questions about video sequences.

OCR and text extraction

Read text from images, screenshots, handwritten notes, and scanned documents. Multilingual OCR with context-aware formatting preservation.

Multimodal reasoning

Combine visual and textual inputs for complex tasks. Code from screenshots, math from diagrams, data extraction from charts -- all via the same API.

Custom vision training

Need specialized detection or classification? Deploy bare-metal GPU instances for fine-tuning vision models on your own datasets with full root access.

Models

Vision-language models.
Ready via API.

Frontier VLMs available through the inference API. For custom training, use bare-metal instances.

Qwen3-VL-235BVision-languageBest-in-class visual reasoning
Llama 3.2 90B VisionVision-languageComplex visual QA
Llama 3.2 11B VisionVision-languageFast, efficient vision tasks
Nemotron Nano 12B VLVision-languageLightweight multimodal

How It Works

Three steps to vision AI.

01

Choose your approach

Use the inference API for instant access to Qwen3-VL, Llama Vision, and Nemotron. Or deploy a bare-metal instance for custom model training.

02

Send images or video

Pass images, screenshots, documents, or video frames alongside text prompts. The model sees and reasons about your visual content.

03

Extract insights at scale

Process documents in bulk, analyze video feeds, or integrate visual understanding into your product. Pay per token via the inference API.

Start seeing with Runcrate.

Analyze your first image in seconds with the inference API. No GPU setup, no commitments, no credit card required to explore.

Per-token pricing
No upfront commitments
Up to 235B params
Frontier vision models
Cancel anytime
No lock-in, no penalties