> ## Documentation Index
> Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Analyze Long Documents with AI

> Process entire codebases, legal contracts, and research papers with 1M-context models. DeepSeek V4 Pro, Gemini 2.5 Flash, and Claude 4 Sonnet compared.

export const RuncrateStyles = () => {
  if (typeof document !== 'undefined' && !document.getElementById('runcrate-overrides')) {
    const s = document.createElement('style');
    s.id = 'runcrate-overrides';
    s.textContent = `
      /* Match Runcrate's rounding scale (--radius: 0.75rem) */
      .rounded-sm { border-radius: 0.5rem !important; }   /* 8px */
      .rounded-md { border-radius: 0.625rem !important; } /* 10px */
      .rounded-lg { border-radius: 0.75rem !important; }  /* 12px */
      .rounded-l-sm { border-top-left-radius: 0.5rem !important; border-bottom-left-radius: 0.5rem !important; }
      .rounded-r-sm { border-top-right-radius: 0.5rem !important; border-bottom-right-radius: 0.5rem !important; }
      .rounded-l-md { border-top-left-radius: 0.625rem !important; border-bottom-left-radius: 0.625rem !important; }
      .rounded-r-md { border-top-right-radius: 0.625rem !important; border-bottom-right-radius: 0.625rem !important; }
      .rounded-l-lg { border-top-left-radius: 0.75rem !important; border-bottom-left-radius: 0.75rem !important; }
      .rounded-r-lg { border-top-right-radius: 0.75rem !important; border-bottom-right-radius: 0.75rem !important; }

      /* Cards: never pure white in light mode */
      .card { background-color: #fcfcfc !important; border-radius: 0.75rem !important; }
      html.dark .card { background-color: #141414 !important; }

      /* Docs hero box */
      .rc-hero { background-color: #fcfcfc; border: 1px solid #e0e0e0; }
      html.dark .rc-hero { background-color: #141414; border-color: #242424; }
      html.dark .rc-hero h1 { color: #f5f5f5; }

      /* Runcrate scrollbar — thin, transparent track, hide-until-hover thumb */
      ::-webkit-scrollbar { width: 6px; height: 6px; background-color: transparent; }
      ::-webkit-scrollbar-track { background-color: transparent; }
      ::-webkit-scrollbar-thumb { background-color: rgba(155, 155, 155, 0.5); border-radius: 10px; transition: opacity 0.3s ease; opacity: 0; }
      ::-webkit-scrollbar-thumb:hover { background-color: rgba(155, 155, 155, 0.7); }
      *:hover::-webkit-scrollbar-thumb,
      *:focus::-webkit-scrollbar-thumb,
      *:active::-webkit-scrollbar-thumb { opacity: 1; }
      * { scrollbar-width: thin; scrollbar-color: rgba(155, 155, 155, 0.5) transparent; }
    `;
    document.head.appendChild(s);
  }
  return null;
};

<RuncrateStyles />

Models with 1M+ token context windows can process entire books, codebases, and document collections in a single request — no chunking, no RAG pipeline, no lost context. Send the full text and ask questions directly.

## Models with 1M+ context

| Model                         | Context window | Strengths                                         |
| ----------------------------- | -------------- | ------------------------------------------------- |
| `deepseek-ai/DeepSeek-V4-Pro` | 1M tokens      | Strong reasoning, good at code and legal analysis |
| `google/gemini-2.5-flash`     | 1M tokens      | Fast, cost-effective for bulk processing          |
| `anthropic/claude-4-sonnet`   | 1M tokens      | Precise instruction following, nuanced writing    |

All three models are available through the same API — switch between them by changing the model string.

***

## Analyze an entire codebase

Load every file from a project into a single prompt:

```python theme={"theme":"github-dark"}
from openai import OpenAI
from pathlib import Path

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

def load_codebase(root: str, extensions: list[str] = [".py", ".ts", ".tsx"]) -> str:
    """Concatenate all source files into a single string."""
    parts = []
    for ext in extensions:
        for path in sorted(Path(root).rglob(f"*{ext}")):
            if "node_modules" in str(path) or ".git" in str(path):
                continue
            relative = path.relative_to(root)
            content = path.read_text(errors="ignore")
            parts.append(f"--- {relative} ---\n{content}")
    return "\n\n".join(parts)

codebase = load_codebase("./my-project")
print(f"Loaded {len(codebase):,} characters")

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro",
    messages=[
        {"role": "system", "content": "You are a senior software architect reviewing a codebase."},
        {"role": "user", "content": f"Here is the full codebase:\n\n{codebase}\n\nIdentify the top 5 architectural issues, security vulnerabilities, and performance bottlenecks. For each, cite the exact file and line range."},
    ],
    max_tokens=4096,
)
print(response.choices[0].message.content)
```

***

## Legal contract review

```python theme={"theme":"github-dark"}
from openai import OpenAI
from pathlib import Path

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

contract = Path("master-services-agreement.txt").read_text()

response = client.chat.completions.create(
    model="anthropic/claude-4-sonnet",
    messages=[
        {"role": "system", "content": "You are a corporate attorney. Analyze contracts precisely, citing specific sections."},
        {"role": "user", "content": f"Full contract:\n\n{contract}\n\nCreate a risk summary: list every clause that creates financial liability, termination risk, or IP assignment. For each, provide the section number, a one-sentence summary, and a risk rating (low/medium/high)."},
    ],
    max_tokens=4096,
)
print(response.choices[0].message.content)
```

***

## Research paper synthesis

The same pattern works for research: load multiple papers into a single prompt and ask `google/gemini-2.5-flash` to synthesize findings, map agreements and contradictions, and identify gaps. Gemini's fast inference keeps costs low even for very long inputs.

***

## Next steps

* [Use the DeepSeek V4 API](/examples/deepseek-v4-api) — detailed guide for DeepSeek V4 Pro with streaming and advanced usage.
* [Extract structured data](/examples/structured-output) — combine long-context analysis with schema-based extraction.
* [Model catalog](/models/model-catalog) — compare all models by context window, price, and speed.