Documentation Index
Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Run DeepSeek V3 (685B MoE) or DeepSeek R1 on your own GPU. DeepSeek models use mixture-of-experts — only ~37B parameters are active per token, so they fit on fewer GPUs than the total parameter count suggests.
GPU requirements
| Model | GPU | VRAM needed | Approx. cost |
|---|
| DeepSeek R1 Distill 8B | RTX 4090 (24 GB) | ~16 GB | ~$0.35/hr |
| DeepSeek R1 Distill 70B | A100 80 GB | ~70 GB | ~$1.60/hr |
| DeepSeek V3 / R1 full (FP8) | 4x H100 80 GB | ~50 GB each | ~$10.00/hr |
Deploy DeepSeek R1 Distill 8B (RTX 4090)
runcrate instances create --name deepseek-8b --gpu RTX4090
runcrate instances status deepseek-8b
runcrate ssh deepseek-8b -- "pip install vllm"
runcrate ssh deepseek-8b -- "nohup python -m vllm.entrypoints.openai.api_server \
--model deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
--max-model-len 8192 \
--port 8000 --host 0.0.0.0 \
> /root/vllm.log 2>&1 &"
Deploy DeepSeek R1 Distill 70B (A100)
runcrate instances create --name deepseek-70b --gpu A100
runcrate ssh deepseek-70b -- "pip install vllm"
runcrate ssh deepseek-70b -- "nohup python -m vllm.entrypoints.openai.api_server \
--model deepseek-ai/DeepSeek-R1-Distill-Llama-70B \
--max-model-len 8192 \
--port 8000 --host 0.0.0.0 \
> /root/vllm.log 2>&1 &"
Deploy DeepSeek V3 full (4x H100)
runcrate instances create --name deepseek-v3 --gpu H100 --gpu-count 4
runcrate ssh deepseek-v3 -- "pip install vllm"
runcrate ssh deepseek-v3 -- "nohup python -m vllm.entrypoints.openai.api_server \
--model deepseek-ai/DeepSeek-V3 \
--tensor-parallel-size 4 \
--max-model-len 16384 \
--trust-remote-code \
--port 8000 --host 0.0.0.0 \
> /root/vllm.log 2>&1 &"
Test the endpoint
runcrate instances info deepseek-8b
curl http://<INSTANCE_IP>:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [{"role": "user", "content": "Explain mixture-of-experts in two sentences."}],
"max_tokens": 256
}'
Monitoring
runcrate ssh deepseek-8b -- nvidia-smi
runcrate ssh deepseek-8b -- "tail -50 /root/vllm.log"
Cleanup
runcrate instances delete deepseek-8b
runcrate instances delete deepseek-70b
runcrate instances delete deepseek-v3