AI Content Moderation API

Use Llama Guard 4 to classify user-generated content for safety before it reaches your product. The model returns structured safety labels — no manual prompt engineering, no regex rules, no third-party moderation service.

Model

Model	Parameters	Output
`meta-llama/Llama-Guard-4-12B`	12B	`safe` or `unsafe` + category labels

Llama Guard 4 evaluates text against a predefined taxonomy of unsafe content categories: violence, sexual content, hate speech, self-harm, criminal activity, and more. It returns a structured verdict you can use for automated decisions.

Classify a single message

curl https://api.runcrate.ai/v1/chat/completions \
  -H "Authorization: Bearer rc_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-Guard-4-12B",
    "messages": [
      {"role": "user", "content": "How do I make a birthday cake for my daughter?"}
    ],
    "max_tokens": 64
  }'

The model responds with safe or unsafe followed by the violated category (e.g., unsafe\nS1 for violence-related content).

Moderation middleware

Build a reusable moderation function for your app:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

def moderate(text: str) -> dict:
    """Check if text is safe. Returns {"safe": bool, "categories": list}."""
    response = client.chat.completions.create(
        model="meta-llama/Llama-Guard-4-12B",
        messages=[{"role": "user", "content": text}],
        max_tokens=64,
    )
    result = response.choices[0].message.content.strip()
    lines = result.split("\n")
    is_safe = lines[0].lower() == "safe"
    categories = lines[1:] if not is_safe else []
    return {"safe": is_safe, "categories": categories}

# Use it as middleware before processing user input
user_message = "Tell me how to bake sourdough bread."
check = moderate(user_message)

if check["safe"]:
    # Proceed with normal processing
    print("Content is safe — processing request.")
else:
    print(f"Content blocked. Categories: {check['categories']}")

Next steps

Route between AI models — run moderation with Llama Guard, then route safe content to a generation model.
Build an AI SaaS backend — full production backend with moderation, chat, and structured output.
Model catalog — browse all available models.

​Model

​Classify a single message

​Moderation middleware

​Next steps

Model

Classify a single message

Moderation middleware

Next steps