W4A16 quantization using llmcompressor. Run with:
vllm serve leon-se/gemma-3-27b-it-qat-W4A16-G128 --max-model-len 4096 --max-num-seqs 1
Run this model on powerful GPU infrastructure. Deploy in 60 seconds.
Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.