leon-se/gemma-3-27b-it-qat-W4A16-G128

image text to textsafetensorsgemma3image-text-to-textconversationalbase_model:google/gemma-3-27b-it-qat-q4_0-unquantizedbase_model:quantized:google/gemma-3-27b-it-qat-q4_0-unquantizedgemma
vLLMRunnable with vLLM
49.0K

W4A16 quantization using llmcompressor. Run with:

vllm serve leon-se/gemma-3-27b-it-qat-W4A16-G128 --max-model-len 4096 --max-num-seqs 1
DEPLOY IN 60 SECONDS

Run gemma-3-27b-it-qat-W4A16-G128 on Runcrate

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.