hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4

text generationtransformersendetransformerssafetensorsllamatext-generationllama-3.1metallama3.1
170.9K

Load model

model = AutoAWQForCausalLM.from_pretrained( model_path, low_cpu_mem_usage=True, use_cache=False, ) tokenizer = AutoTokenizer.from_pretrained(model_path)

Quantize

model.quantize(tokenizer, quant_config=quant_config)

Save quantized model

model.save_quantized(quant_path) tokenizer.save_pretrained(quant_path)

print(f'Model is quantized and saved at "{quant_path}"')

DEPLOY IN 60 SECONDS

Run Meta-Llama-3.1-8B-Instruct-AWQ-INT4 on Runcrate

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.