Original model: https://huggingface.co/5CD-AI/Vintern-1B-v3_5
Install llama.cpp
Then:
llama-server -hf ngxson/Vintern-1B-v3_5-GGUF --chat-template vicuna
Run this model on powerful GPU infrastructure. Deploy in 60 seconds.
Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.