jayn7/Z-Image-Turbo-GGUF

text to imagegguftext-to-imageimage-generationbase_model:Tongyi-MAI/Z-Image-Turbobase_model:quantized:Tongyi-MAI/Z-Image-Turbolicense:apache-2.0apache-2.0
75.8K

[Optional] Attention Backend

Diffusers uses SDPA by default. Switch to Custom attention backend for better efficiency if supported:

#pipeline.transformer.set_attention_backend("_sage_qk_int8_pv_fp16_triton") # Enable Sage Attention #pipeline.transformer.set_attention_backend("flash") # Enable Flash-Attention-2 #pipeline.transformer.set_attention_backend("_flash_3") # Enable Flash-Attention-3

[Optional] Model Compilation

Compiling the DiT model accelerates inference, but the first run will take longer to compile.

#pipeline.transformer.compile()

[Optional] CPU Offloading

Enable CPU offloading for memory-constrained devices.

#pipeline.enable_model_cpu_offload()

images = pipeline( prompt=prompt, num_inference_steps=9, # This actually results in 8 DiT forwards guidance_scale=0.0, # Guidance should be 0 for the Turbo models height=height, width=width, generator=torch.Generator("cuda").manual_seed(seed) ).images[0]

images.save("zimage.png")


</details>


### Credits

- **Original Model**: [Z-Image Turbo by Tongyi-MAI](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
- **Quantization Tools & Guide**: [llama.cpp](https://github.com/ggml-org/llama.cpp) & [city96](https://github.com/city96/ComfyUI-GGUF/blob/main/tools/README.md)

### License
This repository follows the same license as the [Z-Image Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo).
DEPLOY IN 60 SECONDS

Run Z-Image-Turbo-GGUF on Runcrate

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.