ollama/x/imagegen/models
Jeffrey Morgan 9667c2282f
x/imagegen: add naive TeaCache and FP8 quantization support (#13683)
TeaCache:
- Timestep embedding similarity caching for diffusion models
- Polynomial rescaling with configurable thresholds
- Reduces transformer forward passes by ~30-50%

FP8 quantization:
- Support for FP8 quantized models (8-bit weights with scales)
- QuantizedMatmul on Metal, Dequantize on CUDA
- Client-side quantization via ollama create --quantize fp8

Other bug fixes:
- Fix `/api/show` API for image generation models
- Server properly returns model info (architecture, parameters, quantization)
- Memory allocation optimizations
- CLI improvements for image generation
2026-01-12 13:45:22 -08:00
..
gemma3 Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
gpt_oss Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
llama Add experimental MLX backend and engine with imagegen support (#13648) 2026-01-08 16:18:59 -08:00
qwen_image x/imagegen: add naive TeaCache and FP8 quantization support (#13683) 2026-01-12 13:45:22 -08:00
qwen_image_edit x/imagegen: add naive TeaCache and FP8 quantization support (#13683) 2026-01-12 13:45:22 -08:00
zimage x/imagegen: add naive TeaCache and FP8 quantization support (#13683) 2026-01-12 13:45:22 -08:00