ollama

mirror of https://github.com/ollama/ollama synced 2026-04-24 09:14:37 +00:00

History

Jeffrey Morgan 9667c2282f x/imagegen: add naive TeaCache and FP8 quantization support (#13683 ) TeaCache: - Timestep embedding similarity caching for diffusion models - Polynomial rescaling with configurable thresholds - Reduces transformer forward passes by ~30-50% FP8 quantization: - Support for FP8 quantized models (8-bit weights with scales) - QuantizedMatmul on Metal, Dequantize on CUDA - Client-side quantization via ollama create --quantize fp8 Other bug fixes: - Fix `/api/show` API for image generation models - Server properly returns model info (architecture, parameters, quantization) - Memory allocation optimizations - CLI improvements for image generation		2026-01-12 13:45:22 -08:00
..
examples	ci: restore previous linter rules (#13322 )	2025-12-03 18:55:02 -08:00
client.go	x/imagegen: add naive TeaCache and FP8 quantization support (#13683 )	2026-01-12 13:45:22 -08:00
client_test.go	api/client: handle non-json streaming errors (#13007 )	2025-12-01 15:10:16 -08:00
types.go	preserve tool definition and call JSON ordering (#13525 )	2026-01-05 18:03:36 -08:00
types_test.go	preserve tool definition and call JSON ordering (#13525 )	2026-01-05 18:03:36 -08:00
types_typescript_test.go	tools: support anyOf types	2025-08-05 16:46:24 -07:00