ollama

mirror of https://github.com/ollama/ollama synced 2026-04-23 08:45:14 +00:00

History

Jeffrey Morgan 1044b0419a model: add MLA absorption for glm4moelite (#13810 ) * model: add MLA absorption for glm4moelite Split the combined KV_B tensor into separate K_B and V_B tensors during conversion, enabling MLA (Multi-head Latent Attention) absorption which compresses the KV cache for improved efficiency. * ggml: enable MLA flash attention for GLM-4.7-flash Add support for gqa_ratio 4 in MLA flash attention kernels. GLM-4.7-flash uses head size 576 with gqa_ratio 4, which was previously only supported for gqa_ratio 16 (DeepSeek). Metal changes: - Enable head size 576 for flash attention - Increase simdgroups to 8 for large heads (>=512) - Add case 8 kernel dispatch for 8 simdgroups CUDA changes: - Add gqa_ratio 4 support for head 576/512 - Add tile configs for (576, 512, 4) and (576, 512, 8) - Add MMA config cases for ncols 4 - Add template instances for ncols2=4 * model: add compatibility validation for glm4moelite architecture		2026-01-23 14:47:42 -08:00
..
imageproc	deepseekocr	2025-11-18 16:11:37 -08:00
input	batch: use tensors for outputs (#12185 )	2025-09-15 14:33:06 -07:00
models	model: add MLA absorption for glm4moelite (#13810 )	2026-01-23 14:47:42 -08:00
parsers	model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792 )	2026-01-20 12:20:53 -08:00
renderers	model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792 )	2026-01-20 12:20:53 -08:00
testdata	gemma2 impl	2025-03-11 14:35:08 -07:00
bytepairencoding.go	remove unnecessary code (#13502 )	2025-12-16 15:11:26 -08:00
bytepairencoding_test.go	refactor: using testing.B.Loop	2025-10-10 13:25:29 -07:00
model.go	model: add MLA absorption for glm4moelite (#13810 )	2026-01-23 14:47:42 -08:00
model_test.go	fix: leaf alt name (#12390 )	2025-09-23 17:50:53 -07:00
sentencepiece.go	fix(tokenizer): add special tokens to empty inputs (#13091 )	2025-11-18 11:16:56 -08:00
sentencepiece_test.go	model: implement bert in ollama engine (#9080 )	2025-09-15 15:35:59 -07:00
textprocessor.go	model: handle multiple eos tokens (#10577 )	2025-05-16 13:40:23 -07:00
vocabulary.go	fix(tokenizer): add special tokens to empty inputs (#13091 )	2025-11-18 11:16:56 -08:00
vocabulary_test.go	fix(tokenizer): add special tokens to empty inputs (#13091 )	2025-11-18 11:16:56 -08:00
wordpiece.go	nomic-embed-text model implementation (#13071 )	2025-11-18 18:28:10 -08:00
wordpiece_test.go	nomic-embed-text model implementation (#13071 )	2025-11-18 18:28:10 -08:00