ollama

mirror of https://github.com/ollama/ollama synced 2026-04-26 10:14:33 +00:00

History

Jeffrey Morgan a1ca428c90 glm4moelite: fix attention scale calculation (#13893 ) Use the original key dimension (qkNopeHeadDim + qkRopeHeadDim = 256) for the attention scale instead of the MLA absorbed dimension (kvLoraRank + qkRopeHeadDim = 576). MLA absorption is a mathematically equivalent reorganization of the attention computation - it should not change the effective attention scale. The scale should match training, which uses 1/sqrt(256). This improves tool calling and model looping issues.		2026-01-24 17:48:09 -08:00
..
bert	revert granite-embedding (#13505 )	2025-12-16 15:44:52 -08:00
deepseek2	refactor rope	2025-12-08 14:42:22 -08:00
deepseekocr	refactor rope	2025-12-08 14:42:22 -08:00
gemma2	refactor rope	2025-12-08 14:42:22 -08:00
gemma3	model: default gemma 3 rope scale to 1.0, apply corrections based on layer counts (#13453 )	2025-12-12 17:51:56 -08:00
gemma3n	refactor rope	2025-12-08 14:42:22 -08:00
glm4moelite	glm4moelite: fix attention scale calculation (#13893 )	2026-01-24 17:48:09 -08:00
gptoss	refactor rope	2025-12-08 14:42:22 -08:00
lfm2	model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792 )	2026-01-20 12:20:53 -08:00
llama	refactor rope	2025-12-08 14:42:22 -08:00
llama4	refactor rope	2025-12-08 14:42:22 -08:00
mistral3	model: fix rotary embeddings for ministral 3 (#13432 )	2025-12-11 16:02:05 -08:00
mllama	refactor rope	2025-12-08 14:42:22 -08:00
nomicbert	nomic-embed-text:v2: model implementation (#13162 )	2025-12-09 14:24:51 -08:00
olmo3	model: add olmo3 and olmo3.1 (#13415 )	2025-12-15 15:20:04 -08:00
qwen2	refactor rope	2025-12-08 14:42:22 -08:00
qwen3	refactor rope	2025-12-08 14:42:22 -08:00
qwen3vl	refactor rope	2025-12-08 14:42:22 -08:00
qwen25vl	fix: qwen2.5 vl rope (#13486 )	2025-12-15 17:30:33 -08:00
models.go	model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792 )	2026-01-20 12:20:53 -08:00