ollama/model/models
Jeffrey Morgan a1ca428c90
glm4moelite: fix attention scale calculation (#13893)
Use the original key dimension (qkNopeHeadDim + qkRopeHeadDim = 256) for
the attention scale instead of the MLA absorbed dimension (kvLoraRank +
qkRopeHeadDim = 576).

MLA absorption is a mathematically equivalent reorganization of the
attention computation - it should not change the effective attention
scale. The scale should match training, which uses 1/sqrt(256).

This improves tool calling and model looping issues.
2026-01-24 17:48:09 -08:00
..
bert revert granite-embedding (#13505) 2025-12-16 15:44:52 -08:00
deepseek2 refactor rope 2025-12-08 14:42:22 -08:00
deepseekocr refactor rope 2025-12-08 14:42:22 -08:00
gemma2 refactor rope 2025-12-08 14:42:22 -08:00
gemma3 model: default gemma 3 rope scale to 1.0, apply corrections based on layer counts (#13453) 2025-12-12 17:51:56 -08:00
gemma3n refactor rope 2025-12-08 14:42:22 -08:00
glm4moelite glm4moelite: fix attention scale calculation (#13893) 2026-01-24 17:48:09 -08:00
gptoss refactor rope 2025-12-08 14:42:22 -08:00
lfm2 model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792) 2026-01-20 12:20:53 -08:00
llama refactor rope 2025-12-08 14:42:22 -08:00
llama4 refactor rope 2025-12-08 14:42:22 -08:00
mistral3 model: fix rotary embeddings for ministral 3 (#13432) 2025-12-11 16:02:05 -08:00
mllama refactor rope 2025-12-08 14:42:22 -08:00
nomicbert nomic-embed-text:v2: model implementation (#13162) 2025-12-09 14:24:51 -08:00
olmo3 model: add olmo3 and olmo3.1 (#13415) 2025-12-15 15:20:04 -08:00
qwen2 refactor rope 2025-12-08 14:42:22 -08:00
qwen3 refactor rope 2025-12-08 14:42:22 -08:00
qwen3vl refactor rope 2025-12-08 14:42:22 -08:00
qwen25vl fix: qwen2.5 vl rope (#13486) 2025-12-15 17:30:33 -08:00
models.go model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792) 2026-01-20 12:20:53 -08:00