ollama

mirror of https://github.com/ollama/ollama synced 2026-05-01 20:55:48 +00:00

History

Daniel Hiltgen 06ae6367bd mlx: fix RotatingKVCache.concat() dropping context on mid-rotation (#15591 ) After the rotating buffer has wrapped (c.offset > c.maxSize) a subsequent L>1 Update() went through a slice-to-[0, c.idx) path that discarded all slots in [c.idx, Dim), losing the older-but-still-in-window tokens the first Q of the new batch needs for its sliding-window attention. Linearize the circular buffer to logical order in that wrapped case so the existing trim + concat preserves the last (maxSize - 1) old tokens. When the buffer has not yet wrapped (c.offset <= c.maxSize), slots [c.idx, Dim) are grow padding or stale post-rewind data, so keep dropping them.		2026-04-14 18:29:06 -07:00
..
cache.go	mlx: fix RotatingKVCache.concat() dropping context on mid-rotation (#15591 )	2026-04-14 18:29:06 -07:00
cache_test.go	mlxrunner: share KV cache across conversations with common prefixes	2026-03-18 16:06:33 -07:00
recurrent.go	mlxrunner: combine setStateRaw and setStateDetached into setState	2026-03-26 13:32:11 -07:00
recurrent_test.go	mlxrunner: support partial match on pure transformer caches	2026-03-23 17:44:19 -07:00
rotating_multiturn_test.go	mlx: fix RotatingKVCache.concat() dropping context on mid-rotation (#15591 )	2026-04-14 18:29:06 -07:00