mirror of
https://github.com/ollama/ollama
synced 2026-05-01 20:55:48 +00:00
After the rotating buffer has wrapped (c.offset > c.maxSize) a subsequent L>1 Update() went through a slice-to-[0, c.idx) path that discarded all slots in [c.idx, Dim), losing the older-but-still-in-window tokens the first Q of the new batch needs for its sliding-window attention. Linearize the circular buffer to logical order in that wrapped case so the existing trim + concat preserves the last (maxSize - 1) old tokens. When the buffer has not yet wrapped (c.offset <= c.maxSize), slots [c.idx, Dim) are grow padding or stale post-rewind data, so keep dropping them. |
||
|---|---|---|
| .. | ||
| cache | ||
| mlx | ||
| model | ||
| sample | ||
| cache.go | ||
| cache_test.go | ||
| cache_trie.go | ||
| cache_trie_test.go | ||
| client.go | ||
| imports.go | ||
| pipeline.go | ||
| runner.go | ||
| server.go | ||
| utf8_buffer.go | ||
| utf8_buffer_test.go | ||