mirror of
https://github.com/ollama/ollama
synced 2026-04-23 08:45:14 +00:00
mlx.Copy shares the backing buffer with its source (via copy_shared_buffer) rather than allocating independent storage. When used to snapshot a slice of the KV cache, the snapshot array holds the entire original cache buffer alive through the shared data pointer — even after eval detaches the computation graph. Replace Copy with Contiguous in Snapshot and Split. Contiguous allocates a compact buffer when the source buffer is significantly larger than the logical slice (Contiguous::eval checks buffer_size > nbytes + 16384), which is always the case for KV cache slices. |
||
|---|---|---|
| .. | ||
| agent | ||
| cmd | ||
| create | ||
| imagegen | ||
| mlxrunner | ||
| models | ||
| server | ||
| tokenizer | ||
| tools | ||