ollama/x
Jesse Gross d1151e18a1 mlx: fix KV cache snapshot memory leak
mlx.Copy shares the backing buffer with its source (via
copy_shared_buffer) rather than allocating independent storage.
When used to snapshot a slice of the KV cache, the snapshot array
holds the entire original cache buffer alive through the shared
data pointer — even after eval detaches the computation graph.

Replace Copy with Contiguous in Snapshot and Split. Contiguous
allocates a compact buffer when the source buffer is significantly
larger than the logical slice (Contiguous::eval checks
buffer_size > nbytes + 16384), which is always the case for KV
cache slices.
2026-03-25 17:26:34 -07:00
..
agent x/cmd: enable web search and web fetch with flag (#13690) 2026-01-12 13:59:40 -08:00
cmd Reapply "don't require pulling stubs for cloud models" again (#14608) 2026-03-06 14:27:47 -08:00
create mlx: add mxfp4/mxfp8/nvfp4 importing (#15015) 2026-03-24 13:45:44 -07:00
imagegen ci: fix windows cgo compiler error (#15046) 2026-03-24 16:45:36 -07:00
mlxrunner mlx: fix KV cache snapshot memory leak 2026-03-25 17:26:34 -07:00
models mlx: add mxfp4/mxfp8/nvfp4 importing (#15015) 2026-03-24 13:45:44 -07:00
server bugfix: display the parameter count correctly in mlx for ollama show (#14285) 2026-02-16 13:03:34 -08:00
tokenizer mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884) 2026-03-17 11:21:38 -07:00
tools add ability to disable cloud (#14221) 2026-02-12 15:47:00 -08:00