ollama

mirror of https://github.com/ollama/ollama synced 2026-04-23 08:45:14 +00:00

History

Jesse Gross 96e36c0d90 mlxrunner: share KV cache across conversations with common prefixes Enable multiple conversations to reuse cached computations when they share token prefixes (e.g. the same system prompt). A prefix trie tracks shared regions so switching between conversations only recomputes tokens that diverge. Inactive conversation state is paged from active GPU memory to other memory and restored on demand, with LRU eviction to keep memory usage bounded.		2026-03-18 16:06:33 -07:00
..
agent	x/cmd: enable web search and web fetch with flag (#13690 )	2026-01-12 13:59:40 -08:00
cmd	Reapply "don't require pulling stubs for cloud models" again (#14608 )	2026-03-06 14:27:47 -08:00
create	mlx: add prequantized tensor packing + changes for qwen35 (#14878 )	2026-03-17 11:21:18 -07:00
imagegen	mlx: add prequantized tensor packing + changes for qwen35 (#14878 )	2026-03-17 11:21:18 -07:00
mlxrunner	mlxrunner: share KV cache across conversations with common prefixes	2026-03-18 16:06:33 -07:00
models	mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884 )	2026-03-17 11:21:38 -07:00
server	bugfix: display the parameter count correctly in mlx for ollama show (#14285 )	2026-02-16 13:03:34 -08:00
tokenizer	mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884 )	2026-03-17 11:21:38 -07:00
tools	add ability to disable cloud (#14221 )	2026-02-12 15:47:00 -08:00