ollama/x
Jesse Gross bbbad97686 sched: Model eviction for MLX
MLX runners (image generation and LLM) previously bypassed the
scheduler's standard load path via a separate loadMLX method. This meant
they skipped VRAM fitting checks and couldn't participate in model
eviction.

Now all model types flow through the same load function. Model eviction
for MLX is based on weights as KV cache and compute graph are dynamic.
This means that eviction does not take into account the worst case
memory and models can still compete for memory but it is a significant
improvement.
2026-03-16 17:40:29 -07:00
..
agent x/cmd: enable web search and web fetch with flag (#13690) 2026-01-12 13:59:40 -08:00
cmd Reapply "don't require pulling stubs for cloud models" again (#14608) 2026-03-06 14:27:47 -08:00
create MLX: add header vendoring and remove go build tag (#14642) 2026-03-09 17:24:45 -07:00
imagegen sched: Model eviction for MLX 2026-03-16 17:40:29 -07:00
mlxrunner sched: Model eviction for MLX 2026-03-16 17:40:29 -07:00
models mlx: perf improvements (#14768) 2026-03-12 12:01:28 -07:00
server bugfix: display the parameter count correctly in mlx for ollama show (#14285) 2026-02-16 13:03:34 -08:00
tokenizer MLX: add header vendoring and remove go build tag (#14642) 2026-03-09 17:24:45 -07:00
tools add ability to disable cloud (#14221) 2026-02-12 15:47:00 -08:00