ollama/x/mlxrunner
Jesse Gross 22d6c817f8 mlxrunner: fuse top-P and top-K into a single sort pass
When both filters are active, avoid paying for a full sort in top-P
and a partial sort in top-K. Single-filter paths are unchanged.
Improves generation throughput on gemma4:e4b by 1.5%.
2026-04-20 17:43:00 -07:00
..
cache mlx: additional gemma4 cache fixes (#15607) 2026-04-16 13:07:19 -07:00
mlx mlxrunner: use MaxAxis in the min-P sampler 2026-04-20 17:43:00 -07:00
model mlx: mixed-precision quant and capability detection improvements (#15409) 2026-04-13 11:43:07 -07:00
sample mlxrunner: fuse top-P and top-K into a single sort pass 2026-04-20 17:43:00 -07:00
cache.go mlxrunner: schedule periodic snapshots during prefill 2026-03-26 13:32:11 -07:00
cache_test.go mlxrunner: schedule periodic snapshots during prefill 2026-03-26 13:32:11 -07:00
cache_trie.go mlxrunner: share KV cache across conversations with common prefixes 2026-03-18 16:06:33 -07:00
cache_trie_test.go mlxrunner: share KV cache across conversations with common prefixes 2026-03-18 16:06:33 -07:00
client.go mlxrunner: add logprobs support 2026-04-20 17:43:00 -07:00
imports.go Gemma4 on MLX (#15244) 2026-04-13 16:36:51 -07:00
pipeline.go mlxrunner: add logprobs support 2026-04-20 17:43:00 -07:00
runner.go mlxrunner: add logprobs support 2026-04-20 17:43:00 -07:00
server.go mlxrunner: add logprobs support 2026-04-20 17:43:00 -07:00
utf8_buffer.go consolidate the tokenizer (#14327) 2026-02-19 15:55:45 -08:00
utf8_buffer_test.go consolidate the tokenizer (#14327) 2026-02-19 15:55:45 -08:00