ollama/x/mlxrunner
Jesse Gross ce99f24731 mlxrunner: tokenize prompts in request handler goroutines
Move tokenization out of the single GPU processing goroutine and
into each request's HTTP handler goroutine. This allows the next
request's prompt to be tokenized on the CPU while the current
request is executing on the GPU.
2026-04-21 14:38:49 -07:00
..
cache mlx: additional gemma4 cache fixes (#15607) 2026-04-16 13:07:19 -07:00
mlx mlx: improve thread safety of array management 2026-04-21 14:38:49 -07:00
model mlx: mixed-precision quant and capability detection improvements (#15409) 2026-04-13 11:43:07 -07:00
sample mlxrunner: fuse top-P and top-K into a single sort pass 2026-04-20 17:43:00 -07:00
cache.go mlxrunner: schedule periodic snapshots during prefill 2026-03-26 13:32:11 -07:00
cache_test.go mlxrunner: schedule periodic snapshots during prefill 2026-03-26 13:32:11 -07:00
cache_trie.go mlxrunner: share KV cache across conversations with common prefixes 2026-03-18 16:06:33 -07:00
cache_trie_test.go mlxrunner: share KV cache across conversations with common prefixes 2026-03-18 16:06:33 -07:00
client.go mlxrunner: tokenize prompts in request handler goroutines 2026-04-21 14:38:49 -07:00
imports.go Gemma4 on MLX (#15244) 2026-04-13 16:36:51 -07:00
pipeline.go mlxrunner: tokenize prompts in request handler goroutines 2026-04-21 14:38:49 -07:00
runner.go mlxrunner: tokenize prompts in request handler goroutines 2026-04-21 14:38:49 -07:00
server.go mlxrunner: tokenize prompts in request handler goroutines 2026-04-21 14:38:49 -07:00
utf8_buffer.go consolidate the tokenizer (#14327) 2026-02-19 15:55:45 -08:00
utf8_buffer_test.go consolidate the tokenizer (#14327) 2026-02-19 15:55:45 -08:00