ollama

mirror of https://github.com/ollama/ollama synced 2026-04-23 08:45:14 +00:00

History

Jesse Gross ce99f24731 mlxrunner: tokenize prompts in request handler goroutines Move tokenization out of the single GPU processing goroutine and into each request's HTTP handler goroutine. This allows the next request's prompt to be tokenized on the CPU while the current request is executing on the GPU.		2026-04-21 14:38:49 -07:00
..
cache	mlx: additional gemma4 cache fixes (#15607 )	2026-04-16 13:07:19 -07:00
mlx	mlx: improve thread safety of array management	2026-04-21 14:38:49 -07:00
model	mlx: mixed-precision quant and capability detection improvements (#15409 )	2026-04-13 11:43:07 -07:00
sample	mlxrunner: fuse top-P and top-K into a single sort pass	2026-04-20 17:43:00 -07:00
cache.go	mlxrunner: schedule periodic snapshots during prefill	2026-03-26 13:32:11 -07:00
cache_test.go	mlxrunner: schedule periodic snapshots during prefill	2026-03-26 13:32:11 -07:00
cache_trie.go	mlxrunner: share KV cache across conversations with common prefixes	2026-03-18 16:06:33 -07:00
cache_trie_test.go	mlxrunner: share KV cache across conversations with common prefixes	2026-03-18 16:06:33 -07:00
client.go	mlxrunner: tokenize prompts in request handler goroutines	2026-04-21 14:38:49 -07:00
imports.go	Gemma4 on MLX (#15244 )	2026-04-13 16:36:51 -07:00
pipeline.go	mlxrunner: tokenize prompts in request handler goroutines	2026-04-21 14:38:49 -07:00
runner.go	mlxrunner: tokenize prompts in request handler goroutines	2026-04-21 14:38:49 -07:00
server.go	mlxrunner: tokenize prompts in request handler goroutines	2026-04-21 14:38:49 -07:00
utf8_buffer.go	consolidate the tokenizer (#14327 )	2026-02-19 15:55:45 -08:00
utf8_buffer_test.go	consolidate the tokenizer (#14327 )	2026-02-19 15:55:45 -08:00