ollama

mirror of https://github.com/ollama/ollama synced 2026-04-23 08:45:14 +00:00

History

Jesse Gross ce99f24731 mlxrunner: tokenize prompts in request handler goroutines Move tokenization out of the single GPU processing goroutine and into each request's HTTP handler goroutine. This allows the next request's prompt to be tokenized on the CPU while the current request is executing on the GPU.		2026-04-21 14:38:49 -07:00
..
agent	x/cmd: enable web search and web fetch with flag (#13690 )	2026-01-12 13:59:40 -08:00
cmd	Reapply "don't require pulling stubs for cloud models" again (#14608 )	2026-03-06 14:27:47 -08:00
create	Keep Gemma4 router projection in source precision (#15613 )	2026-04-15 15:04:23 -07:00
imagegen	mlx: fix imagegen lookup (#15588 )	2026-04-16 10:39:00 -07:00
mlxrunner	mlxrunner: tokenize prompts in request handler goroutines	2026-04-21 14:38:49 -07:00
models	mlxrunner: add logprobs support	2026-04-20 17:43:00 -07:00
safetensors	create: Clean up experimental paths, fix create from existing safetensor model (#14679 )	2026-04-07 08:12:57 -07:00
server	mlx: fix vision capability + min version (#15106 )	2026-03-27 17:09:28 -07:00
tokenizer	mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884 )	2026-03-17 11:21:38 -07:00
tools	add ability to disable cloud (#14221 )	2026-02-12 15:47:00 -08:00