ollama

mirror of https://github.com/ollama/ollama synced 2026-04-23 08:45:14 +00:00

Author	SHA1	Message	Date
Jesse Gross	bbbad97686	sched: Model eviction for MLX MLX runners (image generation and LLM) previously bypassed the scheduler's standard load path via a separate loadMLX method. This meant they skipped VRAM fitting checks and couldn't participate in model eviction. Now all model types flow through the same load function. Model eviction for MLX is based on weights as KV cache and compute graph are dynamic. This means that eviction does not take into account the worst case memory and models can still compete for memory but it is a significant improvement.	2026-03-16 17:40:29 -07:00
Parth Sareen	bcf6d55b54	launch: fix web search, add web fetch, and enable both for local (#14886 )	2026-03-16 16:26:19 -07:00
easonysliu	810d4f9c22	runner: fix swallowed error in allocModel graph reservation In allocModel(), the first call to reserveWorstCaseGraph(true) had its error silently discarded — `return nil` was used instead of `return err`. This meant that if the prompt-sized graph reservation failed (e.g. due to insufficient memory), the error was swallowed, allocModel reported success, and the model appeared to load correctly. Subsequent inference would then fail in unexpected ways because the worst-case graph was never properly reserved. Fix: return the actual error so the caller can handle the failure (retry with reduced parallelism, report OOM, etc.). Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>	2026-03-16 15:48:45 -07:00
Bruce MacDonald	856c047a6c	cmd/launch: skip --install-daemon when systemd is unavailable (#14883 ) In container environments without systemd, `openclaw onboard --install-daemon` exits non-zero because it cannot create a systemd user service. This causes `ollama launch openclaw` to abort even though the gateway can be started as a foreground child process. Only pass --install-daemon when systemd user services are reachable (Linux with /run/systemd/system present and XDG_RUNTIME_DIR set). On all other platforms the flag is still included by default.	2026-03-16 13:50:04 -07:00
Daniel Hiltgen	79c1e93c00	bench: improve benchmarking tool (#14240 ) New features: - Warmup phase to eliminate cold-start outliers - time-to-first-token measured in each epoch - VRAM/memory tracking to identify CPU spillover - Controlled prompt length - Defaults to 6 epochs and 200 tokens max Benchstat fixes: - ns/request instead of ns/op — non-standard unit created a separate group instead of grouping with timing metrics - Token count as the N field — benchstat interprets N as iteration count for statistical weighting, not as a token count	2026-03-15 11:47:31 -07:00
Parth Sareen	f8b657c967	cmd/launch: add guards for headless mode (#14837 )	2026-03-14 00:10:02 -07:00
Bruce MacDonald	10fefe0d57	config: use native OpenClaw Ollama onboarding (#14829 ) OpenClaw now accepts the Ollama onboarding flags directly upstream, so rely on its wizard state instead of the legacy integration onboarding flag. Update first-run setup to pass the Ollama auth and model flags during onboarding, perform a best-effort update before onboarding when needed, and drop the stale test that asserted persistence of the old onboarding flag.	2026-03-13 16:28:40 -07:00
Daniel Hiltgen	2f9a68f9e9	rocm: doc driver constraints (#14833 )	2026-03-13 15:53:35 -07:00
Bruce MacDonald	3980c0217d	server: decompress zstd request bodies in cloud passthrough middleware (#14827 ) When a zstd-compressed request (e.g. from Codex CLI) hits /v1/responses with a cloud model the request failed. Fix by decompressing zstd bodies before model extraction, so cloud models are detected and proxied directly without the writer being wrapped.	2026-03-13 15:06:47 -07:00
Parth Sareen	870599f5da	launch: remove warning for default policy (#14830 )	2026-03-13 15:01:38 -07:00
Bruce MacDonald	abf8e8e9c8	middleware: handle non-JSON error responses gracefully (#14828 ) writeError in both OpenAI and Anthropic middleware writers would return a raw json.SyntaxError when the error payload wasn't valid JSON (e.g. "invalid character 'e' looking for beginning of value"). Fall back to using the raw bytes as the error message instead. Also use the actual HTTP status code rather than hardcoding 500, so error types map correctly	2026-03-13 14:50:49 -07:00
Shivam Tiwari	f3f31a8192	anthropic: close thinking block before tool_use when no text in between (#14825 ) Root cause: StreamConverter.Process() only incremented contentIndex when closing a thinking block if text content was present. When a model emitted thinking followed directly by a tool_use block (no text in between), thinkingDone was never set and contentIndex was not incremented, causing the tool_use content_block_start to reuse index 0. Clients expecting sequential indices would then fail to find the tool content block. Fix: In the tool call loop, close any open thinking block (thinkingStarted && !thinkingDone) and increment contentIndex before opening the tool_use block, mirroring the existing logic that closes an open text block. Fixes #14816	2026-03-13 13:12:05 -07:00
Devon Rifkin	9e7ba835da	cmd: still populate `ollama ls` when using `ollama run <model:cloud>` (#14824 ) This is temporary until `api/tags` supports cloud natively	2026-03-13 12:24:45 -07:00
Parth Sareen	347f17b8d1	launch: add compact window for claude code (#14823 )	2026-03-13 12:09:23 -07:00
Devon Rifkin	081b9eb423	api/create: always propagate `:cloud` source for cloud models (#14822 ) Otherwise, using `/save` would try to run the local model instead	2026-03-13 11:58:00 -07:00
Parth Sareen	bb867c6fdb	launch: fix headless --yes integration flow and policy scoping (#14815 )	2026-03-13 11:45:36 -07:00
Cadu	81f4506a61	docs: document reasoning_effort support in OpenAI-compatible API (#14821 ) Add reasoning_effort and reasoning to the supported features and request fields for /v1/chat/completions. These fields control thinking on thinking-capable models but were previously undocumented. Closes #14820	2026-03-13 10:57:14 -07:00
Parth Sareen	76925f1284	cmd: TUI model ordering (#14814 )	2026-03-13 10:19:22 -07:00
Devon Rifkin	f676231de9	server: remove experimental aliases support (#14810 )	2026-03-12 20:27:24 -07:00
Parth Sareen	af5f7c0a9e	cmd: refactor tui and launch (#14609 )	2026-03-12 18:39:06 -07:00
Daniel Hiltgen	a6b27d776b	ci: fix missing windows zip file (#14807 ) Use 7z compression (better compression rate) if found in path. That alone isn't sufficient to get us under 2G, so MLX is now split out as a discrete download. Fix CI so it will fail if artifacts fail to upload.	2026-03-12 16:14:00 -07:00
Daniel Hiltgen	539741199e	mlx: perf improvements (#14768 ) * mlx: perf improvements Fix nn.go to call mlx_fast_layer_norm instead of manually implementing (mean, subtract, variance, rsqrt, multiply, add — 6 ops) Fix llama.go, gemma3.go to remove RepeatKV to tile K/V tensors to match the Q head count, since scaled_dot_product_attention natively handles GQA (it just requires n_q_heads % n_kv_heads == 0) * review comments	2026-03-12 12:01:28 -07:00
Eva H	8f45236d09	middleware: enable local tool model for web search (#14787 )	2026-03-11 17:51:39 -04:00
Parth Sareen	97013a190c	openai: split mixed thinking stream chunks via ToChunks (#14648 )	2026-03-11 14:21:29 -07:00
Daniel Hiltgen	c222735c02	mlx: only log load errors when MLX is needed (#14764 ) This suppresses irrelevant/noisy errors in the GGML runner.	2026-03-11 10:31:31 -07:00
Daniel Hiltgen	87d21c7fc0	MLX: harden for init failures (#14777 ) The CLI now links to the lazy-load MLX code, but that still happens in init functions. On internal MLX errors, the CLI exits before it has a chance to start. This change re-wires the MLX error handling so it doesn't exit by default. The MLX based runners currently expect exits on failure, so they re-initialize the default error handling. We can refine error handling for better go stack traces in the future.	2026-03-10 22:52:23 -07:00
Jeffrey Morgan	54e05172a0	Revert "runner: add token history sampling parameters to ollama runner (#14537 )" (#14776 ) This reverts commit `86513cb697`.	2026-03-10 21:07:52 -07:00
Parth Sareen	464186e995	config: qwen3.5 recommendations (#14758 )	2026-03-10 18:04:57 -07:00
Devon Rifkin	8c4d5d6c2f	cloud_proxy: send ollama client version (#14769 ) This was previously included in the user agent, and we've made use of it in the past to hotpatch bugs server-side for particular Ollama versions.	2026-03-10 15:53:25 -07:00
Parth Sareen	bc72b14016	docs: update claude code docs (#14770 )	2026-03-10 15:52:41 -07:00
Parth Sareen	61086083eb	server: add experimental web search and web fetch routes (#14753 )	2026-03-09 21:52:12 -07:00
Daniel Hiltgen	62d1f01ab4	ci: Fix windows build (#14754 ) Instead of relying on sh for wildcard, do it in Go for better windows compatibility.	2026-03-09 19:27:59 -07:00
Daniel Hiltgen	10e51c5177	MLX: add header vendoring and remove go build tag (#14642 ) * prefer rocm v6 on windows Avoid building with v7 - more changes are needed * MLX: add header vendoring and remove go build tag This switches to using a vendoring approach for the mlx-c headers so that Go can build without requiring a cmake first. This enables building the new MLX based code by default. Every time cmake runs, the headers are refreshed, so we can easily keep them in sync when we bump mlx versions. Basic Windows and Linux support are verified. * ci: harden for flaky choco repo servers CI sometimes fails due to choco not actually installing cache. Since it just speeds up the build, we can proceed without. * review comments	2026-03-09 17:24:45 -07:00
Patrick Devine	3e06bde643	mlx: get parameters from modelfile during model creation (#14747 )	2026-03-09 15:33:24 -07:00
Eva H	6be2de8214	app: auto update should be enabled when reset to defaults (#14741 )	2026-03-09 15:02:36 -04:00
Daniel Hiltgen	ebb1b9ec14	rocm: update linux to v7.2 (#14391 ) * rocm: update linux to v7.2 * review comments	2026-03-09 08:26:55 -07:00
Patrick Devine	d126467d5d	x/mlxrunner: replace sampler interface chain with single stateful Sampler (#14652 ) - Collapse MLX sampling state into a single sample.Sampler struct (options + history). - Replace interface-based sampler chain (TopP, TopK, penalty, etc.) with function-based transforms. - Update request/pipeline wiring to use *sample.Sampler, seed history from prompt tokens, and append generated tokens each step. - Implement top_p, min_p, repeat_penalty, and frequency_penalty	2026-03-07 17:50:57 -08:00
Devon Rifkin	afb4c62fbf	cloud_proxy: handle stream disconnects gracefully (#14685 ) Previously we were printing out bad errors for expected cases like clients disconnecting. Now we only debug log when that happens (which still might help in cases where we're figuring out why an integration isn't working). For other errors, we print out a proper warning now	2026-03-06 19:18:52 -08:00
Patrick Devine	e790dc435b	mlx: int4 groupsize 64 (#14682 ) Change affine 4bit integers to use groupsize 64	2026-03-06 16:39:47 -08:00
Daniel Hiltgen	288077c3a3	build: smarter docker parallelism (#14653 ) Our Dockerfile leverages parallel stages for more efficient builds. However, our old parallel settings were naive and lead to under/over utilization depending on the capabilities of your build system. This change switches to using Ninja for all our docker cmake builds to leverage its smarter parallel logic. We tell Ninja to target a load of nproc so each of the build stages will share the load on the system aiming for full CPU use without oversaturation. The GPU parallelism settings are also adjusted to 4 to avoid a long-tail for the last few GPU targets as they work through the long list of GPU architectures. This also fixes the Dockerfile to move Vulkan install to just the stage that needs it instead of blocking most other GPU installs. This should speed up CI which always has a clean build cache.	2026-03-06 16:36:22 -08:00
Daniel Hiltgen	4425c54eda	create: fix localhost handling (#14681 )	2026-03-06 16:35:58 -08:00
Michael Yang	778899a5d2	docs: format compat docs (#14678 )	2026-03-06 14:53:17 -08:00
Jeffrey Morgan	4eab60c1e2	Reapply "don't require pulling stubs for cloud models" again (#14608 ) * Revert "Revert "Reapply "don't require pulling stubs for cloud models"" (#14606)" This reverts commit `39982a954e`. * fix test + do cloud lookup only when seeing cloud models --------- Co-authored-by: ParthSareen <parth.sareen@ollama.com>	2026-03-06 14:27:47 -08:00
Bruce MacDonald	1af850e6e3	parsers: repair unclosed arg_value tags in GLM tool calls (#14656 ) GLM models sometimes omits </arg_value> closing tags in tool call XML, causing xml.Unmarshal to fail with "element <arg_value> closed by </tool_call>". This is a known issue across the GLM family. Sanitize the input to fix closing arg_key values so encoding/xml can handle it.	2026-03-06 14:08:34 -08:00
Parth Sareen	9b0c7cc7b9	cmd: override stale entries for context window pi (#14655 )	2026-03-05 16:30:24 -08:00
Daniel Hiltgen	6928630601	mlx: prevent remote creation mismatch (#14651 ) If the user is pointing at a remote OLLAMA_HOST, fail experimental safetensor based create operations as we only support local creation currently.	2026-03-05 14:59:00 -08:00
Parth Sareen	9896e3627f	cmd/config: fix cloud model limit lookups in integrations (#14650 )	2026-03-05 13:57:28 -08:00
Bruce MacDonald	15732f0ea7	cmd: use native Ollama API endpoint for OpenClaw (#14649 ) Remove the /v1 suffix from the OpenClaw provider baseUrl so it uses the native Ollama API instead of the OpenAI-compatible endpoint. The /v1 endpoint my break tool calling in OpenClaw.	2026-03-05 13:29:17 -08:00
Parth Sareen	562c76d7cc	cmd: add qwen3.5 context length for launch (#14626 )	2026-03-04 14:10:52 -08:00
Parth Sareen	122c68c151	server: loosen thinking level constraint (#14625 )	2026-03-04 13:42:18 -08:00

... 2 3 4 5 6 ...

5339 commits