Commit graph

5217 commits

Author SHA1 Message Date
SamareshSingh f8dc7c9f54
docs: fix openapi schema for /api/ps and /api/tags endpoints (#14210) 2026-02-11 17:37:40 -08:00
Patrick Devine 4a3741129d
bug: fix loading non-mlx models when ollama is built with mlx support (#14211)
This change fixes an issue where GGML based models (for either the Ollama runner or
the legacy llama.cpp runner) would try to load the mlx library. That would panic
and the model fails to start.
2026-02-11 14:48:33 -08:00
Parth Sareen 77ba9404ac
cmd/tui: improve model picker UX (#14209) 2026-02-11 14:36:54 -08:00
Patrick Devine 0aaf6119ec
feature: add ctrl-g to allow users to use an editor to edit their prompt (#14197) 2026-02-11 13:04:41 -08:00
Parth Sareen f08427c138
cmd: TUI UX improvements (#14198) 2026-02-11 10:18:41 -08:00
Maternion 2dbb000908 update context length format. 2026-02-10 17:06:05 -08:00
Maternion c980e19995 Fix formatting of context length notes in documentation 2026-02-10 17:06:05 -08:00
Maternion 6162374ca9 Update context-length.mdx 2026-02-10 17:06:05 -08:00
Patrick Devine 44bdd9a2ef
Add MLX runner with GLM4-MoE-Lite model support (#14185)
This change adds a new MLX based runner which includes:

  * Method-based MLX bindings
  * Subprocess-based MLX runner (x/mlxrunner)
  * KV cache with tree management
  * A basic sampler

The GLM4-MoE-Lite model has been ported to use the new bindings.

---------

Co-authored-by: Michael Yang <git@mxy.ng>
2026-02-10 14:57:57 -08:00
Michael db493d6e5e
docs: update broken links on FAQ and quick cleanup (#14194)
docs: update broken links on FAQ and quick cleanup
2026-02-10 16:52:20 -05:00
Bruce MacDonald 75695f16a5
docs: integration overview (#13831)
Group integrations into high-level types
2026-02-10 11:41:09 -08:00
Patrick Devine a0407d07fa
safetensors quantization for mlx (#14184)
This change includes:
  - changes to the safetensors metadata format
  - changes to the create command to properly create the blobs with the new format
  - changes to load the new format
  - fixes ollama show to properly show each tensor
2026-02-10 11:29:17 -08:00
Jeffrey Morgan 9ec733e527
cmd: make 'ollama login' and 'ollama logout' aliases for 'ollama signin' and 'ollama signout' respectively (#14144) 2026-02-09 19:12:42 -08:00
Parth Sareen 5ef04dab52
cmd: ollama launch pi (#14084) 2026-02-09 19:07:41 -08:00
Daniel Hiltgen aea316f1e9
win: add curl-style install script (#14178)
This adds a new powershell install script suitable for running via

  irm https://ollama.com/install.ps1 | iex

If you download the script and run '-?' it reports basic usage
information, as well as usage examples for common customization
options.  The script is signed as part of the release process
to ensure it can run on a typically configured Windows system.

This does not include doc updates - we can merge those after a release
ships to avoid user confusion.
2026-02-09 15:28:11 -08:00
Patrick Devine 235ba3df5c
cmd: ollama menu and launch improvements (#14038) 2026-02-09 11:30:16 -08:00
Jeffrey Morgan 099a0f18ef
build: fix Dockerfile mlx directory (#14131) 2026-02-06 17:08:53 -08:00
Richard Lyons fff696ee31 docs: increased RAM requirement for parallelism 2026-02-06 15:49:39 -08:00
Jeffrey Morgan 2e3ce6eab3
anthropic: do not count image tokens for now (#14127) 2026-02-06 15:33:18 -08:00
Parth Sareen 9e2003f88a
cmd/config: offer to pull missing models instead of erroring (#14113) 2026-02-06 10:19:47 -08:00
Parth Sareen 42e1d49fbe
cmd: fix context limits for droid and add qwen3-coder-next ctx (#14112) 2026-02-05 22:29:53 -08:00
Michael Yang 814630ca60
Revert "move tokenizers to separate package (#13825)" (#14111) 2026-02-05 20:49:08 -08:00
Parth Sareen 87cf187774
cmd: set claude code env vars on launch (#14109)
Set ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL,
ANTHROPIC_DEFAULT_HAIKU_MODEL, and CLAUDE_CODE_SUBAGENT_MODEL when
launching Claude Code so all model tiers route through Ollama.
2026-02-05 19:04:53 -08:00
Michael Yang 6ddd8862cd
chore: move x/mlxrunner into x/imagegen (#14100) 2026-02-05 18:25:56 -08:00
Michael Yang f1373193dc
move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
Parth Sareen 8a4b77f9da
cmd: set context limits for cloud models in opencode (#14107) 2026-02-05 16:36:46 -08:00
Parth Sareen 5f53fe7884
cmd: ollama launch improvements (#14099) 2026-02-05 15:08:17 -08:00
Bruce MacDonald 7ab4ca0e7f
scripts: add macOS support to install.sh (#14060)
Allow installing Ollama on MacOS directly from the command line. This is in line with other CLI tools and results in a more streamlined experience when the user is looking to use the CLI specifically.
2026-02-05 14:59:01 -08:00
Jeffrey Morgan e36f389e82
scheduler: default parallel=1 for qwen3next/lfm (#14103) 2026-02-05 12:48:25 -08:00
Jesse Gross c61023f554 ollamarunner: Fix off by one error with numPredict
When numPredict is set, the user will receive one less token
than the requested limit. In addition, the stats will incorrectly
show the number of tokens returned as the limit. In cases where
numPredict is not set, the number of tokens is reported correctly.

This occurs because numPredict is checked when setting up the next
batch but hitting the limit will terminate the current batch as well.
Instead, is is better to check the limit as we actually predict them.
2026-02-04 17:14:24 -08:00
Jeffrey Morgan d25535c3f3
qwen3next: avoid inplace sigmoid for shared gate (#14077) 2026-02-04 15:50:02 -08:00
Bruce MacDonald c323161f24
cmd: helpful error message for remote models (#14057)
When trying to use cloud model with OLLAMA_HOST="ollama.com" while not signed in a helpful error message is displayed when the user is not signed in telling them they must sign in to use cloud models. This should be the same experience for models which specify a remote instance.
2026-02-04 14:55:11 -08:00
Jeffrey Morgan 255579aaa7
qwen3next: fix issue in delta net (#14075)
gDiffExp was being broadcast across the wrong axis when multiplying with k. This fix reshapes gDiffExp to [1, chunkSize, nChunks, ...]
2026-02-04 13:40:38 -08:00
Jeffrey Morgan f7102ba826
runner: discard compute results if sequence replaced mid-batch (#14072)
If a sequence is replaced in s.seqs while a batch is computing, the old logits can be decoded into the new sequence. This change rechecks the sequence pointer after compute and skips decoding for replaced entries, preventing stale results from being applied.
2026-02-04 13:19:48 -08:00
Jeffrey Morgan cefabd79a8
Revert "cmd: claude launch improvements (#14064)" (#14071)
This reverts commit ee25219edd.
2026-02-04 09:10:37 -08:00
Jeffrey Morgan df70249520
server: optimize chatPrompt to reduce tokenization calls (#14040)
Change the truncation algorithm to start with all messages and remove
from the front until it fits, rather than adding messages one at a time
from the back. This reduces tokenization calls from O(n) to O(1) in the
common case where all messages fit in context.
2026-02-04 01:21:31 -08:00
Jeffrey Morgan 77eb2ca619
model: add qwen3-next architecture (#14051) 2026-02-03 23:27:21 -08:00
Parth Sareen ee25219edd
cmd: claude launch improvements (#14064) 2026-02-03 19:33:58 -08:00
Jeffrey Morgan b1fccabb34
Revert "Update vendored llama.cpp to b7847" (#14061) 2026-02-03 18:39:36 -08:00
Bruce MacDonald a6355329bf
cmd: open browser on ollama signin when available (#14055)
When a browser is available open it to the connect URL automatically when running the `ollama signin` command. Browser is not opened in any other unauthorized scenario.
2026-02-03 16:42:09 -08:00
Parth Sareen 0398b24b42
cmd: launch defaults (#14035) 2026-02-02 23:19:11 -08:00
Parth Sareen 75b1dddf91
cmd: launch extra params (#14039) 2026-02-03 02:03:33 -05:00
Parth Sareen e1e80ffc3e
cmd/config: move config location (#14034) 2026-02-02 22:48:51 -05:00
Aleksandr Vukmirovich 71896485fd
anthropic: add InputTokens to streaming response (#13934)
---------

Co-authored-by: ParthSareen <parth.sareen@ollama.com>
2026-02-02 18:29:37 -08:00
Jeffrey Morgan ef00199fb4
Update vendor ggml code to a5bb8ba4 (#13832)
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>
2026-02-02 17:31:59 -08:00
Jeffrey Morgan 8f4a008139
Add GLM-OCR vision model support (#14024) 2026-02-02 15:39:18 -08:00
Patrick Devine d8cc798c2b
glm 4.7 flash support on experimental engine (#13838) 2026-02-02 15:22:11 -08:00
Richard Lyons 6582f6da5c llm: Make "do load request" error message more informative 2026-02-02 11:13:21 -08:00
Jesse Gross 0334ffa625 server: use tiered VRAM-based default context length
Replace binary low VRAM mode with tiered VRAM thresholds that set
default context lengths for all models:

- < 24 GiB VRAM: 4,096 context
- 24-48 GiB VRAM: 32,768 context
- >= 48 GiB VRAM: 262,144 context
2026-02-02 10:47:09 -08:00
Jesse Gross d11fbd2c60 server: fix ollama ps showing configured instead of actual context length
When context length is clamped to the model's trained context length,
ollama ps now shows the actual clamped value instead of the originally
configured value.
2026-02-02 10:47:09 -08:00