ollama

mirror of https://github.com/ollama/ollama synced 2026-04-23 08:45:14 +00:00

Author	SHA1	Message	Date
SamareshSingh	f8dc7c9f54	docs: fix openapi schema for /api/ps and /api/tags endpoints (#14210 )	2026-02-11 17:37:40 -08:00
Patrick Devine	4a3741129d	bug: fix loading non-mlx models when ollama is built with mlx support (#14211 ) This change fixes an issue where GGML based models (for either the Ollama runner or the legacy llama.cpp runner) would try to load the mlx library. That would panic and the model fails to start.	2026-02-11 14:48:33 -08:00
Parth Sareen	77ba9404ac	cmd/tui: improve model picker UX (#14209 )	2026-02-11 14:36:54 -08:00
Patrick Devine	0aaf6119ec	feature: add ctrl-g to allow users to use an editor to edit their prompt (#14197 )	2026-02-11 13:04:41 -08:00
Parth Sareen	f08427c138	cmd: TUI UX improvements (#14198 )	2026-02-11 10:18:41 -08:00
Maternion	2dbb000908	update context length format.	2026-02-10 17:06:05 -08:00
Maternion	c980e19995	Fix formatting of context length notes in documentation	2026-02-10 17:06:05 -08:00
Maternion	6162374ca9	Update context-length.mdx	2026-02-10 17:06:05 -08:00
Patrick Devine	44bdd9a2ef	Add MLX runner with GLM4-MoE-Lite model support (#14185 ) This change adds a new MLX based runner which includes: * Method-based MLX bindings * Subprocess-based MLX runner (x/mlxrunner) * KV cache with tree management * A basic sampler The GLM4-MoE-Lite model has been ported to use the new bindings. --------- Co-authored-by: Michael Yang <git@mxy.ng>	2026-02-10 14:57:57 -08:00
Michael	db493d6e5e	docs: update broken links on FAQ and quick cleanup (#14194 ) docs: update broken links on FAQ and quick cleanup	2026-02-10 16:52:20 -05:00
Bruce MacDonald	75695f16a5	docs: integration overview (#13831 ) Group integrations into high-level types	2026-02-10 11:41:09 -08:00
Patrick Devine	a0407d07fa	safetensors quantization for mlx (#14184 ) This change includes: - changes to the safetensors metadata format - changes to the create command to properly create the blobs with the new format - changes to load the new format - fixes ollama show to properly show each tensor	2026-02-10 11:29:17 -08:00
Jeffrey Morgan	9ec733e527	cmd: make 'ollama login' and 'ollama logout' aliases for 'ollama signin' and 'ollama signout' respectively (#14144 )	2026-02-09 19:12:42 -08:00
Parth Sareen	5ef04dab52	cmd: ollama launch pi (#14084 )	2026-02-09 19:07:41 -08:00
Daniel Hiltgen	aea316f1e9	win: add curl-style install script (#14178 ) This adds a new powershell install script suitable for running via irm https://ollama.com/install.ps1 \| iex If you download the script and run '-?' it reports basic usage information, as well as usage examples for common customization options. The script is signed as part of the release process to ensure it can run on a typically configured Windows system. This does not include doc updates - we can merge those after a release ships to avoid user confusion.	2026-02-09 15:28:11 -08:00
Patrick Devine	235ba3df5c	cmd: ollama menu and launch improvements (#14038 )	2026-02-09 11:30:16 -08:00
Jeffrey Morgan	099a0f18ef	build: fix Dockerfile mlx directory (#14131 )	2026-02-06 17:08:53 -08:00
Richard Lyons	fff696ee31	docs: increased RAM requirement for parallelism	2026-02-06 15:49:39 -08:00
Jeffrey Morgan	2e3ce6eab3	anthropic: do not count image tokens for now (#14127 )	2026-02-06 15:33:18 -08:00
Parth Sareen	9e2003f88a	cmd/config: offer to pull missing models instead of erroring (#14113 )	2026-02-06 10:19:47 -08:00
Parth Sareen	42e1d49fbe	cmd: fix context limits for droid and add qwen3-coder-next ctx (#14112 )	2026-02-05 22:29:53 -08:00
Michael Yang	814630ca60	Revert "move tokenizers to separate package (#13825 )" (#14111 )	2026-02-05 20:49:08 -08:00
Parth Sareen	87cf187774	cmd: set claude code env vars on launch (#14109 ) Set ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL, ANTHROPIC_DEFAULT_HAIKU_MODEL, and CLAUDE_CODE_SUBAGENT_MODEL when launching Claude Code so all model tiers route through Ollama.	2026-02-05 19:04:53 -08:00
Michael Yang	6ddd8862cd	chore: move x/mlxrunner into x/imagegen (#14100 )	2026-02-05 18:25:56 -08:00
Michael Yang	f1373193dc	move tokenizers to separate package (#13825 )	2026-02-05 17:44:11 -08:00
Parth Sareen	8a4b77f9da	cmd: set context limits for cloud models in opencode (#14107 )	2026-02-05 16:36:46 -08:00
Parth Sareen	5f53fe7884	cmd: ollama launch improvements (#14099 )	2026-02-05 15:08:17 -08:00
Bruce MacDonald	7ab4ca0e7f	scripts: add macOS support to install.sh (#14060 ) Allow installing Ollama on MacOS directly from the command line. This is in line with other CLI tools and results in a more streamlined experience when the user is looking to use the CLI specifically.	2026-02-05 14:59:01 -08:00
Jeffrey Morgan	e36f389e82	scheduler: default parallel=1 for qwen3next/lfm (#14103 )	2026-02-05 12:48:25 -08:00
Jesse Gross	c61023f554	ollamarunner: Fix off by one error with numPredict When numPredict is set, the user will receive one less token than the requested limit. In addition, the stats will incorrectly show the number of tokens returned as the limit. In cases where numPredict is not set, the number of tokens is reported correctly. This occurs because numPredict is checked when setting up the next batch but hitting the limit will terminate the current batch as well. Instead, is is better to check the limit as we actually predict them.	2026-02-04 17:14:24 -08:00
Jeffrey Morgan	d25535c3f3	qwen3next: avoid inplace sigmoid for shared gate (#14077 )	2026-02-04 15:50:02 -08:00
Bruce MacDonald	c323161f24	cmd: helpful error message for remote models (#14057 ) When trying to use cloud model with OLLAMA_HOST="ollama.com" while not signed in a helpful error message is displayed when the user is not signed in telling them they must sign in to use cloud models. This should be the same experience for models which specify a remote instance.	2026-02-04 14:55:11 -08:00
Jeffrey Morgan	255579aaa7	qwen3next: fix issue in delta net (#14075 ) gDiffExp was being broadcast across the wrong axis when multiplying with k. This fix reshapes gDiffExp to [1, chunkSize, nChunks, ...]	2026-02-04 13:40:38 -08:00
Jeffrey Morgan	f7102ba826	runner: discard compute results if sequence replaced mid-batch (#14072 ) If a sequence is replaced in s.seqs while a batch is computing, the old logits can be decoded into the new sequence. This change rechecks the sequence pointer after compute and skips decoding for replaced entries, preventing stale results from being applied.	2026-02-04 13:19:48 -08:00
Jeffrey Morgan	cefabd79a8	Revert "cmd: claude launch improvements (#14064 )" (#14071 ) This reverts commit `ee25219edd`.	2026-02-04 09:10:37 -08:00
Jeffrey Morgan	df70249520	server: optimize chatPrompt to reduce tokenization calls (#14040 ) Change the truncation algorithm to start with all messages and remove from the front until it fits, rather than adding messages one at a time from the back. This reduces tokenization calls from O(n) to O(1) in the common case where all messages fit in context.	2026-02-04 01:21:31 -08:00
Jeffrey Morgan	77eb2ca619	model: add qwen3-next architecture (#14051 )	2026-02-03 23:27:21 -08:00
Parth Sareen	ee25219edd	cmd: claude launch improvements (#14064 )	2026-02-03 19:33:58 -08:00
Jeffrey Morgan	b1fccabb34	Revert "Update vendored llama.cpp to b7847" (#14061 )	2026-02-03 18:39:36 -08:00
Bruce MacDonald	a6355329bf	cmd: open browser on `ollama signin` when available (#14055 ) When a browser is available open it to the connect URL automatically when running the `ollama signin` command. Browser is not opened in any other unauthorized scenario.	2026-02-03 16:42:09 -08:00
Parth Sareen	0398b24b42	cmd: launch defaults (#14035 )	2026-02-02 23:19:11 -08:00
Parth Sareen	75b1dddf91	cmd: launch extra params (#14039 )	2026-02-03 02:03:33 -05:00
Parth Sareen	e1e80ffc3e	cmd/config: move config location (#14034 )	2026-02-02 22:48:51 -05:00
Aleksandr Vukmirovich	71896485fd	anthropic: add InputTokens to streaming response (#13934 ) --------- Co-authored-by: ParthSareen <parth.sareen@ollama.com>	2026-02-02 18:29:37 -08:00
Jeffrey Morgan	ef00199fb4	Update vendor ggml code to a5bb8ba4 (#13832 ) Co-authored-by: Daniel Hiltgen <daniel@ollama.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>	2026-02-02 17:31:59 -08:00
Jeffrey Morgan	8f4a008139	Add GLM-OCR vision model support (#14024 )	2026-02-02 15:39:18 -08:00
Patrick Devine	d8cc798c2b	glm 4.7 flash support on experimental engine (#13838 )	2026-02-02 15:22:11 -08:00
Richard Lyons	6582f6da5c	llm: Make "do load request" error message more informative	2026-02-02 11:13:21 -08:00
Jesse Gross	0334ffa625	server: use tiered VRAM-based default context length Replace binary low VRAM mode with tiered VRAM thresholds that set default context lengths for all models: - < 24 GiB VRAM: 4,096 context - 24-48 GiB VRAM: 32,768 context - >= 48 GiB VRAM: 262,144 context	2026-02-02 10:47:09 -08:00
Jesse Gross	d11fbd2c60	server: fix ollama ps showing configured instead of actual context length When context length is clamped to the model's trained context length, ollama ps now shows the actual clamped value instead of the originally configured value.	2026-02-02 10:47:09 -08:00

... 2 3 4 5 6 ...

5217 commits