Commit graph

470 commits

Author SHA1 Message Date
Parth Sareen 0398b24b42
cmd: launch defaults (#14035) 2026-02-02 23:19:11 -08:00
Parth Sareen 75b1dddf91
cmd: launch extra params (#14039) 2026-02-03 02:03:33 -05:00
Parth Sareen e1e80ffc3e
cmd/config: move config location (#14034) 2026-02-02 22:48:51 -05:00
Patrick Devine d8cc798c2b
glm 4.7 flash support on experimental engine (#13838) 2026-02-02 15:22:11 -08:00
Jeffrey Morgan 6a7c3f188e
openclaw: run onboarding for fresh installs (#14006)
When launching OpenClaw without prior onboarding, run the onboarding
wizard instead of going straight to gateway. This ensures proper
gateway configuration (mode, token, etc.) before first use.

- Add onboarded() to check for wizard.lastRunAt marker in config
- Run onboard with --auth-choice skip --gateway-token ollama for fresh installs
- Existing installs (onboarding completed) run gateway directly
2026-02-01 13:46:45 -08:00
Thanh Nguyen 27db7f806f
cmd/config: rename integration to openclaw (#13979)
---------

Co-authored-by: ParthSareen <parth.sareen@ollama.com>
2026-01-31 18:31:13 -05:00
Parth Sareen a0923cbdd0
cmd: ollama launch add placeholder text for selector (#13966) 2026-01-29 09:48:49 -08:00
Seokrin Taron Sung f92e362b2e
cmd: capitalize Ollama in serve command help text (#13965) 2026-01-29 09:47:53 -08:00
Gabe Goodhart 7b62c41060
cmd/config: use envconfig.Host() for base API in launch config packages (#13937) 2026-01-27 13:30:00 -08:00
Parth Sareen 3ab842b0f5
cmd: clawdbot config fixes (#13922) 2026-01-26 14:34:29 -08:00
Parth Sareen b8e8ef8929
cmd: ollama launch clawdbot (#13921) 2026-01-26 13:40:59 -08:00
Parth Sareen 465d124183
cmd: fix opencode config (#13894) 2026-01-24 18:42:56 -08:00
Parth Sareen d310e56fa3
cmd: add fallback for claude (#13892) 2026-01-24 18:26:01 -08:00
Parth Sareen aae6ecbaff
cmd: rename ollama config to ollama launch (#13871) 2026-01-23 18:40:40 -08:00
Jeffrey Morgan 66831dcf70
x/imagegen: fix image editing support (#13866)
- Fix panic in ollama show for image gen models (safe type assertion)
- Add vision capability for Flux2KleinPipeline models at create time
- Flatten transparent PNG images onto white background for better results
2026-01-23 15:37:17 -08:00
Parth Sareen 771d9280ec
cmd: ollama config fix droid model name configuration (#13856) 2026-01-23 11:44:22 -08:00
Parth Sareen 199c41e16e
cmd: ollama config command to help configure integrations to use Ollama (#13712) 2026-01-22 20:17:11 -08:00
Parth Sareen f52c21f457
fix: handle Enter key pressed during model loading (#13839) 2026-01-22 18:32:02 -08:00
Jeffrey Morgan 68e00c7c36
fix: prevent image generation models from loading during deletion (#13781)
Move the unload check (empty prompt + KeepAlive=0) before the image
generation model dispatch in GenerateHandler. This prevents models like
flux from being loaded into memory just to be immediately unloaded when
running `ollama rm`.

Also fix a bug in DeleteHandler where `args[0]` was used instead of
`arg` in the delete loop, causing only the first model to be unloaded
when deleting multiple models.
2026-01-19 12:48:34 -08:00
Jeffrey Morgan 634c416645
Add experimental image generation fields to /api/generate (#13753)
Request fields (experimental):
- width: image width (max 4096)
- height: image height (max 4096)
- steps: denoising steps
- seed: random seed

Response fields (experimental):
- images: base64-encoded generated images
- completed: current step progress
- total: total steps

Other changes:
- Fix lifecycle bug where image models wouldn't unload (refCount issue)
- Fix "headers already written" error on Ctrl+C during streaming
- Add gin middleware for OpenAI /v1/images/generations compatibility
- Update CLI to use /api/generate with progress bar
- Add preload support in interactive mode
2026-01-17 18:27:41 -08:00
Patrick Devine a077d996e3
Fix create and show commands for experimental models (#13741)
* x: make `ollama create --experimental` import from safetensors

This change allows pulling in safetensors models into the new experimental model format, and also
fixes the `ollama show` command to be able to correctly display the model information.

* gofumpt the linter

* gofumpt the linter again

* validate the model name
2026-01-16 14:31:55 -08:00
Parth Sareen 75d7b5f926
cmd: enable multi-line input and shift enter (#13694) 2026-01-14 17:52:46 -08:00
Parth Sareen d06acbcb19
x/cmd: enable web search and web fetch with flag (#13690) 2026-01-12 13:59:40 -08:00
Jeffrey Morgan 9667c2282f
x/imagegen: add naive TeaCache and FP8 quantization support (#13683)
TeaCache:
- Timestep embedding similarity caching for diffusion models
- Polynomial rescaling with configurable thresholds
- Reduces transformer forward passes by ~30-50%

FP8 quantization:
- Support for FP8 quantized models (8-bit weights with scales)
- QuantizedMatmul on Metal, Dequantize on CUDA
- Client-side quantization via ollama create --quantize fp8

Other bug fixes:
- Fix `/api/show` API for image generation models
- Server properly returns model info (architecture, parameters, quantization)
- Memory allocation optimizations
- CLI improvements for image generation
2026-01-12 13:45:22 -08:00
Jeffrey Morgan 2584940016
Add z-image image generation prototype (#13659) 2026-01-09 21:09:46 -08:00
Parth Sareen 1ef4241727
x: request access for all commands, add welcome message (#13662) 2026-01-09 18:20:39 -08:00
Parth Sareen 12e2b3514a
x: agent loop ux improvements (#13635) 2026-01-07 01:27:15 -08:00
Parth Sareen 76912c062a
x: add experimental agent loop (#13628) 2026-01-05 23:38:40 -08:00
Jeffrey Morgan 8852220f59
add REQUIRES command to Modelfile (#13361) 2025-12-18 13:21:29 -08:00
Eloi Torrents dac4f17fea
cmd/bench: fix binary name in README (#13276) 2025-12-10 14:16:58 -08:00
Julia Scheaffer 56b8fb024c
cmd/bench: fix options table in cmd/bench/README.md (#13216) 2025-12-10 14:07:48 -08:00
Eloi Torrents a03223b86f
cmd/bench: support writing benchmark output to file (#13263)
* cmd/bench: support writing benchmark output to file

This changes Ollama to allow the bench command to write benchmark
results to a user-specified output file instead of stdout when the
--output flag is provided.

---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2025-12-04 13:22:41 -08:00
Patrick Devine d3e0a0dee4
model: ministral w/ llama4 scaling (#13292)
This change:

* fixes rope scaling in the mistral converter
* updates ministral to include llama4 scaling
* includes a new ministral parser for parsing reasoning and tool calling

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2025-12-01 23:20:14 -08:00
Patrick Devine d7fd72193f
tests: basic benchmarking test framework (#12964)
This change adds a basic benchmarking test framework for Ollama which can
be used to determine the prefill, eval, load duration, and total duration
for running a given model or models.
2025-11-15 18:17:40 -08:00
nicole pardal 1ca608bcd1
embeddings: added embedding command for cl (#12795)
Co-authored-by: A-Akhil <akhilrahul70@gmail.com>

This PR introduces a new ollama embed command that allows users to generate embeddings directly from the command line.

Added ollama embed MODEL [TEXT...] command for generating text embeddings
Supports both direct text arguments and stdin piping for scripted workflows

Outputs embeddings as JSON arrays (one per line)
2025-11-05 11:58:03 -08:00
Patrick Devine f89fc1cadd
bugfix: show connection string for interactive cli usage (#12930) 2025-11-05 11:55:04 -08:00
Michael Yang ed78e127d0
fix(cmd): unload model before removal (#12832)
this change fixes two bugs with `ollama rm`:

1. before a model is removed, it will first be stopped. this only
   happens for the first argument and skipped for all other models
2. models are unloaded indiscriminately. this errors for cloud models
   and should be omitted
2025-10-30 10:41:49 -07:00
Patrick Devine b04e46da3e
bugfix: restore the current runOptions if loading fails in the CLI (#12402)
There are two bugs when using `/load <model>` for a model that doesn't exist, namely:
  1. it will not restore the current model settings if the current model is a thinking model; and
  2. it will crash is the current model is a non-thinking model

This bug fix saves the current runOptions and then restores them if the model load
doesn't happen. It also fixes the crash happening for non-thinking models.
2025-09-25 18:30:45 -07:00
Patrick Devine 5a56ff3cf0
cli: add device signin flow when doing ollama push (#12405) 2025-09-25 15:04:43 -07:00
Patrick Devine 64883e3c4c
auth: fix problems with the ollama keypairs (#12373)
* auth: fix problems with the ollama keypairs

This change adds several fixes including:
  - reading in the pubkey files correctly
  - fixing the push unit test to create a keypair file in a temp directory
  - not return 500 errors for normal status error
2025-09-22 23:20:20 -07:00
Patrick Devine 8b894933a7
engine: add remote proxy (#12307) 2025-09-17 14:40:53 -07:00
fengyuchuanshen 8a7e2055d2
cmd: use slices.Contains to simplify code (#12249) 2025-09-11 09:57:31 -07:00
Patrick Devine 026bc29237
cli: show the default context length env setting in online help (#11928) 2025-08-15 14:59:52 -07:00
Michael Yang fa7776fd24
gpt-oss (#11672)
* bf16

* tests

* gpt-oss

* enable gptoss for engine

* rough estimate

* convert to mxfp4

* handle safetensors U8

* clamp glu/linear

* update tokenizer

* MXFP4 support

This implements the Open Compute Microscaling (MX) FP4 format
as a tensor type with backend implementations focusing
on mulmat and mulmatid on CPU, CUDA, and Metal.

* Unit tests for MXFP4 support

This exercises various operations and shapes on both CPU and GPU (if detected
on the system)

* cuda graph

* unit test adjustments

* cuda: optimize memory access

Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4

* mac: fix crash on old macos versions

cblas_sgemm is only supported on v13.3 and up, however bf16 is
only supported on v14+ so we were falling back to ggml-blas and
crashing on bf16 tensors.  Checking for the function being null
seems to be the simplest way to condittionally avoid registering the
backend.

* server: Minimum context length for gptoss

This model requires a minimum context length of 8192 to function
effectively. Users can set higher values through all normal mechanisms
but lower values will be silently reset.

* ggml: Multiply by numParallel for gptoss sliding window

When computing the graph size estimate, the context size is already
multiplied by numParallel so estimates reflect that. However, since
sliding window models use a smaller, fixed context size, they need
to manually take numParallel into account.

* gpt-oss integration

includes harmony parser and thinking levels, etc.

* fix sync

* fix tests

* fix lint

---------

Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
Co-authored-by: Jesse Gross <jesse@ollama.com>
Co-authored-by: Devon Rifkin <drifkin@drifkin.net>
2025-08-05 12:21:16 -07:00
Patrick Devine 80b538e312
cli: catch upstream errors gracefully (#11512) 2025-07-23 22:16:55 -07:00
Patrick Devine 3bac5cba60
Fix GetModelInfo (#11496)
---------

Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-07-22 13:40:47 -07:00
frob 802ad16ce4
docs: add the no-Modelfile function of ollama create (#9077) 2025-07-16 22:16:10 -07:00
Parth Sareen d73f8aa8c3
cmd: add default assistant role to message construction (#11431) 2025-07-16 11:18:16 -07:00
Daniel Hiltgen 34088dbcfb
API/CLI context enhancements (#11331)
* API: expose context size of loaded models

* CLI: add context UX

This adds a column in the ps output to show the models context size.
2025-07-08 11:59:06 -07:00
Daniel Hiltgen 82ad1dbc07
mac: handle "keep" named apps (#11031)
When a user elects to keep the existing app, the
new Ollama is named `Ollama 2.app`
This fixes the app startup flow to handle this naming pattern.
2025-06-09 16:29:57 -07:00
Daniel Hiltgen feeabdadd2
spawn desktop quickly (#11011)
Give the desktop app a hint to start fast.
2025-06-08 09:34:52 -07:00
Daniel Hiltgen a8ed68bd93
launch app hidden (#10962)
When starting the app in the background, start it hidden.
2025-06-06 14:06:29 -07:00
Daniel Hiltgen 2ae65ae471
win: handle more than 2048 processes (#10997)
Fix an array out of bounds crash
2025-06-06 14:06:09 -07:00
Devon Rifkin 5f57b0ef42
add thinking support to the api and cli (#10584)
- Both `/api/generate` and `/api/chat` now accept a `"think"`
  option that allows specifying whether thinking mode should be on or
  not
- Templates get passed this new option so, e.g., qwen3's template can
  put `/think` or `/no_think` in the system prompt depending on the
  value of the setting
- Models' thinking support is inferred by inspecting model templates.
  The prefix and suffix the parser uses to identify thinking support is
  also automatically inferred from templates
- Thinking control & parsing is opt-in via the API to prevent breaking
  existing API consumers. If the `"think"` option is not specified, the
  behavior is unchanged from previous versions of ollama
- Add parsing for thinking blocks in both streaming/non-streaming mode
  in both `/generate` and `/chat`
- Update the CLI to make use of these changes. Users can pass `--think`
  or `--think=false` to control thinking, or during an interactive
  session they can use the commands `/set think` or `/set nothink`
- A `--hidethinking` option has also been added to the CLI. This makes
  it easy to use thinking in scripting scenarios like
  `ollama run qwen3 --think --hidethinking "my question here"` where you
  just want to see the answer but still want the benefits of thinking
  models
2025-05-28 19:38:52 -07:00
Daniel Hiltgen 7359b02707
win: detect background upgrade in progress (#10785)
Give the user a helpful error instead of showing
connection refused errors.
2025-05-21 10:46:56 -07:00
Daniel Hiltgen 27da2cddc5
Fix lingering Q4_0 help reference (#10720) 2025-05-15 16:33:23 -07:00
Bruce MacDonald feb8923ada
cmd: add ellipses to truncated show metadata (#10717)
When a piece of information has been truncated in the show output an ellipses to indicate that more data has not been displayed
2025-05-15 15:45:52 -07:00
Jeffrey Morgan c7f4ae7b9c
server: add webp image input support (#10653) 2025-05-12 20:41:42 -07:00
Bruce MacDonald 3fa78598a1
cmd: strip single quotes from image page (#10636) 2025-05-09 18:05:43 -07:00
Michael Yang 6e9a7a2568
lint: enable usetesting, disable tenv (#10594) 2025-05-08 11:42:14 -07:00
Daniel Hiltgen 424810450f
Move quantization to new backend (#10363)
* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.
2025-05-06 11:20:48 -07:00
Michael Yang d931ee8f22
create blobs in parallel (#10135)
* default max term height
* error on out of tree files
2025-05-05 11:59:26 -07:00
Devon Rifkin dd93e1af85
Revert "increase default context length to 4096 (#10364)"
This reverts commit 424f648632.
2025-04-28 16:54:11 -07:00
Devon Rifkin 424f648632
increase default context length to 4096 (#10364)
* increase default context length to 4096

We lower the default numParallel from 4 to 2 and use these "savings" to
double the default context length from 2048 to 4096.

We're memory neutral in cases when we previously would've used
numParallel == 4, but we add the following mitigation to handle some
cases where we would have previously fallen back to 1x2048 due to low
VRAM: we decide between 2048 and 4096 using a runtime check, choosing
2048 if we're on a one GPU system with total VRAM of <= 4 GB. We
purposefully don't check the available VRAM because we don't want the
context window size to change unexpectedly based on the available VRAM.

We plan on making the default even larger, but this is a relatively
low-risk change we can make to quickly double it.

* fix tests

add an explicit context length so they don't get truncated. The code
that converts -1 from being a signal for doing a runtime check isn't
running as part of these tests.

* tweak small gpu message

* clarify context length default

also make it actually show up in `ollama serve --help`
2025-04-22 16:33:24 -07:00
greengrass821 0806521642
cmd: add support for escaping ~ in filepath (#10339)
Co-authored-by: tooth paste <tooth_paste91@Poorneshwars-MacBook-Pro.local>
2025-04-20 15:21:48 -07:00
Blake Mizerany 1e7f62cb42
cmd: add retry/backoff (#10069)
This commit adds retry/backoff to the registry client for pull requests.

Also, revert progress indication to match original client's until we can
"get it right."

Also, make WithTrace wrap existing traces instead of clobbering them.
This allows clients to compose traces.
2025-04-15 23:24:44 -07:00
CYJiang 64a9cc8f05
cmd: add missing file close in tests (#10179) 2025-04-14 07:49:41 -04:00
frob ccc8c6777b
cleanup: remove OLLAMA_TMPDIR and references to temporary executables (#10182)
* cleanup: remove OLLAMA_TMPDIR
* cleanup: ollama doesn't use temporary executables anymore

---------

Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-04-08 15:01:39 -07:00
Bruce MacDonald 9876c9faa4
chore(all): replace instances of interface with any (#10067)
Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.
2025-04-02 09:44:27 -07:00
Bruce MacDonald e172f095ba
api: return model capabilities from the show endpoint (#10066)
With support for multimodal models becoming more varied and common it is important for clients to be able to easily see what capabilities a model has. Retuning these from the show endpoint will allow clients to easily see what a model can do.
2025-04-01 15:21:46 -07:00
Patrick Devine 6d1103048e
fix: show correct bool value for kv in verbose show information (#9928) 2025-03-21 11:13:54 -07:00
Patrick Devine 2c8b484643
fix: correctly save in interactive mode (#9788)
This fixes the case where a FROM line in previous modelfile points to a
file which may/may not be present in a different ollama instance. We
shouldn't be relying on the filename though and instead just check if
the FROM line was instead a valid model name and point to that instead.
2025-03-15 12:09:02 -07:00
Patrick Devine 4bed739259
add verbose mode to the show command (#9640)
Add metadata and tensor information to the show command to be able to
see more information about a model. This outputs the same data as
shown on the model details page on ollama.com
2025-03-13 14:24:27 -07:00
frob b3af953a55
cli: don't exit for invalid model during /load. (#9576)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-03-11 23:42:53 -07:00
Michael Yang 05a01fdecb ml/backend/ggml: consolidate system info logging
- output backend system info when initializing the backend. this ensures
  this information is always present without needing to be called
  explicitly
- convert to structured logging
- enumerate devices rather than backends since devices are ordered
- track device indices grouped by device name
2025-03-04 15:14:31 -08:00
Daniel Hiltgen 1fdb351c37
New engine: vision models and auto-fallback (#9113)
* Include unified vision layers in memory prediction

For newer vision models with a single gguf, include
the projection estimates.

* Adjust CLI to handle both styles of vision model metadata

* Wire up new tokenizers for new engine

If we're loading the new engine, utilize the new model
text processor instead of calling into cgo wrappers for
llama.cpp.  This also cleans up some tech debt from the
older tokenization flow for the C++ server which was
no longer used.

This also adjusts the grammar handling logic to pass
through to the new engine instead of utilizing the cgo
schema to grammar call.

* Lay foundation for auto selection of new engine
2025-03-04 09:03:46 -08:00
CYJiang d25efe3954
cmd: add default err return for stop (#9458) 2025-03-03 12:13:41 -08:00
yuiseki d721a02e7d
test: add test cases for ListHandler (#9146) 2025-02-19 13:24:27 -08:00
Jesse Gross ed443a0393 Runner for Ollama engine
This provides integration with the new Ollama engine
(5824541 next ollama runner (#7913)) and the rest of the Ollama
infrastructure such as the runner and Ollama server.

In addition, it also builds out the KV cache infrastructure to
support requirements of how Ollama runs models such as:
 - Parallel processing
 - Memory management for defragmentation and shifting
 - Multi-modal modals

Both old and new engines continue to be supported. By default, only
the old engine is used. To enable the new engine:

Start the server with the OLLAMA_NEW_ENGINE environment variable set:
OLLAMA_NEW_ENGINE=1 ./ollama serve

Start a model that is supported by the Ollama engine. This one is Llama 3.1 8b Q4_K_M:
./ollama run jessegross/llama3.1
2025-02-13 17:09:26 -08:00
Patrick Devine a420a453b4
fix default modelfile for create (#8452) 2025-01-16 01:14:04 -08:00
Patrick Devine 32bd37adf8
make the modelfile path relative for ollama create (#8380) 2025-01-10 16:14:08 -08:00
Patrick Devine 8bccae4f92
show a more descriptive error in the client if it is newer than the server (#8351) 2025-01-09 10:12:30 -08:00
Patrick Devine 86a622cbdc
Update the /api/create endpoint to use JSON (#7935)
Replaces `POST /api/create` to use JSON instead of a Modelfile.

This is a breaking change.
2024-12-31 18:02:30 -08:00
Patrick Devine dd352ab27f
fix crash bug with /save when quotes are used (#8208) 2024-12-21 22:31:37 -08:00
Blake Mizerany b1fd7fef86
server: more support for mixed-case model names (#8017)
Fixes #7944
2024-12-11 15:29:59 -08:00
Daniel Hiltgen 4879a234c4
build: Make target improvements (#7499)
* llama: wire up builtin runner

This adds a new entrypoint into the ollama CLI to run the cgo built runner.
On Mac arm64, this will have GPU support, but on all other platforms it will
be the lowest common denominator CPU build.  After we fully transition
to the new Go runners more tech-debt can be removed and we can stop building
the "default" runner via make and rely on the builtin always.

* build: Make target improvements

Add a few new targets and help for building locally.
This also adjusts the runner lookup to favor local builds, then
runners relative to the executable, and finally payloads.

* Support customized CPU flags for runners

This implements a simplified custom CPU flags pattern for the runners.
When built without overrides, the runner name contains the vector flag
we check for (AVX) to ensure we don't try to run on unsupported systems
and crash.  If the user builds a customized set, we omit the naming
scheme and don't check for compatibility.  This avoids checking
requirements at runtime, so that logic has been removed as well.  This
can be used to build GPU runners with no vector flags, or CPU/GPU
runners with additional flags (e.g. AVX512) enabled.

* Use relative paths

If the user checks out the repo in a path that contains spaces, make gets
really confused so use relative paths for everything in-repo to avoid breakage.

* Remove payloads from main binary

* install: clean up prior libraries

This removes support for v0.3.6 and older versions (before the tar bundle)
and ensures we clean up prior libraries before extracting the bundle(s).
Without this change, runners and dependent libraries could leak when we
update and lead to subtle runtime errors.
2024-12-10 09:47:19 -08:00
Parth Sareen de52b6c2f9
bugfix: "null" value json mode (#7979) 2024-12-06 14:13:15 -08:00
Parth Sareen c6c526275d
api: add generate endpoint for structured outputs (#7939) 2024-12-04 17:37:12 -08:00
Parth Sareen 630e7dc6ff
api: structured outputs - chat endpoint (#7900)
Adds structured outputs to chat endpoint
---------

Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Hieu Nguyen <hieunguyen1053@outlook.com>
2024-12-04 16:31:19 -08:00
Sam 1bdab9fdb1
llm: introduce k/v context quantization (vRAM improvements) (#6279) 2024-12-03 15:57:19 -08:00
Jeffrey Morgan ff6c2d6dc8
cmd: don't rely on reading repo file for test (#7898) 2024-11-30 14:12:53 -08:00
frob 30e88d7f31
cmd: don't submit svg files as images for now (#7830) 2024-11-25 16:43:29 -08:00
Bruce MacDonald a210ec74d2
cmd: print location of model after pushing (#7695)
After a user pushes their model it is not clear what to do next. Add a link
to the output of `ollama push` that tells the user where their model can now
be found.
2024-11-25 09:40:16 -08:00
Bruce MacDonald 7b5585b9cb
server: remove out of date anonymous access check (#7785)
In the past the ollama.com server would return a JWT that contained
information about the user being authenticated. This was used to return
different error messages to the user. This is no longer possible since the
token used to authenticate does not contain information about the user
anymore. Removing this code that no longer works.

Follow up changes will improve the error messages returned here, but good to
clean up first.
2024-11-22 11:57:35 -08:00
Daniel Hiltgen d88972ea48
Be quiet when redirecting output (#7360)
This avoids emitting the progress indicators to stderr, and the interactive
prompts to the output file or pipe.  Running "ollama run model > out.txt"
now exits immediately, and "echo hello | ollama run model > out.txt"
produces zero stderr output and a typical response in out.txt
2024-11-22 08:04:54 -08:00
湛露先生 eaaf5d309d
cmd: delete duplicated call to sb.Reset() (#7308)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2024-11-21 11:20:48 -08:00
Blake Mizerany 67691e410d
cmd: preserve exact bytes when displaying template/system layers (#7586) 2024-11-13 23:53:30 -08:00
Daniel Hiltgen 35ec7f079f
Fix unicode output on windows with redirect to file (#7358)
If we're not writing out to a terminal, avoid setting the console mode
on windows, which corrupts the output file.
2024-10-25 13:43:16 -07:00
Patrick Devine d78fb62056
default to "FROM ." if a Modelfile isn't present (#7250) 2024-10-22 13:32:24 -07:00
Patrick Devine c7cb0f0602
image processing for llama3.2 (#6963)
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>
2024-10-18 16:12:35 -07:00
Jesse Gross 7fe3902552 cli: Send all images in conversation history
Currently the CLI only sends images from the most recent image-
containing message. This prevents doing things like sending
one message with an image and then a follow message with a
second image and asking for comparision based on additional
information not present in any text that was output.

It's possible that some models have a problem with this but the
CLI is not the right place to do this since any adjustments are
model-specific and should affect all clients.

Both llava:34b and minicpm-v do reasonable things with multiple
images in the history.
2024-10-10 11:21:51 -07:00
Alex Mavrogiannis f40bb398f6
Stop model before deletion if loaded (fixed #6957) (#7050) 2024-10-01 15:45:43 -07:00
Patrick Devine abed273de3
add "stop" command (#6739) 2024-09-11 16:36:21 -07:00
Michael Yang ecab6f1cc5 refactor show ouput
fixes line wrapping on long texts
2024-09-11 14:23:09 -07:00
Daniel Hiltgen 6719097649
llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT
With the new very large parameter models, some users are willing to wait for
a very long time for models to load.
2024-09-05 14:00:08 -07:00
Daniel Hiltgen b05c9e83d9
Introduce GPU Overhead env var (#5922)
Provide a mechanism for users to set aside an amount of VRAM on each GPU
to make room for other applications they want to start after Ollama, or workaround
memory prediction bugs
2024-09-05 13:46:35 -07:00
Vimal Kumar 5f7b4a5e30
fix(cmd): show info may have nil ModelInfo (#6579) 2024-08-31 21:12:17 -07:00
Patrick Devine 0c819e167b
convert safetensor adapters into GGUF (#6327) 2024-08-23 11:29:56 -07:00
Michael Yang beb49eef65 create bert models from cli 2024-08-20 17:27:34 -07:00
longtao 0a8d6ea86d
Fix typo and improve readability (#5964)
* Fix typo and improve readability

Summary:
* Rename updatAvailableMenuID to updateAvailableMenuID
* Replace unused cmd parameter with _ in RunServer function
* Fix typos in comments

(cherry picked from commit 5b8715f0b04773369e8eb1f9e6737995a0ab3ba7)

* Update api/client.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-08-13 17:54:19 -07:00
Josh f7e3b9190f
cmd: spinner progress for transfer model data (#6100) 2024-08-12 11:46:32 -07:00
Michael Yang b732beba6a lint 2024-08-01 17:06:06 -07:00
Michael Yang c4c84b7a0d
Merge pull request #5196 from ollama/mxyng/messages-2
include modelfile messages
2024-07-31 10:18:17 -07:00
Michael Yang 5c1912769e
Merge pull request #5473 from ollama/mxyng/environ
fix: environ lookup
2024-07-31 10:18:05 -07:00
Daniel Hiltgen 1a83581a8e
Merge pull request #5895 from dhiltgen/sched_faq
Better explain multi-gpu behavior
2024-07-29 14:25:41 -07:00
Michael Yang 38d9036b59
Merge pull request #5992 from ollama/mxyng/save
fix: model save
2024-07-29 09:53:19 -07:00
Tibor Schmidt f3d7a481b7
feat: add support for min_p (resolve #1142) (#1825) 2024-07-27 14:37:40 -07:00
Michael Yang a250c2cb13 display messages 2024-07-26 13:39:57 -07:00
Michael Yang 3d9de805b7 fix: model save
stop parameter is saved as a slice which is incompatible with modelfile
parsing
2024-07-26 13:23:06 -07:00
Michael Yang 15af558423 include modelfile messages 2024-07-26 11:40:11 -07:00
Daniel Hiltgen 830fdd2715 Better explain multi-gpu behavior 2024-07-23 15:16:38 -07:00
Michael Yang 55cd3ddcca bool 2024-07-22 11:27:21 -07:00
Michael Yang 4f1afd575d host 2024-07-22 11:25:30 -07:00
Daniel Hiltgen cc269ba094 Remove no longer supported max vram var
The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM
scenarios.  With Concurrency this was no longer wired up, and the simplistic
value doesn't map to multi-GPU setups.  Users can still set `num_gpu`
to limit memory usage to avoid OOM if we get our predictions wrong.
2024-07-22 09:08:11 -07:00
Patrick Devine 057d31861e
remove template (#5655) 2024-07-13 20:56:24 -07:00
Patrick Devine 23ebbaa46e Revert "remove template from tests"
This reverts commit 9ac0a7a50b.
2024-07-12 15:47:17 -07:00
Patrick Devine 9ac0a7a50b remove template from tests 2024-07-12 15:41:31 -07:00
royjhan 5f034f5b63
Include Show Info in Interactive (#5342) 2024-06-28 13:15:52 -07:00
royjhan b910fa9010
Ollama Show: Check for Projector Type (#5307)
* Check exists projtype

* Maintain Ordering
2024-06-28 11:30:16 -07:00
Michael Yang 123a722a6f
zip: prevent extracting files into parent dirs (#5314) 2024-06-26 21:38:21 -07:00
Blake Mizerany 2aa91a937b
cmd: defer stating model info until necessary (#5248)
This commit changes the 'ollama run' command to defer fetching model
information until it really needs it. That is, when in interactive mode.

It also removes one such case where the model information is fetch in
duplicate, just before calling generateInteractive and then again, first
thing, in generateInteractive.

This positively impacts the performance of the command:

    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.168 total
    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.220 total
    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.217 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 4% cpu 0.652 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.01s user 0.01s system 5% cpu 0.498 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with or would you like to chat?

    ./after run llama3 'hi'  0.01s user 0.01s system 3% cpu 0.479 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total
2024-06-24 20:14:03 -07:00
royjhan fedf71635e
Extend api/show and ollama show to return more model info (#4881)
* API Show Extended

* Initial Draft of Information

Co-Authored-By: Patrick Devine <pdevine@sonic.net>

* Clean Up

* Descriptive arg error messages and other fixes

* Second Draft of Show with Projectors Included

* Remove Chat Template

* Touches

* Prevent wrapping from files

* Verbose functionality

* Docs

* Address Feedback

* Lint

* Resolve Conflicts

* Function Name

* Tests for api/show model info

* Show Test File

* Add Projector Test

* Clean routes

* Projector Check

* Move Show Test

* Touches

* Doc update

---------

Co-authored-by: Patrick Devine <pdevine@sonic.net>
2024-06-19 14:19:02 -07:00
Patrick Devine c69bc19e46
move OLLAMA_HOST to envconfig (#5009) 2024-06-12 18:48:16 -04:00
Michael Yang 201d853fdf nolintlint 2024-06-04 11:13:30 -07:00
Michael Yang e40145a39d lint 2024-06-04 11:13:30 -07:00
Michael Yang 8ffb51749f nolintlint 2024-06-04 11:13:30 -07:00
Michael Yang 04f3c12bb7 replace x/exp/slices with slices 2024-06-04 11:13:30 -07:00
Josh Yan 914f68f021 replaced duplicate call with variable 2024-05-30 10:38:07 -07:00
Josh Yan bd1d119ba9 fixed japanese characters deleted at end of line 2024-05-30 10:24:21 -07:00
Lei Jitang a03be18189
Fix OLLAMA_LLM_LIBRARY with wrong map name and add more env vars to help message (#4663)
* envconfig/config.go: Fix wrong description of OLLAMA_LLM_LIBRARY

Signed-off-by: Lei Jitang <leijitang@outlook.com>

* serve: Add more env to help message of ollama serve

Add more enviroment variables to `ollama serve --help`
to let users know what can be configurated.

Signed-off-by: Lei Jitang <leijitang@outlook.com>

---------

Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-05-30 09:36:51 -07:00
Patrick Devine 4cc3be3035
Move envconfig and consolidate env vars (#4608) 2024-05-24 14:57:15 -07:00
Josh 9f18b88a06
Merge pull request #4566 from ollama/jyan/shortcuts
add Ctrl + W shortcut
2024-05-21 22:49:36 -07:00
Josh Yan 353f83a9c7 add Ctrl + W shortcut 2024-05-21 16:55:09 -07:00
Patrick Devine d355d2020f add fixes for llama 2024-05-20 16:13:57 -07:00
Patrick Devine ccdf0b2a44
Move the parser back + handle utf16 files (#4533) 2024-05-20 11:26:45 -07:00
Patrick Devine 105186aa17
add OLLAMA_NOHISTORY to turn off history in interactive mode (#4508) 2024-05-18 11:51:57 -07:00
Josh Yan 3d90156e99 removed comment 2024-05-16 14:12:03 -07:00
Josh Yan 26bfc1c443 go fmt'd cmd.go 2024-05-15 17:26:39 -07:00
Josh Yan 799aa9883c go fmt'd cmd.go 2024-05-15 17:24:17 -07:00
Josh Yan c9e584fb90 updated double-width display 2024-05-15 16:45:24 -07:00