ollama/integration
Jesse Gross c61023f554 ollamarunner: Fix off by one error with numPredict
When numPredict is set, the user will receive one less token
than the requested limit. In addition, the stats will incorrectly
show the number of tokens returned as the limit. In cases where
numPredict is not set, the number of tokens is reported correctly.

This occurs because numPredict is checked when setting up the next
batch but hitting the limit will terminate the current batch as well.
Instead, is is better to check the limit as we actually predict them.
2026-02-04 17:14:24 -08:00
..
testdata tests: fix embeddinggemma integration test (#12830) 2025-10-29 11:07:28 -07:00
api_test.go logprob: add bytes to logprobs (#13068) 2025-11-13 13:49:25 -08:00
basic_test.go ollamarunner: Fix off by one error with numPredict 2026-02-04 17:14:24 -08:00
concurrency_test.go test: harden scheduler tests (#12662) 2025-10-17 08:56:44 -07:00
context_test.go test: harden scheduler tests (#12662) 2025-10-17 08:56:44 -07:00
embed_test.go Revert "Update vendored llama.cpp to b7847" (#14061) 2026-02-03 18:39:36 -08:00
imagegen_test.go fix: use api.GenerateRequest for image generation test (#13793) 2026-01-20 03:23:31 -08:00
library_models_test.go Integration test tuning (#12492) 2025-10-08 09:51:25 -07:00
llm_image_test.go test: add ministral-3 (#13300) 2025-12-02 09:52:16 -08:00
max_queue_test.go perf: build graph for next batch async to keep GPU busy (#11863) 2025-08-29 14:20:28 -07:00
model_arch_test.go test: harden scheduler tests (#12662) 2025-10-17 08:56:44 -07:00
model_perf_test.go Integration test tuning (#12492) 2025-10-08 09:51:25 -07:00
quantization_test.go Integration test tuning (#12492) 2025-10-08 09:51:25 -07:00
README.md int: harden server lifecycle (#12835) 2025-10-29 11:50:56 -07:00
tools_test.go Revert "Update vendored llama.cpp to b7847" (#14061) 2026-02-03 18:39:36 -08:00
utils_test.go test: add lfm2.5-thinking coverage (#13802) 2026-01-20 12:57:02 -08:00

Integration Tests

This directory contains integration tests to exercise Ollama end-to-end to verify behavior

By default, these tests are disabled so go test ./... will exercise only unit tests. To run integration tests you must pass the integration tag. go test -tags=integration ./... Some tests require additional tags to enable to allow scoped testing to keep the duration reasonable. For example, testing a broad set of models requires -tags=integration,models and a longer timeout (~60m or more depending on the speed of your GPU.). To view the current set of tag combinations use find integration -type f | xargs grep "go:build"

The integration tests have 2 modes of operating.

  1. By default, on Unix systems, they will start the server on a random port, run the tests, and then shutdown the server. On Windows you must ALWAYS run the server on OLLAMA_HOST for the tests to work.
  2. If OLLAMA_TEST_EXISTING is set to a non-empty string, the tests will run against an existing running server, which can be remote based on your OLLAMA_HOST environment variable

Important

Before running the tests locally without the "test existing" setting, compile ollama from the top of the source tree go build . in addition to GPU support with cmake if applicable on your platform. The integration tests expect to find an ollama binary at the top of the tree.

Many tests use a default small model suitable to run on many systems. You can override this default model by setting OLLAMA_TEST_DEFAULT_MODEL