mirror of https://github.com/ollama/ollama synced 2026-04-23 08:45:14 +00:00

Find a file

Daniel Hiltgen 96b202d34b Add support for gemma4 (#15214 ) * bench: add prompt calibration, context size flag, and NumCtx reporting Add --num-ctx flag to set context size, and report NumCtx in model info header. Calibrate tokens-per-word ratio during warmup using actual tokenization metrics from the model, replacing the fixed 1.3 heuristic. This produces more accurate prompt token counts for --prompt-tokens. Also add fetchContextLength() to query running model context via /api/ps. * integration: improve vision test robustness and add thinking tests Add skipIfNoVisionOverride() to skip vision tests when OLLAMA_TEST_MODEL is set to a non-vision model. Add Think:false to context exhaustion test to prevent thinking models from using all context before the test can measure it. Add third test image (ollama homepage) and replace OCR test with ImageDescription test using it. Relax match strings for broader model compatibility. Add TestThinkingEnabled and TestThinkingSuppressed to verify thinking output and channel tag handling. * gemma4: add Gemma 4 GGML model support Add full Gemma 4 model family support (E2B, E4B, 26B MoE, 31B Dense) for the GGML backend including text, vision, converter, parser, and renderer. Text model features: - Sliding window + full attention with per-layer patterns - KV sharing across layers with donor map - Per-layer embeddings (PLE) with learned projections - MoE routing with RMSNorm + learned scale - Proportional RoPE with freq_factors for global attention - Final logit softcapping Vision model features: - SigLIP vision encoder with 2D RoPE - ClippableLinear with input/output clamping via packed v.clamp_data - Adaptive average pooling with nMerge kernel - Multi-modal projection with unweighted RMSNorm Converter: - Safetensors to GGUF with vision tensor renaming - Fused MoE gate_up_proj splitting - Vision patch embedding reshape (HF to Conv2D layout) - Packed clamp data tensor for ClippableLinear bounds - Proportional RoPE freq_factors generation Also includes: - BackendGet() on ml.Tensor for reading weight tensor data - Q6_K CUDA get_rows kernel support - MoE-aware ffn_down quantization layer counting - Gemma4 parser with tool calling and thinking support - Gemma4 renderer with structured tool format - Architecture-based auto-detection of renderer/parser/stop tokens - Integration test gemma4 model list additions * gemma4: add audio support with USM conformer encoder Add audio encoding for Gemma 4 using the USM conformer architecture: - Converter: audio tensor mapping, SSCP/conformer/embedder name replacements, softplus repacker for per_dim_scale, F32 enforcement for conv weights - GGML backend: Conv1DDW and PadExt tensor ops - Audio encoder: SSCP Conv2D, 12 conformer blocks (FFW + block-local attention with relative position embeddings + LightConv1d + FFW), output projection, audio-to-text embedding projector - Audio preprocessing: WAV decode, mel spectrogram, FFT (pure Go) - Model wiring: WAV detection, audio token handling, unified PostTokenize Correctly transcribes "why is the sky blue" from test audio. * integration: add gemma4 audio tests including OpenAI API coverage Test audio transcription and response via the Ollama native API, plus two new tests exercising the OpenAI-compatible endpoints: - /v1/audio/transcriptions (multipart form upload) - /v1/chat/completions with input_audio content type All tests use capability checks and skip models without audio support. * gemma4: add OpenAI audio API support and capability detection - Add CapabilityAudio and detect from audio.block_count in GGUF - Add /v1/audio/transcriptions endpoint with TranscriptionMiddleware - Add input_audio content type support in /v1/chat/completions - Add TranscriptionRequest/Response types in openai package * gemma4: add audio input support for run command - /audio toggle in interactive mode for voice chat - Platform-specific microphone recording (AVFoundation on macOS, PulseAudio/ALSA on Linux, WASAPI on Windows) - Space to start/stop recording, automatic chunking for long audio * gemma4: add transcribe command (ollama transcribe MODEL) - Interactive mode with readline prompt and slash commands - Non-interactive mode for piped audio or record-until-Ctrl+C - Chunked streaming transcription for long recordings - Word-wrapped output matching run command style * gemma4: add parser, renderer, and integration test plumbing * gemma4: fix renderer to emit BOS token * gemma4: add OpenAI audio transcription API and input_audio support * gemma4: update converter for new weight drop naming * gemma4: add per_expert_scale to MoE router and fix moe_intermediate_size config * gemma4: rewrite renderer to match HF Jinja2 template exactly Fix 8 bugs found by building 55 reference tests verified against the HF Jinja2 chat template (VERIFY_JINJA2=1 shells out to Python): - Tool responses use separate <\|turn>tool turns (not inline tags) - Tool calls emitted before content in assistant messages - Thinking content stripped from assistant history (strip_thinking) - User, tool, and system content trimmed (template does \| trim) - Empty system message still emits system turn (check role, not content) - Nested object properties rendered recursively with required field - Array items specification rendered for array-type properties - OBJECT/ARRAY type-specific rendering comma logic matches template Also adds Required field to api.ToolProperty for nested object schemas, replaces old gemma4_test.go with comprehensive gemma4_reference_test.go, and commits the Jinja2 template as testdata for verification. * gemma4: fix MoE fused gate_up split and multiline tool-call arg parsing - Text MoE: split `ffn_gate_up_exps` into contiguous `[gate\|up]` halves instead of stride-2 slices. - Parser: escape control characters in `<\|"\|>...<\|"\|>` string literals when converting tool-call args to JSON. - Fixes warnings like `invalid character '\n' in string literal` for multiline tool arguments. - Add Gemma4 parser regressions for multiline tool-call args and `gemma4ArgsToJSON`. * cmd: simplify audio input to dropped file attachments * gemma4: use full SWA memory for better cache reuse * gemma4: initialize clamps after backend load * convert: align gemma4 audio tensor renames with llama.cpp * Remove redundant comments in gemma4 vision model * Format Gemma4 MoE block field alignment * use 4096 kvcache.NewSWAMemCache * convert: support new Gemma4 audio_tower tensor naming (#15221) Co-authored-by: jmorganca <jmorganca@gmail.com> * fix integration test defaults for audio * review comments and lint fixes * remove unused audio/video files --------- Co-authored-by: jmorganca <jmorganca@gmail.com>		2026-04-02 11:33:33 -07:00
.github	ci: include mlx jit headers on linux (#15083 )	2026-03-26 23:10:07 -07:00
anthropic	anthropic: fix empty inputs in content blocks (#15105 )	2026-03-27 15:41:27 -07:00
api	Add support for gemma4 (#15214 )	2026-04-02 11:33:33 -07:00
app	app: use the same client for inference and other requests (#15204 )	2026-04-02 11:07:50 -07:00
auth	auth: fix problems with the ollama keypairs (#12373 )	2025-09-22 23:20:20 -07:00
cmd	Add support for gemma4 (#15214 )	2026-04-02 11:33:33 -07:00
convert	Add support for gemma4 (#15214 )	2026-04-02 11:33:33 -07:00
discover	CUDA: filter devices on secondary discovery (#13317 )	2025-12-03 12:58:16 -08:00
docs	launch: replace deprecated OPENAI_BASE_URL with config.toml profile for codex (#15041 )	2026-04-01 11:43:23 -04:00
envconfig	add ability to turn on debug request logging (#14106 )	2026-03-19 17:08:17 -07:00
format	chore(all): replace instances of interface with any (#10067 )	2025-04-02 09:44:27 -07:00
fs	Add support for gemma4 (#15214 )	2026-04-02 11:33:33 -07:00
harmony	Parser for Cogito v2 (#13145 )	2025-11-19 17:21:07 -08:00
integration	Add support for gemma4 (#15214 )	2026-04-02 11:33:33 -07:00
internal	Reapply "don't require pulling stubs for cloud models" again (#14608 )	2026-03-06 14:27:47 -08:00
kvcache	model: support for qwen3.5 architecture (#14378 )	2026-02-24 20:08:05 -08:00
llama	Add support for gemma4 (#15214 )	2026-04-02 11:33:33 -07:00
llm	llm, mlxrunner: fix done channel value consumed by first receiver	2026-03-19 17:44:28 -07:00
logutil	logutil: fix source field (#12279 )	2025-09-16 16:18:07 -07:00
manifest	Clean up the manifest and modelpath (#13807 )	2026-01-21 11:46:17 -08:00
middleware	Add support for gemma4 (#15214 )	2026-04-02 11:33:33 -07:00
ml	Add support for gemma4 (#15214 )	2026-04-02 11:33:33 -07:00
model	Add support for gemma4 (#15214 )	2026-04-02 11:33:33 -07:00
openai	Add support for gemma4 (#15214 )	2026-04-02 11:33:33 -07:00
parser	MLX: add header vendoring and remove go build tag (#14642 )	2026-03-09 17:24:45 -07:00
progress	Add z-image image generation prototype (#13659 )	2026-01-09 21:09:46 -08:00
readline	Add support for gemma4 (#15214 )	2026-04-02 11:33:33 -07:00
runner	Add support for gemma4 (#15214 )	2026-04-02 11:33:33 -07:00
sample	Revert "runner: add token history sampling parameters to ollama runner (#14537 )" (#14776 )	2026-03-10 21:07:52 -07:00
scripts	ci: fix missing windows zip file (#14807 )	2026-03-12 16:14:00 -07:00
server	Add support for gemma4 (#15214 )	2026-04-02 11:33:33 -07:00
template	template: fix args-as-json rendering (#13636 )	2026-01-06 18:33:57 -08:00
thinking	thinking: fix double emit when no opening tag	2025-08-21 21:03:12 -07:00
tokenizer	tokenizer: add SentencePiece-style BPE support (#15162 )	2026-03-31 17:00:36 -07:00
tools	preserve tool definition and call JSON ordering (#13525 )	2026-01-05 18:03:36 -08:00
types	Add support for gemma4 (#15214 )	2026-04-02 11:33:33 -07:00
version	add version	2023-08-22 09:40:58 -07:00
x	mlx: respect tokenizer add_bos_token setting in pipeline (#15185 )	2026-03-31 16:46:30 -07:00
.dockerignore	next build (#8539 )	2025-01-29 15:03:38 -08:00
.gitattributes	.gitattributes: add app/webview to linguist-vendored (#13274 )	2025-11-29 23:46:10 -05:00
.gitignore	harmony: remove special casing in routes.go	2025-09-18 14:55:59 -07:00
.golangci.yaml	ci: restore previous linter rules (#13322 )	2025-12-03 18:55:02 -08:00
CMakeLists.txt	ci: harden cuda include path handling (#15093 )	2026-03-27 07:57:07 -07:00
CMakePresets.json	MLX: add header vendoring and remove go build tag (#14642 )	2026-03-09 17:24:45 -07:00
CONTRIBUTING.md	docs: fix typos in repository documentation (#10683 )	2025-11-15 20:22:29 -08:00
Dockerfile	mlx: update as of 3/23 (#14789 )	2026-03-23 11:28:44 -07:00
go.mod	cmd: set codex env vars on launch and handle zstd request bodies (#14122 )	2026-02-18 17:19:36 -08:00
go.sum	cmd: set codex env vars on launch and handle zstd request bodies (#14122 )	2026-02-18 17:19:36 -08:00
LICENSE	`proto` -> `ollama`	2023-06-26 15:57:13 -04:00
main.go	lint	2024-08-01 17:06:06 -07:00
Makefile.sync	Revert "Update vendored llama.cpp to b7847" (#14061 )	2026-02-03 18:39:36 -08:00
MLX_C_VERSION	mlx: update as of 3/23 (#14789 )	2026-03-23 11:28:44 -07:00
MLX_VERSION	mlx: update as of 3/23 (#14789 )	2026-03-23 11:28:44 -07:00
README.md	readme: update download link for macOS (#1 ) (#14271 )	2026-02-15 15:25:15 -08:00
SECURITY.md	docs: fix typos in repository documentation (#10683 )	2025-11-15 20:22:29 -08:00

README.md

Ollama

Start building with open models.

Download

macOS

curl -fsSL https://ollama.com/install.sh | sh

or download manually

Windows

irm https://ollama.com/install.ps1 | iex

or download manually

Linux

curl -fsSL https://ollama.com/install.sh | sh

Manual install instructions

Docker

The official Ollama Docker image ollama/ollama is available on Docker Hub.

Libraries

Community

Get started

ollama

You'll be prompted to run a model or connect Ollama to your existing agents or applications such as claude, codex, openclaw and more.

Coding

To launch a specific integration:

ollama launch claude

Supported integrations include Claude Code, Codex, Droid, and OpenCode.

AI assistant

Use OpenClaw to turn Ollama into a personal AI assistant across WhatsApp, Telegram, Slack, Discord, and more:

ollama launch openclaw

Chat with a model

Run and chat with Gemma 3:

ollama run gemma3

See ollama.com/library for the full list.

See the quickstart guide for more details.

REST API

Ollama has a REST API for running and managing models.

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'

See the API documentation for all endpoints.

Python

pip install ollama

from ollama import chat

response = chat(model='gemma3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response.message.content)

JavaScript

npm i ollama

import ollama from "ollama";

const response = await ollama.chat({
  model: "gemma3",
  messages: [{ role: "user", content: "Why is the sky blue?" }],
});
console.log(response.message.content);

Supported backends

llama.cpp project founded by Georgi Gerganov.

Documentation

Community Integrations

Want to add your project? Open a pull request.

Chat Interfaces

Web

Open WebUI - Extensible, self-hosted AI interface
Onyx - Connected AI workspace
LibreChat - Enhanced ChatGPT clone with multi-provider support
Lobe Chat - Modern chat framework with plugin ecosystem (docs)
NextChat - Cross-platform ChatGPT UI (docs)
Perplexica - AI-powered search engine, open-source Perplexity alternative
big-AGI - AI suite for professionals
Lollms WebUI - Multi-model web interface
ChatOllama - Chatbot with knowledge bases
Bionic GPT - On-premise AI platform
Chatbot UI - ChatGPT-style web interface
Hollama - Minimal web interface
Chatbox - Desktop and web AI client
chat - Chat web app for teams
Ollama RAG Chatbot - Chat with multiple PDFs using RAG
Tkinter-based client - Python desktop client

Desktop

Dify.AI - LLM app development platform
AnythingLLM - All-in-one AI app for Mac, Windows, and Linux
Maid - Cross-platform mobile and desktop client
Witsy - AI desktop app for Mac, Windows, and Linux
Cherry Studio - Multi-provider desktop client
Ollama App - Multi-platform client for desktop and mobile
PyGPT - AI desktop assistant for Linux, Windows, and Mac
Alpaca - GTK4 client for Linux and macOS
SwiftChat - Cross-platform including iOS, Android, and Apple Vision Pro
Enchanted - Native macOS and iOS client
RWKV-Runner - Multi-model desktop runner
Ollama Grid Search - Evaluate and compare models
macai - macOS client for Ollama and ChatGPT
AI Studio - Multi-provider desktop IDE
Reins - Parameter tuning and reasoning model support
ConfiChat - Privacy-focused with optional encryption
LLocal.in - Electron desktop client
MindMac - AI chat client for Mac
Msty - Multi-model desktop client
BoltAI for Mac - AI chat client for Mac
IntelliBar - AI-powered assistant for macOS
Kerlig AI - AI writing assistant for macOS
Hillnote - Markdown-first AI workspace
Perfect Memory AI - Productivity AI personalized by screen and meeting history

Mobile

Ollama Android Chat - One-click Ollama on Android

SwiftChat, Enchanted, Maid, Ollama App, Reins, and ConfiChat listed above also support mobile platforms.

Code Editors & Development

Cline - VS Code extension for multi-file/whole-repo coding
Continue - Open-source AI code assistant for any IDE
Void - Open source AI code editor, Cursor alternative
Copilot for Obsidian - AI assistant for Obsidian
twinny - Copilot and Copilot chat alternative
gptel Emacs client - LLM client for Emacs
Ollama Copilot - Use Ollama as GitHub Copilot
Obsidian Local GPT - Local AI for Obsidian
Ellama Emacs client - LLM tool for Emacs
orbiton - Config-free text editor with Ollama tab completion
AI ST Completion - Sublime Text 4 AI assistant
VT Code - Rust-based terminal coding agent with Tree-sitter
QodeAssist - AI coding assistant for Qt Creator
AI Toolkit for VS Code - Microsoft-official VS Code extension
Open Interpreter - Natural language interface for computers

Libraries & SDKs

LiteLLM - Unified API for 100+ LLM providers
Semantic Kernel - Microsoft AI orchestration SDK
LangChain4j - Java LangChain (example)
LangChainGo - Go LangChain (example)
Spring AI - Spring framework AI support (docs)
LangChain and LangChain.js with example
Ollama for Ruby - Ruby LLM library
any-llm - Unified LLM interface by Mozilla
OllamaSharp for .NET - .NET SDK
LangChainRust - Rust LangChain (example)
Agents-Flex for Java - Java agent framework (example)
Elixir LangChain - Elixir LangChain
Ollama-rs for Rust - Rust SDK
LangChain for .NET - .NET LangChain (example)
chromem-go - Go vector database with Ollama embeddings (example)
LangChainDart - Dart LangChain
LlmTornado - Unified C# interface for multiple inference APIs
Ollama4j for Java - Java SDK
Ollama for Laravel - Laravel integration
Ollama for Swift - Swift SDK
LlamaIndex and LlamaIndexTS - Data framework for LLM apps
Haystack - AI pipeline framework
Firebase Genkit - Google AI framework
Ollama-hpp for C++ - C++ SDK
PromptingTools.jl - Julia LLM toolkit (example)
Ollama for R - rollama - R SDK
Portkey - AI gateway
Testcontainers - Container-based testing
LLPhant - PHP AI framework

Frameworks & Agents

AutoGPT - Autonomous AI agent platform
crewAI - Multi-agent orchestration framework
Strands Agents - Model-driven agent building by AWS
Cheshire Cat - AI assistant framework
any-agent - Unified agent framework interface by Mozilla
Stakpak - Open source DevOps agent
Hexabot - Conversational AI builder
Neuro SAN - Multi-agent orchestration (docs)

RAG & Knowledge Bases

RAGFlow - RAG engine based on deep document understanding
R2R - Open-source RAG engine
MaxKB - Ready-to-use RAG chatbot
Minima - On-premises or fully local RAG
Chipper - AI interface with Haystack RAG
ARGO - RAG and deep research on Mac/Windows/Linux
Archyve - RAG-enabling document library
Casibase - AI knowledge base with RAG and SSO
BrainSoup - Native client with RAG and multi-agent automation

Bots & Messaging

LangBot - Multi-platform messaging bots with agents and RAG
AstrBot - Multi-platform chatbot with RAG and plugins
Discord-Ollama Chat Bot - TypeScript Discord bot
Ollama Telegram Bot - Telegram bot
LLM Telegram Bot - Telegram bot for roleplay

Terminal & CLI

aichat - All-in-one LLM CLI with Shell Assistant, RAG, and AI tools
oterm - Terminal client for Ollama
gollama - Go-based model manager for Ollama
tlm - Local shell copilot
tenere - TUI for LLMs
ParLlama - TUI for Ollama
llm-ollama - Plugin for Datasette's LLM CLI
ShellOracle - Shell command suggestions
LLM-X - Progressive web app for LLMs
cmdh - Natural language to shell commands
VT - Minimal multimodal AI chat app

Productivity & Apps

AppFlowy - AI collaborative workspace, self-hostable Notion alternative
Screenpipe - 24/7 screen and mic recording with AI-powered search
Vibe - Transcribe and analyze meetings
Page Assist - Chrome extension for AI-powered browsing
NativeMind - Private, on-device browser AI assistant
Ollama Fortress - Security proxy for Ollama
1Panel - Web-based Linux server management
Writeopia - Text editor with Ollama integration
QA-Pilot - GitHub code repository understanding
Raycast extension - Ollama in Raycast
Painting Droid - Painting app with AI integrations
Serene Pub - AI roleplaying app
Mayan EDMS - Document management with Ollama workflows
TagSpaces - File management with AI tagging

Observability & Monitoring

Opik - Debug, evaluate, and monitor LLM applications
OpenLIT - OpenTelemetry-native monitoring for Ollama and GPUs
Lunary - LLM observability with analytics and PII masking
Langfuse - Open source LLM observability
HoneyHive - AI observability and evaluation for agents
MLflow Tracing - Open source LLM observability

README.md

Ollama

Download

macOS

Windows

Linux

Docker

Libraries

Community

Get started

Coding

AI assistant

Chat with a model

REST API

Python

JavaScript

Supported backends

Documentation

Community Integrations

Chat Interfaces

Web

Desktop

Mobile

Code Editors & Development

Libraries & SDKs

Frameworks & Agents

RAG & Knowledge Bases

Bots & Messaging

Terminal & CLI

Productivity & Apps

Observability & Monitoring

Database & Embeddings

Infrastructure & Deployment

Cloud

Package Managers

README.md Unescape Escape

Ollama

Download

macOS

Windows

Linux

Docker

Libraries

Community

Get started

Coding

AI assistant

Chat with a model

REST API

Python

JavaScript

Supported backends

Documentation

Community Integrations

Chat Interfaces

Web

Desktop

Mobile

Code Editors & Development

Libraries & SDKs

Frameworks & Agents

RAG & Knowledge Bases

Bots & Messaging

Terminal & CLI

Productivity & Apps

Observability & Monitoring

Database & Embeddings

Infrastructure & Deployment

Cloud

Package Managers

README.md