ollama/model/models
Daniel Hiltgen de9673ac3f
tokenizer: add byte fallback for SentencePiece BPE encoding (#15232)
* tokenizer: add byte fallback for SentencePiece BPE encoding

When BPE merging produces tokens not in the vocabulary, fall back to
encoding each UTF-8 byte as <0xHH> byte tokens instead of silently
dropping the character. Also teach Decode to convert <0xHH> tokens
back to raw bytes.

Fixes #15229, fixes #15231

* tokenizer fixes
2026-04-02 13:04:45 -07:00
..
bert move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
deepseek2 move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
deepseekocr move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
gemma2 move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
gemma3 move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
gemma3n move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
gemma4 tokenizer: add byte fallback for SentencePiece BPE encoding (#15232) 2026-04-02 13:04:45 -07:00
glm4moelite move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
glmocr move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
gptoss move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
lfm2 model: improvements to LFM architectures (#14368) 2026-02-23 14:38:10 -08:00
llama move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
llama4 move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
mistral3 move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
mllama move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
nemotronh models: add nemotronh architecture support (#14356) 2026-02-22 15:09:14 -08:00
nomicbert move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
olmo3 move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
qwen2 move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
qwen3 move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
qwen3next model: add qwen3-next compatibility for legacy ssm_in projections (#15133) 2026-03-29 11:50:47 -07:00
qwen3vl model: support for qwen3.5 architecture (#14378) 2026-02-24 20:08:05 -08:00
qwen25vl move tokenizers to separate package (#13825) 2026-02-05 17:44:11 -08:00
models.go Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00