ollama/model/models/gemma4
Daniel Hiltgen de9673ac3f
tokenizer: add byte fallback for SentencePiece BPE encoding (#15232)
* tokenizer: add byte fallback for SentencePiece BPE encoding

When BPE merging produces tokens not in the vocabulary, fall back to
encoding each UTF-8 byte as <0xHH> byte tokens instead of silently
dropping the character. Also teach Decode to convert <0xHH> tokens
back to raw bytes.

Fixes #15229, fixes #15231

* tokenizer fixes
2026-04-02 13:04:45 -07:00
..
model.go Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
model_audio.go Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
model_text.go Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
model_vision.go Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
process_audio.go Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
process_image.go Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
tokenizer_compare_test.go Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
tokenizer_reference_test.go tokenizer: add byte fallback for SentencePiece BPE encoding (#15232) 2026-04-02 13:04:45 -07:00