mirror of
https://github.com/ollama/ollama
synced 2026-04-23 08:45:14 +00:00
* tokenizer: add SentencePiece-style BPE support Add WithSentencePieceNormalizer option to BytePairEncoding for models that use BPE with SentencePiece-style space markers (space to/from U+2581). NewBytePairEncoding is unchanged; the new NewBytePairEncodingWithOptions constructor accepts BPEOption functions. Decoding handles the reverse mapping of U+2581 back to spaces. * review comments |
||
|---|---|---|
| .. | ||
| testdata | ||
| bytepairencoding.go | ||
| bytepairencoding_test.go | ||
| sentencepiece.go | ||
| sentencepiece_test.go | ||
| tokenizer.go | ||
| vocabulary.go | ||
| vocabulary_test.go | ||
| wordpiece.go | ||
| wordpiece_test.go | ||