mirror of
https://github.com/ollama/ollama
synced 2026-04-23 08:45:14 +00:00
The nvidia_fp32 config for (576, 512) head sizes had nbatch_fa=32, which caused zero-sized arrays when computing array dimensions: nbatch_fa / (np * warp_size) = 32 / (2 * 32) = 0 This resulted in CUDA compilation failures on CUDA 12 (Windows and Linux arm64): - "static assertion failed with nbatch_fa % (np*warp_size) != 0" - "the size of an array must be greater than zero" Fix by changing nbatch_fa from 32 to 64 for all (576, 512) configs in the nvidia_fp32 function, matching the nvidia_fp16 and AMD configs. |
||
|---|---|---|
| .. | ||
| imageproc | ||
| input | ||
| models | ||
| parsers | ||
| renderers | ||
| testdata | ||
| bytepairencoding.go | ||
| bytepairencoding_test.go | ||
| model.go | ||
| model_test.go | ||
| sentencepiece.go | ||
| sentencepiece_test.go | ||
| textprocessor.go | ||
| vocabulary.go | ||
| vocabulary_test.go | ||
| wordpiece.go | ||
| wordpiece_test.go | ||