ollama/ml
Daniel Hiltgen e823bff873
gemma4: enable flash attention (#15378)
Backport GGML kernels so we can enable flash attention for the gemma 4 model on
Metal and CUDA.
2026-04-07 08:12:36 -07:00
..
backend gemma4: enable flash attention (#15378) 2026-04-07 08:12:36 -07:00
nn fix: qwen2.5 vl rope (#13486) 2025-12-15 17:30:33 -08:00
backend.go Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
device.go flash attn: add auto mode for llama engine (#13052) 2025-12-12 13:27:19 -08:00
path.go cpu: always ensure LibOllamaPath included (#12890) 2025-10-31 14:37:29 -07:00