mirror of
https://github.com/ollama/ollama
synced 2026-04-23 08:45:14 +00:00
* gemma4: implement Gemma 4 model for MLX (text-only runtime) * gemma4: two MoE + SWA prefill perf fixes Two performance optimizations in the gemma4 forward pass 1. Memoize the sliding-window prefill mask across layers. 2. Softmax only over the selected experts in Router.Forward. * review comments |
||
|---|---|---|
| .. | ||
| gemma3 | ||
| gemma4 | ||
| glm4_moe_lite | ||
| llama | ||
| nn | ||
| qwen3 | ||
| qwen3_5 | ||
| qwen3_5_moe | ||