ollama

ihateninjas/ollama

Fork 0

mirror of https://github.com/ollama/ollama synced 2026-04-23 08:45:14 +00:00

Commit graph

Author	SHA1	Message	Date
Jesse Gross	24e038d56a	mlxrunner: add logprobs support Match the ollamarunner and OpenAI semantics: raw, full-vocab log-softmax with the top-K ranked by probability. Skipped on the GPU when the request doesn't ask for logprobs so decode doesn't pay for it otherwise.	2026-04-20 17:43:00 -07:00
Patrick Devine	de5cb7311f	mlx: add mxfp4/mxfp8/nvfp4 importing (#15015 ) This change allows importing bf16 and converting to mxfp4/mxfp8/nvfp4 and also importing fp8 and converting directly to mxfp8.	2026-03-24 13:45:44 -07:00
Daniel Hiltgen	539741199e	mlx: perf improvements (#14768 ) * mlx: perf improvements Fix nn.go to call mlx_fast_layer_norm instead of manually implementing (mean, subtract, variance, rsqrt, multiply, add — 6 ops) Fix llama.go, gemma3.go to remove RepeatKV to tile K/V tensors to match the Q head count, since scaled_dot_product_attention natively handles GQA (it just requires n_q_heads % n_kv_heads == 0) * review comments	2026-03-12 12:01:28 -07:00

Author

SHA1

Message

Date

Jesse Gross

24e038d56a

mlxrunner: add logprobs support

Match the ollamarunner and OpenAI semantics: raw, full-vocab log-softmax
with the top-K ranked by probability. Skipped on the GPU when the request
doesn't ask for logprobs so decode doesn't pay for it otherwise.

2026-04-20 17:43:00 -07:00

Patrick Devine

de5cb7311f

mlx: add mxfp4/mxfp8/nvfp4 importing (#15015 )

This change allows importing bf16 and converting to mxfp4/mxfp8/nvfp4
and also importing fp8 and converting directly to mxfp8.

2026-03-24 13:45:44 -07:00

Daniel Hiltgen

539741199e

mlx: perf improvements (#14768 )

* mlx: perf improvements

Fix nn.go to call mlx_fast_layer_norm instead of manually implementing (mean,
subtract, variance, rsqrt, multiply, add — 6 ops)

Fix llama.go, gemma3.go to remove RepeatKV to tile K/V tensors to match the Q
head count, since scaled_dot_product_attention natively handles GQA (it just
requires n_q_heads % n_kv_heads == 0)

* review comments

2026-03-12 12:01:28 -07:00

3 commits