ollama/x/mlxrunner
Daniel Hiltgen 8968740836
mlx: Improve M5 performance with NAX (#15345)
* mlx: Improve M5 performance with NAX

This modifies the Mac release to now have 2 builds of MLX for broader
compatibility while supporting the latest M5 hardware features.  NAX requires
building with xcode 26.2 and targetting support only for OS v26 and up.  Since
we want to support older MacOS versions as well, we now need 2 different MLX
builds and runtime detection logic to select the optimal version.  The newer
build will detect NAX missing at runtime, so it is safe to run on pre M5 macs.

* mac: prevent generate on cross-compiles

For some versions of Xcode, cmake builds are failing due to header problems in
cross-compiling during the generate phase.  Since generate is producing arch
independent generated output, we can skip this during cross-compiling.
2026-04-07 08:12:24 -07:00
..
cache mlxrunner: combine setStateRaw and setStateDetached into setState 2026-03-26 13:32:11 -07:00
mlx mlx: Improve M5 performance with NAX (#15345) 2026-04-07 08:12:24 -07:00
model mlx: add mxfp4/mxfp8/nvfp4 importing (#15015) 2026-03-24 13:45:44 -07:00
sample mlxrunner: fix Slice(0, 0) returning full dimension instead of empty 2026-03-18 16:06:33 -07:00
cache.go mlxrunner: schedule periodic snapshots during prefill 2026-03-26 13:32:11 -07:00
cache_test.go mlxrunner: schedule periodic snapshots during prefill 2026-03-26 13:32:11 -07:00
cache_trie.go mlxrunner: share KV cache across conversations with common prefixes 2026-03-18 16:06:33 -07:00
cache_trie_test.go mlxrunner: share KV cache across conversations with common prefixes 2026-03-18 16:06:33 -07:00
client.go ci: include mlx jit headers on linux (#15083) 2026-03-26 23:10:07 -07:00
imports.go MLX: add header vendoring and remove go build tag (#14642) 2026-03-09 17:24:45 -07:00
pipeline.go mlx: respect tokenizer add_bos_token setting in pipeline (#15185) 2026-03-31 16:46:30 -07:00
runner.go MLX: add header vendoring and remove go build tag (#14642) 2026-03-09 17:24:45 -07:00
server.go mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884) 2026-03-17 11:21:38 -07:00
utf8_buffer.go consolidate the tokenizer (#14327) 2026-02-19 15:55:45 -08:00
utf8_buffer_test.go consolidate the tokenizer (#14327) 2026-02-19 15:55:45 -08:00