mirror of https://github.com/ollama/ollama synced 2026-04-23 08:45:14 +00:00

History

Daniel Hiltgen 8968740836 mlx: Improve M5 performance with NAX (#15345 ) * mlx: Improve M5 performance with NAX This modifies the Mac release to now have 2 builds of MLX for broader compatibility while supporting the latest M5 hardware features. NAX requires building with xcode 26.2 and targetting support only for OS v26 and up. Since we want to support older MacOS versions as well, we now need 2 different MLX builds and runtime detection logic to select the optimal version. The newer build will detect NAX missing at runtime, so it is safe to run on pre M5 macs. * mac: prevent generate on cross-compiles For some versions of Xcode, cmake builds are failing due to header problems in cross-compiling during the generate phase. Since generate is producing arch independent generated output, we can skip this during cross-compiling.		2026-04-07 08:12:24 -07:00
..
CMakeLists.txt	mlx: Improve M5 performance with NAX (#15345 )	2026-04-07 08:12:24 -07:00
compile.go	MLX: add header vendoring and remove go build tag (#14642 )	2026-03-09 17:24:45 -07:00
doc.go	MLX: add header vendoring and remove go build tag (#14642 )	2026-03-09 17:24:45 -07:00
generate_wrappers.go	MLX: add header vendoring and remove go build tag (#14642 )	2026-03-09 17:24:45 -07:00
mlx.c	mlx: update as of 3/23 (#14789 )	2026-03-23 11:28:44 -07:00
mlx.go	mlx: update as of 3/23 (#14789 )	2026-03-23 11:28:44 -07:00
mlx.h	mlx: update as of 3/23 (#14789 )	2026-03-23 11:28:44 -07:00
mlx_dynamic.c	MLX: add header vendoring and remove go build tag (#14642 )	2026-03-09 17:24:45 -07:00
mlx_dynamic.h	MLX: add header vendoring and remove go build tag (#14642 )	2026-03-09 17:24:45 -07:00
mlx_error_handler.c	MLX: harden for init failures (#14777 )	2026-03-10 22:52:23 -07:00
mlx_error_handler.h	MLX: harden for init failures (#14777 )	2026-03-10 22:52:23 -07:00
mlx_test.go	MLX: add header vendoring and remove go build tag (#14642 )	2026-03-09 17:24:45 -07:00
README.md	Add experimental MLX backend and engine with imagegen support (#13648 )	2026-01-08 16:18:59 -08:00

README.md

MLX Memory Management

| This package will get consolidated with x/ml/backend/mlx in the future.

Automatic Tracking

All arrays are automatically tracked when created. On Eval(), non-kept arrays are freed.

API

result := mlx.Matmul(x, w) // arrays automatically tracked
mlx.Eval(result)           // free non-kept, eval result (auto-kept)

Key Functions

mlx.Eval(outputs...) - free non-kept arrays, then evaluate (outputs auto-kept)
mlx.AsyncEval(outputs...) - async version of Eval (outputs auto-kept)
mlx.Keep(arrays...) - mark arrays to survive cleanup (for weights, caches)
array.Free() - mark array for cleanup on next Eval

Loop Pattern

for step := 0; step < maxTokens; step++ {
    logits := model.Forward(token, caches)
    oldToken := token
    token = sample(logits)

    // Keep cache state across iterations
    for _, c := range caches {
        mlx.Keep(c.State()...)
    }

    oldToken.Free()       // mark for cleanup
    mlx.AsyncEval(token)  // frees old, evals new
}

Notes

Eval() and AsyncEval() auto-keep their outputs
Free() marks for cleanup - actual free happens during next Eval
Use Keep() for weights and cache state that must survive multiple Eval cycles
Arrays created inside compiled closures are managed by MLX, not tracked