mirror of
https://github.com/ollama/ollama
synced 2026-04-23 08:45:14 +00:00
* prefer rocm v6 on windows Avoid building with v7 - more changes are needed * MLX: add header vendoring and remove go build tag This switches to using a vendoring approach for the mlx-c headers so that Go can build without requiring a cmake first. This enables building the new MLX based code by default. Every time cmake runs, the headers are refreshed, so we can easily keep them in sync when we bump mlx versions. Basic Windows and Linux support are verified. * ci: harden for flaky choco repo servers CI sometimes fails due to choco not actually installing cache. Since it just speeds up the build, we can proceed without. * review comments |
||
|---|---|---|
| .. | ||
| CMakeLists.txt | ||
| compile.go | ||
| doc.go | ||
| generate_wrappers.go | ||
| mlx.c | ||
| mlx.go | ||
| mlx.h | ||
| mlx_dynamic.c | ||
| mlx_dynamic.h | ||
| mlx_test.go | ||
| README.md | ||
MLX Memory Management
| This package will get consolidated with x/ml/backend/mlx in the future.
Automatic Tracking
All arrays are automatically tracked when created. On Eval(), non-kept arrays are freed.
API
result := mlx.Matmul(x, w) // arrays automatically tracked
mlx.Eval(result) // free non-kept, eval result (auto-kept)
Key Functions
mlx.Eval(outputs...)- free non-kept arrays, then evaluate (outputs auto-kept)mlx.AsyncEval(outputs...)- async version of Eval (outputs auto-kept)mlx.Keep(arrays...)- mark arrays to survive cleanup (for weights, caches)array.Free()- mark array for cleanup on next Eval
Loop Pattern
for step := 0; step < maxTokens; step++ {
logits := model.Forward(token, caches)
oldToken := token
token = sample(logits)
// Keep cache state across iterations
for _, c := range caches {
mlx.Keep(c.State()...)
}
oldToken.Free() // mark for cleanup
mlx.AsyncEval(token) // frees old, evals new
}
Notes
Eval()andAsyncEval()auto-keep their outputsFree()marks for cleanup - actual free happens during next Eval- Use
Keep()for weights and cache state that must survive multiple Eval cycles - Arrays created inside compiled closures are managed by MLX, not tracked