Apple

Apple M2 16GB

Practical deployment guidance for this machine, using conservative fit estimates instead of marketing-style claims.

Hardware notes

Unified memory is shared, so usable model memory is lower than the advertised total.
Great for 7B and 8B Q4-style local experiments.

Llama 3.1 8B Instruct

comfortable

Q4 at 8k context is estimated at 6.69 GB total and 6-11 tok/s.

Phi-4

yes

This should fit, but headroom is limited. Keep background concurrency low and avoid stretching context further without re-checking the estimate.

Qwen2.5 Coder 14B Instruct

barely

This is a tight fit. Expect CPU offload or memory pressure of roughly 0 GB, and consider a lighter quant or shorter context.

These are the runtimes currently associated with this hardware profile.

MLXOllamallama.cpp

The calculator lets you change context length, runtime, quantization, and concurrency instead of relying on a fixed profile.