Apple

Apple M2 16GB

Practical deployment guidance for this machine, using conservative fit estimates instead of marketing-style claims.

Memory
16 GB unified/RAM
System RAM
16 GB
Bandwidth
100 GB/s
Preferred backend
MLX

Hardware notes

  • Unified memory is shared, so usable model memory is lower than the advertised total.
  • Great for 7B and 8B Q4-style local experiments.

Comfortable fits

Llama 3.1 8B Instruct

comfortable

Q4 at 8k context is estimated at 6.69 GB total and 6-11 tok/s.

Borderline fits

Phi-4

yes

This should fit, but headroom is limited. Keep background concurrency low and avoid stretching context further without re-checking the estimate.

Qwen2.5 Coder 14B Instruct

barely

This is a tight fit. Expect CPU offload or memory pressure of roughly 0 GB, and consider a lighter quant or shorter context.

Backend support

These are the runtimes currently associated with this hardware profile.

MLXOllamallama.cpp

Use the live calculator

The calculator lets you change context length, runtime, quantization, and concurrency instead of relying on a fixed profile.

Open calculator