Apple

Apple M3 Max 64GB

Practical deployment guidance for this machine, using conservative fit estimates instead of marketing-style claims.

Hardware notes

Llama 3.1 8B Instruct

comfortable

Q4 at 8k context is estimated at 6.69 GB total and 24-41 tok/s.

Phi-4

comfortable

Q4 at 8k context is estimated at 10.77 GB total and 13-24 tok/s.

Qwen2.5 Coder 14B Instruct

comfortable

Q4 at 8k context is estimated at 11.12 GB total and 13-22 tok/s.

Gemma 3 27B IT

comfortable

Q4 at 8k context is estimated at 19.96 GB total and 7-12 tok/s.

Llama 3.1 70B Instruct

barely

This is a tight fit. Expect CPU offload or memory pressure of roughly 1.98 GB, and consider a lighter quant or shorter context.

These are the runtimes currently associated with this hardware profile.

MLXOllamallama.cpp

The calculator lets you change context length, runtime, quantization, and concurrency instead of relying on a fixed profile.