Generic x86

CPU Workstation 64GB RAM

Practical deployment guidance for this machine, using conservative fit estimates instead of marketing-style claims.

Hardware notes

Useful for experiments and low-concurrency workloads.
CPU-only inference is usually much slower than GPU or unified-memory deployments.

Llama 3.1 8B Instruct

comfortable

Q4 at 8k context is estimated at 6.69 GB total and 1-3 tok/s.

Phi-4

comfortable

Q4 at 8k context is estimated at 10.92 GB total and 1-2 tok/s.

Qwen2.5 Coder 14B Instruct

comfortable

Q4 at 8k context is estimated at 11.33 GB total and 1-2 tok/s.

Gemma 3 27B IT

comfortable

Q4 at 8k context is estimated at 20.6 GB total and 1-2 tok/s.

Llama 3.1 70B Instruct

barely

This is technically possible, but expect slow inference and little safety margin.

These are the runtimes currently associated with this hardware profile.

llama.cppOllama

The calculator lets you change context length, runtime, quantization, and concurrency instead of relying on a fixed profile.