Generic x86

CPU Workstation 64GB RAM

Practical deployment guidance for this machine, using conservative fit estimates instead of marketing-style claims.

Memory
64 GB unified/RAM
System RAM
64 GB
Bandwidth
70 GB/s
Preferred backend
Ollama

Hardware notes

  • Useful for experiments and low-concurrency workloads.
  • CPU-only inference is usually much slower than GPU or unified-memory deployments.

Comfortable fits

Llama 3.1 8B Instruct

comfortable

Q4 at 8k context is estimated at 6.69 GB total and 1-3 tok/s.

Phi-4

comfortable

Q4 at 8k context is estimated at 10.92 GB total and 1-2 tok/s.

Qwen2.5 Coder 14B Instruct

comfortable

Q4 at 8k context is estimated at 11.33 GB total and 1-2 tok/s.

Gemma 3 27B IT

comfortable

Q4 at 8k context is estimated at 20.6 GB total and 1-2 tok/s.

Borderline fits

Llama 3.1 70B Instruct

barely

This is technically possible, but expect slow inference and little safety margin.

Backend support

These are the runtimes currently associated with this hardware profile.

llama.cppOllama

Use the live calculator

The calculator lets you change context length, runtime, quantization, and concurrency instead of relying on a fixed profile.

Open calculator