Generic x86
CPU Workstation 64GB RAM
Practical deployment guidance for this machine, using conservative fit estimates instead of marketing-style claims.
- Memory
- 64 GB unified/RAM
- System RAM
- 64 GB
- Bandwidth
- 70 GB/s
- Preferred backend
- Ollama
Hardware notes
- Useful for experiments and low-concurrency workloads.
- CPU-only inference is usually much slower than GPU or unified-memory deployments.
Comfortable fits
Llama 3.1 8B Instruct
comfortableQ4 at 8k context is estimated at 6.69 GB total and 1-3 tok/s.
Phi-4
comfortableQ4 at 8k context is estimated at 10.92 GB total and 1-2 tok/s.
Qwen2.5 Coder 14B Instruct
comfortableQ4 at 8k context is estimated at 11.33 GB total and 1-2 tok/s.
Gemma 3 27B IT
comfortableQ4 at 8k context is estimated at 20.6 GB total and 1-2 tok/s.
Borderline fits
Llama 3.1 70B Instruct
barelyThis is technically possible, but expect slow inference and little safety margin.
Backend support
These are the runtimes currently associated with this hardware profile.
llama.cppOllama
Use the live calculator
The calculator lets you change context length, runtime, quantization, and concurrency instead of relying on a fixed profile.