NVIDIA

RTX 3060 12GB

Practical deployment guidance for this machine, using conservative fit estimates instead of marketing-style claims.

Hardware notes

Llama 3.1 8B Instruct

comfortable

Q4 at 8k context is estimated at 6.69 GB total and 38-65 tok/s.

Phi-4

barely

This is a tight fit. Expect CPU offload or memory pressure of roughly 0.12 GB, and consider a lighter quant or shorter context.

Qwen2.5 Coder 14B Instruct

barely

This is a tight fit. Expect CPU offload or memory pressure of roughly 0.53 GB, and consider a lighter quant or shorter context.

These are the runtimes currently associated with this hardware profile.

Ollamallama.cppLM StudioExLlama

The calculator lets you change context length, runtime, quantization, and concurrency instead of relying on a fixed profile.