The Strix Halo platform finally makes mini-PC local inference viable for 70B-class models.
Why unified memory changes everything
Discrete GPU inference requires moving data across the PCIe bus. Strix Halo’s unified memory pool — up to 128 GB shared between CPU and GPU — eliminates that bottleneck. At 96 GB, you can run Llama 3 70B fully in memory with context to spare.
LLM throughput
ASUS NUC Pro 14 — tokens/sec (Ollama)
Llama 3 8B (Q4)
68.4 tok/s
Llama 3 70B (Q4)
13.2 tok/s
Phi-3 Mini
142.1 tok/s
Verdict
If you want to run frontier-class models locally without a dual-RTX rig, the Strix Halo platform is the most pragmatic answer available in 2026.