Who needs this over an RTX 5090?
The RTX 5090 has 32 GB of VRAM. The RTX Pro 6000 has 96 GB — ECC-protected, dual-slot, NVLink-capable. For fine-tuning 70B+ models or running multi-model inference in parallel, the VRAM gap isn’t an inconvenience, it’s a hard constraint.
Inference benchmarks
RTX Pro 6000 Blackwell — LLM inference
Llama 3 70B Q4 (tok/s)
38.7 tok/s
RTX 5090 (same model)
21.3 tok/s
Llama 3 8B Q4 (tok/s)
118.4 tok/s
The cost reality
At approximately $6,000–$8,000 street price, the RTX Pro 6000 is for research labs, ML engineers, and studios — not enthusiast builds. If 32 GB is enough, the RTX 5090 at $1,599 remains the consumer inference recommendation.