The Bleeding Edge

// Article · May 9, 2026

What it actually costs to build a local LLM workstation in 2026

The RTX 5090, the gotchas, and the math against $300/month in cloud subscriptions

from 2026-W19hardwarelocal-airtx-5090practitioner-craft

The question that keeps coming up: "Could I just run my own LLM at home instead of paying $200/month for ChatGPT Pro and another $100/month for Claude Max?" The honest answer is yes, you can — and it's gone from "specialist hobbyist" to "reasonable mid-range PC build" this year. But the math only works for two specific kinds of buyer, and the gotchas are real.

The Card That Matters: NVIDIA RTX 5090

32GB GDDR7 VRAM. MSRP $1,999, but actual street pricing has been $2,500–$3,800 most of the year, with custom AIB models hitting $4,500–$4,800 on Newegg/Amazon as of April. The 32GB VRAM is the entire game — every consumer card below this (5080 at 16GB, 5070 at 12GB, 5060 at 8GB) is a meaningfully worse fit for local LLMs because it can't hold a useful model.

What 32GB Actually Buys You

  • Comfortable: Models up to 30 billion parameters in FP16, no quantisation tricks needed.
  • Workable with quantisation (Q4): Models up to about 70B — Llama 3.3 70B runs at 15–20 tokens/second.
  • Out of reach without multiple GPUs: Claude Opus 4.7-class or DeepSeek V4-class frontier models. Llama 3.3 70B in full FP16 needs ~140GB.
  • Speed reference: ~213 tokens/sec on 8B models, ~61 t/s on 32B, ~$0.06 per million tokens at home if you amortize the build.

Total Build Cost — May 2026

Component Range Notes
RTX 5090 (32GB) $2,000–$4,800 The single biggest line item
AMD Ryzen 9 9950X / Threadripper or Intel Core Ultra 9 $700–$1,500 More cores help for fine-tuning, less so for inference
64–128GB DDR5 system RAM $300–$700 64GB is fine for inference; 128GB if you fine-tune
2TB NVMe Gen5 SSD $200–$400 Models are big — buy storage
1200W+ Platinum PSU $250–$400 The 5090 alone draws 575W under load
Mid/full tower with airflow $150–$300 Heat is the real constraint
Motherboard (X870E or Z890) $400–$700 Need PCIe 5.0 + memory headroom
Total (single 5090) $5,000–$8,000
Multi-GPU 70B-capable build $6,000–$10,000 Adds a second card or workstation chassis

Cost vs. The Cloud

A single H100 in the cloud is $25,000–$40,000 retail. The 5090 delivers roughly 60–80% of H100 performance for 2.5% of the price. If you do more than 3–4 hours of GPU-bound work per day, the workstation pays for itself within months. Cloud GPU rental for the same kind of workload runs $15,000–$50,000/year for what a 70B-capable home build can handle.

The Gotchas You Only Learn After You've Spent the Money

  • Power. 575W from the GPU alone means a 1200W PSU is the floor, and a 15A outlet starts to look tight if you're running this 24/7 with the rest of your setup. Some pro builds run 240V dedicated circuits.
  • Heat. The 5090's stock cooler is good; your case airflow probably isn't. Plan for two intake fans + one rear exhaust minimum.
  • VRAM is the wall. 70B models in FP16 don't fit, and Q4 quantisation pushes 35–40GB — over the 32GB limit. Buying for 70B means buying a second card.
  • Driver and tooling friction. Ollama and LM Studio are smooth; serious work in vLLM, SGLang, or fine-tuning still requires comfort with CUDA versions and Python environments.
  • Resale risk. RTX 50-series prices are inflated by AI demand. If a 6090 or a competitive AMD card lands cheaper-per-VRAM-GB next year, your $4,500 5090 is suddenly a $2,500 5090 on eBay.

The Honest Take

The "build instead of subscribe" math works for two specific people:

  1. Developers running tokens through APIs all day for production work, where the cloud bill has crossed $300/month.
  2. Privacy-sensitive users whose work absolutely cannot leave the house — therapists, lawyers, journalists with sources, healthcare.

For everyone else, $200/month for ChatGPT Pro + $100/month for Claude Max is still the better deal in 2026, because the frontier-model gap (Opus 4.7 / GPT-5.5) over what fits in 32GB is still meaningful. The local box runs Llama and DeepSeek beautifully. It does not run frontier-class models, period.

Sources