// Article · May 9, 2026
What it actually costs to build a local LLM workstation in 2026
The RTX 5090, the gotchas, and the math against $300/month in cloud subscriptions
The question that keeps coming up: "Could I just run my own LLM at home instead of paying $200/month for ChatGPT Pro and another $100/month for Claude Max?" The honest answer is yes, you can — and it's gone from "specialist hobbyist" to "reasonable mid-range PC build" this year. But the math only works for two specific kinds of buyer, and the gotchas are real.
The Card That Matters: NVIDIA RTX 5090
32GB GDDR7 VRAM. MSRP $1,999, but actual street pricing has been $2,500–$3,800 most of the year, with custom AIB models hitting $4,500–$4,800 on Newegg/Amazon as of April. The 32GB VRAM is the entire game — every consumer card below this (5080 at 16GB, 5070 at 12GB, 5060 at 8GB) is a meaningfully worse fit for local LLMs because it can't hold a useful model.
What 32GB Actually Buys You
- Comfortable: Models up to 30 billion parameters in FP16, no quantisation tricks needed.
- Workable with quantisation (Q4): Models up to about 70B — Llama 3.3 70B runs at 15–20 tokens/second.
- Out of reach without multiple GPUs: Claude Opus 4.7-class or DeepSeek V4-class frontier models. Llama 3.3 70B in full FP16 needs ~140GB.
- Speed reference: ~213 tokens/sec on 8B models, ~61 t/s on 32B, ~$0.06 per million tokens at home if you amortize the build.
Total Build Cost — May 2026
| Component | Range | Notes |
|---|---|---|
| RTX 5090 (32GB) | $2,000–$4,800 | The single biggest line item |
| AMD Ryzen 9 9950X / Threadripper or Intel Core Ultra 9 | $700–$1,500 | More cores help for fine-tuning, less so for inference |
| 64–128GB DDR5 system RAM | $300–$700 | 64GB is fine for inference; 128GB if you fine-tune |
| 2TB NVMe Gen5 SSD | $200–$400 | Models are big — buy storage |
| 1200W+ Platinum PSU | $250–$400 | The 5090 alone draws 575W under load |
| Mid/full tower with airflow | $150–$300 | Heat is the real constraint |
| Motherboard (X870E or Z890) | $400–$700 | Need PCIe 5.0 + memory headroom |
| Total (single 5090) | $5,000–$8,000 | |
| Multi-GPU 70B-capable build | $6,000–$10,000 | Adds a second card or workstation chassis |
Cost vs. The Cloud
A single H100 in the cloud is $25,000–$40,000 retail. The 5090 delivers roughly 60–80% of H100 performance for 2.5% of the price. If you do more than 3–4 hours of GPU-bound work per day, the workstation pays for itself within months. Cloud GPU rental for the same kind of workload runs $15,000–$50,000/year for what a 70B-capable home build can handle.
The Gotchas You Only Learn After You've Spent the Money
- Power. 575W from the GPU alone means a 1200W PSU is the floor, and a 15A outlet starts to look tight if you're running this 24/7 with the rest of your setup. Some pro builds run 240V dedicated circuits.
- Heat. The 5090's stock cooler is good; your case airflow probably isn't. Plan for two intake fans + one rear exhaust minimum.
- VRAM is the wall. 70B models in FP16 don't fit, and Q4 quantisation pushes 35–40GB — over the 32GB limit. Buying for 70B means buying a second card.
- Driver and tooling friction. Ollama and LM Studio are smooth; serious work in vLLM, SGLang, or fine-tuning still requires comfort with CUDA versions and Python environments.
- Resale risk. RTX 50-series prices are inflated by AI demand. If a 6090 or a competitive AMD card lands cheaper-per-VRAM-GB next year, your $4,500 5090 is suddenly a $2,500 5090 on eBay.
The Honest Take
The "build instead of subscribe" math works for two specific people:
- Developers running tokens through APIs all day for production work, where the cloud bill has crossed $300/month.
- Privacy-sensitive users whose work absolutely cannot leave the house — therapists, lawyers, journalists with sources, healthcare.
For everyone else, $200/month for ChatGPT Pro + $100/month for Claude Max is still the better deal in 2026, because the frontier-model gap (Opus 4.7 / GPT-5.5) over what fits in 32GB is still meaningful. The local box runs Llama and DeepSeek beautifully. It does not run frontier-class models, period.