Open Model Delusion Calculator

The weights are free. The cluster to serve them is not. Enter an open-weights LLM's parameters — total and active (MoE) params, precision (FP4/FP8/BF16/FP32), context length, and concurrency — and estimate the NVIDIA GPU cluster needed to self-host it: H100/H200 PCIe, B200, or GB200 NVL72 counts, node/rack topology, host CPU, RAM, storage, and an order-of-magnitude hardware cost.

Presets for current open models including Llama 3.1, Llama 4 Maverick, DeepSeek V3/V4, GLM-5.2, Kimi K2.7, MiniMax M3, Mixtral, and Gemma 4.