Inference 1 Predict Peak VRAM Before Downloading a Model (Weights + KV Cache + Quantization) Jan 26, 2026