For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. We address this challenge with Hypersim, a photorealistic synthetic dataset for holistic indoor scene understanding.