The Open Community for the Creation and Use of Synthetic Data in AI

A hub for synthetic datasets, papers, code, and people pioneering their use in machine learning.

Featured Synthetic Datasets

Browse published synthetic datasets

SURREAL (Synthetic hUmans foR REAL tasks) is a large-scale person dataset that generates photorealistic synthetic images with labeling for human part segmentation and depth estimation, producing 6.5M frames in 67.5K short clips (about 100 frames each) of 2.6K action sequences with 145 different synthetic subjects. To ensure realism, the synthetic bodies are created using the SMPL body model, whose parameters are fit by the MoSh method given raw 3D MoCap marker data.Read more

We present a new dataset, called Falling Things (FAT), for advancing the state-of-the-art in object detection and 3D pose estimation in the context of robotics. By synthetically combining object models and backgrounds of complex composition and high graphical quality, we are able to generate photorealistic images with accurate 3D pose annotations for all objects in all images. Our dataset contains 60k annotated photos of 21 household objects taken from the YCB dataset. For each image, we provide the 3D poses, per-pixel class segmentation, and 2D/3D bounding box coordinates for all objects. To facilitate testing different input modalities, we provide mono and stereo RGB images, along with registered dense depth images. We describe in detail the generation process and statistical analysis of the data.Read more

For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. We address this challenge with Hypersim, a photorealistic synthetic dataset for holistic indoor scene understanding.Read more

Featured Papers & Code

Browse papers and/or code about data generation and training techniques

In this paper, we propose a new assumption, generalized label shift (GLS), to improve robustness against mismatched label distributions

A survey of domain adaptation with ~80 references

Unsupervised domain adaptation with gradient reversal