The Face Synthetics Dataset by Microsoft

A dataset of 100k synthetic faces with 2D landmark and per-pixel segmentation labels from a procedurally-generated parametric 3D face model with a comprehensive library of hand-crafted assets

Released in: Fake It Till You Make It: Face analysis in the wild using synthetic data alone

Source: Fake It Till You Make It

Contributor:

Summary

Author demonstrates that it is possible to perform face-related computer vision in the wild using synthetic data alone. Although the community has long enjoyed the benefits of synthesizing training data with graphics, the domain gap between real and synthetic data has remained a problem, especially for human faces.

Researchers have tried to bridge this gap with data mixing, domain adaptation, and domain-adversarial training, but we show that it is possible to synthesize data with minimal domain gap, so that models trained on synthetic data generalize to real in-the-wild datasets.
Author describes how to combine a procedurally-generated parametric 3D face model with a comprehensive library of hand-crafted assets to render training images with unprecedented realism and diversity. They train machine learning systems for face-related tasks such as landmark localization and face parsing, showing that synthetic data can both match real data in accuracy as well as open up new approaches where manual labeling would be impossible.

Author has released a dataset of 100,000 synthetic faces with 2D landmark and per-pixel segmentation labels is available for non-commercial research purposes.

100k

Images in dataset

2021

Year Released

Key Links & Stats

microsoft/FaceSynthetics

Fake It Till You Make It

Research Only License

@misc{wood2021fake, title={Fake It Till You Make It: Face analysis in the wild using synthetic data alone}, author={Erroll Wood and Tadas Baltru\v{s}aitis and Charlie Hewitt and Sebastian Dziadzio and Matthew Johnson and Virginia Estellers and Thomas J. Cashman and Jamie Shotton}, year={2021}, eprint={2109.15102}, archivePrefix={arXiv}, primaryClass={cs.CV} }

scenebox

Modalities

  1. Still Image

Verticals

  1. Facial

ML Task

  1. Semantic Segmentation
  2. Instance Segmentation
  3. Human Pose Estimation
  4. Facial Modeling

Related organizations

Microsoft