Kubric: a scalable dataset generator by Google Research

A data generation pipeline for creating semi-realistic synthetic multi-object videos for ML tasks using PyBullet and Blender.

Released in: Kubric: A scalable dataset generator

Source: Kubric: A scalable dataset generator

Contributor:

Summary

Author states that the amount and quality of training data is often more important for the performance of a system than architecture and training details. Collecting, processing and annotating real data at scale is difficult, expensive, and raises additional concerns. Synthetic Data is a powerful tool with the potential to overcome these shortcomings. Unfortunately, software tools for effective data generation are less mature than those for architecture design and training, which leads to fragmented generation efforts.

To address these problems authors introduce Kubric, an open-source Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines, and generating TBs of data. They also demonstrate the effectiveness of Kubric by presenting a series of 13 different generated datasets for tasks ranging from studying 3D NeRF models to optical flow estimation.

Kubric, the used assets, all of the generation code, as well as the rendered datasets for reuse and modification have been released by the authors.

2022

Year Released

Key Links & Stats

google-research/kubric

Kubric: A scalable dataset generator

Apache License 2.0

Kubric: A scalable dataset generator

@article{greff2021kubric, title = {Kubric: a scalable dataset generator}, author = {Klaus Greff and Francois Belletti and Lucas Beyer and Carl Doersch and Yilun Du and Daniel Duckworth and David J Fleet and Dan Gnanapragasam and Florian Golemo and Charles Herrmann and Thomas Kipf and Abhijit Kundu and Dmitry Lagun and Issam Laradji and Hsueh-Ti (Derek) Liu and Henning Meyer and Yishu Miao and Derek Nowrouzezahrai and Cengiz Oztireli and Etienne Pot and Noha Radwan and Daniel Rebain and Sara Sabour and Mehdi S. M. Sajjadi and Matan Sela and Vincent Sitzmann and Austin Stone and Deqing Sun and Suhani Vora and Ziyu Wang and Tianhao Wu and Kwang Moo Yi and Fangcheng Zhong and Andrea Tagliasacchi}, booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2022}, }

ML Tasks

  1. Depth Estimation
  2. Image Generation
  3. Instance Segmentation

ML Platform

  1. Not Applicable

Modalities

  1. Video
  2. RGB-D
  3. 3D Asset

Verticals

  1. Synthetic Media & Art

CG Platform

  1. Blender

Related organizations

Google Research

University of Toronto

McGill University, Mila, MIT, DeepMind, UBC, University of Cambridge, ServiceNow, Haiper, Simon Faser University