Speech-Driven Tongue Animation

Automated tongue animations that match recorded speech

Released in: Speech-Driven Tongue Animation

Contributor:

Summary

Advances in speech driven animation techniques now allow creating convincing animations of virtual characters solely from audio data. While many approaches focus on facial and lip motion, they often do not provide realistic animation of the inner mouth. Performance or motion capture of the tongue and jaw from video alone is difficult because the inner mouth is only partially observable during speech. In this work, the authors collected a large-scale speech to tongue mocap dataset that focuses on capturing tongue, jaw, and lip motion during speech . This dataset enables research on data-driven techniques for realistic inner mouth animation. The work presents a method that leverages recent deep-learning based audio feature representations to build a robust and generalizable speech to animation pipeline. The authors find that self-supervised deep learning based audio feature encoders are robust and generalize well to unseen speakers and content. To demonstrate the practical application of our approach, they show animations on a high-quality parametric 3D face model driven by the landmarks generated from the speech-to-tongue animation method.

2022

Year Released

Key Links & Stats

SpeechDrivenTongueAnimation

@inproceedings{medina2022speechtongue, title={Speech Driven Tongue Animation}, author={Medina, Salvador and Tomé, Denis and Stoll, Carsten and Tiede, Mark and Munhall, Kevin and Hauptmann, Alex and Matthews, Iain}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2022}, organization={IEEE/CVF} }

ML Tasks

Face Animation
Facial Modeling

ML Platform

Pytorch

Modalities

Video
3D Asset

Verticals

Facial
Digital Human

CG Platform

Not Applicable

Related organizations

Carnegie Mellon University

Epic Games