Author aims to reconstruct hand-held objects given a single RGB image. In contrast to prior works that typically assume known 3D templates and reduce the problem to 3D pose estimation, their approach is to reconstruct generic hand-held object without knowing their 3D templates. Key insight is that hand articulation is highly predictive of the object shape, and hence propose an approach that conditionally reconstructs the object based on the articulation and the visual input.
Given an image depicting a hand-held object, first use off-the-shelf systems to estimate the underlying hand pose and then infer the object shape in a normalized hand-centric coordinate frame. Author parameterized the object by signed distance which are inferred by an implicit network which leverages the information from both visual feature and articulation-aware coordinates to process a query point. Performs experiments across three datasets and shows that their method consistently outperforms baselines and is able to reconstruct a diverse set of objects. Author also analyzes the benefits and robustness of explicit articulation conditioning and also shows that this allows the hand pose estimation to further improve in test-time optimization.
2022
Year Released
Key Links & Stats
JudyYe / ihoi
What's in your hands? 3D Reconstruction of Generic Objects in Hands
What's in your hands? 3D Reconstruction of Generic Objects in Hands
@inproceedings{ye2022hand,
author = {Ye, Yufei
and Gupta, Abhinav
and Tulsiani, Shubham},
title = {What's in your hands? 3D Reconstruction of Generic Objects in Hands},
booktitle = {CVPR},
year = {2022}
}