1University of Science and Technology of China, 2Zhejiang University
In this work, we propose StereoPIFu, which integrates the geometric constraints of stereo vision with implicit function representation of PIFu, to recover the 3D shape of the clothed human from a pair of low-cost rectified images. First, we introduce the effective voxel-aligned features from a stereo vision-based network to enable depth-aware reconstruction. Moreover, the novel relative z-offset is employed to associate predicted high-fidelity human depth and occupancy inference, which helps restore fine-level surface details. Second, a network structure that fully utilizes the geometry information from the stereo images is designed to improve the human body reconstruction quality. Consequently, our StereoPIFu can naturally infer the human body's spatial location in camera space and maintain the correct relative position of different parts of the human body, which enables our method to capture human performance. Compared to previous works, our StereoPIFu significantly improves the robustness, completeness, and accuracy of the clothed human reconstruction, which is demonstrated by extensive experimental results.
Overview of our StereoPIFu pipeline. Given a stereo pair, for a query point P, its pixel-aligned feature, voxel-aligned features, and relative z-offset are constructed. These features encode the information about whether P is inside the underlying surface or not and are used for inferring the occupancy of P by the MLP.
@inproceedings{yang2021stereopifu,
author = {Yang Hong and Juyong Zhang and Boyi Jiang and Yudong Guo and Ligang Liu and Hujun Bao},
title = {StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision},
booktitle = {{IEEE/CVF} Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2021}
}