Innovative domain-adaptive method enables 3D face reconstruction from single depth images

Reconstructing a 3D face from visuals is crucial for digital face modeling and manipulation. Traditional methods predominantly depend on RGB images, which are susceptible to lighting variations and offer only 2D information. In contrast, depth images, resistant to lighting changes, directly capture 3D data, offering a potential solution for robust reconstructions. Recent studies have turned […]

Feb 29, 2024 - 18:00

Innovative domain-adaptive method enables 3D face reconstruction from single depth images

Reconstructing a 3D face from visuals is crucial for digital face modeling and manipulation. Traditional methods predominantly depend on RGB images, which are susceptible to lighting variations and offer only 2D information. In contrast, depth images, resistant to lighting changes, directly capture 3D data, offering a potential solution for robust reconstructions. Recent studies have turned to deep learning for more robust reconstruction from depth data; however, the scarcity of real depth images with accurate 3D facial labels has hindered the training process. Attempts to use auto-synthesized data for training have met limitations in generalizing to real-world scenarios due to domain disparities.

The main pipeline of the proposed 3D face reconstruction method.

Credit: Xiaoxu CAI, Jianwen LOU, Jiajun BU, Junyu DONG, Haishuai WANG, Hui YU

In their efforts to address these challenges, a research team, led by Xiaoxu Cai, unveiled their latest findings on 15 Feb 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature. Their research introduces a novel domain-adaptive reconstruction method, utilizing deep learning alongside a fusion of auto-labeled synthetic and unlabeled real data. This approach facilitates the reconstruction of 3D faces from individual depth images captured in the real world. Their method implements domain-adaptive neural networks dedicated to predicting head pose and facial shape, respectively. Each network is trained using specific strategies tailored to its component. The head pose network is trained using a straightforward fine-tuning method, whereas a more robust adversarial domain adaptation approach is applied to train the facial shape network. The initial step of preprocessing involves converting pixel values from the depth image into 3D point coordinates within the camera space. This process allows the utilization of 2D convolutions in the reconstruction network for processing 3D geometric information. The network output employs 3D vertex offsets, establishing a more focused target distribution to facilitate the learning process.

The method is thoroughly evaluated on challenging real-world datasets, demonstrating its competitive performance compared to state-of-the-art techniques.

DOI: 10.1007/s11704-023-3541-7