Speech-aided facial video super resolution with accurate lip motion and enhanced frequency details
ORCID
- Vivek Singh: 0000-0003-1728-1198
Abstract
Despite recent breakthroughs in face hallucination, video face hallucination remains a challenging task due to the issue of consistency across video frames. The temporal dimension in videos makes it difficult to learn facial motion and maintain color uniformity throughout the sequence. To address these challenges, we propose a novel audio-visual cross-modality support based video face hallucination network. The framework excels in learning fine spatiotemporal motion patterns by leveraging the correlation between movement of the facial structure and associated speech signal. Another significant challenge generic to face hallucination is blurriness around the key facial regions, such as mouth and lips. These areas show higher spatial displacement rendering their recovery in low-resolution images particularly difficult. The proposed approach explicitly defines a lip reading loss to learn the fine-grain motion in these facial regions. Further, during training, GANs show a higher potential to overfit to small frequency bands, which results in missing hard-to-synthesize frequencies. As a remedy, we introduce a frequency based loss function compelling the model to grasp salient frequency features. Visual and quantitative comparisons with state-of-the-art demonstrate significant improvements in visual results as well as higher coherence in the generated outputs across successive frames.
Publication Date
2025-01-01
Publication Title
Machine Vision and Applications
Volume
36
Issue
4
ISSN
0932-8092
Embargo Period
2026-05-09
Keywords
Cross-modality, Face hallucination, Fourier transform, Generative adversarial networks, Speech recognition
Recommended Citation
Sharma, S., Singh, V., Dhall, A., & Kumar, V. (2025) 'Speech-aided facial video super resolution with accurate lip motion and enhanced frequency details', Machine Vision and Applications, 36(4). Available at: 10.1007/s00138-025-01699-4
This item is under embargo until 09 May 2026