Show simple item record

dc.contributor.authorOtto, Christopheren_US
dc.contributor.authorNaruniec, Jaceken_US
dc.contributor.authorHelminger, Leonharden_US
dc.contributor.authorEtterlin, Thomasen_US
dc.contributor.authorMignone, Grazianaen_US
dc.contributor.authorChandran, Prashanthen_US
dc.contributor.authorZoss, Gasparden_US
dc.contributor.authorSchroers, Christopheren_US
dc.contributor.authorGross, Markusen_US
dc.contributor.authorGotardo, Pauloen_US
dc.contributor.authorBradley, Dereken_US
dc.contributor.authorWeber, Romannen_US
dc.contributor.editorUmetani, Nobuyukien_US
dc.contributor.editorWojtan, Chrisen_US
dc.contributor.editorVouga, Etienneen_US
dc.date.accessioned2022-10-04T06:42:04Z
dc.date.available2022-10-04T06:42:04Z
dc.date.issued2022
dc.identifier.issn1467-8659
dc.identifier.urihttps://doi.org/10.1111/cgf.14705
dc.identifier.urihttps://diglib.eg.org:443/handle/10.1111/cgf14705
dc.description.abstractFace swapping is the process of applying a source actor's appearance to a target actor's performance in a video. This is a challenging visual effect that has seen increasing demand in film and television production. Recent work has shown that datadriven methods based on deep learning can produce compelling effects at production quality in a fraction of the time required for a traditional 3D pipeline. However, the dominant approach operates only on 2D imagery without reference to the underlying facial geometry or texture, resulting in poor generalization under novel viewpoints and little artistic control. Methods that do incorporate geometry rely on pre-learned facial priors that do not adapt well to particular geometric features of the source and target faces. We approach the problem of face swapping from the perspective of learning simultaneous convolutional facial autoencoders for the source and target identities, using a shared encoder network with identity-specific decoders. The key novelty in our approach is that each decoder first lifts the latent code into a 3D representation, comprising a dynamic face texture and a deformable 3D face shape, before projecting this 3D face back onto the input image using a differentiable renderer. The coupled autoencoders are trained only on videos of the source and target identities, without requiring 3D supervision. By leveraging the learned 3D geometry and texture, our method achieves face swapping with higher quality than when using offthe- shelf monocular 3D face reconstruction, and overall lower FID score than state-of-the-art 2D methods. Furthermore, our 3D representation allows for efficient artistic control over the result, which can be hard to achieve with existing 2D approaches.en_US
dc.publisherThe Eurographics Association and John Wiley & Sons Ltd.en_US
dc.subjectCCS Concepts: Computing methodologies → Image manipulation; Rendering; Neural Networks
dc.subjectComputing methodologies → Image manipulation
dc.subjectRendering
dc.subjectNeural Networks
dc.titleLearning Dynamic 3D Geometry and Texture for Video Face Swappingen_US
dc.description.seriesinformationComputer Graphics Forum
dc.description.sectionheadersDigital Human
dc.description.volume41
dc.description.number7
dc.identifier.doi10.1111/cgf.14705
dc.identifier.pages611-622
dc.identifier.pages12 pages


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

  • 41-Issue 7
    Pacific Graphics 2022 - Symposium Proceedings

Show simple item record