With the surge in emerging technologies such as Metaverse, spatial computing, and generative AI, the application of facial style transfer has gained much interest from researchers and startups enthusiasts alike. StyleGAN methods have paved the way for transfer-learning strategies that could reduce the dependency on the vast data available for the training process. However, StyleGAN methods tend to need to be more balanced, resulting in the introduction of artifacts in the facial images. Studies such as DualStyleGAN proposed multipath networks but required the networks to be trained for a specific style rather than simultaneously generating a fusion of facial styles. In this paper, we propose a Fusion of STyles (FIST) network for facial images that leverages pretrained multipath style transfer networks to eliminate the problem associated with the lack of enormous data volume in the training phase and the fusion of multiple styles at the output. We leverage pretrained styleGAN networks with an external style pass that uses a residual modulation block instead of a transform coding block. The method also preserves facial structure, identity, and details via the gated mapping unit introduced in this study. The aforementioned components enable us to train the network with minimal data while generating high-quality stylized images, opening up new possibilities for facial style transfer in emerging technologies. Our training process adapts curriculum learning strategy to perform efficient, flexible style, and model fusion in the generative space. We perform extensive experiments to show the superiority of the proposed FISTNet compared to existing state-of-the-art methods.
FISTNet: FusIon of STyle-path generative Networks for facial style transfer
Fortino, Giancarlo;
2024-01-01
Abstract
With the surge in emerging technologies such as Metaverse, spatial computing, and generative AI, the application of facial style transfer has gained much interest from researchers and startups enthusiasts alike. StyleGAN methods have paved the way for transfer-learning strategies that could reduce the dependency on the vast data available for the training process. However, StyleGAN methods tend to need to be more balanced, resulting in the introduction of artifacts in the facial images. Studies such as DualStyleGAN proposed multipath networks but required the networks to be trained for a specific style rather than simultaneously generating a fusion of facial styles. In this paper, we propose a Fusion of STyles (FIST) network for facial images that leverages pretrained multipath style transfer networks to eliminate the problem associated with the lack of enormous data volume in the training phase and the fusion of multiple styles at the output. We leverage pretrained styleGAN networks with an external style pass that uses a residual modulation block instead of a transform coding block. The method also preserves facial structure, identity, and details via the gated mapping unit introduced in this study. The aforementioned components enable us to train the network with minimal data while generating high-quality stylized images, opening up new possibilities for facial style transfer in emerging technologies. Our training process adapts curriculum learning strategy to perform efficient, flexible style, and model fusion in the generative space. We perform extensive experiments to show the superiority of the proposed FISTNet compared to existing state-of-the-art methods.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.