A Novel Double-Tail Generative Adversarial Network for Fast Photo Animation

Gang LIU; Xin CHEN; Zhixiang GAO

doi:10.1587/transinf.2023EDP7061

IEICE TRANSACTIONS on Information

A Novel Double-Tail Generative Adversarial Network for Fast Photo Animation

Gang LIU, Xin CHEN, Zhixiang GAO

Full Text Views

0

Cite this

Summary :

Photo animation is to transform photos of real-world scenes into anime style images, which is a challenging task in AIGC (AI Generated Content). Although previous methods have achieved promising results, they often introduce noticeable artifacts or distortions. In this paper, we propose a novel double-tail generative adversarial network (DTGAN) for fast photo animation. DTGAN is the third version of the AnimeGAN series. Therefore, DTGAN is also called AnimeGANv3. The generator of DTGAN has two output tails, a support tail for outputting coarse-grained anime style images and a main tail for refining coarse-grained anime style images. In DTGAN, we propose a novel learnable normalization technique, termed as linearly adaptive denormalization (LADE), to prevent artifacts in the generated images. In order to improve the visual quality of the generated anime style images, two novel loss functions suitable for photo animation are proposed: 1) the region smoothing loss function, which is used to weaken the texture details of the generated images to achieve anime effects with abstract details; 2) the fine-grained revision loss function, which is used to eliminate artifacts and noise in the generated anime style image while preserving clear edges. Furthermore, the generator of DTGAN is a lightweight generator framework with only 1.02 million parameters in the inference phase. The proposed DTGAN can be easily end-to-end trained with unpaired training data. Extensive experiments have been conducted to qualitatively and quantitatively demonstrate that our method can produce high-quality anime style images from real-world photos and perform better than the state-of-the-art models.

Publication: IEICE TRANSACTIONS on Information Vol.E107-D No.1 pp.72-82

Publication Date: 2024/01/01

Publicized: 2023/09/28

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2023EDP7061

Type of Manuscript: PAPER

Category: Artificial Intelligence, Data Mining

Authors

Gang LIU
  Hubei University of Technology
Xin CHEN
  Wuhan TianYu Information Industry CO., LTD.
Zhixiang GAO
  Wuhan College

Keyword

AIGC, generative adversarial networks, photo animation, linearly adaptive denormalization, double-tail

Cite this

Copy

Gang LIU, Xin CHEN, Zhixiang GAO, "A Novel Double-Tail Generative Adversarial Network for Fast Photo Animation" in IEICE TRANSACTIONS on Information, vol. E107-D, no. 1, pp. 72-82, January 2024, doi: 10.1587/transinf.2023EDP7061.
Abstract: Photo animation is to transform photos of real-world scenes into anime style images, which is a challenging task in AIGC (AI Generated Content). Although previous methods have achieved promising results, they often introduce noticeable artifacts or distortions. In this paper, we propose a novel double-tail generative adversarial network (DTGAN) for fast photo animation. DTGAN is the third version of the AnimeGAN series. Therefore, DTGAN is also called AnimeGANv3. The generator of DTGAN has two output tails, a support tail for outputting coarse-grained anime style images and a main tail for refining coarse-grained anime style images. In DTGAN, we propose a novel learnable normalization technique, termed as linearly adaptive denormalization (LADE), to prevent artifacts in the generated images. In order to improve the visual quality of the generated anime style images, two novel loss functions suitable for photo animation are proposed: 1) the region smoothing loss function, which is used to weaken the texture details of the generated images to achieve anime effects with abstract details; 2) the fine-grained revision loss function, which is used to eliminate artifacts and noise in the generated anime style image while preserving clear edges. Furthermore, the generator of DTGAN is a lightweight generator framework with only 1.02 million parameters in the inference phase. The proposed DTGAN can be easily end-to-end trained with unpaired training data. Extensive experiments have been conducted to qualitatively and quantitatively demonstrate that our method can produce high-quality anime style images from real-world photos and perform better than the state-of-the-art models.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023EDP7061/_p

Copy

@ARTICLE{e107-d_1_72,
author={Gang LIU, Xin CHEN, Zhixiang GAO, },
journal={IEICE TRANSACTIONS on Information},
title={A Novel Double-Tail Generative Adversarial Network for Fast Photo Animation},
year={2024},
volume={E107-D},
number={1},
pages={72-82},
abstract={Photo animation is to transform photos of real-world scenes into anime style images, which is a challenging task in AIGC (AI Generated Content). Although previous methods have achieved promising results, they often introduce noticeable artifacts or distortions. In this paper, we propose a novel double-tail generative adversarial network (DTGAN) for fast photo animation. DTGAN is the third version of the AnimeGAN series. Therefore, DTGAN is also called AnimeGANv3. The generator of DTGAN has two output tails, a support tail for outputting coarse-grained anime style images and a main tail for refining coarse-grained anime style images. In DTGAN, we propose a novel learnable normalization technique, termed as linearly adaptive denormalization (LADE), to prevent artifacts in the generated images. In order to improve the visual quality of the generated anime style images, two novel loss functions suitable for photo animation are proposed: 1) the region smoothing loss function, which is used to weaken the texture details of the generated images to achieve anime effects with abstract details; 2) the fine-grained revision loss function, which is used to eliminate artifacts and noise in the generated anime style image while preserving clear edges. Furthermore, the generator of DTGAN is a lightweight generator framework with only 1.02 million parameters in the inference phase. The proposed DTGAN can be easily end-to-end trained with unpaired training data. Extensive experiments have been conducted to qualitatively and quantitatively demonstrate that our method can produce high-quality anime style images from real-world photos and perform better than the state-of-the-art models.},
keywords={},
doi={10.1587/transinf.2023EDP7061},
ISSN={1745-1361},
month={January},}

Copy

TY - JOUR
TI - A Novel Double-Tail Generative Adversarial Network for Fast Photo Animation
T2 - IEICE TRANSACTIONS on Information
SP - 72
EP - 82
AU - Gang LIU
AU - Xin CHEN
AU - Zhixiang GAO
PY - 2024
DO - 10.1587/transinf.2023EDP7061
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2024
AB - Photo animation is to transform photos of real-world scenes into anime style images, which is a challenging task in AIGC (AI Generated Content). Although previous methods have achieved promising results, they often introduce noticeable artifacts or distortions. In this paper, we propose a novel double-tail generative adversarial network (DTGAN) for fast photo animation. DTGAN is the third version of the AnimeGAN series. Therefore, DTGAN is also called AnimeGANv3. The generator of DTGAN has two output tails, a support tail for outputting coarse-grained anime style images and a main tail for refining coarse-grained anime style images. In DTGAN, we propose a novel learnable normalization technique, termed as linearly adaptive denormalization (LADE), to prevent artifacts in the generated images. In order to improve the visual quality of the generated anime style images, two novel loss functions suitable for photo animation are proposed: 1) the region smoothing loss function, which is used to weaken the texture details of the generated images to achieve anime effects with abstract details; 2) the fine-grained revision loss function, which is used to eliminate artifacts and noise in the generated anime style image while preserving clear edges. Furthermore, the generator of DTGAN is a lightweight generator framework with only 1.02 million parameters in the inference phase. The proposed DTGAN can be easily end-to-end trained with unpaired training data. Extensive experiments have been conducted to qualitatively and quantitatively demonstrate that our method can produce high-quality anime style images from real-world photos and perform better than the state-of-the-art models.
ER -

IEICE TRANSACTIONS on Information