The image-to-image translation aims to learn a mapping between the source and target domains. For improving visual quality, the majority of previous works adopt multi-stage techniques to refine coarse results in a progressive manner. In this work, we present a novel approach for generating plausible details by only introducing a group of intermediate supervisions without cascading multiple stages. Specifically, we propose a Laplacian Pyramid Transformation Generative Adversarial Network (LapTransGAN) to simultaneously transform components in different frequencies from the source domain to the target domain within only one stage. Hierarchical perceptual and gradient penalization are utilized for learning consistent semantic structures and details at each pyramid level. The proposed model is evaluated based on various metrics, including the similarity in feature maps, reconstruction quality, segmentation accuracy, similarity in details, and qualitative appearances. Our experiments show that LapTransGAN can achieve a much better quantitative performance than both the supervised pix2pix model and the unsupervised CycleGAN model. Comprehensive ablation experiments are conducted to study the contribution of each component.
Jianbo WANG
The University of Tokyo
Haozhi HUANG
Tencent AI Lab
Li SHEN
Tencent AI Lab
Xuan WANG
Tencent AI Lab
Toshihiko YAMASAKI
The University of Tokyo
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Jianbo WANG, Haozhi HUANG, Li SHEN, Xuan WANG, Toshihiko YAMASAKI, "Hierarchical Detailed Intermediate Supervision for Image-to-Image Translation" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 12, pp. 2085-2096, December 2023, doi: 10.1587/transinf.2023EDP7025.
Abstract: The image-to-image translation aims to learn a mapping between the source and target domains. For improving visual quality, the majority of previous works adopt multi-stage techniques to refine coarse results in a progressive manner. In this work, we present a novel approach for generating plausible details by only introducing a group of intermediate supervisions without cascading multiple stages. Specifically, we propose a Laplacian Pyramid Transformation Generative Adversarial Network (LapTransGAN) to simultaneously transform components in different frequencies from the source domain to the target domain within only one stage. Hierarchical perceptual and gradient penalization are utilized for learning consistent semantic structures and details at each pyramid level. The proposed model is evaluated based on various metrics, including the similarity in feature maps, reconstruction quality, segmentation accuracy, similarity in details, and qualitative appearances. Our experiments show that LapTransGAN can achieve a much better quantitative performance than both the supervised pix2pix model and the unsupervised CycleGAN model. Comprehensive ablation experiments are conducted to study the contribution of each component.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023EDP7025/_p
Copy
@ARTICLE{e106-d_12_2085,
author={Jianbo WANG, Haozhi HUANG, Li SHEN, Xuan WANG, Toshihiko YAMASAKI, },
journal={IEICE TRANSACTIONS on Information},
title={Hierarchical Detailed Intermediate Supervision for Image-to-Image Translation},
year={2023},
volume={E106-D},
number={12},
pages={2085-2096},
abstract={The image-to-image translation aims to learn a mapping between the source and target domains. For improving visual quality, the majority of previous works adopt multi-stage techniques to refine coarse results in a progressive manner. In this work, we present a novel approach for generating plausible details by only introducing a group of intermediate supervisions without cascading multiple stages. Specifically, we propose a Laplacian Pyramid Transformation Generative Adversarial Network (LapTransGAN) to simultaneously transform components in different frequencies from the source domain to the target domain within only one stage. Hierarchical perceptual and gradient penalization are utilized for learning consistent semantic structures and details at each pyramid level. The proposed model is evaluated based on various metrics, including the similarity in feature maps, reconstruction quality, segmentation accuracy, similarity in details, and qualitative appearances. Our experiments show that LapTransGAN can achieve a much better quantitative performance than both the supervised pix2pix model and the unsupervised CycleGAN model. Comprehensive ablation experiments are conducted to study the contribution of each component.},
keywords={},
doi={10.1587/transinf.2023EDP7025},
ISSN={1745-1361},
month={December},}
Copy
TY - JOUR
TI - Hierarchical Detailed Intermediate Supervision for Image-to-Image Translation
T2 - IEICE TRANSACTIONS on Information
SP - 2085
EP - 2096
AU - Jianbo WANG
AU - Haozhi HUANG
AU - Li SHEN
AU - Xuan WANG
AU - Toshihiko YAMASAKI
PY - 2023
DO - 10.1587/transinf.2023EDP7025
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2023
AB - The image-to-image translation aims to learn a mapping between the source and target domains. For improving visual quality, the majority of previous works adopt multi-stage techniques to refine coarse results in a progressive manner. In this work, we present a novel approach for generating plausible details by only introducing a group of intermediate supervisions without cascading multiple stages. Specifically, we propose a Laplacian Pyramid Transformation Generative Adversarial Network (LapTransGAN) to simultaneously transform components in different frequencies from the source domain to the target domain within only one stage. Hierarchical perceptual and gradient penalization are utilized for learning consistent semantic structures and details at each pyramid level. The proposed model is evaluated based on various metrics, including the similarity in feature maps, reconstruction quality, segmentation accuracy, similarity in details, and qualitative appearances. Our experiments show that LapTransGAN can achieve a much better quantitative performance than both the supervised pix2pix model and the unsupervised CycleGAN model. Comprehensive ablation experiments are conducted to study the contribution of each component.
ER -