Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.
Yusuke HARA
The University of Tokyo
Xueting WANG
The University of Tokyo
Toshihiko YAMASAKI
The University of Tokyo
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yusuke HARA, Xueting WANG, Toshihiko YAMASAKI, "Video Inpainting by Frame Alignment with Deformable Convolution" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 8, pp. 1349-1358, August 2021, doi: 10.1587/transinf.2020EDP7194.
Abstract: Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020EDP7194/_p
Copy
@ARTICLE{e104-d_8_1349,
author={Yusuke HARA, Xueting WANG, Toshihiko YAMASAKI, },
journal={IEICE TRANSACTIONS on Information},
title={Video Inpainting by Frame Alignment with Deformable Convolution},
year={2021},
volume={E104-D},
number={8},
pages={1349-1358},
abstract={Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.},
keywords={},
doi={10.1587/transinf.2020EDP7194},
ISSN={1745-1361},
month={August},}
Copy
TY - JOUR
TI - Video Inpainting by Frame Alignment with Deformable Convolution
T2 - IEICE TRANSACTIONS on Information
SP - 1349
EP - 1358
AU - Yusuke HARA
AU - Xueting WANG
AU - Toshihiko YAMASAKI
PY - 2021
DO - 10.1587/transinf.2020EDP7194
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2021
AB - Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.
ER -