Video Inpainting by Frame Alignment with Deformable Convolution

Yusuke HARA; Xueting WANG; Toshihiko YAMASAKI

doi:10.1587/transinf.2020EDP7194

IEICE TRANSACTIONS on Information

Video Inpainting by Frame Alignment with Deformable Convolution

Yusuke HARA, Xueting WANG, Toshihiko YAMASAKI

Full Text Views

0

Cite this

Summary :

Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.

Publication: IEICE TRANSACTIONS on Information Vol.E104-D No.8 pp.1349-1358

Publication Date: 2021/08/01

Publicized: 2021/04/22

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2020EDP7194

Type of Manuscript: PAPER

Category: Image Processing and Video Processing

Authors

Yusuke HARA
  The University of Tokyo
Xueting WANG
  The University of Tokyo
Toshihiko YAMASAKI
  The University of Tokyo

Keyword

video inpainting, deformable convolution, deep learning, computer vision

Cite this

Copy

Yusuke HARA, Xueting WANG, Toshihiko YAMASAKI, "Video Inpainting by Frame Alignment with Deformable Convolution" in IEICE TRANSACTIONS on Information, vol. E104-D, no. 8, pp. 1349-1358, August 2021, doi: 10.1587/transinf.2020EDP7194.
Abstract: Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020EDP7194/_p

Copy

@ARTICLE{e104-d_8_1349,
author={Yusuke HARA, Xueting WANG, Toshihiko YAMASAKI, },
journal={IEICE TRANSACTIONS on Information},
title={Video Inpainting by Frame Alignment with Deformable Convolution},
year={2021},
volume={E104-D},
number={8},
pages={1349-1358},
abstract={Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.},
keywords={},
doi={10.1587/transinf.2020EDP7194},
ISSN={1745-1361},
month={August},}

Copy

TY - JOUR
TI - Video Inpainting by Frame Alignment with Deformable Convolution
T2 - IEICE TRANSACTIONS on Information
SP - 1349
EP - 1358
AU - Yusuke HARA
AU - Xueting WANG
AU - Toshihiko YAMASAKI
PY - 2021
DO - 10.1587/transinf.2020EDP7194
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2021
AB - Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.
ER -

IEICE TRANSACTIONS on Information