The search functionality is under construction.

Author Search Result

[Author] Toshihiko YAMASAKI(12hit)

1-12hit
  • Video Inpainting by Frame Alignment with Deformable Convolution

    Yusuke HARA  Xueting WANG  Toshihiko YAMASAKI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/04/22
      Vol:
    E104-D No:8
      Page(s):
    1349-1358

    Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.

  • Attractiveness Computing in Image Media

    Toshihiko YAMASAKI  

     
    INVITED PAPER-Vision

      Pubricized:
    2023/06/16
      Vol:
    E106-A No:9
      Page(s):
    1196-1201

    Our research group has been working on attractiveness prediction, reasoning, and even enhancement for multimedia content, which we call “attractiveness computing.” Attractiveness includes impressiveness, instagrammability, memorability, clickability, and so on. Analyzing such attractiveness was usually done by experienced professionals but we have experimentally revealed that artificial intelligence (AI) based on big multimedia data can imitate or reproduce professionals' skills in some cases. In this paper, we introduce some of the representative works and possible real-life applications of our attractiveness computing for image media.

  • Hierarchical Detailed Intermediate Supervision for Image-to-Image Translation

    Jianbo WANG  Haozhi HUANG  Li SHEN  Xuan WANG  Toshihiko YAMASAKI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2023/09/14
      Vol:
    E106-D No:12
      Page(s):
    2085-2096

    The image-to-image translation aims to learn a mapping between the source and target domains. For improving visual quality, the majority of previous works adopt multi-stage techniques to refine coarse results in a progressive manner. In this work, we present a novel approach for generating plausible details by only introducing a group of intermediate supervisions without cascading multiple stages. Specifically, we propose a Laplacian Pyramid Transformation Generative Adversarial Network (LapTransGAN) to simultaneously transform components in different frequencies from the source domain to the target domain within only one stage. Hierarchical perceptual and gradient penalization are utilized for learning consistent semantic structures and details at each pyramid level. The proposed model is evaluated based on various metrics, including the similarity in feature maps, reconstruction quality, segmentation accuracy, similarity in details, and qualitative appearances. Our experiments show that LapTransGAN can achieve a much better quantitative performance than both the supervised pix2pix model and the unsupervised CycleGAN model. Comprehensive ablation experiments are conducted to study the contribution of each component.

  • Human Attribute Analysis Using a Top-View Camera Based on Two-Stage Classification

    Toshihiko YAMASAKI  Tomoaki MATSUNAMI  Tuhan CHEN  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E96-D No:4
      Page(s):
    993-996

    This paper presents a technique that analyzes pedestrians' attributes such as gender and bag-possession status from surveillance video. One of the technically challenging issues is that we use only top-view camera images to protect privacy. The shape features over the frames are extracted by bag-of-features (BoF) using histogram of oriented gradients (HoG) vectors. In order to enhance the classification accuracy, a two-staged classification framework is presented. Multiple classifiers are trained by changing the parameters in the first stage. The outputs from the first stage is further trained and classified in the second stage classifier. The experiments using 60-minute video captured at Haneda Airport, Japan, show that the accuracies for the gender classification and the bag-possession classification were 95.8% and 97.2%, respectively, which is a significant improvement from our previous work.

  • Retrieval of Images Captured by Car Cameras Using Its Front and Side Views and GPS Data

    Toshihiko YAMASAKI  Takayuki ISHIKAWA  Kiyoharu AIZAWA  

     
    PAPER

      Vol:
    E90-D No:1
      Page(s):
    217-223

    Recently, cars are equipped with a lot of sensors for safety driving. We have been trying to store the driving-scene video with such sensor data and to detect the change of scenery of streets. Detection results can be used for building historical database of town scenery, automatic landmark updating of maps, and so forth. In order to compare images to detect changes, image retrieval taken at nearly identical locations is required as the first step. Since Global Positioning System (GPS) data essentially contain some noises, we cannot rely only on GPS data for our image retrieval. Therefore, we have developed an image retrieval algorithm employing edge-histogram-based image features in conjunction with hierarchical search. By using edge histograms projected onto the vertical and horizontal axes, the retrieval has been made robust to image variation due to weather change, clouds, obstacles, and so on. In addition, matching cost has been made small by limiting the matching candidates employing the hierarchical search. Experimental results have demonstrated that the mean retrieval accuracy has been improved from 65% to 76% for the front-view images and from 34% to 53% for the side-view images.

  • Assessment System of Presentation Slide Design Using Visual and Structural Features

    Shengzhou YI  Junichiro MATSUGAMI  Toshihiko YAMASAKI  

     
    PAPER

      Pubricized:
    2021/12/01
      Vol:
    E105-D No:3
      Page(s):
    587-596

    Developing well-designed presentation slides is challenging for many people, especially novices. The ability to build high quality slideshows is becoming more important in society. In this study, a neural network was used to identify novice vs. well-designed presentation slides based on visual and structural features. For such a purpose, a dataset containing 1,080 slide pairs was newly constructed. One of each pair was created by a novice, and the other was the improved one by the same person according to the experts' advice. Ten checkpoints frequently pointed out by professional consultants were extracted and set as prediction targets. The intrinsic problem was that the label distribution was imbalanced, because only a part of the samples had corresponding design problems. Therefore, re-sampling methods for addressing class imbalance were applied to improve the accuracy of the proposed model. Furthermore, we combined the target task with an assistant task for transfer and multi-task learning, which helped the proposed model achieve better performance. After the optimal settings were used for each checkpoint, the average accuracy of the proposed model rose up to 81.79%. With the advice provided by our assessment system, the novices significantly improved their slide design.

  • Summarization of 3D Video by Rate-Distortion Trade-off

    Jianfeng XU  Toshihiko YAMASAKI  Kiyoharu AIZAWA  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E90-D No:9
      Page(s):
    1430-1438

    3D video, which consists of a sequence of mesh models, can reproduce dynamic scenes containing 3D information. To summarize 3D video, a key frame extraction method is developed using rate-distortion (R-D) trade-off. For this purpose, an effective feature vector is extracted for each frame. Shot detection is performed using the feature vectors as a preprocessing followed by key frame extraction. Simple but reasonable definitions of rate and distortion are presented. Based on an assumption of linearity, an R-D curve is generated in each shot, where the locations of the key frames are optimized. Finally, R-D trade-off can be achieved by optimizing a cost function using a Lagrange multiplier, where the number of key frames is optimized in each shot. Therefore, our system will automatically determine the best locations and the number of key frames in the sense of R-D trade-off. Our experimental results show the extracted key frames are compact and faithful to the original 3D video.

  • Robust Object-Based Watermarking Using Feature Matching

    Viet-Quoc PHAM  Takashi MIYAKI  Toshihiko YAMASAKI  Kiyoharu AIZAWA  

     
    PAPER-Application Information Security

      Vol:
    E91-D No:7
      Page(s):
    2027-2034

    We present a robust object-based watermarking algorithm using the scale-invariant feature transform (SIFT) in conjunction with a data embedding method based on Discrete Cosine Transform (DCT). The message is embedded in the DCT domain of randomly generated blocks in the selected object region. To recognize the object region after being distorted, its SIFT features are registered in advance. In the detection scheme, we extract SIFT features from the distorted image and match them with the registered ones. Then we recover the distorted object region based on the transformation parameters obtained from the matching result using SIFT, and the watermarked message can be detected. Experimental results demonstrated that our proposed algorithm is very robust to distortions such as JPEG compression, scaling, rotation, shearing, aspect ratio change, and image filtering.

  • SIFT-Based Non-blind Watermarking Robust to Non-linear Geometrical Distortions

    Toshihiko YAMASAKI  Kiyoharu AIZAWA  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E96-D No:6
      Page(s):
    1368-1375

    This paper presents a non-blind watermarking technique that is robust to non-linear geometric distortion attacks. This is one of the most challenging problems for copyright protection of digital content because it is difficult to estimate the distortion parameters for the embedded blocks. In our proposed scheme, the location of the blocks are recorded by the translation parameters from multiple Scale Invariant Feature Transform (SIFT) feature points. This method is based on two assumptions: SIFT features are robust to non-linear geometric distortion and even such non-linear distortion can be regarded as “linear” distortion in local regions. We conducted experiments using 149,800 images (7 standard images and 100 images downloaded from Flickr, 10 different messages, 10 different embedding block patterns, and 14 attacks). The results show that the watermark detection performance is drastically improved, while the baseline method can achieve only chance level accuracy.

  • Users' Preference Prediction of Real Estate Properties Based on Floor Plan Analysis

    Naoki KATO  Toshihiko YAMASAKI  Kiyoharu AIZAWA  Takemi OHAMA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/11/20
      Vol:
    E103-D No:2
      Page(s):
    398-405

    With the recent advances in e-commerce, it has become important to recommend not only mass-produced daily items, such as books, but also items that are not mass-produced. In this study, we present an algorithm for real estate recommendations. Automatic property recommendations are a highly difficult task because no identical properties exist in the world, occupied properties cannot be recommended, and users rent or buy properties only a few times in their lives. For the first step of property recommendation, we predict users' preferences for properties by combining content-based filtering and Multi-Layer Perceptron (MLP). In the MLP, we use not only attribute data of users and properties, but also deep features extracted from property floor plan images. As a result, we successfully predict users' preference with a Matthews Correlation Coefficient (MCC) of 0.166.

  • Ubiquitous Home: Retrieval of Experiences in a Home Environment

    Gamhewage C. DE SILVA  Toshihiko YAMASAKI  Kiyoharu AIZAWA  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E91-D No:2
      Page(s):
    330-340

    Automated capture and retrieval of experiences at home is interesting due to the wide variety and personal significance of such experiences. We present a system for retrieval and summarization of continuously captured multimedia data from Ubiquitous Home, a two-room house consisting of a large number of cameras and microphones. Data from pressure based sensors on the floor are analyzed to segment footsteps of different persons. Video and audio handover are implemented to retrieve continuous video streams corresponding to moving persons. An adaptive algorithm based on the rate of footsteps summarizes these video streams. A novel method for audio segmentation using multiple microphones is used for video retrieval based on sounds with high accuracy. An experiment, in which a family lived in this house for twelve days, was conducted. The system was evaluated by the residents who used the system for retrieving their own experiences; we report and discuss the results.

  • Robustness of Deep Learning Models in Dermatological Evaluation: A Critical Assessment

    Sourav MISHRA  Subhajit CHAUDHURY  Hideaki IMAIZUMI  Toshihiko YAMASAKI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2020/12/22
      Vol:
    E104-D No:3
      Page(s):
    419-429

    Our paper attempts to critically assess the robustness of deep learning methods in dermatological evaluation. Although deep learning is being increasingly sought as a means to improve dermatological diagnostics, the performance of models and methods have been rarely investigated beyond studies done under ideal settings. We aim to look beyond results obtained on curated and ideal data corpus, by investigating resilience and performance on user-submitted data. Assessing via few imitated conditions, we have found the overall accuracy to drop and individual predictions change significantly in many cases despite of robust training.