The search functionality is under construction.

Author Search Result

[Author] Keiji YANAI(8hit)

1-8hit
  • A Multi-Resolution Image Understanding System Based on Multi-Agent Architecture for High-Resolution Images

    Keiji YANAI  Koichiro DEGUCHI  

     
    PAPER

      Vol:
    E84-D No:12
      Page(s):
    1642-1650

    Recently a high-resolution image that has more than one million pixels is available easily. However, such an image requires much processing time and memory for an image understanding system. In this paper, we propose an integrated image understanding system of multi-resolution analysis and multi-agent-based architecture for high-resolution images. The system we propose in this paper has capability to treat with a high-resolution image effectively without much extra cost. We implemented an experimental system for images of indoor scenes.

  • Image Collector II: A System to Gather a Large Number of Images from the Web

    Keiji YANAI  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E88-D No:10
      Page(s):
    2432-2436

    We propose a system that enables us to gather hundreds of images related to one set of keywords provided by a user from the World Wide Web. The system is called Image Collector II. The Image Collector, which we proposed previously, can gather only one or two hundreds of images. We propose the two following improvements on our previous system in terms of the number of gathered images and their precision: (1) We extract some words appearing with high frequency from all HTML files in which output images are embedded in an initial image gathering, and using them as keywords, we carry out a second image gathering. Through this process, we can obtain hundreds of images for one set of keywords. (2) The more images we gather, the more the precision of gathered images decreases. To improve the precision, we introduce word vectors of HTML files embedding images into the image selecting process in addition to image feature vectors.

  • Automatic Retrieval of Action Video Shots from the Web Using Density-Based Cluster Analysis and Outlier Detection

    Nga Hang DO  Keiji YANAI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2016/07/21
      Vol:
    E99-D No:11
      Page(s):
    2788-2795

    In this paper, we introduce a fully automatic approach to construct action datasets from noisy Web video search results. The idea is based on combining cluster structure analysis and density-based outlier detection. For a specific action concept, first, we download its Web top search videos and segment them into video shots. We then organize these shots into subsets using density-based hierarchy clustering. For each set, we rank its shots by their outlier degrees which are determined as their isolatedness with respect to their surroundings. Finally, we collect high ranked shots as training data for the action concept. We demonstrate that with action models trained by our data, we can obtain promising precision rates in the task of action classification while offering the advantage of fully automatic, scalable learning. Experiment results on UCF11, a challenging action dataset, show the effectiveness of our method.

  • Image-Based Food Calorie Estimation Using Recipe Information

    Takumi EGE  Keiji YANAI  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Vol:
    E101-D No:5
      Page(s):
    1333-1341

    Recently, mobile applications for recording everyday meals draw much attention for self dietary. However, most of the applications return food calorie values simply associated with the estimated food categories, or need for users to indicate the rough amount of foods manually. In fact, it has not been achieved to estimate food calorie from a food photo with practical accuracy, and it remains an unsolved problem. Then, in this paper, we propose estimating food calorie from a food photo by simultaneous learning of food calories, categories, ingredients and cooking directions using deep learning. Since there exists a strong correlation between food calories and food categories, ingredients and cooking directions information in general, we expect that simultaneous training of them brings performance boosting compared to independent single training. To this end, we use a multi-task CNN. In addition, in this research, we construct two kinds of datasets that is a dataset of calorie-annotated recipe collected from Japanese recipe sites on the Web and a dataset collected from an American recipe site. In the experiments, we trained both multi-task and single-task CNNs, and compared them. As a result, a multi-task CNN achieved the better performance on both food category estimation and food calorie estimation than single-task CNNs. For the Japanese recipe dataset, by introducing a multi-task CNN, 0.039 were improved on the correlation coefficient, while for the American recipe dataset, 0.090 were raised compared to the result by the single-task CNN. In addition, we showed that the proposed multi-task CNN based method outperformed search-based methods proposed before.

  • Simultaneous Estimation of Dish Locations and Calories with Multi-Task Learning Open Access

    Takumi EGE  Keiji YANAI  

     
    PAPER

      Pubricized:
    2019/04/25
      Vol:
    E102-D No:7
      Page(s):
    1240-1246

    In recent years, a rise in healthy eating has led to various food management applications which have image recognition function to record everyday meals automatically. However, most of the image recognition functions in the existing applications are not directly useful for multiple-dish food photos and cannot automatically estimate food calories. Meanwhile, methodologies on image recognition have advanced greatly because of the advent of Convolutional Neural Network (CNN). CNN has improved accuracies of various kinds of image recognition tasks such as classification and object detection. Therefore, we propose CNN-based food calorie estimation for multiple-dish food photos. Our method estimates dish locations and food calories simultaneously by multi-task learning of food dish detection and food calorie estimation with a single CNN. It is expected to achieve high speed and small network size by simultaneous estimation in a single network. Because currently there is no dataset of multiple-dish food photos annotated with both bounding boxes and food calories, in this work we use two types of datasets alternately for training a single CNN. For the two types of datasets, we use multiple-dish food photos annotated with bounding boxes and single-dish food photos with food calories. Our results showed that our multi-task method achieved higher accuracy, higher speed and smaller network size than a sequential model of food detection and food calorie estimation.

  • VisualTextualRank: An Extension of VisualRank to Large-Scale Video Shot Extraction Exploiting Tag Co-occurrence

    Nga H. DO  Keiji YANAI  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E98-D No:1
      Page(s):
    166-172

    In this paper, we propose a novel ranking method called VisualTextualRank which ranks media data according to the relevance between the data and specified keywords. We apply our method to the system of video shot ranking which aims to automatically obtain video shots corresponding to given action keywords from Web videos. The keywords can be any type of action such as “surfing wave” (sport action) or “brushing teeth” (daily activity). Top ranked video shots are expected to be relevant to the keywords. While our baseline exploits only visual features of the data, the proposed method employs both textual information (tags) and visual features. Our method is based on random walks over a bipartite graph to integrate visual information of video shots and tag information of Web videos effectively. Note that instead of treating the textual information as an additional feature for shot ranking, we explore the mutual reinforcement between shots and textual information of their corresponding videos to improve shot ranking. We validated our framework on a database which was used by the baseline. Experiments showed that our proposed ranking method, VisualTextualRank, improved significantly the performance of the system of video shot extraction over the baseline.

  • Multi-Style Shape Matching GAN for Text Images Open Access

    Honghui YUAN  Keiji YANAI  

     
    PAPER

      Pubricized:
    2023/12/27
      Vol:
    E107-D No:4
      Page(s):
    505-514

    Deep learning techniques are used to transform the style of images and produce diverse images. In the text style transformation field, many previous studies attempted to generate stylized text using deep learning networks. However, to achieve multiple style transformations for text images, the methods proposed in previous studies require learning multiple networks or cannot be guided by style images. Thus, in this study we focused on multistyle transformation of text images using style images to guide the generation of results. We propose a multiple-style transformation network for text style transfer, which we refer to as the Multi-Style Shape Matching GAN (Multi-Style SMGAN). The proposed method generates multiple styles of text images using a single model by training the model only once, and allows users to control the text style according to style images. The proposed method implements conditions to the network such that all styles can be distinguished effectively in the network, and the generation of each styled text can be controlled according to these conditions. The proposed network is optimized such that the conditional information can be transmitted effectively throughout the network. The proposed method was evaluated experimentally on a large number of text images, and the results show that the trained model can generate multiple-style text in realtime according to the style image. In addition, the results of a user survey study indicate that the proposed method produces higher quality results compared to existing methods.

  • Webly-Supervised Food Detection with Foodness Proposal Open Access

    Wataru SHIMODA  Keiji YANAI  

     
    PAPER

      Pubricized:
    2019/04/25
      Vol:
    E102-D No:7
      Page(s):
    1230-1239

    To minimize the annotation costs associated with training semantic segmentation models and object detection models, weakly supervised detection and weakly supervised segmentation approaches have been extensively studied. However most of these approaches assume that the domain between training and testing is the same, which at times results in considerable performance drops. For example, if we train an object detection network using only web images showing a large object at the center, it can be difficult for the network to detect multiple small objects. In this paper, we focus on training a CNN with only web images and achieve object detection in the wild. A proposal-based approach can address the problem associated with differences in domains because web images are similar to images of the proposal. In both domains, the target object is located at the center of the image and the ratio of the size of the target object to the size of the image is large. Several proposal methods have been proposed to detect regions with high “object-ness.” However, many of these proposals generate a large number of candidates to increase the recall rate. Considering the recent advent of deep CNNs, methods that generate a large number of proposals exhibit problems in terms of processing time for practical use. Therefore, we propose a CNN-based “food-ness” proposal method in this paper that requires neither pixel-wise annotation nor bounding box annotation. Our method generates proposals through backpropagation and most of these proposals focus only on food objects. In addition, we can easily control the number of proposals. Through experiments, we trained a network model using only web images and tested the model on the UEC FOOD 100 dataset. We demonstrate that the proposed method achieves high performance compared to traditional proposal methods in terms of the trade-off between accuracy and computational cost. Therefore, in this paper, we propose an intermediate approach between the traditional proposal approach and the fully convolutional approach. In particular, we propose a novel proposal method that generates high“food-ness” regions using fully convolutional networks based on the backward approach by training food images gathered from the web.