The search functionality is under construction.

Author Search Result

[Author] Kenji MASE(9hit)

1-9hit
  • People Re-Identification with Local Distance Comparison Using Learned Metric

    Guanwen ZHANG  Jien KATO  Yu WANG  Kenji MASE  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E97-D No:9
      Page(s):
    2461-2472

    In this paper, we propose a novel approach for multiple-shot people re-identification. Due to high variance in camera view, light illumination, non-rigid deformation of posture and so on, there exists a crucial inter-/intra- variance issue, i.e., the same people may look considerably different, whereas different people may look extremely similar. This issue leads to an intractable, multimodal distribution of people appearance in feature space. To deal with such multimodal properties of data, we solve the re-identification problem under a local distance comparison framework, which significantly alleviates the difficulty induced by varying appearance of each individual. Furthermore, we build an energy-based loss function to measure the similarity between appearance instances, by calculating the distance between corresponding subsets in feature space. This loss function not only favors small distances that indicate high similarity between appearances of the same people, but also penalizes small distances or undesirable overlaps between subsets, which reflect high similarity between appearances of different people. In this way, effective people re-identification can be achieved in a robust manner against the inter-/intra- variance issue. The performance of our approach has been evaluated by applying it to the public benchmark datasets ETHZ and CAVIAR4REID. Experimental results show significant improvements over previous reports.

  • Multiple-Shot People Re-Identification by Patch-Wise Learning

    Guanwen ZHANG  Jien KATO  Yu WANG  Kenji MASE  

     
    PAPER-Pattern Recognition

      Pubricized:
    2015/08/31
      Vol:
    E98-D No:12
      Page(s):
    2257-2270

    In this paper, we propose a patch-wise learning based approach to deal with the multiple-shot people re-identification task. In the proposed approach, re-identification is formulated as a patch-wise set-to-set matching problem, with each patch set being matched using a specifically learned Mahalanobis distance metric. The proposed approach has two advantages: (1) a patch-wise representation that moderates the ambiguousness of a non-rigid matching problem (of human body) to an approximate rigid one (of body parts); (2) a patch-wise learning algorithm that enables more constraints to be included in the learning process and results in distance metrics of high quality. We evaluate the proposed approach on popular benchmark datasets and confirm its competitive performance compared to the state-of-the-art methods.

  • Recursive Multi-Scale Channel-Spatial Attention for Fine-Grained Image Classification

    Dichao LIU  Yu WANG  Kenji MASE  Jien KATO  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/12/22
      Vol:
    E105-D No:3
      Page(s):
    713-726

    Fine-grained image classification is a difficult problem, and previous studies mainly overcome this problem by locating multiple discriminative regions in different scales and then aggregating complementary information explored from the located regions. However, locating discriminative regions introduces heavy overhead and is not suitable for real-world application. In this paper, we propose the recursive multi-scale channel-spatial attention module (RMCSAM) for addressing this problem. Following the experience of previous research on fine-grained image classification, RMCSAM explores multi-scale attentional information. However, the attentional information is explored by recursively refining the deep feature maps of a convolutional neural network (CNN) to better correspond to multi-scale channel-wise and spatial-wise attention, instead of localizing attention regions. In this way, RMCSAM provides a lightweight module that can be inserted into standard CNNs. Experimental results show that RMCSAM can improve the classification accuracy and attention capturing ability over baselines. Also, RMCSAM performs better than other state-of-the-art attention modules in fine-grained image classification, and is complementary to some state-of-the-art approaches for fine-grained image classification. Code is available at https://github.com/Dichao-Liu/Recursive-Multi-Scale-Channel-Spatial-Attention-Module.

  • Vote Distribution Model for Hough-Based Action Detection

    Kensho HARA  Takatsugu HIRAYAMA  Kenji MASE  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2016/08/18
      Vol:
    E99-D No:11
      Page(s):
    2796-2808

    Hough-based voting approaches have been widely used to solve many detection problems such as object and action detection. These approaches for action detection cast votes for action classes and positions based on the local spatio-temporal features of given videos. The voting process of each local feature is performed independently of the other local features. This independence enables the method to be robust to occlusions because votes based on visible local features are not influenced by occluded local features. However, such independence makes discrimination of similar motions between different classes difficult and causes the method to cast many false votes. We propose a novel Hough-based action detection method to overcome the problem of false votes. The false votes do not occur randomly such that they depend on relevant action classes. We introduce vote distributions, which represent the number of votes for each action class. We assume that the distribution of false votes include important information necessary to improving action detection. These distributions are used to build a model that represents the characteristics of Hough voting that include false votes. The method estimates the likelihood using the model and reduces the influence of false votes. In experiments, we confirmed that the proposed method reduces false positive detection and improves action detection accuracy when using the IXMAS dataset and the UT-Interaction dataset.

  • Human Spine Posture Estimation from 2D Frontal and Lateral Views Using 3D Physically Accurate Spine Model

    Daisuke FURUKAWA  Kensaku MORI  Takayuki KITASAKA  Yasuhito SUENAGA  Kenji MASE  Tomoichi TAKAHASHI  

     
    PAPER-ME and Human Body

      Vol:
    E87-D No:1
      Page(s):
    146-154

    This paper proposes the design of a physically accurate spine model and its application to estimate three dimensional spine posture from the frontal and lateral views of a human body taken by two conventional video cameras. The accurate spine model proposed here is composed of rigid body parts approximating vertebral bodies and elastic body parts representing intervertebral disks. In the estimation process, we obtain neck and waist positions by fitting the Connected Vertebra Spheres Model to frontal and lateral silhouette images. Then the virtual forces acting on the top and the bottom vertebrae of the accurate spine model are computed based on the obtained neck and waist positions. The accurate model is deformed by the virtual forces, the gravitational force, and the forces of repulsion. The model thus deformed is regarded as the current posture. According to the preliminary experiments based on one real MR image data set of only one subject person, we confirmed that our proposed deformation method estimates the positions of the vertebrae within positional shifts of 3.2 6.8 mm. 3D posture of the spine could be estimated reasonably by applying the estimation method to actual human images taken by video cameras.

  • Top-Down Visual Attention Estimation Using Spatially Localized Activation Based on Linear Separability of Visual Features

    Takatsugu HIRAYAMA  Toshiya OHIRA  Kenji MASE  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2015/09/10
      Vol:
    E98-D No:12
      Page(s):
    2308-2316

    Intelligent information systems captivate people's attention. Examples of such systems include driving support vehicles capable of sensing driver state and communication robots capable of interacting with humans. Modeling how people search visual information is indispensable for designing these kinds of systems. In this paper, we focus on human visual attention, which is closely related to visual search behavior. We propose a computational model to estimate human visual attention while carrying out a visual target search task. Existing models estimate visual attention using the ratio between a representative value of visual feature of a target stimulus and that of distractors or background. The models, however, can not often achieve a better performance for difficult search tasks that require a sequentially spotlighting process. For such tasks, the linear separability effect of a visual feature distribution should be considered. Hence, we introduce this effect to spatially localized activation. Concretely, our top-down model estimates target-specific visual attention using Fisher's variance ratio between a visual feature distribution of a local region in the field of view and that of a target stimulus. We confirm the effectiveness of our computational model through a visual search experiment.

  • Personal Viewpoint Navigation Based on Object Trajectory Distribution for Multi-View Videos

    Xueting WANG  Kensho HARA  Yu ENOKIBORI  Takatsugu HIRAYAMA  Kenji MASE  

     
    PAPER-Human-computer Interaction

      Pubricized:
    2017/10/12
      Vol:
    E101-D No:1
      Page(s):
    193-204

    Multi-camera videos with abundant information and high flexibility are useful in a wide range of applications, such as surveillance systems, web lectures, news broadcasting, concerts and sports viewing. Viewers can enjoy an enhanced viewing experience by choosing their own viewpoint through viewing interfaces. However, some viewers may feel annoyed by the need for continual manual viewpoint selection, especially when the number of selectable viewpoints is relatively large. In order to solve this issue, we propose an automatic viewpoint navigation method designed especially for sports. This method focuses on a viewer's personal preference for viewpoint selection, instead of common and professional editing rules. We assume that different trajectory distributions of viewing objects cause a difference in the viewpoint selection according to personal preference. We learn the relationship between the viewer's personal viewpoint-selection tendency and the spatio-temporal game context represented by the objects trajectories. We compare three methods based on Gaussian mixture model, SVM with a general histogram and SVM with a bag-of-words to seek the best learning scheme for this relationship. The performance of the proposed methods are evaluated by assessing the degree of similarity between the selected viewpoints and the viewers' edited records.

  • Adaptive Metric Learning for People Re-Identification

    Guanwen ZHANG  Jien KATO  Yu WANG  Kenji MASE  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E97-D No:11
      Page(s):
    2888-2902

    There exist two intrinsic issues in multiple-shot person re-identification: (1) large differences in camera view, illumination, and non-rigid deformation of posture that make the intra-class variance even larger than the inter-class variance; (2) only a few training data that are available for learning tasks in a realistic re-identification scenario. In our previous work, we proposed a local distance comparison framework to deal with the first issue. In this paper, to deal with the second issue (i.e., to derive a reliable distance metric from limited training data), we propose an adaptive learning method to learn an adaptive distance metric, which integrates prior knowledge learned from a large existing auxiliary dataset and task-specific information extracted from a much smaller training dataset. Experimental results on several public benchmark datasets show that combined with the local distance comparison framework, our adaptive learning method is superior to conventional approaches.

  • Recognition of Facial Expression from Optical Flow

    Kenji MASE  

     
    PAPER

      Vol:
    E74-D No:10
      Page(s):
    3474-3483

    We present a method that uses optical flow to estimate facial muscle actions which can then be recognized as facial expressions. Facial expressions are the result of facial muscle actions which are triggered by the nerve impulses generated by emotions. The muscle actions cause the movement and deformation of facial skin and facial features such as eyes, mouth and nose. Since facial skin has the texture of a fine-grained organ, which helps in extracting the optical flow, we can extract muscle actions from external appearance. We are thus able to construct a facial expression recognition system based on optical flow data. We investigate the recogniton method in two ways. First, the optical-flow fields of skin movement is evaluated in muscle winsows, each of which defines one primary direction of muscle contraction to correctly extract muscle movement. Second, a fifteen dimensional feature vector is used to represent the most active points in terms of the flow variance through time and local spatial areas. The expression recognition system uses the feature vector to categorize the image sequences into several classes of facial expression. Preliminary experiments indicate an accuracy of approximately 80% when recognizing four types expressions: happiness, anger, disgust, and surprise.