1-8hit |
Tsubasa MIYAUCHI Ayato ONO Hiroki YOSHIMURA Masashi NISHIYAMA Yoshio IWAI
We propose a method for embedding the awareness state and response state in an image-based avatar to smoothly and automatically start an interaction with a user. When both states are not embedded, the image-based avatar can become non-responsive or slow to respond. To consider the beginning of an interaction, we observed the behaviors between a user and receptionist in an information center. Our method replayed the behaviors of the receptionist at appropriate times in each state of the image-based avatar. Experimental results demonstrate that, at the beginning of the interaction, our method for embedding the awareness state and response state increased subjective scores more than not embedding the states.
Alparslan YILDIZ Noriko TAKEMURA Maiya HORI Yoshio IWAI Kosuke SATO
In this study, we introduce a system for tracking multiple people using multiple active cameras. Our main objective is to surveille as many targets as possible, at any time, using a limited number of active cameras. In our context, an active camera is a statically located pan-tilt-zoom camera. In this research, we aim to optimize the camera configuration to achieve maximum coverage of the targets. We first devise a method for efficient tracking and estimation of target locations in the environment. Our tracking method is able to track an unknown number of targets and easily estimate multiple future time-steps, which is a requirement for active cameras. Next, we present an optimization of camera configuration with variable time-step that is optimal given the estimated object likelihoods for multiple future frames. We confirmed our results using simulation and real videos, and show that without introducing any significant computational complexities, it is possible to use active cameras to the point that we can track and observe multiple targets very effectively.
Tomohiro MASHITA Yoshio IWAI Masahiko YACHIDA
This paper proposes a calibration method for catadioptric camera systems consisting of a mirror whose reflecting surface is the surface of revolution and a perspective camera as typified by HyperOmni Vision. The proposed method is based on conventional camera calibration and mirror posture estimation. Many methods for camera calibration have been proposed and during the last decade, methods for catadioptric camera calibration have also been proposed. The main problem with catadioptric camera calibration is that the degree of freedom of mirror posture is limited or the accuracy of the estimated parameters is inadequate due to nonlinear optimization. On the other hand, our method can estimate five degrees of freedom of mirror posture and is free from the volatility of nonlinear optimization. The mirror posture has five degrees of freedom, because the mirror surface has a surface of revolution. Our method uses the mirror boundary and can estimate up to four mirror postures. We apply an extrinsic parameter calibration method based on conic fitting for this estimation method. Because an estimate of the mirror posture is not unique, we also propose a selection method for finding the best one. By using the conic-based analytical method we can avoid the initial value problem arising from nonlinear optimization. We conducted experiments on synthesized images and real images to evaluate the performance of our method, and discuss its accuracy.
Ayaka YAMAMOTO Yoshio IWAI Hiroshi ISHIGURO
Background subtraction is widely used in detecting moving objects; however, changing illumination conditions, color similarity, and real-time performance remain important problems. In this paper, we introduce a sequential method for adaptively estimating background components using Kalman filters, and a novel method for detecting objects using margined sign correlation (MSC). By applying MSC to our adaptive background model, the proposed system can perform object detection robustly and accurately. The proposed method is suitable for implementation on a graphics processing unit (GPU) and as such, the system realizes real-time performance efficiently. Experimental results demonstrate the performance of the proposed system.
Takuya KAMITANI Hiroki YOSHIMURA Masashi NISHIYAMA Yoshio IWAI
We propose a method for accurately identifying people using temporal and spatial changes in local movements measured from video sequences of body sway. Existing methods identify people using gait features that mainly represent the large swinging of the limbs. The use of gait features introduces a problem in that the identification performance decreases when people stop walking and maintain an upright posture. To extract informative features, our method measures small swings of the body, referred to as body sway. We extract the power spectral density as a feature from local body sway movements by dividing the body into regions. To evaluate the identification performance using our method, we collected three original video datasets of body sway sequences. The first dataset contained a large number of participants in an upright posture. The second dataset included variation over the long term. The third dataset represented body sway in different postures. The results on the datasets confirmed that our method using local movements measured from body sway can extract informative features for identification.
Masashi NISHIYAMA Michiko INOUE Yoshio IWAI
We propose an attention mechanism in deep learning networks for gender recognition using the gaze distribution of human observers when they judge the gender of people in pedestrian images. Prevalent attention mechanisms spatially compute the correlation among values of all cells in an input feature map to calculate attention weights. If a large bias in the background of pedestrian images (e.g., test samples and training samples containing different backgrounds) is present, the attention weights learned using the prevalent attention mechanisms are affected by the bias, which in turn reduces the accuracy of gender recognition. To avoid this problem, we incorporate an attention mechanism called gaze-guided self-attention (GSA) that is inspired by human visual attention. Our method assigns spatially suitable attention weights to each input feature map using the gaze distribution of human observers. In particular, GSA yields promising results even when using training samples with the background bias. The results of experiments on publicly available datasets confirm that our GSA, using the gaze distribution, is more accurate in gender recognition than currently available attention-based methods in the case of background bias between training and test samples.
Kiyotaka WATANABE Yoshio IWAI Hajime NAGAHARA Masahiko YACHIDA Toshiya SUZUKI
We propose a novel strategy to obtain a high spatio-temporal resolution video. To this end, we introduce a dual sensor camera that can capture two video sequences with the same field of view simultaneously. These sequences record high resolution with low frame rate and low resolution with high frame rate. This paper presents an algorithm to synthesize a high spatio-temporal resolution video from these two video sequences by using motion compensation and spectral fusion. We confirm that the proposed method improves the resolution and frame rate of the synthesized video.
Dadet PRAMADIHANTO Yoshio IWAI Masahiko YACHIDA
In this paper we propose an integration of face identification and facial expression recognition. A face is modeled as a graph where the nodes represent facial feature points. This model is used for automatic face and facial feature point detection, and facial feature points tracked by applying flexible feature matching. Face identification is performed by comparing the graphs representing the input face image with individual face models. Facial expression is modeled by finding the relationship between the motion of facial feature points and expression change. Individual and average expression models are generated and then used to identify facial expressions under appropriate categories and the degree of expression changes. The expression model used for facial expression recognition is chosen by the results of face identification.