Tsuyoshi HIGASHIGUCHI Norimichi UKITA Masayuki KANBARA Norihiro HAGITA
This paper proposes a method for predicting individuality-preserving gait patterns. Physical rehabilitation can be performed using visual and/or physical instructions by physiotherapists or exoskeletal robots. However, a template-based rehabilitation may produce discomfort and pain in a patient because of deviations from the natural gait of each patient. Our work addresses this problem by predicting an individuality-preserving gait pattern for each patient. In this prediction, the transition of the gait patterns is modeled by associating the sequence of a 3D skeleton in gait with its continuous-value gait features (e.g., walking speed or step width). In the space of the prediction model, the arrangement of the gait patterns are optimized so that (1) similar gait patterns are close to each other and (2) the gait feature changes smoothly between neighboring gait patterns. This model allows to predict individuality-preserving gait patterns of each patient even if his/her various gait patterns are not available for prediction. The effectiveness of the proposed method is demonstrated quantitatively. with two datasets.
Tatsuya NOBUNAGA Toshiaki WATANABE Hiroya TANAKA
Individuals can be identified by features extracted from an electrocardiogram (ECG). However, irregular palpitations due to stress or exercise decrease the identification accuracy due to distortion of the ECG waveforms. In this letter, we propose a human identification scheme based on the frequency spectrums of an ECG, which can successfully extract features and thus identify individuals even while exercising. For the proposed scheme, we demonstrate an accuracy rate of 99.8% in a controlled experiment with exercising subjects. This level of accuracy is achieved by determining the significant features of individuals with a random forest classifier. In addition, the effectiveness of the proposed scheme is verified using a publicly available ECG database. We show that the proposed scheme also achieves a high accuracy with this public database.
Kazuaki KONDO Genki MIZUNO Yuichi NAKAMURA
This study proposes a mathematical model of a gesture-based pointing interface system for simulating pointing behaviors in various situations. We assume an interaction between a pointing interface and a user as a human-in-the-loop system and describe it using feedback control theory. The model is formulated as a hybrid of a target value follow-up component and a disturbance compensation one. These are induced from the same feedback loop but with different parameter sets to describe human pointing characteristics well. The two optimal parameter sets were determined individually to represent actual pointing behaviors accurately for step input signals and random walk disturbance sequences, respectively. The calibrated model is used to simulate pointing behaviors for arbitrary input signals expected in practical situations. Through experimental evaluations, we quantitatively analyzed the performance of the proposed hybrid model regarding how accurately it can simulate actual pointing behaviors and also discuss the advantage regarding the basic non-hybrid model. Model refinements for further accuracy are also suggested based on the evaluation results.
A lot of vision systems have been embedded in devices around us, like mobile phones, vehicles and UAVs. Many of them still need interactive operations of human users. However, specifying accurate object information could be a challenging task due to video jitters caused by camera shakes and target motions. In this paper, we first collect practical hand drawn bounding boxes on real-life videos which are captured by hand-held cameras and UAV-based cameras. We give a deep look into human-computer interactive operations on unstable images. The collected data shows that human input suffers heavy deviations which are harmful to interaction accuracy. To achieve robust interactions on unstable platforms, we propose a target-focused video stabilization method which utilizes a proposal-based object detector and a tracking-based motion estimation component. This method starts with a single manual click and outputs stabilized video stream in which the specified target stays almost stationary. Our method removes not only camera jitters but also target motions simultaneously, therefore offering an comfortable environment for users to do further interactive operations. The experiments demonstrate that the proposed method effectively eliminates image vibrations and significantly increases human input accuracy.
Ngochao TRAN Tetsuro IMAI Koshiro KITAO Yukihiko OKUMURA Takehiro NAKAMURA Hiroshi TOKUDA Takao MIYAKE Robin WANG Zhu WEN Hajime KITANO Roger NICHOLS
The fifth generation (5G) system using millimeter waves is considered for application to high traffic areas with a dense population of pedestrians. In such an environment, the effects of shadowing and scattering of radio waves by human bodies (HBs) on propagation channels cannot be ignored. In this paper, we clarify based on measurement the characteristics of waves scattered by the HB for typical non-line-of-sight scenarios in street canyon environments. In these scenarios, there are street intersections with pedestrians, and the angles that are formed by the transmission point, HB, and reception point are nearly equal to 90 degrees. We use a wide-band channel sounder for the 67-GHz band with a 1-GHz bandwidth and horn antennas in the measurements. The distance parameter between antennas and the HB is changed in the measurements. Moreover, the direction of the HB is changed from 0 to 360 degrees. The evaluation results show that the radar cross section (RCS) of the HB fluctuates randomly over the range of approximately 20dB. Moreover, the distribution of the RCS of the HB is a Gaussian distribution with a mean value of -9.4dBsm and the standard deviation of 4.2dBsm.
The present study considers an action-based person identification problem, in which an input action sequence consists of 3D skeletal data from multiple frames. Unlike previous approaches, the type of action is not pre-defined in this work, which requires the subject classifier to possess cross-action generalization capabilities. To achieve that, we present a novel pose-based Hough forest framework, in which each per-frame pose feature casts a probabilistic vote to the Hough space. Pose distribution is estimated from training data and then used to compute the reliability of the vote to deal with the unseen poses in the test action sequence. Experimental results with various real datasets demonstrate that the proposed method provides effective person identification results especially for the challenging cross-action person identification setting.
Shilei CHENG Song GU Maoquan YE Mei XIE
Human action recognition in videos draws huge research interests in computer vision. The Bag-of-Word model is quite commonly used to obtain the video level representations, however, BoW model roughly assigns each feature vector to its nearest visual word and the collection of unordered words ignores the interest points' spatial information, inevitably causing nontrivial quantization errors and impairing improvements on classification rates. To address these drawbacks, we propose an approach for action recognition by encoding spatio-temporal log Euclidean covariance matrix (ST-LECM) features within the low-rank and sparse representation framework. Motivated by low rank matrix recovery, local descriptors in a spatial temporal neighborhood have similar representation and should be approximately low rank. The learned coefficients can not only capture the global data structures, but also preserve consistent. Experimental results showed that the proposed approach yields excellent recognition performance on synthetic video datasets and are robust to action variability, view variations and partial occlusion.
This paper proposes an iterative scheme between human action classification and pose estimation in still images. Initial action classification is achieved only by global image features that consist of the responses of various object filters. The classification likelihood of each action weights human poses estimated by the pose models of multiple sub-action classes. Such fine-grained action-specific pose models allow us to robustly identify the pose of a target person under the assumption that similar poses are observed in each action. From the estimated pose, pose features are extracted and used with global image features for action re-classification. This iterative scheme can mutually improve action classification and pose estimation. Experimental results with a public dataset demonstrate the effectiveness of the proposed method both for action classification and pose estimation.
Tomoki HAYASHI Masafumi NISHIDA Norihide KITAOKA Tomoki TODA Kazuya TAKEDA
In this study, toward the development of smartphone-based monitoring system for life logging, we collect over 1,400 hours of data by recording including both the outdoor and indoor daily activities of 19 subjects, under practical conditions with a smartphone and a small camera. We then construct a huge human activity database which consists of an environmental sound signal, triaxial acceleration signals and manually annotated activity tags. Using our constructed database, we evaluate the activity recognition performance of deep neural networks (DNNs), which have achieved great performance in various fields, and apply DNN-based adaptation techniques to improve the performance with only a small amount of subject-specific training data. We experimentally demonstrate that; 1) the use of multi-modal signal, including environmental sound and triaxial acceleration signals with a DNN is effective for the improvement of activity recognition performance, 2) the DNN can discriminate specified activities from a mixture of ambiguous activities, and 3) DNN-based adaptation methods are effective even if only a small amount of subject-specific training data is available.
Mitsuhiro YOKOTA Yoshichika OHTA Teruya FUJII
The radio wave shadowing by a two-dimensional human body is examined numerically as the scattering problem by using the Method of Moments (MoM) in order to verify the equivalent human body diameter. Three human body models are examined: (1) a circular cylinder, (2) an elliptical cylinder, and (3) an elliptical cylinder with two circular cylinders are examined. The scattered fields yields by the circular cylinder are compared with measured data. Since the angle of the model to an incident wave affects scattered fields in models other than a circular cylinder, the models of an elliptical cylinder and an elliptical cylinder with two circular cylinders are converted into a circular cylinder of equivalent diameter. The frequency characteristics for the models are calculated by using the equivalent diameter.
Md. Golam RASHED Ryota SUZUKI Takuya YONEZAWA Antony LAM Yoshinori KOBAYASHI Yoshinori KUNO
This introduces a method which uses LIDAR to identify humans and track their positions, body orientation, and movement trajectories in any public space to read their various types of behavioral responses to surroundings. We use a network of LIDAR poles, installed at the shoulder level of typical adults to reduce potential occlusion between persons and/or objects even in large-scale social environments. With this arrangement, a simple but effective human tracking method is proposed that works by combining multiple sensors' data so that large-scale areas can be covered. The effectiveness of this method is evaluated in an art gallery of a real museum. The result revealed good tracking performance and provided valuable behavioral information related to the art gallery.
Yuta OGUMA Takayuki NISHIO Koji YAMAMOTO Masahiro MORIKURA
A joint deployment of base stations (BSs) and RGB-depth (RGB-D) cameras for camera-assisted millimeter-wave (mmWave) access networks is discussed in this paper. For the deployment of a wide variety of devices in heterogeneous networks, it is crucial to consider the synergistic effects among the different types of nodes. A synergy between mmWave networks and cameras reduces the power consumption of mmWave BSs through sleep control. A purpose of this work is to optimize the number of nodes of each type, to maximize the average achievable rate within the constraint of a predefined total power budget. A stochastic deployment problem is formulated as a submodular optimization problem, by assuming that the deployment of BSs and cameras forms two independent Poisson point processes. An approximate algorithm is presented to solve the deployment problem, and it is proved that a (1-e-1)/2-approximate solution can be obtained for submodular optimization, using a modified greedy algorithm. The numerical results reveal the deployment conditions under which the average achievable rate of the camera-assisted mmWave system is higher than that of a conventional system that does not employ RGB-D cameras.
This paper proposes a method for human pose estimation in still images. The proposed method achieves occlusion-aware appearance modeling. Appearance modeling with less accurate appearance data is problematic because it adversely affects the entire training process. The proposed method evaluates the effectiveness of mitigating the influence of occluded body parts in training sample images. In order to improve occlusion evaluation by a discriminatively-trained model, occlusion images are synthesized and employed with non-occlusion images for discriminative modeling. The score of this discriminative model is used for weighting each sample in the training process. Experimental results demonstrate that our approach improves the performance of human pose estimation in contrast to base models.
Goshiro YAMAMOTO Luiz SAMPAIO Takafumi TAKETOMI Christian SANDOR Hirokazu KATO Tomohiro KURODA
We present a novel method to enable users to experience mobile interaction with digital content on external displays by embedding markers imperceptibly on the screen. Our method consists of two parts: marker embedding on external displays and marker detection. To embed markers, similar to previous work, we display complementary colors in alternating frames, which are selected by considering L*a*b color space in order to make the markers harder for humans to detect. Our marker detection process does not require mobile devices to be synchronized with the display, while certain constraints for the relation between camera and display update rate need to be fulfilled. In this paper, we have conducted three experiments. The results show 1) selecting complementary colors in the a*b* color plane maximizes imperceptibility, 2) our method is extremely robust when used with static contents and can handle animated contents up to certain optical flow levels, and 3) our method was proved to work well in case of small movements, but large movements can lead to loss of tracking.
Sumaru NIIDA Sho TSUGAWA Mutsumi SUGANUMA Naoki WAKAMIYA
The Technical Committee on Communication Behavior Engineering addresses the research question “How do we construct a communication network system that includes users?”. The growth in highly functional networks and terminals has brought about greater diversity in users' lifestyles and freed people from the restrictions of time and place. Under this situation, the similarities of human behavior cause traffic aggregation and generate new problems in terms of the stabilization of network service quality. This paper summarizes previous studies relevant to communication behavior from a multidisciplinary perspective and discusses the research approach adopted by the Technical Committee on Communication Behavior Engineering.
Tao YU Yusuke KUKI Gento MATSUSHITA Daiki MAEHARA Seiichi SAMPEI Kei SAKAGUCHI
Artificial lighting is responsible for a large portion of total energy consumption and has great potential for energy saving. This paper designs an LED light control algorithm based on users' localization using multiple battery-less binary human detection sensors. The proposed lighting control system focuses on reducing office lighting energy consumption and satisfying users' illumination requirement. Most current lighting control systems use infrared human detection sensors, but the poor detection probability, especially for a static user, makes it difficult to realize comfortable and effective lighting control. To improve the detection probability of each sensor, we proposed to locate sensors as close to each user as possible by using a battery-less wireless sensor network, in which all sensors can be placed freely in the space with high energy stability. We also proposed to use a multi-sensor-based user localization algorithm to capture user's position more accurately and realize fine lighting control which works even with static users. The system is actually implemented in an indoor office environment in a pilot project. A verification experiment is conducted by measuring the practical illumination and power consumption. The performance agrees with design expectations. It shows that the proposed LED lighting control system reduces the energy consumption significantly, 57% compared to the batch control scheme, and satisfies user's illumination requirement with 100% probability.
Tsuyoshi HIGASHIGUCHI Toma SHIMOYAMA Norimichi UKITA Masayuki KANBARA Norihiro HAGITA
This paper proposes a method for evaluating a physical gait motion based on a 3D human skeleton measured by a depth sensor. While similar methods measure and evaluate the motion of only a part of interest (e.g., knee), the proposed method comprehensively evaluates the motion of the full body. The gait motions with a variety of physical disabilities due to lesioned body parts are recorded and modeled in advance for gait anomaly detection. This detection is achieved by finding lesioned parts a set of pose features extracted from gait sequences. In experiments, the proposed features extracted from the full body allowed us to identify where a subject was injured with 83.1% accuracy by using the model optimized for the individual. The superiority of the full-body features was validated in in contrast to local features extracted from only a body part of interest (77.1% by lower-body features and 65% by upper-body features). Furthermore, the effectiveness of the proposed full-body features was also validated with single universal model used for all subjects; 55.2%, 44.7%, and 35.5% by the full-body, lower-body, and upper-body features, respectively.
In this letter, a novel and highly efficient haze removal algorithm is proposed for haze removal from only a single input image. The proposed algorithm is built on the atmospheric scattering model. Firstly, global atmospheric light is estimated and coarse atmospheric veil is inferred based on statistics of dark channel prior. Secondly, the coarser atmospheric veil is refined by using a fast Tri-Gaussian filter based on human retina property. To avoid halo artefacts, we then redefine the scene albedo. Finally, the haze-free image is derived by inverting the atmospheric scattering model. Results on some challenging foggy images demonstrate that the proposed method can not only improve the contrast and visibility of the restored image but also expedite the process.
Chanho JUNG Sanghyun JOO Do-Won NAM Wonjun KIM
In this paper, we aim to investigate the potential usefulness of machine learning in image quality assessment (IQA). Most previous studies have focused on designing effective image quality metrics (IQMs), and significant advances have been made in the development of IQMs over the last decade. Here, our goal is to improve prediction outcomes of “any” given image quality metric. We call this the “IQM's Outcome Improvement” problem, in order to distinguish the proposed approach from the existing IQA approaches. We propose a method that focuses on the underlying IQM and improves its prediction results by using machine learning techniques. Extensive experiments have been conducted on three different publicly available image databases. Particularly, through both 1) in-database and 2) cross-database validations, the generality and technological feasibility (in real-world applications) of our machine-learning-based algorithm have been evaluated. Our results demonstrate that the proposed framework improves prediction outcomes of various existing commonly used IQMs (e.g., MSE, PSNR, SSIM-based IQMs, etc.) in terms of not only prediction accuracy, but also prediction monotonicity.
In this paper, we propose a single-channel speech enhancement method for a push-to-talk enabled wireless communication device. The proposed method is based on adaptive weighted β-order spectral amplitude estimation under speech presence uncertainty and enhanced instantaneous phase estimation in order to achieve flexible and effective noise reduction while limiting the speech distortion due to different noise conditions. Experimental results confirm that the proposed method delivers higher voice quality and intelligibility than the reference methods in various noise environments.