To understand human emotion, it is necessary to be aware of the surrounding situation and individual personalities. In most previous studies, however, these important aspects were not considered. Emotion recognition has been considered as a classification problem. In this paper, we attempt new approaches to utilize a person's situational information and personality for use in understanding emotion. We propose a method of extracting situational information and building a personalized emotion model for reflecting the personality of each character in the text. To extract and utilize situational information, we propose a situation model using lexical and syntactic information. In addition, to reflect the personality of an individual, we propose a personalized emotion model using KBANN (Knowledge-based Artificial Neural Network). Our proposed system has the advantage of using a traditional keyword-spotting algorithm. In addition, we also reflect the fact that the strength of emotion decreases over time. Experimental results show that the proposed system can more accurately and intelligently recognize a person's emotion than previous methods.
Kenshi SAHO Takuya SAKAMOTO Toru SATO Kenichi INOUE Takeshi FUKUDA
The classification of human motion is an important aspect of monitoring pedestrian traffic. This requires the development of advanced surveillance and monitoring systems. Methods to achieve this have been proposed using micro-Doppler radars. However, reliable long-term data and/or complicated procedures are needed to classify motion accurately with these conventional methods because their accuracy and real-time capabilities are invariably inadequate. This paper proposes an accurate and real-time method for classifying the movements of pedestrians using ultra wide-band (UWB) Doppler radar to overcome these problems. The classification of various movements is achieved by extracting feature parameters based on UWB Doppler radar images and their radial velocity distributions. Experiments were carried out assuming six types of pedestrian movements (pedestrians swinging both arms, swinging only one arm, swinging no arms, on crutches, pushing wheelchairs, and seated in wheelchairs). We found they could be classified using the proposed feature parameters and a k-nearest neighbor algorithm. A classification accuracy of 96% was achieved with a mean calculation time of 0.55s. Moreover, the classification accuracy was 99% using our proposed method for classifying three groups of pedestrian movements (normal walkers, those on crutches, and those in wheelchairs).
Linfeng XU Liaoyuan ZENG Zhengning WANG
In this letter, we use the saliency maps obtained by several bottom-up methods to learn a model to generate a bottom-up saliency map. In order to consider top-down image semantics, we use the high-level features of objectness and background probability to learn a top-down saliency map. The bottom-up map and top-down map are combined through a two-layer structure. Quantitative experiments demonstrate that the proposed method and features are effective to predict human fixation.
Hongbo ZHANG Shaozi LI Songzhi SU Shu-Yuan CHEN
Many successful methods for recognizing human action are spatio-temporal interest point (STIP) based methods. Given a test video sequence, for a matching-based method using a voting mechanism, each test STIP casts a vote for each action class based on its mutual information with respect to the respective class, which is measured in terms of class likelihood probability. Therefore, two issues should be addressed to improve the accuracy of action recognition. First, effective STIPs in the training set must be selected as references for accurately estimating probability. Second, discriminative STIPs in the test set must be selected for voting. This work uses ε-nearest neighbors as effective STIPs for estimating the class probability and uses a variance filter for selecting discriminative STIPs. Experimental results verify that the proposed method is more accurate than existing action recognition methods.
Keisuke DOHI Kazuhiro NEGI Yuichiro SHIBATA Kiyoshi OGURI
We implement external memory-free deep pipelined FPGA implementation including HOG feature extraction and AdaBoost classification. To construct our design by compact FPGA, we introduce some simplifications of the algorithm and aggressive use of stream oriented architectures. We present comparison results between our simplified fixed-point scheme and an original floating-point scheme in terms of quality of results, and the results suggest the negative impact of the simplified scheme for hardware implementation is limited. We empirically show that, our system is able to detect human from 640480 VGA images at up to 112 FPS on a Xilinx Virtex-5 XC5VLX50 FPGA.
A specification for digital cinema systems which deal with movies digitally from production to delivery as well as projection on the screens is recommended by DCI (Digital Cinema Initiative), and the systems based on this specification have already been developed and installed in theaters. The parameters of the systems that play an important role in determining image quality include image resolution, quantization bit depth, color space, gamma characteristics, and data compression methods. This paper comparatively discusses a relation between required bit depth and gamma quantization using both of a human visual system for grayscale images and two color difference models for color images. The required bit depth obtained from a contrast sensitivity function against grayscale images monotonically decreases as the gamma value increases, while it has a minimum value when the gamma is 2.9 to 3.0 from both of the CIE 1976 L* a* b* and CIEDE2000 color difference models. It is also shown that the bit depth derived from the contrast sensitivity function is one bit greater than that derived from the color difference models at the gamma value of 2.6. Moreover, a comparison between the color differences computed with the CIE 1976 L* a* b* and CIEDE2000 leads to a same result from the view point of the required bit depth for digital cinema systems.
Sumaru NIIDA Satoshi UEMURA Etsuko T. HARADA
As mobile multimedia services expand, user behavior will become more diverse and the control of service quality from the user's perspective will become more important in service design. The quality of the network is one of the critical factors determining mobile service quality. However, this has mainly been evaluated in objective physical terms, such as delay reduction and bandwidth expansion. It is less common to use a human-centered design viewpoint when improving network performance. In this paper, we discuss ways to improve the quality of web services using time-fillers that actively address the human factors to improve the subjective quality of a mobile network. A field experiment was conducted, using a prototype. The results of the field experiment show that time-fillers can significantly decrease user dissatisfaction with waiting, but that this effect is strongly influenced by user preferences concerning content. Based on these results, we discuss the design requirements for effective use of time-fillers.
Akihiro MAEHIGASHI Kazuhisa MIWA Hitoshi TERAI Kazuaki KOJIMA Junya MORITA
This study investigated the relationship between human use of automation and their sensitivity to changes in automation and manual performance. In the real world, automation and manual performance change dynamically with changes in the environment. However, a few studies investigated whether changes in automation or manual performance have more effect on whether users choose to use automation. We used two types of experimental tracking tasks in which the participants had to select whether to use automation or conduct manual operation while monitoring the variable performance of automation and manual operation. As a result, we found that there is a mutual relationship between human use of automation and their sensitivity to automation and manual performance changes. Also, users do not react equally to both automation and manual performance changes although they use automation adequately.
Toshihiko YAMASAKI Tomoaki MATSUNAMI Tuhan CHEN
This paper presents a technique that analyzes pedestrians' attributes such as gender and bag-possession status from surveillance video. One of the technically challenging issues is that we use only top-view camera images to protect privacy. The shape features over the frames are extracted by bag-of-features (BoF) using histogram of oriented gradients (HoG) vectors. In order to enhance the classification accuracy, a two-staged classification framework is presented. Multiple classifiers are trained by changing the parameters in the first stage. The outputs from the first stage is further trained and classified in the second stage classifier. The experiments using 60-minute video captured at Haneda Airport, Japan, show that the accuracies for the gender classification and the bag-possession classification were 95.8% and 97.2%, respectively, which is a significant improvement from our previous work.
Akisato KIMURA Ryo YONETANI Takatsugu HIRAYAMA
We humans are easily able to instantaneously detect the regions in a visual scene that are most likely to contain something of interest. Exploiting this pre-selection mechanism called visual attention for image and video processing systems would make them more sophisticated and therefore more useful. This paper briefly describes various computational models of human visual attention and their development, as well as related psychophysical findings. In particular, our objective is to carefully distinguish several types of studies related to human visual attention and saliency as a measure of attentiveness, and to provide a taxonomy from several viewpoints such as the main objective, the use of additional cues and mathematical principles. This survey finally discusses possible future directions for research into human visual attention and saliency computation.
Luong Pham VAN Hoyoung LEE Jaehwan KIM Byeungwoo JEON
Blocking artifacts are introduced in many block-based coding systems, and its reduction can significantly improve the subjective quality of compressed video. The H.264/AVC uses an in-loop deblocking filter to remove the blocking artifacts. The filter considers some coding conditions in its adaptive deblocking filtering such as coded block pattern (CBP), motion vector, macroblock type, etc. for inter-predicted blocks, however, it does not consider much for intra-coded blocks. In this paper, we utilize the human visual system (HVS) characteristic and the local characteristic of image blocks to modify the boundary strength (BS) of the intra-deblocking filter in order to gain improvement in the subjective quality and also to reduce the complexity in filtering intra coded slices. In addition, we propose a low-complexity deblocking method which utilizes the correlation between vertical and horizontal boundaries of a block in inter coded slices. Experimental results show that our proposed method achieves not only significant gain in the subjective quality but also some PSNR gain, and reduces the computational complexity of the deblocking filter by 36.23% on average.
Xue ZHANG Anhong WANG Bing ZENG Lei LIU Zhuo LIU
Numerous examples in image processing have demonstrated that human visual perception can be exploited to improve processing performance. This paper presents another showcase in which some visual information is employed to guide adaptive block-wise compressive sensing (ABCS) for image data, i.e., a varying CS-sampling rate is applied on different blocks according to the visual contents in each block. To this end, we propose a visual analysis based on the discrete cosine transform (DCT) coefficients of each block reconstructed at the decoder side. The analysis result is sent back to the CS encoder, stage-by-stage via a feedback channel, so that we can decide which blocks should be further CS-sampled and what is the extra sampling rate. In this way, we can perform multiple passes of reconstruction to improve the quality progressively. Simulation results show that our scheme leads to a significant improvement over the existing ones with a fixed sampling rate.
Kenshi SAHO Takuya SAKAMOTO Toru SATO Kenichi INOUE Takeshi FUKUDA
The imaging of humans using radar is promising for surveillance systems. Although conventional radar systems detect the presence or position of intruders, it is difficult to acquire shape and motion details because the resolution is insufficient. This paper presents a high-resolution human imaging algorithm for an ultra-wideband (UWB) Doppler radar. The proposed algorithm estimates three-dimensional human images using interferometry and, using velocity information, rejects false images created by the interference of body parts. Experiments verify that our proposed algorithm achieves adequate pedestrian imaging. In addition, accurate shape and motion parameters are extracted from the estimated images.
Shogo MORI Gosuke OHASHI Yoshifumi SHIMODAIRA
This study examines the robustness of image quality factors in various types of environment illumination using a parameter design in the field of quality engineering. Experimental results revealed that image quality factors are influenced by environment illuminations in the following order: minimum luminance, maximum luminance and gamma.
Bobo ZENG Guijin WANG Xinggang LIN Chunxiao LIU
This work presents a real-time human detection system for VGA (Video Graphics Array, 640480) video, which well suits visual surveillance applications. To achieve high running speed and accuracy, firstly we design multiple fast scalar feature types on the gradient channels, and experimentally identify that NOGCF (Normalized Oriented Gradient Channel Feature) has better performance with Gentle AdaBoost in cascaded classifiers. A confidence measure for cascaded classifiers is developed and utilized in the subsequent tracking stage. Secondly, we propose to use speedup techniques including a detector pyramid for multi-scale detection and channel compression for integral channel calculation respectively. Thirdly, by integrating the detector's discrete detected humans and continuous detection confidence map, we employ a two-layer tracking by detection algorithm for further speedup and accuracy improvement. Compared with other methods, experiments show the system is significantly faster with 20 fps running speed in VGA video and has better accuracy as well.
Xin LIAO Qiaoyan WEN Jie ZHANG
In this letter, a novel steganographic method with four-pixel differencing and exploiting modification direction is proposed. Secret data are embedded into each four-pixel block by adaptively applying exploiting modification direction technique. The difference value of the four-pixel block is used to judge whether the pixels in edge areas can tolerate larger changes than those in smooth areas. The readjustment guarantees to extract the secret data exactly and to minimize the embedding distortion. Since the proposed method processes non-overlapping 22 pixels blocks instead of two consecutive pixels, the features of edge can be considered sufficiently. Compared with the previous method, experimental results show that the proposed method provides better performance, i.e., larger embedding capacity and better image quality.
Jegoon RYU Sei-ichiro KAMATA Alireza AHRARY
In this paper, we propose a novel gait recognition framework - Spherical Space Model with Human Point Clouds (SSM-HPC) to recognize front view of human gait. A new gait representation - Marching in Place (MIP) gait is also introduced which preserves the spatiotemporal characteristics of individual gait manner. In comparison with the previous studies on gait recognition which usually use human silhouette images from image sequences, this research applies three dimensional (3D) point clouds data of human body obtained from stereo camera. The proposed framework exhibits gait recognition rates superior to those of other gait recognition methods.
Kazuki MATSUDA Norimichi UKITA
This paper proposes a method for reconstructing a smooth and accurate 3D surface. Recent machine vision techniques can reconstruct accurate 3D points and normals of an object. The reconstructed point cloud is used for generating its 3D surface by surface reconstruction. The more accurate the point cloud, the more correct the surface becomes. For improving the surface, how to integrate the advantages of existing techniques for point reconstruction is proposed. Specifically, robust and dense reconstruction with Shape-from-Silhouettes (SfS) and accurate stereo reconstruction are integrated. Unlike gradual shape shrinking by space carving, our method obtains 3D points by SfS and stereo independently and accepts the correct points reconstructed. Experimental results show the improvement by our method.
Masaki WAKI Shigenori URUNO Hiroyuki OHASHI Tetsuya MANABE Yuji AZUMA
We propose an optical fiber connection navigation system that uses visible light communication for an integrated distribution module in a central office. The system realizes an accurate database, requires less skilled work to operate and eliminates human error. This system can achieve a working time reduction of up to 88.0% compared with the conventional work without human error for the connection/removal of optical fiber cords, and is economical as regards installation and operation.
Recent advances in 3-D technologies draw an interest on the just noticeable difference in depth (JNDD) that describes a perceptual threshold of depth differences. In this letter, we address a new application of the JNDD to the depth image enhancement. In the proposed algorithm, a depth image is first segmented into multiple layers and then the depth range of the layer is expanded if the depth difference between adjacent layers is smaller than the JNDD. Therefore, viewers can effectively perceive the depth differences between layers and thus the human depth perception can be improved. The proposed algorithm can be applied to any depth-based 3-D display applications.