Akira KUBOTA Kazuya KODAMA Asami ITO
A pupil function of aperture in image capturing systems is theoretically derived such that one can perfectly reconstruct all-in-focus image through linear filtering of the focal stack. The perfect reconstruction filters are also designed based on the derived pupil function. The designed filters are space-invariant; hence the presented method does not require region segmentation. Simulation results using synthetic scenes shows effectiveness of the derived pupil function and the filters.
Hanxing XUE Jiali YOU Jinlin WANG
Smart-routers develop greatly in recent years as one of the representative products of IoT and Smart home. Different from traditional routers, they have storage and processing capacity. Actually, smart-routers in the same location or ISP have better link conditions and can provide high quality service to each other. Therefore, for the content required services, how to construct the overlay network and efficiently deploy replications of popular content in smart-routers' network are critical. The performance of existing centralized models is limited by the bottleneck of the single point's performance. In order to improve the stability and scalability of the system through the capability of smart-router, we propose a novel intelligent and decentralized content diffusion system in smart-router network. In the system, the content will be quickly and autonomously diffused in the network which follows the specific requirement of coverage rate in neighbors. Furthermore, we design a heuristic node selection algorithm (MIG) and a replacement algorithm (MCL) to assist the diffusion of content. Specifically, system based MIG will select neighbor with the maximum value of information gain to cache the replication. The replication with the least loss of the coverage rate gain will be replaced in the system based on MCL. Through the simulation experiments, at the same requirement of coverage rate, MIG can reduce the number of replications by at least 20.2% compared with other algorithms. Compared with other replacement algorithms, MCL achieves the best successful service rate which means how much ratio of the service can be provided by neighbors. The system based on the MIG and MCL can provide stable service with the lowest bandwidth and storage cost.
Minghao TANG Yuan ZONG Wenming ZHENG Jisheng DAI Jingang SHI Peng SONG
Micro-expression is one type of special facial expressions and usually occurs when people try to hide their true emotions. Therefore, recognizing micro-expressions has potential values in lots of applications, e.g., lie detection. In this letter, we focus on such a meaningful topic and investigate how to make full advantage of the color information provided by the micro-expression samples to deal with the micro-expression recognition (MER) problem. To this end, we propose a novel method called color space fusion learning (CSFL) model to fuse the spatiotemporal features extracted in different color space such that the fused spatiotemporal features would be better at describing micro-expressions. To verify the effectiveness of the proposed CSFL method, extensive MER experiments on a widely-used spatiotemporal micro-expression database SMIC is conducted. The experimental results show that the CSFL can significantly improve the performance of spatiotemporal features in coping with MER tasks.
Zheng FANG Tieyong CAO Jibin YANG Meng SUN
Salient region detection is a fundamental problem in computer vision and image processing. Deep learning models perform better than traditional approaches but suffer from their huge parameters and slow speeds. To handle these problems, in this paper we propose the multi-feature fusion network (MFFN) - a efficient salient region detection architecture based on Convolution Neural Network (CNN). A novel feature extraction structure is designed to obtain feature maps from CNN. A fusion dense block is used to fuse all low-level and high-level feature maps to derive salient region results. MFFN is an end-to-end architecture which does not need any post-processing procedures. Experiments on the benchmark datasets demonstrate that MFFN achieves the state-of-the-art performance on salient region detection and requires much less parameters and computation time. Ablation experiments demonstrate the effectiveness of each module in MFFN.
Zheng FANG Tieyong CAO Jibin YANG Meng SUN
Saliency detection is widely used in many vision tasks like image retrieval, compression and person re-identification. The deep-learning methods have got great results but most of them focused more on the performance ignored the efficiency of models, which were hard to transplant into other applications. So how to design a efficient model has became the main problem. In this letter, we propose parallel feature network, a saliency model which is built on convolution neural network (CNN) by a parallel method. Parallel dilation blocks are first used to extract features from different layers of CNN, then a parallel upsampling structure is adopted to upsample feature maps. Finally saliency maps are obtained by fusing summations and concatenations of feature maps. Our final model built on VGG-16 is much smaller and faster than existing saliency models and also achieves state-of-the-art performance.
Yuma KINOSHITA Sayaka SHIOTA Hitoshi KIYA
This paper proposes a novel pseudo multi-exposure image fusion method based on a single image. Multi-exposure image fusion is used to produce images without saturation regions, by using photos with different exposures. However, it is difficult to take photos suited for the multi-exposure image fusion when we take a photo of dynamic scenes or record a video. In addition, the multi-exposure image fusion cannot be applied to existing images with a single exposure or videos. The proposed method enables us to produce pseudo multi-exposure images from a single image. To produce multi-exposure images, the proposed method utilizes the relationship between the exposure values and pixel values, which is obtained by assuming that a digital camera has a linear response function. Moreover, it is shown that the use of a local contrast enhancement method allows us to produce pseudo multi-exposure images with higher quality. Most of conventional multi-exposure image fusion methods are also applicable to the proposed multi-exposure images. Experimental results show the effectiveness of the proposed method by comparing the proposed one with conventional ones.
Jingjie YAN Guanming LU Xiaodong BAI Haibo LI Ning SUN Ruiyu LIANG
In this letter, we propose a supervised bimodal emotion recognition approach based on two important human emotion modalities including facial expression and body gesture. A effectively supervised feature fusion algorithms named supervised multiset canonical correlation analysis (SMCCA) is presented to established the linear connection between three sets of matrices, which contain the feature matrix of two modalities and their concurrent category matrix. The test results in the bimodal emotion recognition of the FABO database show that the SMCCA algorithm can get better or considerable efficiency than unsupervised feature fusion algorithm covering canonical correlation analysis (CCA), sparse canonical correlation analysis (SCCA), multiset canonical correlation analysis (MCCA) and so on.
A fusion framework between CNN and RNN is proposed dedicatedly for air-writing recognition. By modeling the air-writing using both spatial and temporal features, the proposed network can learn more information than existing techniques. Performance of the proposed network is evaluated by using the alphabet and numeric datasets in the public database namely the 6DMG. Average accuracy of the proposed fusion network outperforms other techniques, i.e. 99.25% and 99.83% are observed in the alphabet gesture and the numeric gesture, respectively. Simplified structure of RNN is also proposed, which can attain about two folds speed-up of ordinary BLSTM network. It is also confirmed that only the distance between consecutive sampling points is enough to attain high recognition performance.
Wei LI Yi WU Chunlin SHEN Huajun GONG
We present a system to improve the robustness of real-time 3D surface reconstruction by utilizing non-inertial localization sensor. Benefiting from such sensor, our easy-to-build system can effectively avoid tracking drift and lost comparing with conventional dense tracking and mapping systems. To best fusing the sensor, we first adopt a hand-eye calibration and performance analysis for our setup and then propose a novel optimization framework based on adaptive criterion function to improve the robustness as well as accuracy. We apply our system to several challenging reconstruction tasks, which show significant improvement in scanning robustness and reconstruction quality.
Ruicong ZHI Ghada ZAMZMI Dmitry GOLDGOF Terri ASHMEADE Tingting LI Yu SUN
The accurate assessment of infants' pain is important for understanding their medical conditions and developing suitable treatment. Pediatric studies reported that the inadequate treatment of infants' pain might cause various neuroanatomical and psychological problems. The fact that infants can not communicate verbally motivates increasing interests to develop automatic pain assessment system that provides continuous and accurate pain assessment. In this paper, we propose a new set of pain facial activity features to describe the infants' facial expression of pain. Both dynamic facial texture feature and dynamic geometric feature are extracted from video sequences and utilized to classify facial expression of infants as pain or no pain. For the dynamic analysis of facial expression, we construct spatiotemporal domain representation for texture features and time series representation (i.e. time series of frame-level features) for geometric features. Multiple facial features are combined through both feature fusion and decision fusion schemes to evaluate their effectiveness in infants' pain assessment. Experiments are conducted on the video acquired from NICU infants, and the best accuracy of the proposed pain assessment approaches is 95.6%. Moreover, we find that although decision fusion does not perform better than that of feature fusion, the False Negative Rate of decision fusion (6.2%) is much lower than that of feature fusion (25%).
Automatically recognizing pain and estimating pain intensity is an emerging research area that has promising applications in the medical and healthcare field, and this task possesses a crucial role in the diagnosis and treatment of patients who have limited ability to communicate verbally and remains a challenge in pattern recognition. Recently, deep learning has achieved impressive results in many domains. However, deep architectures require a significant amount of labeled data for training, and they may fail to outperform conventional handcrafted features due to insufficient data, which is also the problem faced by pain detection. Furthermore, the latest studies show that handcrafted features may provide complementary information to deep-learned features; hence, combining these features may result in improved performance. Motived by the above considerations, in this paper, we propose an innovative method based on the combination of deep spatiotemporal and handcrafted features for pain intensity estimation. We use C3D, a deep 3-dimensional convolutional network that takes a continuous sequence of video frames as input, to extract spatiotemporal facial features. C3D models the appearance and motion of videos simultaneously. For handcrafted features, we propose extracting the geometric information by computing the distance between normalized facial landmarks per frame and the ones of the mean face shape, and we extract the appearance information using the histogram of oriented gradients (HOG) features around normalized facial landmarks per frame. Two levels of SVRs are trained using spatiotemporal, geometric and appearance features to obtain estimation results. We tested our proposed method on the UNBC-McMaster shoulder pain expression archive database and obtained experimental results that outperform the current state-of-the-art.
Naomi YAMASHITA Yuya OTA Faiz SALLEH Mani NAVANEETHAN Masaru SHIMOMURA Kenji MURAKAMI Hiroya IKEDA
With the aim of characterizing the thermal conductivity for nanometer-scale thermoelectric materials, we have constructed a new measurement system based on ac calorimetry. Analysis of the obtained data requires time-evolution of temperature distribution in nanometer-scale material under periodic heating. In this study, we made a simulation using a C#-program for time-dependent temperature distribution, based on 2-dimensional heat-diffusion equation including the influence of heat emission from material edges. The simulation was applied to AlN with millimeter-scale dimensions for confirming the validity and accuracy. The simulated thermal diffusivity for 10×75-mm2-area AlN was 1.3×10-4 m2/s, which was larger than the value set in the heat-diffusion equation. This overestimation was also observed in the experiment. Therefore, our simulation can reproduce the unsteady heat conduction and be used for analyzing the ac calorimetry experiment.
Hainan ZHANG Yanjing SUN Song LI Wenjuan SHI Chenglong FENG
The correlation filter-based trackers with an appearance model established by single feature have poor robustness to challenging video environment which includes factors such as occlusion, fast motion and out-of-view. In this paper, a long-term tracking algorithm based on multi-feature adaptive fusion for video target is presented. We design a robust appearance model by fusing powerful features including histogram of gradient, local binary pattern and color-naming at response map level to conquer the interference in the video. In addition, a random fern classifier is trained as re-detector to detect target when tracking failure occurs, so that long-term tracking is implemented. We evaluate our algorithm on large-scale benchmark datasets and the results show that the proposed algorithm have more accurate and more robust performance in complex video environment.
Motofumi NAKANISHI Shintaro IZUMI Mio TSUKAHARA Hiroshi KAWAGUCHI Hiromitsu KIMURA Kyoji MARUMOTO Takaaki FUCHIKAMI Yoshikazu FUJIMORI Masahiko YOSHIMOTO
This paper presents an algorithm for a physical activity (PA) classification and metabolic equivalents (METs) monitoring and its System-on-a-Chip (SoC) implementation to realize both power reduction and high estimation accuracy. Long-term PA monitoring is an effective means of preventing lifestyle-related diseases. Low power consumption and long battery life are key features supporting the wider dissemination of the monitoring system. As described herein, an adaptive sampling method is implemented for longer battery life by minimizing the active rate of acceleration without decreasing accuracy. Furthermore, advanced PA classification using both the heart rate and acceleration is introduced. The proposed algorithms are evaluated by experimentation with eight subjects in actual conditions. Evaluation results show that the root mean square error with respect to the result of processing with fixed sampling rate is less than 0.22[METs], and the mean absolute error is less than 0.06[METs]. Furthermore, to minimize the system-level power dissipation, a dedicated SoC is implemented using 130-nm CMOS process with FeRAM. A non-volatile CPU using non-volatile memory and a flip-flop is used to reduce the stand-by power. The proposed algorithm, which is implemented using dedicated hardware, reduces the active rate of the CPU and accelerometer. The current consumption of the SoC is less than 3-µA. And the evaluation system using the test chip achieves 74% system-level power reduction. The total current consumption including that of the accelerometer is 11.3-µA on average.
Ying-Yao TING Chi-Wei HSIAO Huan-Sheng WANG
To prevent constraints or defects of a single sensor from malfunctions, this paper proposes a fire detection system based on the Dempster-Shafer theory with multi-sensor technology. The proposed system operates in three stages: measurement, data reception and alarm activation, where an Arduino is tasked with measuring and interpreting the readings from three types of sensors. Sensors under consideration involve smoke, light and temperature detection. All the measured data are wirelessly transmitted to the backend Raspberry Pi for subsequent processing. Within the system, the Raspberry Pi is used to determine the probability of fire events using the Dempster-Shafer theory. We investigate moderate settings of the conflict coefficient and how it plays an essential role in ensuring the plausibility of the system's deduced results. Furthermore, a MySQL database with a web server is deployed on the Raspberry Pi for backlog and data analysis purposes. In addition, the system provides three notification services, including web browsing, smartphone APP, and short message service. For validation, we collected the statistics from field tests conducted in a controllable and safe environment by emulating fire events happening during both daytime and nighttime. Each experiment undergoes the No-fire, On-fire and Post-fire phases. Experimental results show an accuracy of up to 98% in both the No-fire and On-fire phases during the daytime and an accuracy of 97% during the nighttime under reasonable conditions. When we take the three phases into account, the accuracy in the daytime and nighttime increase to 97% and 89%, respectively. Field tests validate the efficiency and accuracy of the proposed system.
Yinghui ZHANG Hongjun WANG Hengxue ZHOU Ping DENG
Image boundary detection or image segmentation is an important step in image analysis. However, choosing appropriate parameters for boundary detection algorithms is necessary to achieve good boundary detection results. Image boundary detection fusion with unsupervised parameters can output a final consensus boundary, which is generally better than using unsupervised or supervised image boundary detection algorithms. In this study, we theoretically examine why image boundary detection fusion can work well and we propose a mixture model for image boundary detection fusion (MMIBDF) to achieve good consensus segmentation in an unsupervised manner. All of the segmentation algorithms are treated as new features and the segmentation results obtained by the algorithms are the values of the new features. The MMIBDF is designed to sample the boundary according to a discrete distribution. We present an inference method for MMIBDF and describe the corresponding algorithm in detail. Extensive empirical results demonstrate that MMIBDF significantly outperforms other image boundary detection fusion algorithms and the base image boundary detection algorithms according to most performance indices.
Jinhua WANG Weiqiang WANG Guangmei XU Hongzhe LIU
In this paper, we describe the direct learning of an end-to-end mapping between under-/over-exposed images and well-exposed images. The mapping is represented as a deep convolutional neural network (CNN) that takes multiple-exposure images as input and outputs a high-quality image. Our CNN has a lightweight structure, yet gives state-of-the-art fusion quality. Furthermore, we know that for a given pixel, the influence of the surrounding pixels gradually increases as the distance decreases. If the only pixels considered are those in the convolution kernel neighborhood, the final result will be affected. To overcome this problem, the size of the convolution kernel is often increased. However, this also increases the complexity of the network (too many parameters) and the training time. In this paper, we present a method in which a number of sub-images of the source image are obtained using the same CNN model, providing more neighborhood information for the convolution operation. Experimental results demonstrate that the proposed method achieves better performance in terms of both objective evaluation and visual quality.
Takao MURAKAMI Yosuke KAGA Kenta TAKAHASHI
The likelihood-ratio based score level fusion (LR fusion) scheme is known as one of the most promising multibiometric fusion schemes. This scheme verifies a user by computing a log-likelihood ratio (LLR) for each modality, and comparing the total LLR to a threshold. It can happen in practice that genuine LLRs tend to be less than 0 for some modalities (e.g., the user is a “goat”, who is inherently difficult to recognize, for some modalities; the user suffers from temporary physical conditions such as injuries and illness). The LR fusion scheme can handle such cases by allowing the user to select a subset of modalities at the authentication phase and setting LLRs corresponding to missing query samples to 0. A recent study, however, proposed a modality selection attack, in which an impostor inputs only query samples whose LLRs are greater than 0 (i.e., takes an optimal strategy), and proved that this attack degrades the overall accuracy even if the genuine user also takes this optimal strategy. In this paper, we investigate the impact of the modality selection attack in more details. Specifically, we investigate whether the overall accuracy is improved by eliminating “goat” templates, whose LLRs tend to be less than 0 for genuine users, from the database (i.e., restricting modality selection). As an overall performance measure, we use the KL (Kullback-Leibler) divergence between a genuine score distribution and an impostor's one. We first prove the modality restriction hardly increases the KL divergence when a user can select a subset of modalities (i.e., selective LR fusion). We second prove that the modality restriction increases the KL divergence when a user needs to input all biometric samples (i.e., non-selective LR fusion). We conduct experiments using three real datasets (NIST BSSR1 Set1, Biosecure DS2, and CASIA-Iris-Thousand), and discuss directions of multibiometric fusion systems.
Zijie WANG Qin LIU Takeshi IKENAGA
High-dynamic-range imaging (HDRI) technologies aim to extend the dynamic range of luminance against the limitation of camera sensors. Irradiance information of a scene can be reconstructed by fusing multiple low-dynamic-range (LDR) images with different exposures. The key issue is removing ghost artifacts caused by motion of moving objects and handheld cameras. This paper proposes a robust ghost-free HDRI algorithm by visual salience based bilateral motion detection and stack extension based exposure fusion. For ghost areas detection, visual salience is introduced to measure the differences between multiple images; bilateral motion detection is employed to improve the accuracy of labeling motion areas. For exposure fusion, the proposed algorithm reduces the discontinuity of brightness by stack extension and rejects the information of ghost areas to avoid artifacts via fusion masks. Experiment results show that the proposed algorithm can remove ghost artifacts accurately for both static and handheld cameras, remain robust to scenes with complex motion and keep low complexity over recent advances including rank minimization based method and patch based method by 63.6% and 20.4% time savings averagely.
The history of optical fiber and optical transmission technologies has been described in many publications. However, the history of other technologies designed to support the physical layer of optical transmission has not been described in much detail. I would like to highlight those technologies in addition to optical fibers. Therefore, this paper describes the history of the development of optical fiber related technologies such as fusion splicers, optical fiber connectors, ribbon fiber, and passive components based on changes in optical fibers and optical fiber cables. Moreover, I describe technologies designed to support multi-core fibers such as fan-in/fan-out devices.