Shumpei YAMASAKI Daiki NOBAYASHI Kazuya TSUKAMOTO Takeshi IKENAGA Myung J. LEE
With the development and spread of Internet of Things technologies, various types of data for IoT applications can be generated anywhere and at any time. Among such data, there are data that depend heavily on generation time and location. We define these data as spatio-temporal data (STD). In previous studies, we proposed a STD retention system using vehicular networks to achieve the “Local production and consumption of STD” paradigm. The system can quickly provide STD for users within a specific location by retaining the STD within the area. However, this system does not take into account that each type of STD has different requirements for STD retention. In particular, the lifetime of STD and the diffusion time to the entire area directly influence the performance of STD retention. Therefore, we propose an efficient diffusion and elimination control method for retention based on the requirements of STD. The results of simulation evaluation demonstrated that the proposed method can satisfy the requirements of STD, while maintaining a high coverage rate in the area.
Jinhua WANG Xuewei LI Hongzhe LIU
At present, the generative adversarial network (GAN) plays an important role in learning tasks. The basic idea of a GAN is to train the discriminator and generator simultaneously. A GAN-based inverse tone mapping method can generate high dynamic range (HDR) images corresponding to a scene according to multiple image sequences of a scene with different exposures. However, subsequent tone mapping algorithm processing is needed to display it on a general device. This paper proposes an end-to-end multi-exposure image fusion algorithm based on a relative GAN (called RaGAN-EF), which can fuse multiple image sequences with different exposures directly to generate a high-quality image that can be displayed on a general device without further processing. The RaGAN is used to design the loss function, which can retain more details in the source images. In addition, the number of input image sequences of multi-exposure image fusion algorithms is often uncertain, which limits the application of many existing GANs. This paper proposes a convolutional layer with weights shared between channels, which can solve the problem of variable input length. Experimental results demonstrate that the proposed method performs better in terms of both objective evaluation and visual quality.
Yuanbo FANG Hongliang FU Huawei TAO Ruiyu LIANG Li ZHAO
Speech based deception detection using deep learning is one of the technologies to realize a deception detection system with high recognition rate in the future. Multi-network feature extraction technology can effectively improve the recognition performance of the system, but due to the limited labeled data and the lack of effective feature fusion methods, the performance of the network is limited. Based on this, a novel hybrid network model based on attentional multi-feature fusion (HN-AMFF) is proposed. Firstly, the static features of large amounts of unlabeled speech data are input into DAE for unsupervised training. Secondly, the frame-level features and static features of a small amount of labeled speech data are simultaneously input into the LSTM network and the encoded output part of DAE for joint supervised training. Finally, a feature fusion algorithm based on attention mechanism is proposed, which can get the optimal feature set in the training process. Simulation results show that the proposed feature fusion method is significantly better than traditional feature fusion methods, and the model can achieve advanced performance with only a small amount of labeled data.
Kazuya TSUKAMOTO Hitomi TAMURA Yuzo TAENAKA Daiki NOBAYASHI Hiroshi YAMAMOTO Takeshi IKENAGA Myung LEE
In IoT era, the growth of data variety is driven by cross-domain data fusion. In this paper, we advocate that “local production for local consumption (LPLC) paradigm” can be an innovative approach in cross-domain data fusion, and propose a new framework, geolocation-centric information platform (GCIP) that can produce and deliver diverse spatio-temporal content (STC). In the GCIP, (1) infrastructure-based geographic hierarchy edge network and (2) adhoc-based STC retention system are interplayed to provide both of geolocation-awareness and resiliency. Then, we discussed the concepts and the technical challenges of the GCIP. Finally, we implemented a proof-of-concepts of GCIP and demonstrated its efficacy through practical experiments on campus IPv6 network and simulation experiments.
Wentao LYU Qiqi LIN Lipeng GUO Chengqun WANG Zhenyi YANG Weiqiang XU
In this paper, we present a novel method for vehicle detection based on the Faster R-CNN frame. We integrate MobileNet into Faster R-CNN structure. First, the MobileNet is used as the base network to generate the feature map. In order to retain the more information of vehicle objects, a fusion strategy is applied to multi-layer features to generate a fused feature map. The fused feature map is then shared by region proposal network (RPN) and Fast R-CNN. In the RPN system, we employ a novel dimension cluster method to predict the anchor sizes, instead of choosing the properties of anchors manually. Our detection method improves the detection accuracy and saves computation resources. The results show that our proposed method respectively achieves 85.21% and 91.16% on the mean average precision (mAP) for DIOR dataset and UA-DETRAC dataset, which are respectively 1.32% and 1.49% improvement than Faster R-CNN (ResNet152). Also, since less operations and parameters are required in the base network, our method costs the storage size of 42.52MB, which is far less than 214.89MB of Faster R-CNN(ResNet50).
Kouki SEO Chihiro GO Yuma KINOSHITA Hitoshi KIYA
We propose a novel hue-correction scheme for multi-exposure image fusion (MEF). Various MEF methods have so far been studied to generate higher-quality images. However, there are few MEF methods considering hue distortion unlike other fields of image processing, due to a lack of a reference image that has correct hue. In the proposed scheme, we generate an HDR image as a reference for hue correction, from input multi-exposure images. After that, hue distortion in images fused by an MEF method is removed by using hue information of the HDR one, on the basis of the constant-hue plane in the RGB color space. In simulations, the proposed scheme is demonstrated to be effective to correct hue-distortion caused by conventional MEF methods. Experimental results also show that the proposed scheme can generate high-quality images, regardless of exposure conditions of input multi-exposure images.
Xue NI Huali WANG Ying ZHU Fan MENG
Low Probability of Intercept (LPI) radar waveform has complex and diverse modulation schemes, which cannot be easily identified by the traditional methods. The research on intrapulse modulation LPI radar waveform recognition has received increasing attention. In this paper, we propose an automatic LPI radar waveform recognition algorithm that uses a multi-resolution fusion convolutional neural network. First, signals embedded within the noise are processed using Choi-William Distribution (CWD) to obtain time-frequency feature images. Then, the images are resized by interpolation and sent to the proposed network for training and identification. The network takes a dual-channel CNN structure to obtain features at different resolutions and makes features fusion by using the concatenation and Inception module. Extensive simulations are carried out on twelve types of LPI radar waveforms, including BPSK, Costas, Frank, LFM, P1~P4, and T1~T4, corrupted with additive white Gaussian noise of SNR from 10dB to -8dB. The results show that the overall recognition rate of the proposed algorithm reaches 95.1% when the SNR is -6dB. We also try various sample selection methods related to the recognition task of the system. The conclusion is that reducing the samples with SNR above 2dB or below -8dB can effectively improve the training speed of the network while maintaining recognition accuracy.
Sung-Woon JUNG Hyuk-Ju KWON Dong-Min SON Sung-Hak LEE
High dynamic range (HDR) imaging refers to digital image processing that modifies the range of color and contrast to enhance image visibility. To create an HDR image, two or more images that include various information are needed. In order to convert low dynamic range (LDR) images to HDR images, we consider the possibility of using a generative adversarial network (GAN) as an appropriate deep neural network. Deep learning requires a great deal of data in order to build a module, but once the module is created, it is convenient to use. In this paper, we propose a weight map for local luminance based on learning to reconstruct locally tone-mapped images.
Junxing ZHANG Shuo YANG Chunjuan BO Huimin LU
Vehicle logo detection technology is one of the research directions in the application of intelligent transportation systems. It is an important extension of detection technology based on license plates and motorcycle types. A vehicle logo is characterized by uniqueness, conspicuousness, and diversity. Therefore, thorough research is important in theory and application. Although there are some related works for object detection, most of them cannot achieve real-time detection for different scenes. Meanwhile, some real-time detection methods of single-stage have performed poorly in the object detection of small sizes. In order to solve the problem that the training samples are scarce, our work in this paper is improved by constructing the data of a vehicle logo (VLD-45-S), multi-stage pre-training, multi-scale prediction, feature fusion between deeper with shallow layer, dimension clustering of the bounding box, and multi-scale detection training. On the basis of keeping speed, this article improves the detection precision of the vehicle logo. The generalization of the detection model and anti-interference capability in real scenes are optimized by data enrichment. Experimental results show that the accuracy and speed of the detection algorithm are improved for the object of small sizes.
In this paper, we propose a deep model of visual recognition based on hybrid KPCA Network(H-KPCANet), which is based on the combination of one-stage KPCANet and two-stage KPCANet. The proposed model consists of four types of basic components: the input layer, one-stage KPCANet, two-stage KPCANet and the fusion layer. The role of one-stage KPCANet is to calculate the KPCA filters for convolution layer, and two-stage KPCANet is to learn PCA filters in the first stage and KPCA filters in the second stage. After binary quantization mapping and block-wise histogram, the features from two different types of KPCANets are fused in the fusion layer. The final feature of the input image can be achieved by weighted serial combination of the two types of features. The performance of our proposed algorithm is tested on digit recognition and object classification, and the experimental results on visual recognition benchmarks of MNIST and CIFAR-10 validated the performance of the proposed H-KPCANet.
Jianmei ZHANG Pengyu WANG Feiyang GONG Hongqing ZHU Ning CHEN
Finding the correspondence between two images of the same object or scene is an active research field in computer vision. This paper develops a rapid and effective Content-based Superpixel Image matching and Stitching (CSIS) scheme, which utilizes the content of superpixel through multi-features fusion technique. Unlike popular keypoint-based matching method, our approach proposes a superpixel internal feature-based scheme to implement image matching. In the beginning, we make use of a novel superpixel generation algorithm based on content-based feature representation, named Content-based Superpixel Segmentation (CSS) algorithm. Superpixels are generated in terms of a new distance metric using color, spatial, and gradient feature information. It is developed to balance the compactness and the boundary adherence of resulted superpixels. Then, we calculate the entropy of each superpixel for separating some superpixels with significant characteristics. Next, for each selected superpixel, its multi-features descriptor is generated by extracting and fusing local features of the selected superpixel itself. Finally, we compare the matching features of candidate superpixels and their own neighborhoods to estimate the correspondence between two images. We evaluated superpixel matching and image stitching on complex and deformable surfaces using our superpixel region descriptors, and the results show that new method is effective in matching accuracy and execution speed.
Yuki KUROSAWA Shinya MOCHIDUKI Yuko HOSHINO Mitsuho YAMADA
We measured eye movements at gaze points while subjects performed calculation tasks and examined the relationship between the eye movements and fatigue and/or internal state of a subject by tasks. It was suggested that fatigue and/or internal state of a subject affected eye movements at gaze points and that we could measure them using eye movements at gaze points in real time.
Xinxin HAN Jian YE Jia LUO Haiying ZHOU
The triaxial accelerometer is one of the most important sensors for human activity recognition (HAR). It has been observed that the relations between the axes of a triaxial accelerometer plays a significant role in improving the accuracy of activity recognition. However, the existing research rarely focuses on these relations, but rather on the fusion of multiple sensors. In this paper, we propose a data fusion-based convolutional neural network (CNN) approach to effectively use the relations between the axes. We design a single-channel data fusion method and multichannel data fusion method in consideration of the diversified formats of sensor data. After obtaining the fused data, a CNN is used to extract the features and perform classification. The experiments show that the proposed approach has an advantage over the CNN in accuracy. Moreover, the single-channel model achieves an accuracy of 98.83% with the WISDM dataset, which is higher than that of state-of-the-art methods.
Cheng XU Wei HAN Dongzhen WANG Daqing HUANG
In this paper, we propose a salient region detection method with multi-feature fusion and edge constraint. First, an image feature extraction and fusion network based on dense connection structure and multi-channel convolution channel is designed. Then, a multi-scale atrous convolution block is applied to enlarge reception field. Finally, to increase accuracy, a combined loss function including classified loss and edge loss is built for multi-task training. Experimental results verify the effectiveness of the proposed method.
Guizhong ZHANG Baoxian WANG Zhaobo YAN Yiqiang LI Huaizhi YANG
In this work, we present one novel rust detection method based upon one-class classification and L2 sparse representation (SR) with decision fusion. Firstly, a new color contrast descriptor is proposed for extracting the rust features of steel structure images. Considering that the patterns of rust features are more simplified than those of non-rust ones, one-class support vector machine (SVM) classifier and L2 SR classifier are designed with these rust image features, respectively. After that, a multiplicative fusion rule is advocated for combining the one-class SVM and L2 SR modules, thereby achieving more accurate rust detecting results. In the experiments, we conduct numerous experiments, and when compared with other developed rust detectors, the presented method can offer better rust detecting performances.
Yuhuan WANG Hang YIN Zhanxin YANG Yansong LV Lu SI Xinle YU
In this paper, we propose an adaptive fusion successive cancellation list decoder (ADF-SCL) for polar codes with single cyclic redundancy check. The proposed ADF-SCL decoder reasonably avoids unnecessary calculations by selecting the successive cancellation (SC) decoder or the adaptive successive cancellation list (AD-SCL) decoder depending on a log-likelihood ratio (LLR) threshold in the decoding process. Simulation results show that compared to the AD-SCL decoder, the proposed decoder can achieve significant reduction of the average complexity in the low signal-to-noise ratio (SNR) region without degradation of the performance. When Lmax=32 and Eb/N0=0.5dB, the average complexity of the proposed decoder is 14.23% lower than that of the AD-SCL decoder.
Abraham MONRROY CANO Eijiro TAKEUCHI Shinpei KATO Masato EDAHIRO
We present an accurate and easy-to-use multi-sensor fusion toolbox for autonomous vehicles. It includes a ‘target-less’ multi-LiDAR (Light Detection and Ranging), and Camera-LiDAR calibration, sensor fusion, and a fast and accurate point cloud ground classifier. Our calibration methods do not require complex setup procedures, and once the sensors are calibrated, our framework eases the fusion of multiple point clouds, and cameras. In addition we present an original real-time ground-obstacle classifier, which runs on the CPU, and is designed to be used with any type and number of LiDARs. Evaluation results on the KITTI dataset confirm that our calibration method has comparable accuracy with other state-of-the-art contenders in the benchmark.
Chihiro GO Yuma KINOSHITA Sayaka SHIOTA Hitoshi KIYA
This paper proposes a novel multi-exposure image fusion (MEF) scheme for single-shot high dynamic range imaging with spatially varying exposures (SVE). Single-shot imaging with SVE enables us not only to produce images without color saturation regions from a single-shot image, but also to avoid ghost artifacts in the producing ones. However, the number of exposures is generally limited to two, and moreover it is difficult to decide the optimum exposure values before the photographing. In the proposed scheme, a scene segmentation method is applied to input multi-exposure images, and then the luminance of the input images is adjusted according to both of the number of scenes and the relationship between exposure values and pixel values. The proposed method with the luminance adjustment allows us to improve the above two issues. In this paper, we focus on dual-ISO imaging as one of single-shot imaging. In an experiment, the proposed scheme is demonstrated to be effective for single-shot high dynamic range imaging with SVE, compared with conventional MEF schemes with exposure compensation.
Kota ANDO Kodai UEYOSHI Yuka OBA Kazutoshi HIROSE Ryota UEMATSU Takumi KUDO Masayuki IKEBE Tetsuya ASAI Shinya TAKAMAEDA-YAMAZAKI Masato MOTOMURA
Deep neural network (NN) has been widely accepted for enabling various AI applications, however, the limitation of computational and memory resources is a major problem on mobile devices. Quantized NN with a reduced bit precision is an effective solution, which relaxes the resource requirements, but the accuracy degradation due to its numerical approximation is another problem. We propose a novel quantized NN model employing the “dithering” technique to improve the accuracy with the minimal additional hardware requirement at the view point of the hardware-algorithm co-designing. Dithering distributes the quantization error occurring at each pixel (neuron) spatially so that the total information loss of the plane would be minimized. The experiment we conducted using the software-based accuracy evaluation and FPGA-based hardware resource estimation proved the effectiveness and efficiency of the concept of an NN model with dithering.
Akira KUBOTA Kazuya KODAMA Asami ITO
A pupil function of aperture in image capturing systems is theoretically derived such that one can perfectly reconstruct all-in-focus image through linear filtering of the focal stack. The perfect reconstruction filters are also designed based on the derived pupil function. The designed filters are space-invariant; hence the presented method does not require region segmentation. Simulation results using synthetic scenes shows effectiveness of the derived pupil function and the filters.