Yutian CHEN Wenyan GAN Shanshan JIAO Youwei XU Yuntian FENG
Recent researches on mobile robots show that convolutional neural network (CNN) has achieved impressive performance in visual place recognition especially for large-scale dynamic environment. However, CNN leads to the large space of image representation that cannot meet the real-time demand for robot navigation. Aiming at this problem, we evaluate the feature effectiveness of feature maps obtained from the layer of CNN by variance and propose a novel method that reserve salient feature maps and make adaptive binarization for them. Experimental results demonstrate the effectiveness and efficiency of our method. Compared with state of the art methods for visual place recognition, our method not only has no significant loss in precision, but also greatly reduces the space of image representation.
Taito MANABE Yuichiro SHIBATA Kiyoshi OGURI
The super-resolution technology is one of the solutions to fill the gap between high-resolution displays and lower-resolution images. There are various algorithms to interpolate the lost information, one of which is using a convolutional neural network (CNN). This paper shows an FPGA implementation and a performance evaluation of a novel CNN-based super-resolution system, which can process moving images in real time. We apply horizontal and vertical flips to input images instead of enlargement. This flip method prevents information loss and enables the network to make the best use of its patch size. In addition, we adopted the residue number system (RNS) in the network to reduce FPGA resource utilization. Efficient multiplication and addition with LUTs increased a network scale that can be implemented on the same FPGA by approximately 54% compared to an implementation with fixed-point operations. The proposed system can perform super-resolution from 960×540 to 1920×1080 at 60fps with a latency of less than 1ms. Despite resource restriction of the FPGA, the system can generate clear super-resolution images with smooth edges. The evaluation results also revealed the superior quality in terms of the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM) index, compared to systems with other methods.
Recently, mobile applications for recording everyday meals draw much attention for self dietary. However, most of the applications return food calorie values simply associated with the estimated food categories, or need for users to indicate the rough amount of foods manually. In fact, it has not been achieved to estimate food calorie from a food photo with practical accuracy, and it remains an unsolved problem. Then, in this paper, we propose estimating food calorie from a food photo by simultaneous learning of food calories, categories, ingredients and cooking directions using deep learning. Since there exists a strong correlation between food calories and food categories, ingredients and cooking directions information in general, we expect that simultaneous training of them brings performance boosting compared to independent single training. To this end, we use a multi-task CNN. In addition, in this research, we construct two kinds of datasets that is a dataset of calorie-annotated recipe collected from Japanese recipe sites on the Web and a dataset collected from an American recipe site. In the experiments, we trained both multi-task and single-task CNNs, and compared them. As a result, a multi-task CNN achieved the better performance on both food category estimation and food calorie estimation than single-task CNNs. For the Japanese recipe dataset, by introducing a multi-task CNN, 0.039 were improved on the correlation coefficient, while for the American recipe dataset, 0.090 were raised compared to the result by the single-task CNN. In addition, we showed that the proposed multi-task CNN based method outperformed search-based methods proposed before.
This paper introduces a filter level pruning method based on similar feature extraction for compressing and accelerating the convolutional neural networks by k-means++ algorithm. In contrast to other pruning methods, the proposed method would analyze the similarities in recognizing features among filters rather than evaluate the importance of filters to prune the redundant ones. This strategy would be more reasonable and effective. Furthermore, our method does not result in unstructured network. As a result, it needs not extra sparse representation and could be efficiently supported by any off-the-shelf deep learning libraries. Experimental results show that our filter pruning method could reduce the number of parameters and the amount of computational costs in Lenet-5 by a factor of 17.9× with only 0.3% accuracy loss.
Yande XIANG Jiahui LUO Taotao ZHU Sheng WANG Xiaoyan XIANG Jianyi MENG
Arrhythmia classification based on electrocardiogram (ECG) is crucial in automatic cardiovascular disease diagnosis. The classification methods used in the current practice largely depend on hand-crafted manual features. However, extracting hand-crafted manual features may introduce significant computational complexity, especially in the transform domains. In this study, an accurate method for patient-specific ECG beat classification is proposed, which adopts morphological features and timing information. As to the morphological features of heartbeat, an attention-based two-level 1-D CNN is incorporated in the proposed method to extract different grained features automatically by focusing on various parts of a heartbeat. As to the timing information, the difference between previous and post RR intervels is computed as a dynamic feature. Both the extracted morphological features and the interval difference are used by multi-layer perceptron (MLP) for classifing ECG signals. In addition, to reduce memory storage of ECG data and denoise to some extent, an adaptive heartbeat normalization technique is adopted which includes amplitude unification, resolution modification, and signal difference. Based on the MIT-BIH arrhythmia database, the proposed classification method achieved sensitivity Sen=93.4% and positive predictivity Ppr=94.9% in ventricular ectopic beat (VEB) detection, sensitivity Sen=86.3% and positive predictivity Ppr=80.0% in supraventricular ectopic beat (SVEB) detection, and overall accuracy OA=97.8% under 6-bit ECG signal resolution. Compared with the state-of-the-art automatic ECG classification methods, these results show that the proposed method acquires comparable accuracy of heartbeat classification though ECG signals are represented by lower resolution.
Huimin CAI Eryun LIU Hongxia LIU Shulong WANG
A real-time road-direction point detection model is developed based on convolutional neural network architecture which can adapt to complex environment. Firstly, the concept of road-direction point is defined for either single road or crossroad. For single road, the predicted road-direction point can serve as a guiding point for a self-driving vehicle to go ahead. In the situation of crossroad, multiple road-direction points can also be detected which will help this vehicle to make a choice from possible directions. Meanwhile, different types of road surface can be classified by this model for both paved roads and unpaved roads. This information will be beneficial for a self-driving vehicle to speed up or slow down according to various road conditions. Finally, the performance of this model is evaluated on different platforms including Jetson TX1. The processing speed can reach 12 FPS on this portable embedded system so that it provides an effective and economic solution of road-direction estimation in the applications of autonomous navigation.
Zhixian MA Jie ZHU Weitian LI Haiguang XU
Detection of cavities in X-ray astronomical images has become a field of interest, since the flourishing studies on black holes and the Active Galactic Nuclei (AGN). In this paper, an approach is proposed to detect cavities in X-ray astronomical images using our newly designed Granular Convolutional Neural Network (GCNN) based classifiers. The raw data are firstly preprocessed to obtain images of the observed objects, i.e., galaxies or galaxy clusters. In each image, pixels are classified into three categories, (1) the faint backgrounds (BKG), (2) the cavity regions (CAV), and (3) the bright central gas regions (CNT). And the sample sets are then generated by dividing large images into subimages with a window size according to the cavities' scale. Since the number of BKG samples are far more than the other types, to achieve balanced training sets, samples from the major class are split into subsets, i.e., granule. Then a group of three-convolutional-layer granular CNN networks without subsampling layers are designed as the classifiers, and trained with the labeled granular sample sets. Finally, the trained GCNN classifiers are applied to new observations, so as to estimate the cavity regions with a voting strategy and locate them with elliptical profiles on the raw observation images. Experiments and applications of our approach are demonstrated on 40 X-ray astronomical observations retrieved from chandra Data Archive (CDA). Comparisons among our approach, the β-model fitting and the Unsharp Masking (UM) methods were also performed, which prove our approach was more accurate and robust.
Hongjun ZHANG Yuntian FENG Wenning HAO Gang CHEN Dawei JIN
In recent years, deep learning has been widely applied in relation extraction task. The method uses only word embeddings as network input, and can model relations between target named entity pairs. It equally deals with each relation mention, so it cannot effectively extract relations from the corpus with an enormous number of non-relations, which is the main reason why the performance of relation extraction is significantly lower than that of relation classification. This paper designs a deep reinforcement learning framework for relation extraction, which considers relation extraction task as a two-step decision-making game. The method models relation mentions with CNN and Tree-LSTM, which can calculate initial state and transition state for the game respectively. In addition, we can tackle the problem of unbalanced corpus by designing penalty function which can increase the penalties for first-step decision-making errors. Finally, we use Q-Learning algorithm with value function approximation to learn control policy π for the game. This paper sets up a series of experiments in ACE2005 corpus, which show that the deep reinforcement learning framework can achieve state-of-the-art performance in relation extraction task.
Dengchao HE Hongjun ZHANG Wenning HAO Rui ZHANG Huan HAO
The purpose of document modeling is to learn low-dimensional semantic representations of text accurately for Natural Language Processing tasks. In this paper, proposed is a novel attention-based hybrid neural network model, which would extract semantic features of text hierarchically. Concretely, our model adopts a bidirectional LSTM module with word-level attention to extract semantic information for each sentence in text and subsequently learns high level features via a dynamic convolution neural network module. Experimental results demonstrate that our proposed approach is effective and achieve better performance than conventional methods.
Shinya OHTANI Yu KATO Nobutaka KUROKI Tetsuya HIROSE Masahiro NUMA
This paper proposes image super-resolution techniques with multi-channel convolutional neural networks. In the proposed method, output pixels are classified into K×K groups depending on their coordinates. Those groups are generated from separate channels of a convolutional neural network (CNN). Finally, they are synthesized into a K×K magnified image. This architecture can enlarge images directly without bicubic interpolation. Experimental results of 2×2, 3×3, and 4×4 magnifications have shown that the average PSNR for the proposed method is about 0.2dB higher than that for the conventional SRCNN.
Hai DAI NGUYEN Anh DUC LE Masaki NAKAGAWA
This paper presents deep learning to recognize online handwritten mathematical symbols. Recently various deep learning architectures such as Convolution neural networks (CNNs), Deep neural networks (DNNs), Recurrent neural networks (RNNs) and Long short-term memory (LSTM) RNNs have been applied to fields such as computer vision, speech recognition and natural language processing where they have shown superior performance to state-of-the-art methods on various tasks. In this paper, max-out-based CNNs and Bidirectional LSTM (BLSTM) networks are applied to image patterns created from online patterns and to the original online patterns, respectively and then combined. They are compared with traditional recognition methods which are MRFs and MQDFs by recognition experiments on the CROHME database along with analysis and explanation.
Mohammad HOSNTALAB Reza AGHAEIZADEH ZOROOFI Ali ABBASPOUR TEHRANI-FARD Gholamreza SHIRANI Mohammad REZA ASHARIF
Teeth segmentation in computed tomography (CT) images is a major and challenging task for various computer assisted procedures. In this paper, we introduced a hybrid method for quantification of teeth in CT volumetric dataset inspired by our previous experiences and anatomical knowledge of teeth and jaws. In this regard, we propose a novel segmentation technique using an adaptive thresholding, morphological operations, panoramic re-sampling and variational level set algorithm. The proposed method consists of several steps as follows: first, we determine the operation region in CT slices. Second, the bony tissues are separated from other tissues by utilizing an adaptive thresholding technique based on the 3D pulses coupled neural networks (PCNN). Third, teeth tissue is classified from other bony tissues by employing panorex lines and anatomical knowledge of teeth in the jaws. In this case, the panorex lines are estimated using Otsu thresholding and mathematical morphology operators. Then, the proposed method is followed by calculating the orthogonal lines corresponding to panorex lines and panoramic re-sampling of the dataset. Separation of upper and lower jaws and initial segmentation of teeth are performed by employing the integral projections of the panoramic dataset. Based the above mentioned procedures an initial mask for each tooth is obtained. Finally, we utilize the initial mask of teeth and apply a variational level set to refine initial teeth boundaries to final contour. In the last step a surface rendering algorithm known as marching cubes (MC) is applied to volumetric visualization. The proposed algorithm was evaluated in the presence of 30 cases. Segmented images were compared with manually outlined contours. We compared the performance of segmentation method using ROC analysis of the thresholding, watershed and our previous works. The proposed method performed best. Also, our algorithm has the advantage of high speed compared to our previous works.
Tetsuo NISHI Norikazu TAKAHASHI Hajime HARA
We give the necessary and sufficient conditions for a one-dimensional discrete-time autonomous binary cellular neural networks to be stable in the case of fixed boundary. The results are complete generalization of our previous one [16] in which the symmetrical connections were assumed. The conditions are compared with some stability conditions so far known.
Tetsuo NISHI Hajime HARA Norikazu TAKAHASHI
We give necessary and sufficient conditions for a 1-D DBCNN (1-dimensional discrete-time binary cellular neural network) with an external input to be stable in terms of connection coefficients. The results are generalization of our previous one [18],[19] in which the input was assumed to be zero.
Information processing with only locally connected networks such as cellular neural networks is advantageous for integrated circuit implementations. Adding long range connections can often enhance considerably their performance. It is sufficient to activate these connections randomly from time to time (blinking connections). This can be realized by sending packets on a communication network underlying the information processing network that is needed anyway for bringing information in and out of the locally connected network. We prove for the case of multi-stable networks that if the long-range connections are switched on and off sufficiently fast, the behavior of the blinking network is with high probability the same as the behavior of the time-averaged network. In the averaged network the blinking connections are replaced by fixed connections with low (average) coupling strength.
In the CNN problem, a "scene" appears on the two-dimensional plane, at different positions sequentially, and a "camera crew" has to shoot the scene whenever it appears. If a scene appears at some position, the camera crew does not have to move to the position exactly, but has only to move to a point that lies in the same horizontal or vertical line with the scene. Namely it is enough to move either to the same row or to the same column. The goal is to minimize the total moving distance of the camera crew. This problem has been quite popular in the last decade but it is still open whether or not there is a competitive algorithm, i.e., an algorithm with competitive ratio bounded by a constant. In this paper we study this problem under a natural restriction that the server can move only along the X-axis and the Y-axis. It is shown that there exists a competitive algorithm for this restricted version, namely there is an online algorithm for this "axis-bound CNN" with competitive ratio 9.0.
Zonghuang YANG Yoshifumi NISHIO Akio USHIDA
The paper discusses the spatio-temporal phenomena in autonomous two-layer Cellular Neural Networks (CNNs) with mutually coupled templates between two layers. By computer calculations, we show how pattern formations, autowaves and classical waves can be regenerated in the networks, and describe the properties of these phenomena in detail. In particular, we focus our discussion on the necessary conditions for generating these spatio-temporal phenomena. In addition, the influences of the template parameters and initial state conditions of CNNs on the spatio-temporal phenomena are investigated.
Csaba REKECZKY Akio USHIDA Tamás ROSKA
Cellular Neural Networks (CNNs) are nonlinear dynamic array processors with mainly local interconnections. In most of the applications, the local interconnection pattern, called cloning template, is translation invariant. In this paper, an optimal ring-coding method for rotation invariant description of given set of objects, is introduced. The design methodology of the templates based on the ring-codes and the synthesis of CNN analogic algorithms to detect standing and moving objects in a rotationally invariant way, discussed in detail. It is shown that the algorithms can be implemented using the CNN Universal Machine, the recently invented analogic visual microprocessor. The estimated time performance and the parallel detecting capability is emphasized, the limitations are also thoroughly investigated.