Zhixian MA Jie ZHU Weitian LI Haiguang XU
Detection of cavities in X-ray astronomical images has become a field of interest, since the flourishing studies on black holes and the Active Galactic Nuclei (AGN). In this paper, an approach is proposed to detect cavities in X-ray astronomical images using our newly designed Granular Convolutional Neural Network (GCNN) based classifiers. The raw data are firstly preprocessed to obtain images of the observed objects, i.e., galaxies or galaxy clusters. In each image, pixels are classified into three categories, (1) the faint backgrounds (BKG), (2) the cavity regions (CAV), and (3) the bright central gas regions (CNT). And the sample sets are then generated by dividing large images into subimages with a window size according to the cavities' scale. Since the number of BKG samples are far more than the other types, to achieve balanced training sets, samples from the major class are split into subsets, i.e., granule. Then a group of three-convolutional-layer granular CNN networks without subsampling layers are designed as the classifiers, and trained with the labeled granular sample sets. Finally, the trained GCNN classifiers are applied to new observations, so as to estimate the cavity regions with a voting strategy and locate them with elliptical profiles on the raw observation images. Experiments and applications of our approach are demonstrated on 40 X-ray astronomical observations retrieved from chandra Data Archive (CDA). Comparisons among our approach, the β-model fitting and the Unsharp Masking (UM) methods were also performed, which prove our approach was more accurate and robust.
Yulong XU Yang LI Jiabao WANG Zhuang MIAO Hang LI Yafei ZHANG
Feature extractor plays an important role in visual tracking, but most state-of-the-art methods employ the same feature representation in all scenes. Taking into account the diverseness, a tracker should choose different features according to the videos. In this work, we propose a novel feature adaptive correlation tracker, which decomposes the tracking task into translation and scale estimation. According to the luminance of the target, our approach automatically selects either hierarchical convolutional features or histogram of oriented gradient features in translation for varied scenarios. Furthermore, we employ a discriminative correlation filter to handle scale variations. Extensive experiments are performed on a large-scale benchmark challenging dataset. And the results show that the proposed algorithm outperforms state-of-the-art trackers in accuracy and robustness.
Shinya OHTANI Yu KATO Nobutaka KUROKI Tetsuya HIROSE Masahiro NUMA
This paper proposes image super-resolution techniques with multi-channel convolutional neural networks. In the proposed method, output pixels are classified into K×K groups depending on their coordinates. Those groups are generated from separate channels of a convolutional neural network (CNN). Finally, they are synthesized into a K×K magnified image. This architecture can enlarge images directly without bicubic interpolation. Experimental results of 2×2, 3×3, and 4×4 magnifications have shown that the average PSNR for the proposed method is about 0.2dB higher than that for the conventional SRCNN.
Guohao LYU Hui YIN Xinyan YU Siwei LUO
In this letter, a local characteristic image restoration based on convolutional neural network is proposed. In this method, image restoration is considered as a classification problem and images are divided into several sub-blocks. The convolutional neural network is used to extract and classify the local characteristics of image sub-blocks, and the different forms of the regularization constraints are adopted for the different local characteristics. Experiments show that the image restoration results by the regularization method based on local characteristics are superior to those by the traditional regularization methods and this method also has lower computing cost.
Recent studies have obtained superior performance in image recognition tasks by using, as an image representation, the fully connected layer activations of Convolutional Neural Networks (CNN) trained with various kinds of images. However, the CNN representation is not very suitable for fine-grained image recognition tasks involving food image recognition. For improving performance of the CNN representation in food image recognition, we propose a novel image representation that is comprised of the covariances of convolutional layer feature maps. In the experiment on the ETHZ Food-101 dataset, our method achieved 58.65% averaged accuracy, which outperforms the previous methods such as the Bag-of-Visual-Words Histogram, the Improved Fisher Vector, and CNN-SVM.
Recently, the ratio of probability density functions was demonstrated to be useful in solving various machine learning tasks such as outlier detection, non-stationarity adaptation, feature selection, and clustering. The key idea of this density ratio approach is that the ratio is directly estimated so that difficult density estimation is avoided. So far, parametric and non-parametric direct density ratio estimators with various loss functions have been developed, and the kernel least-squares method was demonstrated to be highly useful both in terms of accuracy and computational efficiency. On the other hand, recent study in pattern recognition exhibited that deep architectures such as a convolutional neural network can significantly outperform kernel methods. In this paper, we propose to use the convolutional neural network in density ratio estimation, and experimentally show that the proposed method tends to outperform the kernel-based method in outlying image detection.
Osamu NOMURA Takashi MORIE Keisuke KOREKADO Teppei NAKANO Masakazu MATSUGU Atsushi IWATA
Real-time object detection or recognition technology becomes more important for various intelligent vision systems. Processing models for object detection or recognition from natural images should tolerate pattern deformations and pattern position shifts. The hierarchical convolutional neural networks are considered as a promising model for robust object detection/recognition. This model requires huge computational power for a large number of multiply-and-accumulation operations. In order to apply this model to robot vision or various intelligent real-time vision systems, its LSI implementation is essential. This paper proposes a new algorithm for reducing multiply-and-accumulation operation by sorting neuron outputs by magnitude. We also propose an LSI architecture based on this algorithm. As a proof of concept for our LSI architecture, we have designed, fabricated and tested two test LSIs: a sorting LSI and an image-filtering LSI. The sorting LSI is designed based on the content addressable memory (CAM) circuit technology. The image-filtering LSI is designed for parallel processing by analog circuit array based on the merged/mixed analog-digital approach. We have verified the validity of our LSI architecture by measuring the LSIs.
Patrick LE CALLET Christian VIARD-GAUDIN Stephane PECHARD Emilie CAILLAULT
This paper describes an objective measurement method designed to assess the perceived quality for digital videos. The proposed approach can be used either in the context of a reduced reference quality assessment or in the more challenging situation where no reference is available. In that way, it can be deployed in a QoS monitoring strategy in order to control the end-user perceived quality. The originality of the approach relies on the very limited computation resources which are involved, such a system could be integrated quite easily in a real time application. It uses a convolutional neural network (CNN) that allows a continuous time scoring of the video. Experiments conducted on different MPEG-2 videos, with bit rates ranging from 2 to 6 Mbits/s, show the effectiveness of the proposed approach. More specifically, a linear correlation criterion, between objective and subjective scoring, ranging from 0.90 up to 0.95 has been obtained on a set of typical TV videos in the case of a reduced reference assessment. Without any reference to the original video, the correlation criteria remains quite satisfying since it still lies between 0.85 and 0.90, which is quite high with respect to the difficulty of the task, and equivalent and more in some cases than the traditional PSNR, which is a full reference measurement.