1-3hit |
Chunxiao FAN Xiaopeng HONG Lei TIAN Yue MING Matti PIETIKÄINEN Guoying ZHAO
PCANet, as one noticeable shallow network, employs the histogram representation for feature pooling. However, there are three main problems about this kind of pooling method. First, the histogram-based pooling method binarizes the feature maps and leads to inevitable discriminative information loss. Second, it is difficult to effectively combine other visual cues into a compact representation, because the simple concatenation of various visual cues leads to feature representation inefficiency. Third, the dimensionality of histogram-based output grows exponentially with the number of feature maps used. In order to overcome these problems, we propose a novel shallow network model, named as PCANet-II. Compared with the histogram-based output, the second order pooling not only provides more discriminative information by preserving both the magnitude and sign of convolutional responses, but also dramatically reduces the size of output features. Thus we combine the second order statistical pooling method with the shallow network, i.e., PCANet. Moreover, it is easy to combine other discriminative and robust cues by using the second order pooling. So we introduce the binary feature difference encoding scheme into our PCANet-II to further improve robustness. Experiments demonstrate the effectiveness and robustness of our proposed PCANet-II method.
Yazhong ZHANG Jinjian WU Guangming SHI Xuemei XIE Yi NIU Chunxiao FAN
Reduced-reference (RR) image quality assessment (IQA) algorithm aims to automatically evaluate the distorted image quality with partial reference data. The goal of RR IQA metric is to achieve higher quality prediction accuracy using less reference information. In this paper, we introduce a new RR IQA metric by quantifying the difference of discrete cosine transform (DCT) entropy features between the reference and distorted images. Neurophysiological evidences indicate that the human visual system presents different sensitivities to different frequency bands. Moreover, distortions on different bands result in individual quality degradations. Therefore, we suggest to calculate the information degradation on each band separately for quality assessment. The information degradations are firstly measured by the entropy difference of reorganized DCT coefficients. Then, the entropy differences on all bands are pooled to obtain the quality score. Experimental results on LIVE, CSIQ, TID2008, Toyama and IVC databases show that the proposed method performs highly consistent with human perception with limited reference data (8 values).
Chunxiao FAN Yang LI Lei TIAN Yong LI
This letter proposes a representation learning framework of convolutional neural networks (Convnets) that aims to rectify and improve the feature representations learned by existing transformation-invariant methods. The existing methods usually encode feature representations invariant to a wide range of spatial transformations by augmenting input images or transforming intermediate layers. Unfortunately, simply transforming the intermediate feature maps may lead to unpredictable representations that are ineffective in describing the transformed features of the inputs. The reason is that the operations of convolution and geometric transformation are not exchangeable in most cases and so exchanging the two operations will yield the transformation error. The error may potentially harm the performance of the classification networks. Motivated by the fractal statistics of natural images, this letter proposes a rectifying transformation operator to minimize the error. The proposed method is differentiable and can be inserted into the convolutional architecture without making any modification to the optimization algorithm. We show that the rectified feature representations result in better classification performance on two benchmarks.