1-15hit |
In this letter, we first study the impact of the basic reference frame jitter on the digital image stabilization. Next, a method for stabilizing the digital image sequence based on the correction for basic reference frame jitter is proposed. The experimental results show that our proposed method can effectively decrease the excessive undefined areas in the stable image sequence resulting from the basic reference frame jitter.
Hao GE Feng YANG Xiaoguang TU Mei XIE Zheng MA
Recently, numerous methods have been proposed to tackle the problem of fine-grained image classification. However, rare of them focus on the pre-processing step of image alignment. In this paper, we propose a new pre-processing method with the aim of reducing the variance of objects among the same class. As a result, the variance of objects between different classes will be more significant. The proposed approach consists of four procedures. The “parts” of the objects are firstly located. After that, the rotation angle and the bounding box could be obtained based on the spatial relationship of the “parts”. Finally, all the images are resized to similar sizes. The objects in the images possess the properties of translation, scale and rotation invariance after processed by the proposed method. Experiments on the CUB-200-2011 and CUB-200-2010 datasets have demonstrated that the proposed method could boost the recognition performance by serving as a pre-processing step of several popular classification algorithms.
In this letter, we analyze the influence of motion and out-of-focus blur on both frequency spectrum and cepstrum of an iris image. Based on their characteristics, we define two new discriminative blur features represented by Energy Spectral Density Distribution (ESDD) and Singular Cepstrum Histogram (SCH). To merge the two features for blur detection, a merging kernel which is a linear combination of two kernels is proposed when employing Support Vector Machine. Extensive experiments demonstrate the validity of our method by showing the improved blur detection performance on both synthetic and real datasets.
In this paper, we propose a deep model of visual recognition based on hybrid KPCA Network(H-KPCANet), which is based on the combination of one-stage KPCANet and two-stage KPCANet. The proposed model consists of four types of basic components: the input layer, one-stage KPCANet, two-stage KPCANet and the fusion layer. The role of one-stage KPCANet is to calculate the KPCA filters for convolution layer, and two-stage KPCANet is to learn PCA filters in the first stage and KPCA filters in the second stage. After binary quantization mapping and block-wise histogram, the features from two different types of KPCANets are fused in the fusion layer. The final feature of the input image can be achieved by weighted serial combination of the two types of features. The performance of our proposed algorithm is tested on digit recognition and object classification, and the experimental results on visual recognition benchmarks of MNIST and CIFAR-10 validated the performance of the proposed H-KPCANet.
Lili PAN Qiangsen HE Yali ZHENG Mei XIE
Facial age estimation requires accurately capturing the mapping relationship between facial features and corresponding ages, so as to precisely estimate ages for new input facial images. Previous works usually use one-layer regression model to learn this complex mapping relationship, resulting in low estimation accuracy. In this letter, we propose a new gender-specific regression model with a two-layer structure for more accurate age estimation. Different from recent two-layer models that use a global regressor to calculate cumulative attributes (CA) and use CA to estimate age, we use gender-specific ones to calculate CA with more flexibility and precision. Extensive experimental results on FG-NET and Morph 2 datasets demonstrate the superiority of our method over other state-of-the-art age estimation methods.
Xiaoguang TU Feng YANG Mei XIE Zheng MA
Numerous methods have been developed to handle lighting variations in the preprocessing step of face recognition. However, most of them only use the high-frequency information (edges, lines, corner, etc.) for recognition, as pixels lied in these areas have higher local variance values, and thus insensitive to illumination variations. In this case, information of low-frequency may be discarded and some of the features which are helpful for recognition may be ignored. In this paper, we present a new and efficient method for illumination normalization using an energy minimization framework. The proposed method aims to remove the illumination field of the observed face images while simultaneously preserving the intrinsic facial features. The normalized face image and illumination field could be achieved by a reciprocal iteration scheme. Experiments on CMU-PIE and the Extended Yale B databases show that the proposed method can preserve a very good visual quality even on the images illuminated with deep shadow and high brightness regions, and obtain promising illumination normalization results for better face recognition performance.
The quality of codebook is very important in visual image classification. In order to boost the classification performance, a scheme of codebook generation for scene image recognition based on parallel key SIFT analysis (PKSA) is presented in this paper. The method iteratively applies classical k-means clustering algorithm and similarity analysis to evaluate key SIFT descriptors (KSDs) from the input images, and generates the codebook by a relaxed k-means algorithm according to the set of KSDs. With the purpose of evaluating the performance of the PKSA scheme, the image feature vector is calculated by sparse code with Spatial Pyramid Matching (ScSPM) after the codebook is constructed. The PKSA-based ScSPM method is tested and compared on three public scene image datasets. The experimental results show the proposed scheme of PKSA can significantly save computational time and enhance categorization rate.
Shilei CHENG Song GU Maoquan YE Mei XIE
Human action recognition in videos draws huge research interests in computer vision. The Bag-of-Word model is quite commonly used to obtain the video level representations, however, BoW model roughly assigns each feature vector to its nearest visual word and the collection of unordered words ignores the interest points' spatial information, inevitably causing nontrivial quantization errors and impairing improvements on classification rates. To address these drawbacks, we propose an approach for action recognition by encoding spatio-temporal log Euclidean covariance matrix (ST-LECM) features within the low-rank and sparse representation framework. Motivated by low rank matrix recovery, local descriptors in a spatial temporal neighborhood have similar representation and should be approximately low rank. The learned coefficients can not only capture the global data structures, but also preserve consistent. Experimental results showed that the proposed approach yields excellent recognition performance on synthetic video datasets and are robust to action variability, view variations and partial occlusion.
For face recognition with a single training image per person, Collaborative Representation based Classification (CRC) has significantly less complexity than Extended Sparse Representation based Classification (ESRC). However, CRC gets lower recognition rates than ESRC. In order to combine the advantages of CRC and ESRC, we propose Extended Collaborative Representation based Classification (ECRC) for face recognition with a single training image per person. ECRC constructs an auxiliary intraclass variant dictionary to represent the possible variation between the testing and training images. Experimental results show that ECRC outperforms the compared methods in terms of both high recognition rates and low computation complexity.
k-NN classification has been applied to classify normal tissues in MR images. However, the intensity inhomogeneity of MR images forces conventional k-NN classification into significant misclassification errors. This letter proposes a new interleaved method, which combines k-NN classification and bias field estimation in an energy minimization framework, to simultaneously overcome the limitation of misclassifications in conventional k-NN classification and correct the bias field of observed images. Experiments demonstrate the effectiveness and advantages of the proposed algorithm.
Yazhong ZHANG Jinjian WU Guangming SHI Xuemei XIE Yi NIU Chunxiao FAN
Reduced-reference (RR) image quality assessment (IQA) algorithm aims to automatically evaluate the distorted image quality with partial reference data. The goal of RR IQA metric is to achieve higher quality prediction accuracy using less reference information. In this paper, we introduce a new RR IQA metric by quantifying the difference of discrete cosine transform (DCT) entropy features between the reference and distorted images. Neurophysiological evidences indicate that the human visual system presents different sensitivities to different frequency bands. Moreover, distortions on different bands result in individual quality degradations. Therefore, we suggest to calculate the information degradation on each band separately for quality assessment. The information degradations are firstly measured by the entropy difference of reorganized DCT coefficients. Then, the entropy differences on all bands are pooled to obtain the quality score. Experimental results on LIVE, CSIQ, TID2008, Toyama and IVC databases show that the proposed method performs highly consistent with human perception with limited reference data (8 values).
Template tracking has been extensively studied in Computer Vision with a wide range of applications. A general framework is to construct a parametric model to predict movement and to track the target. The difference in intensity between the pixels belonging to the current region and the pixels of the selected target allows a straightforward prediction of the region position in the current image. Traditional methods track the object based on the assumption that the relationship between the intensity difference and the region position is linear or non-linear. They will result in bad tracking performance when just one model is adopted. This paper proposes a method, called as Mixture Hyperplanes Approximation, which is based on finite mixture of generalized linear regression models to perform robust tracking. Moreover, a fast learning strategy is discussed, which improves the robustness against noise. Experiments demonstrate the performance and stability of Mixture Hyperplanes Approximation.
This paper proposes a new theory and design method for a class of recombination nonuniform filter banks (RNFBs) with linear phase (LP) filters. In a uniform filter bank (FB), consecutive channels are merged by sets of transmultiplexers (TMUXs) to realize a nonuniform FB. RNFBs with LP analysis/synthesis filters are of great interest because the analysis filters for the partially reconstructed signals, through merging, are LP and hence less phase distortions are introduced to the desired signals. We analyze the spectrum supports of the analysis filters of these LP RNFBs. The conditions on the uniform FB and recombination TMUXs of an LP RNFB with good frequency characteristics are determined. These conditions are relatively simple to be satisfied and the uniform FB and recombination TMUXs can be designed separately without much degradation in performance. This allows dynamically recombination of different number of channels in the original uniform FB to give a flexible and time-varying frequency partitioning. Using these results, a method for designing a class of near-perfect-reconstruction (NPR) LP RNFBs with cosine roll-off transition band using the REMEZ algorithm is proposed. A design example is given to show that LP RNFBs with good frequency responses and reasonably low reconstruction errors can be achieved.
Shilei CHENG Mei XIE Zheng MA Siqi LI Song GU Feng YANG
As characterizing videos simultaneously from spatial and temporal cues have been shown crucial for video processing, with the shortage of temporal information of soft assignment, the vector of locally aggregated descriptor (VLAD) should be considered as a suboptimal framework for learning the spatio-temporal video representation. With the development of attention mechanisms in natural language processing, in this work, we present a novel model with VLAD following spatio-temporal self-attention operations, named spatio-temporal self-attention weighted VLAD (ST-SAWVLAD). In particular, sequential convolutional feature maps extracted from two modalities i.e., RGB and Flow are receptively fed into the self-attention module to learn soft spatio-temporal assignments parameters, which enabling aggregate not only detailed spatial information but also fine motion information from successive video frames. In experiments, we evaluate ST-SAWVLAD by using competitive action recognition datasets, UCF101 and HMDB51, the results shcoutstanding performance. The source code is available at:https://github.com/badstones/st-sawvlad.
Haoqi XIONG Jingjing GAO Chongjin ZHU Yanling LI Shu ZHANG Mei XIE
The MR image segmentation is always a challenging problem because of the intensity inhomogeneity. Many existing methods don't reach their expected segmentations; besides their implementations are usually complicated. Therefore, we originally interleave the extended Otsu segmentation with bias field estimation in an energy minimization. Via our proposed method, the optimal segmentation and bias field estimation are achieved simultaneously throughout the reciprocal iteration. The results of our method not only satisfy the required classification via its applications in the synthetic and the real images, but also demonstrate that our method is superior to the baseline methods in accordance with the performance analysis of JS metrics.