Wentao LYU Di ZHOU Chengqun WANG Lu ZHANG
In this paper, we present a novel discriminative dictionary learning (DDL) method for image classification. The local structural relationship between samples is first built by the Laplacian eigenmaps (LE), and then integrated into the basic DDL frame to suppress inter-class ambiguity in the feature space. Moreover, in order to improve the discriminative ability of the dictionary, the category label information of training samples is formulated into the objective function of dictionary learning by considering the discriminative promotion term. Thus, the data points of original samples are transformed into a new feature space, in which the points from different categories are expected to be far apart. The test results based on the real dataset indicate the effectiveness of this method.
Recently, the performances of discriminative correlation filter (CF) trackers are getting better and better in visual tracking. In this paper, we propose spatial-temporal regularization with precise state estimation based on discriminative correlation filter (STPSE) in order to achieve more significant tracking performance. First, we consider the continuous change of the object state, using the information from the previous two filters for training the correlation filter model. Here, we train the correlation filter model with the hand-crafted features. Second, we introduce update control in which average peak-to-correlation energy (APCE) and the distance between the object locations obtained by HOG features and hand-crafted features are utilized to detect abnormality of the state around the object. APCE and the distance indicate the reliability of the filter response, thus if abnormality is detected, the proposed method does not update the scale and the object location estimated by the filter response. In the experiment, our tracker (STPSE) achieves significant and real-time performance with only CPU for the challenging benchmark sequence (OTB2013, OTB2015, and TC128).
Pedestrian detection is a significant task in computer vision. In recent years, it is widely used in applications such as intelligent surveillance systems and automated driving systems. Although it has been exhaustively studied in the last decade, the occlusion handling issue still remains unsolved. One convincing idea is to first detect human body parts, and then utilize the parts information to estimate the pedestrians' existence. Many parts-based pedestrian detection approaches have been proposed based on this idea. However, in most of these approaches, the low-quality parts mining and the clumsy part detector combination is a bottleneck that limits the detection performance. To eliminate the bottleneck, we propose Discriminative Part CNN (DP-CNN). Our approach has two main contributions: (1) We propose a high-quality body parts mining method based on both convolutional layer features and body part subclasses. The mined part clusters are not only discriminative but also representative, and can help to construct powerful pedestrian detectors. (2) We propose a novel method to combine multiple part detectors. We convert the part detectors to a middle layer of a CNN and optimize the whole detection pipeline by fine-tuning that CNN. In experiments, it shows astonishing effectiveness of optimization and robustness of occlusion handling.
Danlei XING Fei WU Ying SUN Xiao-Yuan JING
Cross-project defect prediction (CPDP) is a feasible solution to build an accurate prediction model without enough historical data. Although existing methods for CPDP that use only labeled data to build the prediction model achieve great results, there are much room left to further improve on prediction performance. In this paper we propose a Semi-Supervised Discriminative Feature Learning (SSDFL) approach for CPDP. SSDFL first transfers knowledge of source and target data into the common space by using a fully-connected neural network to mine potential similarities of source and target data. Next, we reduce the differences of both marginal distributions and conditional distributions between mapped source and target data. We also introduce the discriminative feature learning to make full use of label information, which is that the instances from the same class are close to each other and the instances from different classes are distant from each other. Extensive experiments are conducted on 10 projects from AEEEM and NASA datasets, and the experimental results indicate that our approach obtains better prediction performance than baselines.
Peng GAO Yipeng MA Chao LI Ke SONG Yan ZHANG Fei WANG Liyi XIAO
Most state-of-the-art discriminative tracking approaches are based on either template appearance models or statistical appearance models. Despite template appearance models have shown excellent performance, they perform poorly when the target appearance changes rapidly. In contrast, statistic appearance models are insensitive to fast target state changes, but they yield inferior tracking results in challenging scenarios such as illumination variations and background clutters. In this paper, we propose an adaptive object tracking approach with complementary models based on template and statistical appearance models. Both of these models are unified via our novel combination strategy. In addition, we introduce an efficient update scheme to improve the performance of our approach. Experimental results demonstrate that our approach achieves superior performance at speeds that far exceed the frame-rate requirement on recent tracking benchmarks.
Wenming YANG Riqiang GAO Qingmin LIAO
This paper presents a strategy, Weighted Voting of Discriminative Regions (WVDR), to improve the face recognition performance, especially in Small Sample Size (SSS) and occlusion situations. In WVDR, we extract the discriminative regions according to facial key points and abandon the rest parts. Considering different regions of face make different contributions to recognition, we assign weights to regions for weighted voting. We construct a decision dictionary according to the recognition results of selected regions in the training phase, and this dictionary is used in a self-defined loss function to obtain weights. The final identity of test sample is the weighted voting of selected regions. In this paper, we combine the WVDR strategy with CRC and SRC separately, and extensive experiments show that our method outperforms the baseline and some representative algorithms.
Kisoo KWON Jong Won SHIN Nam Soo KIM
Nonnegative matrix factorization (NMF) is an unsupervised technique to represent nonnegative data as linear combinations of nonnegative bases, which has shown impressive performance for source separation. However, its source separation performance degrades when one signal can also be described well with the bases for the interfering source signals. In this paper, we propose a discriminative NMF (DNMF) algorithm which exploits the reconstruction error for the interfering signals as well as the target signal based on target bases. The objective function for training the bases is constructed so as to yield high reconstruction error for the interfering source signals while guaranteeing low reconstruction error for the target source signals. Experiments show that the proposed method outperformed the standard NMF and another DNMF method in terms of both the perceptual evaluation of speech quality score and signal-to-distortion ratio in various noisy environments.
Middle-level parts have attracted great attention in the computer vision community, acting as discriminative elements for objects. In this paper we propose an unsupervised approach to mine discriminative parts for object detection. This work features three aspects. First, we introduce an unsupervised, exemplar-based training process for part detection. We generate initial parts by selective search and then train part detectors by exemplar SVM. Second, a part selection model based on consistency and distinctiveness is constructed to select effective parts from the candidate pool. Third, we combine discriminative part mining with the deformable part model (DPM) for object detection. The proposed method is evaluated on the PASCAL VOC2007 and VOC2010 datasets. The experimental results demons-trate the effectiveness of our method for object detection.
Shuang LIU Zhong ZHANG Baihua XIAO Xiaozhong CAO
Texture feature descriptors such as local binary patterns (LBP) have proven effective for ground-based cloud classification. Traditionally, these texture feature descriptors are predefined in a handcrafted way. In this paper, we propose a novel method which automatically learns discriminative features from labeled samples for ground-based cloud classification. Our key idea is to learn these features through mutual information maximization which learns a transformation matrix for local difference vectors of LBP. The experimental results show that our learned features greatly improves the performance of ground-based cloud classification when compared to the other state-of-the-art methods.
An LIU Maoyin CHEN Donghua ZHOU
Robust crater recognition is a research focus on deep space exploration mission, and sparse representation methods can achieve desirable robustness and accuracy. Due to destruction and noise incurred by complex topography and varied illumination in planetary images, a robust crater recognition approach is proposed based on dictionary learning with a low-rank error correction model in a sparse representation framework. In this approach, all the training images are learned as a compact and discriminative dictionary. A low-rank error correction term is introduced into the dictionary learning to deal with gross error and corruption. Experimental results on crater images show that the proposed method achieves competitive performance in both recognition accuracy and efficiency.
For robust visual tracking, the main challenges of a subspace representation model can be attributed to the difficulty in handling various appearances of the target object. Traditional subspace learning tracking algorithms neglected the discriminative correlation between different multi-view target samples and the effectiveness of sparse subspace learning. For learning a better subspace representation model, we designed a discriminative graph to model both the labeled target samples with various appearances and the updated foreground and background samples, which are selected using an incremental updating scheme. The proposed discriminative graph structure not only can explicitly capture multi-modal intraclass correlations within labeled samples but also can obtain a balance between within-class local manifold and global discriminative information from foreground and background samples. Based on the discriminative graph, we achieved a sparse embedding by using L2,1-norm, which is incorporated to select relevant features and learn transformation in a unified framework. In a tracking procedure, the subspace learning is embedded into a Bayesian inference framework using compound motion estimation and a discriminative observation model, which significantly makes localization effective and accurate. Experiments on several videos have demonstrated that the proposed algorithm is robust for dealing with various appearances, especially in dynamically changing and clutter situations, and has better performance than alternatives reported in the recent literature.
Meixu SONG Jielin PAN Qingwei ZHAO Yonghong YAN
Introducing pronunciation models into decoding has been proven to be benefit to LVCSR. In this paper, a discriminative pronunciation modeling method is presented, within the framework of the Minimum Phone Error (MPE) training for HMM/GMM. In order to bring the pronunciation models into the MPE training, the auxiliary function is rewritten at word level and decomposes into two parts. One is for co-training the acoustic models, and the other is for discriminatively training the pronunciation models. On Mandarin conversational telephone speech recognition task, compared to the baseline using a canonical lexicon, the discriminative pronunciation models reduced the absolute Character Error Rate (CER) by 0.7% on LDC test set, and with the acoustic model co-training, 0.8% additional CER decrease had been achieved.
A discriminative reference-based method for scene image categorization is presented in this letter. Reference-based image classification approach combined with K-SVD is approved to be a simple, efficient, and effective method for scene image categorization. It learns a subspace as a means of randomly selecting a reference-set and uses it to represent images. A good reference-set should be both representative and discriminative. More specifically, the reference-set subspace should well span the data space while maintaining low redundancy. To automatically select reference images, we adapt affinity propagation algorithm based on data similarity to gather a reference-set that is both representative and discriminative. We apply the discriminative reference-based method to the task of scene categorization on some benchmark datasets. Extensive experiment results demonstrate that the proposed scene categorization method with selected reference set achieves better performance and higher efficiency compared to the state-of-the-art methods.
Yaohui QI Fuping PAN Fengpei GE Qingwei ZHAO Yonghong YAN
A smoothing method for minimum phone error linear regression (MPELR) is proposed in this paper. We show that the objective function for minimum phone error (MPE) can be combined with a prior mean distribution. When the prior mean distribution is based on maximum likelihood (ML) estimates, the proposed method is the same as the previous smoothing technique for MPELR. Instead of ML estimates, maximum a posteriori (MAP) parameter estimate is used to define the mode of prior mean distribution to improve the performance of MPELR. Experiments on a large vocabulary speech recognition task show that the proposed method can obtain 8.4% relative reduction in word error rate when the amount of data is limited, while retaining the same asymptotic performance as conventional MPELR. When compared with discriminative maximum a posteriori linear regression (DMAPLR), the proposed method shows improvement except for the case of limited adaptation data for supervised adaptation.
Keigo KUBO Sakriani SAKTI Graham NEUBIG Tomoki TODA Satoshi NAKAMURA
Grapheme-to-phoneme (g2p) conversion, used to estimate the pronunciations of out-of-vocabulary (OOV) words, is a highly important part of recognition systems, as well as text-to-speech systems. The current state-of-the-art approach in g2p conversion is structured learning based on the Margin Infused Relaxed Algorithm (MIRA), which is an online discriminative training method for multiclass classification. However, it is known that the aggressive weight update method of MIRA is prone to overfitting, even if the current example is an outlier or noisy. Adaptive Regularization of Weight Vectors (AROW) has been proposed to resolve this problem for binary classification. In addition, AROW's update rule is simpler and more efficient than that of MIRA, allowing for more efficient training. Although AROW has these advantages, it has not been applied to g2p conversion yet. In this paper, we first apply AROW on g2p conversion task which is structured learning problem. In an evaluation that employed a dataset generated from the collective knowledge on the Web, our proposed approach achieves a 6.8% error reduction rate compared to MIRA in terms of phoneme error rate. Also the learning time of our proposed approach was shorter than that of MIRA in almost datasets.
Xin LI Jielin PAN Qingwei ZHAO Yonghong YAN
Morphemes, which are obtained from morphological parsing, and statistical sub-words, which are derived from data-driven splitting, are commonly used as the recognition units for speech recognition of agglutinative languages. In this letter, we propose a discriminative approach to select the splitting result, which is more likely to improve the recognizer's performance, for each distinct word type. An objective function which involves the unigram language model (LM) probability and the count of misrecognized phones on the acoustic training data is defined and minimized. After determining the splitting result for each word in the text corpus, we select the frequent units to build a hybrid vocabulary including morphemes and statistical sub-words. Compared to a statistical sub-word based system, the hybrid system achieves 0.8% letter error rates (LERs) reduction on the test set.
Hongbo ZHANG Shaozi LI Songzhi SU Shu-Yuan CHEN
Many successful methods for recognizing human action are spatio-temporal interest point (STIP) based methods. Given a test video sequence, for a matching-based method using a voting mechanism, each test STIP casts a vote for each action class based on its mutual information with respect to the respective class, which is measured in terms of class likelihood probability. Therefore, two issues should be addressed to improve the accuracy of action recognition. First, effective STIPs in the training set must be selected as references for accurately estimating probability. Second, discriminative STIPs in the test set must be selected for voting. This work uses ε-nearest neighbors as effective STIPs for estimating the class probability and uses a variance filter for selecting discriminative STIPs. Experimental results verify that the proposed method is more accurate than existing action recognition methods.
Hiroto SAIGO Hisashi KASHIMA Koji TSUDA
Apriori-based mining algorithms enumerate frequent patterns efficiently, but the resulting large number of patterns makes it difficult to directly apply subsequent learning tasks. Recently, efficient iterative methods are proposed for mining discriminative patterns for classification and regression. These methods iteratively execute discriminative pattern mining algorithm and update example weights to emphasize on examples which received large errors in the previous iteration. In this paper, we study a family of loss functions that induces sparsity on example weights. Most of the resulting example weights become zeros, so we can eliminate those examples from discriminative pattern mining, leading to a significant decrease in search space and time. In computational experiments we compare and evaluate various loss functions in terms of the amount of sparsity induced and resulting speed-up obtained.
Akio KOBAYASHI Takahiro OKU Toru IMAI Seiichi NAKAGAWA
This paper describes a new method for semi-supervised discriminative language modeling, which is designed to improve the robustness of a discriminative language model (LM) obtained from manually transcribed (labeled) data. The discriminative LM is implemented as a log-linear model, which employs a set of linguistic features derived from word or phoneme sequences. The proposed semi-supervised discriminative modeling is formulated as a multi-objective optimization programming problem (MOP), which consists of two objective functions defined on both labeled lattices and automatic speech recognition (ASR) lattices as unlabeled data. The objectives are coherently designed based on the expected risks that reflect information about word errors for the training data. The model is trained in a discriminative manner and acquired as a solution to the MOP problem. In transcribing Japanese broadcast programs, the proposed method reduced relatively a word error rate by 6.3% compared with that achieved by a conventional trigram LM.
Takanobu OBA Takaaki HORI Atsushi NAKAMURA Akinori ITO
This paper describes a technique for overcoming the model shrinkage problem in automatic speech recognition (ASR), which allows application developers and users to control the model size with less degradation of accuracy. Recently, models for ASR systems tend to be large and this can constitute a bottleneck for developers and users without special knowledge of ASR with respect to introducing the ASR function. Specifically, discriminative language models (DLMs) are usually designed in a high-dimensional parameter space, although DLMs have gained increasing attention as an approach for improving recognition accuracy. Our proposed method can be applied to linear models including DLMs, in which the score of an input sample is given by the inner product of its features and the model parameters, but our proposed method can shrink models in an easy computation by obtaining simple statistics, which are square sums of feature values appearing in a data set. Our experimental results show that our proposed method can shrink a DLM with little degradation in accuracy and perform properly whether or not the data for obtaining the statistics are the same as the data for training the model.