Karolina NURZYNSKA Mamoru KUBO Ken-ichiro MURAMOTO
This study presents three image processing systems for snow particle classification into snowflake and graupel. All of them are based on feature classification, yet as a novelty in all cases multiple features are exploited. Additionally, each of them is characterized by a different data flow. In order to compare the performances, we not only consider various features, but also suggest different classifiers. The best achieved results are for the snowflake discrimination method applied before statistical classifier, as the correct classification ratio in this case reaches 94%. In other cases the best results are around 88%.
This paper presents a combined feature extraction method to improve the performance of bag-of-features image classification. We apply 10 relevant operations to global/local statistics of visual words. Because the pairwise combination of visual words is large, we apply feature selection methods including fisher discriminant criterion and L1-SVM. The effectiveness of the proposed method is confirmed through the experiment.
Walaa ALY Seiichi UCHIDA Masakazu SUZUKI
Machine recognition of mathematical expressions on printed documents is not trivial even when all the individual characters and symbols in an expression can be recognized correctly. In this paper, an automatic classification method of spatial relationships between the adjacent symbols in a pair is presented. This classification is important to realize an accurate structure analysis module of math OCR. Experimental results on very large databases showed that this classification worked well with an accuracy of 99.525% by using distribution maps which are defined by two geometric features, relative size and relative position, with careful treatment on document-dependent characteristics.
Mohammad Nurul HUDA Hiroaki KAWASHIMA Tsuneo NITTA
This paper describes a distinctive phonetic feature (DPF) extraction method for use in a phoneme recognition system; our method has a low computation cost. This method comprises three stages. The first stage uses two multilayer neural networks (MLNs): MLNLF-DPF, which maps continuous acoustic features, or local features (LFs), onto discrete DPF features, and MLNDyn, which constrains the DPF context at the phoneme boundaries. The second stage incorporates inhibition/enhancement (In/En) functionalities to discriminate whether the DPF dynamic patterns of trajectories are convex or concave, where convex patterns are enhanced and concave patterns are inhibited. The third stage decorrelates the DPF vectors using the Gram-Schmidt orthogonalization procedure before feeding them into a hidden Markov model (HMM)-based classifier. In an experiment on Japanese Newspaper Article Sentences (JNAS) utterances, the proposed feature extractor, which incorporates two MLNs and an In/En network, was found to provide a higher phoneme correct rate with fewer mixture components in the HMMs.
Abdul JALIL Anwar MANZAR Tanweer A. CHEEMA Ijaz M. QURESHI
A rotation invariant texture analysis technique is proposed with a novel combination of Radon Transform (RT) and Hidden Markov Models (HMM). Features of any texture are extracted during RT which due to its inherent property captures all the directional properties of a certain texture. HMMs are used for classification purpose. One HMM is trained for each texture on its feature vector which preserves the rotational invariance of feature vector in a more compact and useful form. Once all the HMMs have been trained, testing is done by picking any of these textures at any arbitrary orientation. The best percentage of correct classification (PCC) is above 98 % carried out on sixty texture of Brodatz album.
Akinori HIDAKA Kenji NISHIDA Takio KURITA
In this paper, we propose a novel classifier-based object tracker. Our tracker is the combination of Rectangle Feature (RF) based detector [17],[18] and optical-flow based tracking method [1]. We show that the gradient of extended RFs can be calculated rapidly by using Integral Image method. The proposed tracker was tested on real video sequences. We applied our tracker for face tracking and car tracking experiments. Our tracker worked over 100 fps while maintaining comparable accuracy to RF based detector. Our tracking routine that does not contain image I/O processing can be performed about 500 to 2,500 fps with sufficient tracking accuracy.
Three features for image classification into natural objects and artifacts are investigated. They are 'line length ratio', 'line direction distribution,' and 'edge coverage'. Among the three, the feature 'line length ratio' shows superior classification accuracy (above 90%) that exceeds the performance of conventional features, according to experimental results in application to digital camera images. As the development of this feature was motivated by the fact that the edge sharpening magnitude in image-quality improvement must be controlled based on the image content, this classification algorithm should be especially suitable for the image-quality improvement applications.
In this paper we propose an efficient line feature-based 2D object recognition algorithm using a novel entropy correspondence measure (ECM) that encodes the probabilistic similarity between two line feature sets. Since the proposed ECM-based method uses the whole structural information of objects simultaneously for matching, it overcomes the common drawbacks of the conventional techniques that are based on feature to feature correspondence. Moreover, since ECM is endowed with probabilistic attribute, it shows quite robust performance in the noisy environment. In order to enhance the recognition performance and speed, line features are pre-clustered into several groups according to their inclination by an eigen analysis, and then ECM is applied to each corresponding group individually. Experimental results on real images demonstrate that the proposed algorithm has superior performance to those of the conventional algorithms in both the accuracy and the computational efficiency, in the noisy environment.
In this paper, we propose a learning classifier based on maximum entropy (ME) for resolving zero-anaphora in Chinese text. Besides regular grammatical, lexical, positional and semantic features motivated by previous research on anaphora resolution, we develop two innovative Web-based features for extracting additional semantic information from the Web. The values of the two features can be obtained easily by querying the Web using some patterns. Our study shows that our machine learning approach is able to achieve an accuracy comparable to that of state-of-the-art systems. The Web as a knowledge source can be incorporated effectively into the ME learning framework and significantly improves the performance of our approach.
This paper presents a robust object tracking method under pose variation and partial occlusion. In practical environment, the appearance of objects is changed dynamically by pose variation or partial occlusion. Therefore, the robustness to them is required for practical applications. However, it is difficult to be robust to various changes by only one tracking model. Therefore, slight robustness to variations and the easiness of model update are required. For this purpose, Kernel Principal Component Analysis (KPCA) of local parts is used. KPCA of local parts is proposed originally for the purpose of pose independent object recognition. Training of this method is performed by using local parts cropped from only one or two object images. This is good property for tracking because only one target image is given in practical applications. In addition, the model (subspace) of this method can be updated easily by solving a eigen value problem. Performance of the proposed method is evaluated by using the test face sequence captured under pose, partial occlusion, scaling and illumination variations. Effectiveness and robustness of the proposed method are demonstrated by the comparison with template matching based tracker. In addition, adaptive update rule using similarity with current subspace is also proposed. Effectiveness of adaptive update rule is shown by experiment.
Kazuya HARAGUCHI Toshihide IBARAKI
We consider the classification problem to construct a classifier c:{0,1}n
Zhibin PAN Koji KOTANI Tadahiro OHMI
The encoding process of vector quantization (VQ) is a time bottleneck preventing its practical applications. In order to speed up VQ encoding, it is very effective to use lower dimensional features of a vector to estimate how large the Euclidean distance between the input vector and a candidate codeword could be so as to reject most unlikely codewords. The three popular statistical features of the average or the mean, the variance, and L2 norm of a vector have already been adopted in the previous works individually. Recently, these three statistical features were combined together to derive a sequential EEENNS search method in [6], which is very efficient but still has obvious computational redundancy. This Letter aims at giving a mathematical analysis on the results of EEENNS method further and pointing out that it is actually unnecessary to use L2 norm feature anymore in fast VQ encoding if the mean and the variance are used simultaneously as proposed in IEENNS method. In other words, L2 norm feature is redundant for a rejection test in fast VQ encoding. Experimental results demonstrated an approximate 10-20% reduction of the total computational cost for various detailed images in the case of not using L2 norm feature so that it confirmed the correctness of the mathematical analysis.
Shingo YOSHIZAWA Noboru HAYASAKA Naoya WADA Yoshikazu MIYANAGA
This paper describes a noise robustness technique that normalizes the cepstral amplitude range in order to remove the influence of additive noise. Additive noise causes speech feature mismatches between testing and training environments and it degrades recognition accuracy in noisy environments. We presume an approximate model that expresses the influence by changing the amplitude range and the DC component in the log-spectra. According to this model, we propose a cepstral amplitude range normalization (CARN) that normalizes the cepstral distance between maximum and minimum values. It can estimate noise robust features without prior knowledge or adaptation. We evaluated its performance in an isolated word recognition task by using the Noisex92 database. Compared with the combinations of conventional methods, the CARN could improve recognition accuracy under various SNR conditions.
In speech enhancement with adaptive microphone array, the voice activity detection (VAD) is indispensable for the adaptation control. Even though many VAD methods have been proposed as a pre-processor for speech recognition and compression, they can hardly discriminate nonstationary interferences which frequently exist in real environment. In this research, we propose a novel VAD method with array signal processing in the wavelet domain. In that domain we can integrate the temporal, spectral and spatial information to achieve robust voice activity discriminability for a nonstationary interference arriving from close direction of speech. The signals acquired by microphone array are at first decomposed into appropriate subbands using wavelet packet to extract its temporal and spectral features. Then directionality check and direction estimation on each subbands are executed to do VAD with respect to the spatial information. Computer simulation results for sound data demonstrate that the proposed method keeps its discriminability even for the interference arriving from close direction of speech.
Synergies in processing requirements and knowledge of human speech production and perception have led to a similarity of the speech signal representations used for the tasks of recognition, coding, and modification. The representations are generally composed of a description of the vocal-tract transfer function and, in the case of coding and modification, a description of the excitation signal. This paper provides an overview of commonly used representations. For coding and modification, autoregressive models represented by line spectral frequencies perform well for the vocal tract, and pitch-synchronous filter banks and modulation-domain filters perform well for the excitation. For recognition, good representations are based on a smoothed magnitude response of the vocal tract.
In the last three decades of the 20th Century, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, to medium size vocabulary voice interactive command and control systems for business automation, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. Although we have witnessed many new technological promises, we have also encountered a number of practical limitations that hinder a widespread deployment of applications and services. On one hand, fast progress was observed in statistical speech and language modeling. On the other hand only spotty successes have been reported in applying knowledge sources in acoustics, speech and language science to improving speech recognition performance and robustness to adverse conditions. In this paper we review some key advances in several areas of speech recognition. A bottom-up detection framework is also proposed to facilitate worldwide research collaboration for incorporating technology advances in both statistical modeling and knowledge integration into going beyond the current speech recognition limitations and benefiting the society in the 21st century.
Konstantin MARKOV Tomoko MATSUI Rainer GRUHN Jinsong ZHANG Satoshi NAKAMURA
This paper presents the ATR speech recognition system designed for the DARPA SPINE2 evaluation task. The system is capable of dealing with speech from highly variable, real-world noisy conditions and communication channels. A number of robust techniques are implemented, such as differential spectrum mel-scale cepstrum features, on-line MLLR adaptation, and word-level hypothesis combination, which led to a significant reduction in the word error rate.
Tin Lay NWE Say Wei FOO Liyanage C. DE SILVA
On research to determine reliable acoustic indicators for the type of stress present in speech, the majority of systems have concentrated on the statistics extracted from pitch contour, energy contour, wavelet based subband features and Teager-Energy-Operator (TEO) based feature parameters. These systems work mostly on pair-wise distinction between stress and neutral speech. Their performance decreases substantially when tested in multi-style detection among many stress categories. In this paper, a novel system is proposed using linear short time Log Frequency Power Coefficients (LFPC) and TEO based nonlinear LFPC features in both time and frequency domain. Five-state Hidden Markov Model (HMM) with continuous Gaussian mixture distribution is used. The stress classification ability of the system is tested using data from the SUSAS (Speech Under Simulated and Actual Stress) database to categorize five stress conditions individually. It is found that the performance of linear acoustic features LFPC is better than that of nonlinear TEO based LFPC feature parameters. Results show that with linear acoustic feature LFPC, average accuracy of 84% and the best accuracy of 95% can be achieved in the classification of the five categories. Results of test of the system under different signal-to-noise conditions show that the performance of the system does not degrade drastically with increase in noise. It is also observed that classification using nonlinear frequency domain LFPC features gives relatively higher accuracy than that using nonlinear time domain LFPC features.
Kazuhito KOISHIDA Keiichi TOKUDA Takashi MASUKO Takao KOBAYASHI
This paper proposes a vector quantization scheme which makes it possible to consider the dynamics of input vectors. In the proposed scheme, a linear transformation is applied to the consecutive input vectors and the resulting vector is quantized with a distortion measure defined by the statistics. At the decoder side, the output vector sequence is determined using the statistics associated with the transmitted indices in such a way that a likelihood is maximized. To solve the maximization problem, a computationally efficient algorithm is derived. The performance of the proposed method is evaluated in LSP parameter quantization. It is found that the LSP trajectories and the corresponding spectra change quite smoothly in the proposed method. It is also shown that the use of the proposed method results in a significant improvement of subjective quality.
Kazuhiro HOTTA Taketoshi MISHIMA Takio KURITA
This paper presents a scale invariant face detection and classification method which uses shift invariant features extracted from a Log-Polar image. Scale changes of a face in an image are represented as shift along the horizontal axis in the Log-Polar image. In order to obtain scale invariant features, shift invariant features are extracted from each row of the Log-Polar image. Autocorrelations, Fourier spectrum, and PARCOR coefficients are used as shift invariant features. These features are then combined with simple classification methods based on Linear Discriminant Analysis to realize scale invariant face detection and classification. The effectiveness of the proposed face detection method is confirmed by experiments using face images captured under different scales, backgrounds, illuminations, and dates. To evaluate the proposed face classification method, we performed experiments using 2,800 face images with 7 scales under 2 different backgrounds and face images of 52 persons.