The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] features(84hit)

61-80hit(84hit)

  • 2D Feature Space for Snow Particle Classification into Snowflake and Graupel

    Karolina NURZYNSKA  Mamoru KUBO  Ken-ichiro MURAMOTO  

     
    PAPER-Pattern Recognition

      Vol:
    E93-D No:12
      Page(s):
    3344-3351

    This study presents three image processing systems for snow particle classification into snowflake and graupel. All of them are based on feature classification, yet as a novelty in all cases multiple features are exploited. Additionally, each of them is characterized by a different data flow. In order to compare the performances, we not only consider various features, but also suggest different classifiers. The best achieved results are for the snowflake discrimination method applied before statistical classifier, as the correct classification ratio in this case reaches 94%. In other cases the best results are around 88%.

  • Extraction of Combined Features from Global/Local Statistics of Visual Words Using Relevant Operations

    Tetsu MATSUKAWA  Takio KURITA  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E93-D No:10
      Page(s):
    2870-2874

    This paper presents a combined feature extraction method to improve the performance of bag-of-features image classification. We apply 10 relevant operations to global/local statistics of visual words. Because the pairwise combination of visual words is large, we apply feature selection methods including fisher discriminant criterion and L1-SVM. The effectiveness of the proposed method is confirmed through the experiment.

  • Automatic Classification of Spatial Relationships among Mathematical Symbols Using Geometric Features

    Walaa ALY  Seiichi UCHIDA  Masakazu SUZUKI  

     
    PAPER-Pattern Recognition

      Vol:
    E92-D No:11
      Page(s):
    2235-2243

    Machine recognition of mathematical expressions on printed documents is not trivial even when all the individual characters and symbols in an expression can be recognized correctly. In this paper, an automatic classification method of spatial relationships between the adjacent symbols in a pair is presented. This classification is important to realize an accurate structure analysis module of math OCR. Experimental results on very large databases showed that this classification worked well with an accuracy of 99.525% by using distribution maps which are defined by two geometric features, relative size and relative position, with careful treatment on document-dependent characteristics.

  • Distinctive Phonetic Feature (DPF) Extraction Based on MLNs and Inhibition/Enhancement Network

    Mohammad Nurul HUDA  Hiroaki KAWASHIMA  Tsuneo NITTA  

     
    PAPER-Speech and Hearing

      Vol:
    E92-D No:4
      Page(s):
    671-680

    This paper describes a distinctive phonetic feature (DPF) extraction method for use in a phoneme recognition system; our method has a low computation cost. This method comprises three stages. The first stage uses two multilayer neural networks (MLNs): MLNLF-DPF, which maps continuous acoustic features, or local features (LFs), onto discrete DPF features, and MLNDyn, which constrains the DPF context at the phoneme boundaries. The second stage incorporates inhibition/enhancement (In/En) functionalities to discriminate whether the DPF dynamic patterns of trajectories are convex or concave, where convex patterns are enhanced and concave patterns are inhibited. The third stage decorrelates the DPF vectors using the Gram-Schmidt orthogonalization procedure before feeding them into a hidden Markov model (HMM)-based classifier. In an experiment on Japanese Newspaper Article Sentences (JNAS) utterances, the proposed feature extractor, which incorporates two MLNs and an In/En network, was found to provide a higher phoneme correct rate with fewer mixture components in the HMMs.

  • New Rotation-Invariant Texture Analysis Technique Using Radon Transform and Hidden Markov Models

    Abdul JALIL  Anwar MANZAR  Tanweer A. CHEEMA  Ijaz M. QURESHI  

     
    LETTER-Computer Graphics

      Vol:
    E91-D No:12
      Page(s):
    2906-2909

    A rotation invariant texture analysis technique is proposed with a novel combination of Radon Transform (RT) and Hidden Markov Models (HMM). Features of any texture are extracted during RT which due to its inherent property captures all the directional properties of a certain texture. HMMs are used for classification purpose. One HMM is trained for each texture on its feature vector which preserves the rotational invariance of feature vector in a more compact and useful form. Once all the HMMs have been trained, testing is done by picking any of these textures at any arbitrary orientation. The best percentage of correct classification (PCC) is above 98 % carried out on sixty texture of Brodatz album.

  • Object Tracking by Maximizing Classification Score of Detector Based on Rectangle Features

    Akinori HIDAKA  Kenji NISHIDA  Takio KURITA  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E91-D No:8
      Page(s):
    2163-2170

    In this paper, we propose a novel classifier-based object tracker. Our tracker is the combination of Rectangle Feature (RF) based detector [17],[18] and optical-flow based tracking method [1]. We show that the gradient of extended RFs can be calculated rapidly by using Integral Image method. The proposed tracker was tested on real video sequences. We applied our tracker for face tracking and car tracking experiments. Our tracker worked over 100 fps while maintaining comparable accuracy to RF based detector. Our tracking routine that does not contain image I/O processing can be performed about 500 to 2,500 fps with sufficient tracking accuracy.

  • Natural Object/Artifact Image Classification Based on Line Features

    Johji TAJIMA  Hironori KONO  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E91-D No:8
      Page(s):
    2207-2211

    Three features for image classification into natural objects and artifacts are investigated. They are 'line length ratio', 'line direction distribution,' and 'edge coverage'. Among the three, the feature 'line length ratio' shows superior classification accuracy (above 90%) that exceeds the performance of conventional features, according to experimental results in application to digital camera images. As the development of this feature was motivated by the fact that the edge sharpening magnitude in image-quality improvement must be controlled based on the image content, this classification algorithm should be especially suitable for the image-quality improvement applications.

  • Structural Object Recognition Using Entropy Correspondence Measure of Line Features

    San KO  Kyoung Mu LEE  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E91-D No:1
      Page(s):
    78-85

    In this paper we propose an efficient line feature-based 2D object recognition algorithm using a novel entropy correspondence measure (ECM) that encodes the probabilistic similarity between two line feature sets. Since the proposed ECM-based method uses the whole structural information of objects simultaneously for matching, it overcomes the common drawbacks of the conventional techniques that are based on feature to feature correspondence. Moreover, since ECM is endowed with probabilistic attribute, it shows quite robust performance in the noisy environment. In order to enhance the recognition performance and speed, line features are pre-clustered into several groups according to their inclination by an eigen analysis, and then ECM is applied to each corresponding group individually. Experimental results on real images demonstrate that the proposed algorithm has superior performance to those of the conventional algorithms in both the accuracy and the computational efficiency, in the noisy environment.

  • Zero-Anaphora Resolution in Chinese Using Maximum Entropy

    Jing PENG  Kenji ARAKI  

     
    PAPER-Natural Language Processing

      Vol:
    E90-D No:7
      Page(s):
    1092-1102

    In this paper, we propose a learning classifier based on maximum entropy (ME) for resolving zero-anaphora in Chinese text. Besides regular grammatical, lexical, positional and semantic features motivated by previous research on anaphora resolution, we develop two innovative Web-based features for extracting additional semantic information from the Web. The values of the two features can be obtained easily by querying the Web using some patterns. Our study shows that our machine learning approach is able to achieve an accuracy comparable to that of state-of-the-art systems. The Web as a knowledge source can be incorporated effectively into the ME learning framework and significantly improves the performance of our approach.

  • A Robust Object Tracking Method under Pose Variation and Partial Occlusion

    Kazuhiro HOTTA  

     
    PAPER-Tracking

      Vol:
    E89-D No:7
      Page(s):
    2132-2141

    This paper presents a robust object tracking method under pose variation and partial occlusion. In practical environment, the appearance of objects is changed dynamically by pose variation or partial occlusion. Therefore, the robustness to them is required for practical applications. However, it is difficult to be robust to various changes by only one tracking model. Therefore, slight robustness to variations and the easiness of model update are required. For this purpose, Kernel Principal Component Analysis (KPCA) of local parts is used. KPCA of local parts is proposed originally for the purpose of pose independent object recognition. Training of this method is performed by using local parts cropped from only one or two object images. This is good property for tracking because only one target image is given in practical applications. In addition, the model (subspace) of this method can be updated easily by solving a eigen value problem. Performance of the proposed method is evaluated by using the test face sequence captured under pose, partial occlusion, scaling and illumination variations. Effectiveness and robustness of the proposed method are demonstrated by the comparison with template matching based tracker. In addition, adaptive update rule using similarity with current subspace is also proposed. Effectiveness of adaptive update rule is shown by experiment.

  • Construction of Classifiers by Iterative Compositions of Features with Partial Knowledge

    Kazuya HARAGUCHI  Toshihide IBARAKI  

     
    PAPER

      Vol:
    E89-A No:5
      Page(s):
    1284-1291

    We consider the classification problem to construct a classifier c:{0,1}n{0,1} from a given set of examples (training set), which (approximately) realizes the hidden oracle y:{0,1}n{0,1} describing the phenomenon under consideration. For this problem, a number of approaches are already known in computational learning theory; e.g., decision trees, support vector machines (SVM), and iteratively composed features (ICF). The last one, ICF, was proposed in our previous work (Haraguchi et al., (2004)). A feature, composed of a nonempty subset S of other features (including the original data attributes), is a Boolean function fS:{0,1}S{0,1} and is constructed according to the proposed rule. The ICF algorithm iterates generation and selection processes of features, and finally adopts one of the generated features as the classifier, where the generation process may be considered as embodying the idea of boosting, since new features are generated from the available features. In this paper, we generalize a feature to an extended Boolean function fS:{0,1,*}S{0,1,*} to allow partial knowledge, where * denotes the state of uncertainty. We then propose the algorithm ICF* to generate such generalized features. The selection process of ICF* is also different from that of ICF, in that features are selected so as to cover the entire training set. Our computational experiments indicate that ICF* is better than ICF in terms of both classification performance and computation time. Also, it is competitive with other representative learning algorithms such as decision trees and SVM.

  • Performance Comparison between Equal-Average Equal-Variance Equal-Norm Nearest Neighbor Search (EEENNS) Method and Improved Equal-Average Equal-Variance Nearest Neighbor Search (IEENNS) Method for Fast Encoding of Vector Quantization

    Zhibin PAN  Koji KOTANI  Tadahiro OHMI  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E88-D No:9
      Page(s):
    2218-2222

    The encoding process of vector quantization (VQ) is a time bottleneck preventing its practical applications. In order to speed up VQ encoding, it is very effective to use lower dimensional features of a vector to estimate how large the Euclidean distance between the input vector and a candidate codeword could be so as to reject most unlikely codewords. The three popular statistical features of the average or the mean, the variance, and L2 norm of a vector have already been adopted in the previous works individually. Recently, these three statistical features were combined together to derive a sequential EEENNS search method in [6], which is very efficient but still has obvious computational redundancy. This Letter aims at giving a mathematical analysis on the results of EEENNS method further and pointing out that it is actually unnecessary to use L2 norm feature anymore in fast VQ encoding if the mean and the variance are used simultaneously as proposed in IEENNS method. In other words, L2 norm feature is redundant for a rejection test in fast VQ encoding. Experimental results demonstrated an approximate 10-20% reduction of the total computational cost for various detailed images in the case of not using L2 norm feature so that it confirmed the correctness of the mathematical analysis.

  • Cepstral Amplitude Range Normalization for Noise Robust Speech Recognition

    Shingo YOSHIZAWA  Noboru HAYASAKA  Naoya WADA  Yoshikazu MIYANAGA  

     
    PAPER-Speech and Hearing

      Vol:
    E87-D No:8
      Page(s):
    2130-2137

    This paper describes a noise robustness technique that normalizes the cepstral amplitude range in order to remove the influence of additive noise. Additive noise causes speech feature mismatches between testing and training environments and it degrades recognition accuracy in noisy environments. We presume an approximate model that expresses the influence by changing the amplitude range and the DC component in the log-spectra. According to this model, we propose a cepstral amplitude range normalization (CARN) that normalizes the cepstral distance between maximum and minimum values. It can estimate noise robust features without prior knowledge or adaptation. We evaluated its performance in an isolated word recognition task by using the Noisex92 database. Compared with the combinations of conventional methods, the CARN could improve recognition accuracy under various SNR conditions.

  • Voice Activity Detection with Array Signal Processing in the Wavelet Domain

    Yusuke HIOKA  Nozomu HAMADA  

     
    PAPER-Engineering Acoustics

      Vol:
    E86-A No:11
      Page(s):
    2802-2811

    In speech enhancement with adaptive microphone array, the voice activity detection (VAD) is indispensable for the adaptation control. Even though many VAD methods have been proposed as a pre-processor for speech recognition and compression, they can hardly discriminate nonstationary interferences which frequently exist in real environment. In this research, we propose a novel VAD method with array signal processing in the wavelet domain. In that domain we can integrate the temporal, spectral and spatial information to achieve robust voice activity discriminability for a nonstationary interference arriving from close direction of speech. The signals acquired by microphone array are at first decomposed into appropriate subbands using wavelet packet to extract its temporal and spectral features. Then directionality check and direction estimation on each subbands are executed to do VAD with respect to the spatial information. Computer simulation results for sound data demonstrate that the proposed method keeps its discriminability even for the interference arriving from close direction of speech.

  • Signal Processing Representations of Speech

    W. Bastiaan KLEIJN  

     
    INVITED SURVEY PAPER

      Vol:
    E86-D No:3
      Page(s):
    359-376

    Synergies in processing requirements and knowledge of human speech production and perception have led to a similarity of the speech signal representations used for the tasks of recognition, coding, and modification. The representations are generally composed of a description of the vocal-tract transfer function and, in the case of coding and modification, a description of the excitation signal. This paper provides an overview of commonly used representations. For coding and modification, autoregressive models represented by line spectral frequencies perform well for the vocal tract, and pitch-synchronous filter banks and modulation-domain filters perform well for the excitation. For recognition, good representations are based on a smoothed magnitude response of the vocal tract.

  • On Automatic Speech Recognition at the Dawn of the 21st Century

    Chin-Hui LEE  

     
    INVITED SURVEY PAPER

      Vol:
    E86-D No:3
      Page(s):
    377-396

    In the last three decades of the 20th Century, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, to medium size vocabulary voice interactive command and control systems for business automation, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. Although we have witnessed many new technological promises, we have also encountered a number of practical limitations that hinder a widespread deployment of applications and services. On one hand, fast progress was observed in statistical speech and language modeling. On the other hand only spotty successes have been reported in applying knowledge sources in acoustics, speech and language science to improving speech recognition performance and robustness to adverse conditions. In this paper we review some key advances in several areas of speech recognition. A bottom-up detection framework is also proposed to facilitate worldwide research collaboration for incorporating technology advances in both statistical modeling and knowledge integration into going beyond the current speech recognition limitations and benefiting the society in the 21st century.

  • Noise and Channel Distortion Robust ASR System for DARPA SPINE2 Task

    Konstantin MARKOV  Tomoko MATSUI  Rainer GRUHN  Jinsong ZHANG  Satoshi NAKAMURA  

     
    PAPER-Robust Speech Recognition and Enhancement

      Vol:
    E86-D No:3
      Page(s):
    497-504

    This paper presents the ATR speech recognition system designed for the DARPA SPINE2 evaluation task. The system is capable of dealing with speech from highly variable, real-world noisy conditions and communication channels. A number of robust techniques are implemented, such as differential spectrum mel-scale cepstrum features, on-line MLLR adaptation, and word-level hypothesis combination, which led to a significant reduction in the word error rate.

  • Stress Classification Using Subband Based Features

    Tin Lay NWE  Say Wei FOO  Liyanage C. DE SILVA  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E86-D No:3
      Page(s):
    565-573

    On research to determine reliable acoustic indicators for the type of stress present in speech, the majority of systems have concentrated on the statistics extracted from pitch contour, energy contour, wavelet based subband features and Teager-Energy-Operator (TEO) based feature parameters. These systems work mostly on pair-wise distinction between stress and neutral speech. Their performance decreases substantially when tested in multi-style detection among many stress categories. In this paper, a novel system is proposed using linear short time Log Frequency Power Coefficients (LFPC) and TEO based nonlinear LFPC features in both time and frequency domain. Five-state Hidden Markov Model (HMM) with continuous Gaussian mixture distribution is used. The stress classification ability of the system is tested using data from the SUSAS (Speech Under Simulated and Actual Stress) database to categorize five stress conditions individually. It is found that the performance of linear acoustic features LFPC is better than that of nonlinear TEO based LFPC feature parameters. Results show that with linear acoustic feature LFPC, average accuracy of 84% and the best accuracy of 95% can be achieved in the classification of the five categories. Results of test of the system under different signal-to-noise conditions show that the performance of the system does not degrade drastically with increase in noise. It is also observed that classification using nonlinear frequency domain LFPC features gives relatively higher accuracy than that using nonlinear time domain LFPC features.

  • Vector Quantization of Speech Spectral Parameters Using Statistics of Static and Dynamic Features

    Kazuhito KOISHIDA  Keiichi TOKUDA  Takashi MASUKO  Takao KOBAYASHI  

     
    PAPER-Speech and Hearing

      Vol:
    E84-D No:10
      Page(s):
    1427-1434

    This paper proposes a vector quantization scheme which makes it possible to consider the dynamics of input vectors. In the proposed scheme, a linear transformation is applied to the consecutive input vectors and the resulting vector is quantized with a distortion measure defined by the statistics. At the decoder side, the output vector sequence is determined using the statistics associated with the transmitted indices in such a way that a likelihood is maximized. To solve the maximization problem, a computationally efficient algorithm is derived. The performance of the proposed method is evaluated in LSP parameter quantization. It is found that the LSP trajectories and the corresponding spectra change quite smoothly in the proposed method. It is also shown that the use of the proposed method results in a significant improvement of subjective quality.

  • Scale Invariant Face Detection and Classification Method Using Shift Invariant Features Extracted from Log-Polar Image

    Kazuhiro HOTTA  Taketoshi MISHIMA  Takio KURITA  

     
    PAPER

      Vol:
    E84-D No:7
      Page(s):
    867-878

    This paper presents a scale invariant face detection and classification method which uses shift invariant features extracted from a Log-Polar image. Scale changes of a face in an image are represented as shift along the horizontal axis in the Log-Polar image. In order to obtain scale invariant features, shift invariant features are extracted from each row of the Log-Polar image. Autocorrelations, Fourier spectrum, and PARCOR coefficients are used as shift invariant features. These features are then combined with simple classification methods based on Linear Discriminant Analysis to realize scale invariant face detection and classification. The effectiveness of the proposed face detection method is confirmed by experiments using face images captured under different scales, backgrounds, illuminations, and dates. To evaluate the proposed face classification method, we performed experiments using 2,800 face images with 7 scales under 2 different backgrounds and face images of 52 persons.

61-80hit(84hit)