IEICE global.ieice.org Site

Keyword Search Result

[Keyword] features(84hit)

61-80hit(84hit)

2D Feature Space for Snow Particle Classification into Snowflake and Graupel
Karolina NURZYNSKA Mamoru KUBO Ken-ichiro MURAMOTO

PAPER-Pattern Recognition

Vol:
E93-D No:12
Page(s):
3344-3351
This study presents three image processing systems for snow particle classification into snowflake and graupel. All of them are based on feature classification, yet as a novelty in all cases multiple features are exploited. Additionally, each of them is characterized by a different data flow. In order to compare the performances, we not only consider various features, but also suggest different classifiers. The best achieved results are for the snowflake discrimination method applied before statistical classifier, as the correct classification ratio in this case reaches 94%. In other cases the best results are around 88%.
Extraction of Combined Features from Global/Local Statistics of Visual Words Using Relevant Operations
Tetsu MATSUKAWA Takio KURITA

LETTER-Image Recognition, Computer Vision

Vol:
E93-D No:10
Page(s):
2870-2874
This paper presents a combined feature extraction method to improve the performance of bag-of-features image classification. We apply 10 relevant operations to global/local statistics of visual words. Because the pairwise combination of visual words is large, we apply feature selection methods including fisher discriminant criterion and L1-SVM. The effectiveness of the proposed method is confirmed through the experiment.
Automatic Classification of Spatial Relationships among Mathematical Symbols Using Geometric Features
Walaa ALY Seiichi UCHIDA Masakazu SUZUKI

PAPER-Pattern Recognition

Vol:
E92-D No:11
Page(s):
2235-2243
Machine recognition of mathematical expressions on printed documents is not trivial even when all the individual characters and symbols in an expression can be recognized correctly. In this paper, an automatic classification method of spatial relationships between the adjacent symbols in a pair is presented. This classification is important to realize an accurate structure analysis module of math OCR. Experimental results on very large databases showed that this classification worked well with an accuracy of 99.525% by using distribution maps which are defined by two geometric features, relative size and relative position, with careful treatment on document-dependent characteristics.
Distinctive Phonetic Feature (DPF) Extraction Based on MLNs and Inhibition/Enhancement Network
Mohammad Nurul HUDA Hiroaki KAWASHIMA Tsuneo NITTA

PAPER-Speech and Hearing

Vol:
E92-D No:4
Page(s):
671-680
This paper describes a distinctive phonetic feature (DPF) extraction method for use in a phoneme recognition system; our method has a low computation cost. This method comprises three stages. The first stage uses two multilayer neural networks (MLNs): MLNLF-DPF, which maps continuous acoustic features, or local features (LFs), onto discrete DPF features, and MLNDyn, which constrains the DPF context at the phoneme boundaries. The second stage incorporates inhibition/enhancement (In/En) functionalities to discriminate whether the DPF dynamic patterns of trajectories are convex or concave, where convex patterns are enhanced and concave patterns are inhibited. The third stage decorrelates the DPF vectors using the Gram-Schmidt orthogonalization procedure before feeding them into a hidden Markov model (HMM)-based classifier. In an experiment on Japanese Newspaper Article Sentences (JNAS) utterances, the proposed feature extractor, which incorporates two MLNs and an In/En network, was found to provide a higher phoneme correct rate with fewer mixture components in the HMMs.
New Rotation-Invariant Texture Analysis Technique Using Radon Transform and Hidden Markov Models
Abdul JALIL Anwar MANZAR Tanweer A. CHEEMA Ijaz M. QURESHI

LETTER-Computer Graphics

Vol:
E91-D No:12
Page(s):
2906-2909
A rotation invariant texture analysis technique is proposed with a novel combination of Radon Transform (RT) and Hidden Markov Models (HMM). Features of any texture are extracted during RT which due to its inherent property captures all the directional properties of a certain texture. HMMs are used for classification purpose. One HMM is trained for each texture on its feature vector which preserves the rotational invariance of feature vector in a more compact and useful form. Once all the HMMs have been trained, testing is done by picking any of these textures at any arbitrary orientation. The best percentage of correct classification (PCC) is above 98 % carried out on sixty texture of Brodatz album.
Object Tracking by Maximizing Classification Score of Detector Based on Rectangle Features
Akinori HIDAKA Kenji NISHIDA Takio KURITA

PAPER-Image Recognition, Computer Vision

Vol:
E91-D No:8
Page(s):
2163-2170
In this paper, we propose a novel classifier-based object tracker. Our tracker is the combination of Rectangle Feature (RF) based detector [17],[18] and optical-flow based tracking method [1]. We show that the gradient of extended RFs can be calculated rapidly by using Integral Image method. The proposed tracker was tested on real video sequences. We applied our tracker for face tracking and car tracking experiments. Our tracker worked over 100 fps while maintaining comparable accuracy to RF based detector. Our tracking routine that does not contain image I/O processing can be performed about 500 to 2,500 fps with sufficient tracking accuracy.
Natural Object/Artifact Image Classification Based on Line Features
Johji TAJIMA Hironori KONO

LETTER-Image Recognition, Computer Vision

Vol:
E91-D No:8
Page(s):
2207-2211
Three features for image classification into natural objects and artifacts are investigated. They are 'line length ratio', 'line direction distribution,' and 'edge coverage'. Among the three, the feature 'line length ratio' shows superior classification accuracy (above 90%) that exceeds the performance of conventional features, according to experimental results in application to digital camera images. As the development of this feature was motivated by the fact that the edge sharpening magnitude in image-quality improvement must be controlled based on the image content, this classification algorithm should be especially suitable for the image-quality improvement applications.
Structural Object Recognition Using Entropy Correspondence Measure of Line Features
San KO Kyoung Mu LEE

PAPER-Image Recognition, Computer Vision

Vol:
E91-D No:1
Page(s):
78-85
In this paper we propose an efficient line feature-based 2D object recognition algorithm using a novel entropy correspondence measure (ECM) that encodes the probabilistic similarity between two line feature sets. Since the proposed ECM-based method uses the whole structural information of objects simultaneously for matching, it overcomes the common drawbacks of the conventional techniques that are based on feature to feature correspondence. Moreover, since ECM is endowed with probabilistic attribute, it shows quite robust performance in the noisy environment. In order to enhance the recognition performance and speed, line features are pre-clustered into several groups according to their inclination by an eigen analysis, and then ECM is applied to each corresponding group individually. Experimental results on real images demonstrate that the proposed algorithm has superior performance to those of the conventional algorithms in both the accuracy and the computational efficiency, in the noisy environment.
Zero-Anaphora Resolution in Chinese Using Maximum Entropy
Jing PENG Kenji ARAKI

PAPER-Natural Language Processing

Vol:
E90-D No:7
Page(s):
1092-1102
In this paper, we propose a learning classifier based on maximum entropy (ME) for resolving zero-anaphora in Chinese text. Besides regular grammatical, lexical, positional and semantic features motivated by previous research on anaphora resolution, we develop two innovative Web-based features for extracting additional semantic information from the Web. The values of the two features can be obtained easily by querying the Web using some patterns. Our study shows that our machine learning approach is able to achieve an accuracy comparable to that of state-of-the-art systems. The Web as a knowledge source can be incorporated effectively into the ME learning framework and significantly improves the performance of our approach.
A Robust Object Tracking Method under Pose Variation and Partial Occlusion
Kazuhiro HOTTA

PAPER-Tracking

Vol:
E89-D No:7
Page(s):
2132-2141
This paper presents a robust object tracking method under pose variation and partial occlusion. In practical environment, the appearance of objects is changed dynamically by pose variation or partial occlusion. Therefore, the robustness to them is required for practical applications. However, it is difficult to be robust to various changes by only one tracking model. Therefore, slight robustness to variations and the easiness of model update are required. For this purpose, Kernel Principal Component Analysis (KPCA) of local parts is used. KPCA of local parts is proposed originally for the purpose of pose independent object recognition. Training of this method is performed by using local parts cropped from only one or two object images. This is good property for tracking because only one target image is given in practical applications. In addition, the model (subspace) of this method can be updated easily by solving a eigen value problem. Performance of the proposed method is evaluated by using the test face sequence captured under pose, partial occlusion, scaling and illumination variations. Effectiveness and robustness of the proposed method are demonstrated by the comparison with template matching based tracker. In addition, adaptive update rule using similarity with current subspace is also proposed. Effectiveness of adaptive update rule is shown by experiment.
Construction of Classifiers by Iterative Compositions of Features with Partial Knowledge
Kazuya HARAGUCHI Toshihide IBARAKI

PAPER

Vol:
E89-A No:5
Page(s):
1284-1291
We consider the classification problem to construct a classifier c:{0,1}n{0,1} from a given set of examples (training set), which (approximately) realizes the hidden oracle y:{0,1}n{0,1} describing the phenomenon under consideration. For this problem, a number of approaches are already known in computational learning theory; e.g., decision trees, support vector machines (SVM), and iteratively composed features (ICF). The last one, ICF, was proposed in our previous work (Haraguchi et al., (2004)). A feature, composed of a nonempty subset S of other features (including the original data attributes), is a Boolean function fS:{0,1}S{0,1} and is constructed according to the proposed rule. The ICF algorithm iterates generation and selection processes of features, and finally adopts one of the generated features as the classifier, where the generation process may be considered as embodying the idea of boosting, since new features are generated from the available features. In this paper, we generalize a feature to an extended Boolean function fS:{0,1,*}S{0,1,*} to allow partial knowledge, where * denotes the state of uncertainty. We then propose the algorithm ICF* to generate such generalized features. The selection process of ICF* is also different from that of ICF, in that features are selected so as to cover the entire training set. Our computational experiments indicate that ICF* is better than ICF in terms of both classification performance and computation time. Also, it is competitive with other representative learning algorithms such as decision trees and SVM.
Performance Comparison between Equal-Average Equal-Variance Equal-Norm Nearest Neighbor Search (EEENNS) Method and Improved Equal-Average Equal-Variance Nearest Neighbor Search (IEENNS) Method for Fast Encoding of Vector Quantization
Zhibin PAN Koji KOTANI Tadahiro OHMI

LETTER-Image Processing and Video Processing

Vol:
E88-D No:9
Page(s):
2218-2222
The encoding process of vector quantization (VQ) is a time bottleneck preventing its practical applications. In order to speed up VQ encoding, it is very effective to use lower dimensional features of a vector to estimate how large the Euclidean distance between the input vector and a candidate codeword could be so as to reject most unlikely codewords. The three popular statistical features of the average or the mean, the variance, and L2 norm of a vector have already been adopted in the previous works individually. Recently, these three statistical features were combined together to derive a sequential EEENNS search method in [6], which is very efficient but still has obvious computational redundancy. This Letter aims at giving a mathematical analysis on the results of EEENNS method further and pointing out that it is actually unnecessary to use L2 norm feature anymore in fast VQ encoding if the mean and the variance are used simultaneously as proposed in IEENNS method. In other words, L2 norm feature is redundant for a rejection test in fast VQ encoding. Experimental results demonstrated an approximate 10-20% reduction of the total computational cost for various detailed images in the case of not using L2 norm feature so that it confirmed the correctness of the mathematical analysis.
Cepstral Amplitude Range Normalization for Noise Robust Speech Recognition
Shingo YOSHIZAWA Noboru HAYASAKA Naoya WADA Yoshikazu MIYANAGA

PAPER-Speech and Hearing

Vol:
E87-D No:8
Page(s):
2130-2137
This paper describes a noise robustness technique that normalizes the cepstral amplitude range in order to remove the influence of additive noise. Additive noise causes speech feature mismatches between testing and training environments and it degrades recognition accuracy in noisy environments. We presume an approximate model that expresses the influence by changing the amplitude range and the DC component in the log-spectra. According to this model, we propose a cepstral amplitude range normalization (CARN) that normalizes the cepstral distance between maximum and minimum values. It can estimate noise robust features without prior knowledge or adaptation. We evaluated its performance in an isolated word recognition task by using the Noisex92 database. Compared with the combinations of conventional methods, the CARN could improve recognition accuracy under various SNR conditions.
Voice Activity Detection with Array Signal Processing in the Wavelet Domain
Yusuke HIOKA Nozomu HAMADA

PAPER-Engineering Acoustics

Vol:
E86-A No:11
Page(s):
2802-2811
In speech enhancement with adaptive microphone array, the voice activity detection (VAD) is indispensable for the adaptation control. Even though many VAD methods have been proposed as a pre-processor for speech recognition and compression, they can hardly discriminate nonstationary interferences which frequently exist in real environment. In this research, we propose a novel VAD method with array signal processing in the wavelet domain. In that domain we can integrate the temporal, spectral and spatial information to achieve robust voice activity discriminability for a nonstationary interference arriving from close direction of speech. The signals acquired by microphone array are at first decomposed into appropriate subbands using wavelet packet to extract its temporal and spectral features. Then directionality check and direction estimation on each subbands are executed to do VAD with respect to the spatial information. Computer simulation results for sound data demonstrate that the proposed method keeps its discriminability even for the interference arriving from close direction of speech.
Signal Processing Representations of Speech
W. Bastiaan KLEIJN

INVITED SURVEY PAPER

Vol:
E86-D No:3
Page(s):
359-376
Synergies in processing requirements and knowledge of human speech production and perception have led to a similarity of the speech signal representations used for the tasks of recognition, coding, and modification. The representations are generally composed of a description of the vocal-tract transfer function and, in the case of coding and modification, a description of the excitation signal. This paper provides an overview of commonly used representations. For coding and modification, autoregressive models represented by line spectral frequencies perform well for the vocal tract, and pitch-synchronous filter banks and modulation-domain filters perform well for the excitation. For recognition, good representations are based on a smoothed magnitude response of the vocal tract.
On Automatic Speech Recognition at the Dawn of the 21st Century
Chin-Hui LEE

INVITED SURVEY PAPER

Vol:
E86-D No:3
Page(s):
377-396
In the last three decades of the 20th Century, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, to medium size vocabulary voice interactive command and control systems for business automation, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. Although we have witnessed many new technological promises, we have also encountered a number of practical limitations that hinder a widespread deployment of applications and services. On one hand, fast progress was observed in statistical speech and language modeling. On the other hand only spotty successes have been reported in applying knowledge sources in acoustics, speech and language science to improving speech recognition performance and robustness to adverse conditions. In this paper we review some key advances in several areas of speech recognition. A bottom-up detection framework is also proposed to facilitate worldwide research collaboration for incorporating technology advances in both statistical modeling and knowledge integration into going beyond the current speech recognition limitations and benefiting the society in the 21st century.
Noise and Channel Distortion Robust ASR System for DARPA SPINE2 Task
Konstantin MARKOV Tomoko MATSUI Rainer GRUHN Jinsong ZHANG Satoshi NAKAMURA

PAPER-Robust Speech Recognition and Enhancement

Vol:
E86-D No:3
Page(s):
497-504
This paper presents the ATR speech recognition system designed for the DARPA SPINE2 evaluation task. The system is capable of dealing with speech from highly variable, real-world noisy conditions and communication channels. A number of robust techniques are implemented, such as differential spectrum mel-scale cepstrum features, on-line MLLR adaptation, and word-level hypothesis combination, which led to a significant reduction in the word error rate.
Stress Classification Using Subband Based Features
Tin Lay NWE Say Wei FOO Liyanage C. DE SILVA

PAPER-Speech Synthesis and Prosody

Vol:
E86-D No:3
Page(s):
565-573
On research to determine reliable acoustic indicators for the type of stress present in speech, the majority of systems have concentrated on the statistics extracted from pitch contour, energy contour, wavelet based subband features and Teager-Energy-Operator (TEO) based feature parameters. These systems work mostly on pair-wise distinction between stress and neutral speech. Their performance decreases substantially when tested in multi-style detection among many stress categories. In this paper, a novel system is proposed using linear short time Log Frequency Power Coefficients (LFPC) and TEO based nonlinear LFPC features in both time and frequency domain. Five-state Hidden Markov Model (HMM) with continuous Gaussian mixture distribution is used. The stress classification ability of the system is tested using data from the SUSAS (Speech Under Simulated and Actual Stress) database to categorize five stress conditions individually. It is found that the performance of linear acoustic features LFPC is better than that of nonlinear TEO based LFPC feature parameters. Results show that with linear acoustic feature LFPC, average accuracy of 84% and the best accuracy of 95% can be achieved in the classification of the five categories. Results of test of the system under different signal-to-noise conditions show that the performance of the system does not degrade drastically with increase in noise. It is also observed that classification using nonlinear frequency domain LFPC features gives relatively higher accuracy than that using nonlinear time domain LFPC features.
Vector Quantization of Speech Spectral Parameters Using Statistics of Static and Dynamic Features
Kazuhito KOISHIDA Keiichi TOKUDA Takashi MASUKO Takao KOBAYASHI

PAPER-Speech and Hearing

Vol:
E84-D No:10
Page(s):
1427-1434
This paper proposes a vector quantization scheme which makes it possible to consider the dynamics of input vectors. In the proposed scheme, a linear transformation is applied to the consecutive input vectors and the resulting vector is quantized with a distortion measure defined by the statistics. At the decoder side, the output vector sequence is determined using the statistics associated with the transmitted indices in such a way that a likelihood is maximized. To solve the maximization problem, a computationally efficient algorithm is derived. The performance of the proposed method is evaluated in LSP parameter quantization. It is found that the LSP trajectories and the corresponding spectra change quite smoothly in the proposed method. It is also shown that the use of the proposed method results in a significant improvement of subjective quality.
Scale Invariant Face Detection and Classification Method Using Shift Invariant Features Extracted from Log-Polar Image
Kazuhiro HOTTA Taketoshi MISHIMA Takio KURITA

PAPER

Vol:
E84-D No:7
Page(s):
867-878
This paper presents a scale invariant face detection and classification method which uses shift invariant features extracted from a Log-Polar image. Scale changes of a face in an image are represented as shift along the horizontal axis in the Log-Polar image. In order to obtain scale invariant features, shift invariant features are extracted from each row of the Log-Polar image. Autocorrelations, Fourier spectrum, and PARCOR coefficients are used as shift invariant features. These features are then combined with simple classification methods based on Linear Discriminant Analysis to realize scale invariant face detection and classification. The effectiveness of the proposed face detection method is confirmed by experiments using face images captured under different scales, backgrounds, illuminations, and dates. To evaluate the proposed face classification method, we performed experiments using 2,800 face images with 7 scales under 2 different backgrounds and face images of 52 persons.

61-80hit(84hit)

Keyword Search Result

[Keyword] features(84hit)

2D Feature Space for Snow Particle Classification into Snowflake and Graupel

Extraction of Combined Features from Global/Local Statistics of Visual Words Using Relevant Operations

Automatic Classification of Spatial Relationships among Mathematical Symbols Using Geometric Features

Distinctive Phonetic Feature (DPF) Extraction Based on MLNs and Inhibition/Enhancement Network

New Rotation-Invariant Texture Analysis Technique Using Radon Transform and Hidden Markov Models

Object Tracking by Maximizing Classification Score of Detector Based on Rectangle Features

Natural Object/Artifact Image Classification Based on Line Features

Structural Object Recognition Using Entropy Correspondence Measure of Line Features

Zero-Anaphora Resolution in Chinese Using Maximum Entropy

A Robust Object Tracking Method under Pose Variation and Partial Occlusion

Construction of Classifiers by Iterative Compositions of Features with Partial Knowledge

Performance Comparison between Equal-Average Equal-Variance Equal-Norm Nearest Neighbor Search (EEENNS) Method and Improved Equal-Average Equal-Variance Nearest Neighbor Search (IEENNS) Method for Fast Encoding of Vector Quantization

Cepstral Amplitude Range Normalization for Noise Robust Speech Recognition

Voice Activity Detection with Array Signal Processing in the Wavelet Domain

Signal Processing Representations of Speech

On Automatic Speech Recognition at the Dawn of the 21st Century

Noise and Channel Distortion Robust ASR System for DARPA SPINE2 Task

Stress Classification Using Subband Based Features

Vector Quantization of Speech Spectral Parameters Using Statistics of Static and Dynamic Features

Scale Invariant Face Detection and Classification Method Using Shift Invariant Features Extracted from Log-Polar Image

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles