The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] ATI(18690hit)

10861-10880hit(18690hit)

  • Comparative Study of Speaker Identification Methods: dPLRM, SVM and GMM

    Tomoko MATSUI  Kunio TANABE  

     
    PAPER-Speaker Recognition

      Vol:
    E89-D No:3
      Page(s):
    1066-1073

    A comparison of performances is made of three text-independent speaker identification methods based on dual Penalized Logistic Regression Machine (dPLRM), Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) with experiments by 10 male speakers. The methods are compared for the speech data which were collected over the period of 13 months in 6 utterance-sessions of which the earlier 3 sessions were for obtaining training data of 12 seconds' utterances. Comparisons are made with the Mel-frequency cepstrum (MFC) data versus the log-power spectrum data and also with training data in a single session versus in plural ones. It is shown that dPLRM with the log-power spectrum data is competitive with SVM and GMM methods with MFC data, when trained for the combined data collected in the earlier three sessions. dPLRM outperforms GMM method especially as the amount of training data becomes smaller. Some of these findings have been already reported in [1]-[3].

  • PS-ZCPA Based Feature Extraction with Auditory Masking, Modulation Enhancement and Noise Reduction for Robust ASR

    Muhammad GHULAM  Takashi FUKUDA  Kouichi KATSURADA  Junsei HORIKAWA  Tsuneo NITTA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    1015-1023

    A pitch-synchronous (PS) auditory feature extraction method based on ZCPA (Zero-Crossings Peak-Amplitudes) was proposed previously and showed more robustness over a conventional ZCPA and MFCC based features. In this paper, firstly, a non-linear adaptive threshold adjustment procedure is introduced into the PS-ZCPA method to get optimal results in noisy conditions with different signal-to-noise ratio (SNR). Next, auditory masking, a well-known auditory perception, and modulation enhancement that simulates a strong relationship between modulation spectrums and intelligibility of speech are embedded into the PS-ZCPA method. Finally, a Wiener filter based noise reduction procedure is integrated into the method to make it more noise-robust, and the performance is evaluated against ETSI ES202 (WI008), which is a standard front-end for distributed speech recognition. All the experiments were carried out on Aurora-2J database. The experimental results demonstrated improved performance of the PS-ZCPA method by embedding auditory masking into it, and a slightly improved performance by using modulation enhancement. The PS-ZCPA method with Wiener filter based noise reduction also showed better performance than ETSI ES202 (WI008).

  • Iterative Power Allocation Scheme for MIMO Systems

    Hui SHI  Tetsushi ABE  Hirohito SUDA  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E89-B No:3
      Page(s):
    791-800

    In closed-loop multiple-input and multiple-output space-division multiplexing (MIMO-SDM) systems, allocating power among multiple transmit data streams improves the channel capacity. However, the optimum power allocation values are not always available in closed-form. For instance, when we use transmission and reception schemes that do not transfer the MIMO channel into parallel orthogonal channels (e.g., eigen-mode SDM), the signal to interference plus noise ratio (SINR) of each data stream at the output of the receiver is not proportional to its corresponding transmit power. This feature makes it difficult to obtain the optimal closed-form power allocation value for each data stream. Thus, in this paper, we propose an iterative power allocation scheme for MIMO-SDM systems where the SINR is not proportional to the transmit power. Furthermore, we incorporate a transmit antenna selection scheme into the proposed power allocation scheme in order to attain further capacity enhancement. Computer simulation results are provided to show the effectiveness of the proposed power allocation schemes.

  • A Hybrid Fine-Tuned Multi-Objective Memetic Algorithm

    Xiuping GUO  Genke YANG  Zhiming WU  Zhonghua HUANG  

     
    PAPER-Numerical Analysis and Optimization

      Vol:
    E89-A No:3
      Page(s):
    790-797

    In this paper, we propose a hybrid fine-tuned multi-objective memetic algorithm hybridizing different solution fitness evaluation methods for global exploitation and exploration. To search across all regions in objective space, the algorithm uses a widely diversified set of weights at each generation, and employs a simulated annealing to optimize each utility function. For broader exploration, a grid-based technique is adopted to discover the missing nondominated regions on existing tradeoff surface, and a Pareto-based local perturbation is performed to reproduce incrementing solutions trying to fill up the discontinuous areas. Additional advanced feature is that the procedure is made dynamic and adaptive to the online optimization conditions based on a function of improvement ratio to obtain better stability and convergence of the algorithm. Effectiveness of our approach is shown by applying it to multi-objective 0/1 knapsack problem (MOKP).

  • Design of Equiripple Minimum Phase FIR Filters with Ripple Ratio Control

    Masahiro OKUDA  Masaaki IKEHARA  Shin-ichi TAKAHASHI  

     
    PAPER-Digital Signal Processing

      Vol:
    E89-A No:3
      Page(s):
    751-756

    In this paper, we present a numerical method for the equiripple approximation of minimum phase FIR digital filters. Many methods have been proposed for the design of such filters. Many of them first design a linear phase filter whose length is twice as long, and then factorize the filter to obtain the minimum phase. Although these methods theoretically guarantee its optimality, it is difficult to control the ratio of ripples between different bands. In the conventional lowpass filter design, for example, when different weights are given for its passband and stopband, one needs to iteratively design the filter by trial and error to achieve the ratio of the weights exactly. To address this problem, we modifies well-known Parks-McClellan algorithm and make it possible to directly control the ripple ratios. The method iteratively solves a set of linear equations with controlling the ratio of ripples. Using this method, the equiripple solutions are obtained quickly.

  • Effect of the Power Ramping under Retransmission in an ARQ for the WCDMA Downlink in One Path Rayleigh Fading Channel

    Seung-Hoon HWANG  

     
    LETTER-Terrestrial Radio Communications

      Vol:
    E89-B No:3
      Page(s):
    1024-1026

    In this paper, we propose and analyze a scheme for wireless channels, which is the combination of an automatic repeat request (ARQ) scheme and power ramping. The power ramping is used for more reliable downlink data transmission. This technique gradually increases the transmission power at each retransmission attempt. Simulation results demonstrate that when the power step size is 0.5 dB, the average throughput gain may be as high as 2.3% to 5.4% with properly selected parameters.

  • Design and Performance of an LDPC-Coded FH-OFDMA System in the Uplink Cellular Environments

    Yun Hee KIM  Kwang Soon KIM  Sang Hyun LEE  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E89-B No:3
      Page(s):
    828-836

    An LDPC-coded FH-OFDMA system is proposed for the uplink of a packet-based cellular system, where the frequency hopping (FH) is based on a resource block (RB) for coherent demodulation. For the system, different RB types are employed either for better intercell interference (ICI) averaging capability or for better channel estimation performance. For the receiver, practical iterative channel estimation and decoding methods are proposed to improve the channel estimation performance without boosting the pilot power and to mitigate the adverse effects of the ICI. Extensive simulation results are provided to show the effect of the RB size on the channel estimation and ICI averaging performance as well as possible application of the proposed receiver in harsh mobile environments with dynamic packet allocation.

  • Nonparametric Speaker Recognition Method Using Earth Mover's Distance

    Shingo KUROIWA  Yoshiyuki UMEDA  Satoru TSUGE  Fuji REN  

     
    PAPER-Speaker Recognition

      Vol:
    E89-D No:3
      Page(s):
    1074-1081

    In this paper, we propose a distributed speaker recognition method using a nonparametric speaker model and Earth Mover's Distance (EMD). In distributed speaker recognition, the quantized feature vectors are sent to a server. The Gaussian mixture model (GMM), the traditional method used for speaker recognition, is trained using the maximum likelihood approach. However, it is difficult to fit continuous density functions to quantized data. To overcome this problem, the proposed method represents each speaker model with a speaker-dependent VQ code histogram designed by registered feature vectors and directly calculates the distance between the histograms of speaker models and testing quantized feature vectors. To measure the distance between each speaker model and testing data, we use EMD which can calculate the distance between histograms with different bins. We conducted text-independent speaker identification experiments using the proposed method. Compared to results using the traditional GMM, the proposed method yielded relative error reductions of 32% for quantized data.

  • Teeth Image Recognition for Biometrics

    Tae-Woo KIM  Tae-Kyung CHO  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E89-D No:3
      Page(s):
    1309-1313

    This paper presents a personal identification method based on BMME and LDA for images acquired at anterior and posterior occlusion expression of teeth. The method consists of teeth region extraction, BMME, and pattern recognition for the images acquired at the anterior and posterior occlusion state of teeth. Two occlusions can provide consistent teeth appearance in images and BMME can reduce matching error in pattern recognition. Using teeth images can be beneficial in recognition because teeth, rigid objects, cannot be deformed at the moment of image acquisition. In the experiments, the algorithm was successful in teeth recognition for personal identification for 20 people, which encouraged our method to be able to contribute to multi-modal authentication systems.

  • Training Augmented Models Using SVMs

    Mark J.F. GALES  Martin I. LAYTON  

     
    INVITED PAPER

      Vol:
    E89-D No:3
      Page(s):
    892-899

    There has been significant interest in developing new forms of acoustic model, in particular models which allow additional dependencies to be represented than those contained within a standard hidden Markov model (HMM). This paper discusses one such class of models, augmented statistical models. Here, a local exponential approximation is made about some point on a base model. This allows additional dependencies within the data to be modelled than are represented in the base distribution. Augmented models based on Gaussian mixture models (GMMs) and HMMs are briefly described. These augmented models are then related to generative kernels, one approach used for allowing support vector machines (SVMs) to be applied to variable length data. The training of augmented statistical models within an SVM, generative kernel, framework is then discussed. This may be viewed as using maximum margin training to estimate statistical models. Augmented Gaussian mixture models are then evaluated using rescoring on a large vocabulary speech recognition task.

  • Separation of Mixed Audio Signals by Decomposing Hilbert Spectrum with Modified EMD

    Md. Khademul Islam MOLLA  Keikichi HIROSE  Nobuaki MINEMATSU  

     
    PAPER-Speech/Audio Processing

      Vol:
    E89-A No:3
      Page(s):
    727-734

    The Hilbert transformation together with empirical mode decomposition (EMD) produces Hilbert spectrum (HS) which is a fine-resolution time-frequency representation of any nonlinear and non-stationary signal. The EMD decomposes the mixture signal into some oscillatory components each one is called intrinsic mode function (IMF). Some modification of the conventional EMD is proposed here. The instantaneous frequency of every real valued IMF component is computed with Hilbert transformation. The HS is constructed by arranging the instantaneous frequency spectra of IMF components. The HS of the mixture signal is decomposed into subspaces corresponding to the component sources. The decomposition is performed by applying independent component analysis (ICA) and Kulback-Leibler divergence based K-means clustering on the selected number of bases derived from HS of the mixture. The time domain source signals are assembled by applying some post processing on the subspaces. We have produced experimental results using the proposed separation technique.

  • Minimum Bayes Risk Estimation and Decoding in Large Vocabulary Continuous Speech Recognition

    William BYRNE  

     
    INVITED PAPER

      Vol:
    E89-D No:3
      Page(s):
    900-907

    Minimum Bayes risk estimation and decoding strategies based on lattice segmentation techniques can be used to refine large vocabulary continuous speech recognition systems through the estimation of the parameters of the underlying hidden Markov models and through the identification of smaller recognition tasks which provides the opportunity to incorporate novel modeling and decoding procedures in LVCSR. These techniques are discussed in the context of going 'beyond HMMs', showing in particular that this process of subproblem identification makes it possible to train and apply small-domain binary pattern classifiers, such as Support Vector Machines, to large vocabulary continuous speech recognition.

  • Zero-Knowledge Hierarchical Authentication in MANETs

    Pino CABALLERO-GIL  Candelaria HERNANDEZ-GOYA  

     
    LETTER-Application Information Security

      Vol:
    E89-D No:3
      Page(s):
    1288-1289

    This work addresses the critical problem of authentication in mobile ad hoc networks. It includes a new approach based on the Zero-Knowledge cryptographic paradigm where two different security levels are defined. The first level is characterized by the use of an NP-complete graph problem to describe an Access Control Protocol, while the highest level corresponds to a Group Authentication Protocol based on a hard-on-average graph problem. The main goal of the proposal is to balance security strength and network performance. Therefore, both protocols are scalable and decentralized, and their requirements of communication, storage and computation are limited.

  • Grayscale Image Segmentation Using Color Space

    Takahiko HORIUCHI  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E89-D No:3
      Page(s):
    1231-1237

    A novel approach for segmentation of grayscale images, which are color scene originally, is proposed. Many algorithms have been elaborated for a grayscale image segmentation. All those approaches have been discussed in a luminance space, because it has been considered that grayscale images do not have any color information. However, a luminance value has color information as a set of corresponding colors. In this paper, an inverse mapping of luminance values is carried out to CIELAB color space, and the image segmentation for grayscale images is performed based on a distance in the color space. The proposed scheme is applied to a region growing segmentation and the performance is verified.

  • Multi-Species Particle Swarm Optimizer for Multimodal Function Optimization

    Masao IWAMATSU  

     
    PAPER-Artificial Intelligence and Cognitive Science

      Vol:
    E89-D No:3
      Page(s):
    1181-1187

    This paper introduces a modified particle swarm optimizer (PSO) called the Multi-Species Particle Swarm Optimizer (MSPSO) for locating all the global minima of multi-modal functions. MSPSO extend the original PSO by dividing the particle swarm spatially into a multiple cluster called a species in a multi-dimensional search space. Each species explores a different area of the search space and tries to find out the global or local optima of that area. We test our MSPSO for several multi-modal functions with multiple global optima. Our MSPSO can successfully locate all the global optima of all the test functions, and in particular, can locate all 18 global optima of the two-dimensional Shubert function. We also examined how the performance of MSPSO depends on various algorithm parameters.

  • Genetic Algorithm Based Optimization of Partly-Hidden Markov Model Structure Using Discriminative Criterion

    Tetsuji OGAWA  Tetsunori KOBAYASHI  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    939-945

    A discriminative modeling is applied to optimize the structure of a Partly-Hidden Markov Model (PHMM). PHMM was proposed in our previous work to deal with the complicated temporal changes of acoustic features. It can represent observation dependent behaviors in both observations and state transitions. In the formulation of the previous PHMM, we used a common structure for all models. However, it is expected that the optimal structure which gives the best performance differs from category to category. In this paper, we designed a new structure optimization method in which the dependence of the states and the observations of PHMM are optimally defined according to each model using the weighted likelihood-ratio maximization (WLRM) criterion. The WLRM criterion gives high discriminability between the correct category and the incorrect categories. Therefore it gives model structures with good discriminative performance. We define the model structure combination which satisfy the WLRM criterion for any possible structure combinations as the optimal structures. A genetic algorithm is also applied to the adequate approximation of a full search. With results of continuous lecture talk speech recognition, the effectiveness of the proposed structure optimization is shown: it reduced the word errors compared to HMM and PHMM with a common structure for all models.

  • A Non-stationary Noise Suppression Method Based on Particle Filtering and Polyak Averaging

    Masakiyo FUJIMOTO  Satoshi NAKAMURA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    922-930

    This paper addresses a speech recognition problem in non-stationary noise environments: the estimation of noise sequences. To solve this problem, we present a particle filter-based sequential noise estimation method for front-end processing of speech recognition in noise. In the proposed method, a noise sequence is estimated in three stages: a sequential importance sampling step, a residual resampling step, and finally a Markov chain Monte Carlo step with Metropolis-Hastings sampling. The estimated noise sequence is used in the MMSE-based clean speech estimation. We also introduce Polyak averaging and feedback into a state transition process for particle filtering. In the evaluation results, we observed that the proposed method improves speech recognition accuracy in the results of non-stationary noise environments a noise compensation method with stationary noise assumptions.

  • Pilot-Aided ICI Self-Cancellation Scheme for OFDM Systems

    Chih-Peng LI  Wei-Wen HU  

     
    LETTER-Transmission Systems and Transmission Equipment for Communications

      Vol:
    E89-B No:3
      Page(s):
    955-958

    In this letter, a novel pilot-aided inter-carrier interference (ICI) self-cancellation scheme is proposed for use in orthogonal frequency division multiplexing (OFDM) systems. The proposed scheme maps both modulated data symbols and pre-defined pilot symbols onto non-neighboring sub-carriers with weighting coefficients of +1 and -1. With the aid of pilot symbols, a more accurate estimation of frequency offsets can be obtained, and the ICI self-cancellation demodulation can be operated properly.

  • An Automatic Bi-Directional Bus Repeater Control Scheme Using Dynamic Collaborative Driving Techniques

    Masahiro NOMURA  Taku OHSAWA  Koichi TAKEDA  Yoetsu NAKAZAWA  Yoshinori HIROTA  Yasuhiko HAGIHARA  Naoki NISHI  

     
    PAPER-Interface and Interconnect Techniques

      Vol:
    E89-C No:3
      Page(s):
    334-341

    This paper describes a newly developed automatic direction control scheme for bi-directional bus repeaters that uses dynamic collaborative driving techniques. Repeater directions are rapidly determined by detecting the direction of control signal propagation through an additional control signal line that is driven by dynamic collaborative drivers. Application to an on-chip peripheral bus reduces control circuit transistor counts by about 75% and the number of control signal lines by about 50% without loss of speed. Experimental results for a 0.18-µm CMOS implementation indicate that the proposed scheme is four times faster than a conventional scheme with no bi-directional bus repeaters.

  • Low Dynamic Power and Low Leakage Power Techniques for CMOS Motion Estimation Circuits

    Nobuaki KOBAYASHI  Tomomi EI  Tadayoshi ENOMOTO  

     
    PAPER-Low Power Techniques

      Vol:
    E89-C No:3
      Page(s):
    271-279

    To drastically reduce the dynamic power (PAT) and the leakage power (PST) of the CMOS MPEG4/H.264 motion estimation (ME) circuits, several power reduction techniques were developed. They were circuit architectures, which were able to reduce the supply voltages (VDD) and numbers of logic gates of not only the whole circuit but the critical path, a fast motion estimation algorithm, and a leakage current reduction circuit. A 0.18-µm CMOS ME circuit has been fabricated by adopting those techniques. At a clock frequency of 160 MHz and VDD of 1.25 V, PAT decreased to 75.9 µW, which was 5.35% that of a conventional ME circuit. PST also decreased to 0.82 nW, which was 3.93% that of the conventional ME circuit.

10861-10880hit(18690hit)