The search functionality is under construction.

Keyword Search Result

[Keyword] acoustic(178hit)

121-140hit(178hit)

  • Nonlinear Wave Propagation for a Parametric Loudspeaker

    Jun YANG  Kan SHA  Woon-Seng GAN  Jing TIAN  

     
    PAPER

      Vol:
    E87-A No:9
      Page(s):
    2395-2400

    A directional audible sound can be generated by amplitude-modulated (AM) into ultrasound wave from a parametric array. To synthesize audio signals produced by the self-demodulation effect of the AM sound wave, a quasi-linear analytical solution, which describes the nonlinear wave propagation, is developed for fast numerical evaluation. The radiated sound field is expressed as the superposition of Gaussian Beams. Numerical results are presented for a rectangular parametric loudspeaker, which are in good agreement with the experimental data published previously.

  • Automatic Generation of Non-uniform HMM Topologies Based on the MDL Criterion

    Takatoshi JITSUHIRO  Tomoko MATSUI  Satoshi NAKAMURA  

     
    PAPER-Speech and Hearing

      Vol:
    E87-D No:8
      Page(s):
    2121-2129

    We propose a new method to introduce the Minimum Description Length (MDL) criterion to the automatic generation of non-uniform, context-dependent HMM topologies. Phonetic decision tree clustering is widely used, based on the Maximum Likelihood (ML) criterion, and only creates contextual variations. However, the ML criterion needs to predetermine control parameters, such as the total number of states, empirically for use as stop criteria. Information criteria have been applied to solve this problem for decision tree clustering. However, decision tree clustering cannot create topologies with various state lengths automatically. Therefore, we propose a method that applies the MDL criterion as split and stop criteria to the Successive State Splitting (SSS) algorithm as a means of generating contextual and temporal variations. This proposed method, the MDL-SSS algorithm, can automatically create adequate topologies without such predetermined parameters. Experimental results for travel arrangement dialogs and lecture speech show that the MDL-SSS can automatically stop splitting and obtain more appropriate HMM topologies than the original one.

  • Alternative Learning Algorithm for Stereophonic Acoustic Echo Canceller without Pre-Processing

    Akihiro HIRANO  Kenji NAKAYAMA  Daisuke SOMEDA  Masahiko TANAKA  

     
    PAPER-Speech/Acoustic Signal Processing

      Vol:
    E87-A No:8
      Page(s):
    1958-1964

    This paper proposes an alternative learning algorithm for a stereophonic acoustic echo canceller without pre-processing which can identify the correct echo-paths. By dividing the filter coefficients into the former/latter parts and updating them alternatively, conditions both for unique solution and for perfect echo cancellation are satisfied. The learning for each part is switched from one part to the other when that part converges. Convergence analysis clarifies the condition for correct echo-path identification. For fast and stable convergence, a convergence detection and an adaptive step-size are introduced. The modification amount of the filter coefficients determines the convergence state and the step-size. Computer simulations show 10 dB smaller filter coefficient error than those of the conventional algorithms without pre-processing.

  • A Robust Watermarking System Based on the Properties of Low Frequency in Perceptual Audio Coding

    Ching-Te WANG  Tung-Shou CHEN  Zhen-Ming XU  

     
    PAPER-Multimedia Environment Technology

      Vol:
    E87-A No:8
      Page(s):
    2152-2159

    In this paper, we will propose a robust watermarking system for digital audio sound to protect the copyright of publication and claim of ownership. The proposed watermarking scheme uses the frequency extent between 1 Hz and 20 Hz, which cannot be heard by the unaided human ear, to embed the watermark. Thus, the original audio quality will not be influenced by the watermark. Currently, the techniques of Perceptual Audio Coder contain MPEG-1, -2, -2.5, MPEG-2 AAC, MPEG-4 AAC and Window Media Audio. From experimental results, the proposed watermarking system can resist attacks of previous audio coders and low bit-rate compression. The watermark is extracted with 100% correction after previous encoder attacks. Furthermore, to authenticate the audio signal, the system can quickly extract the watermark without the knowledge of original audio signals.

  • Compensation of Speech Coding Distortion for Wireless Speech Recognition

    Hong Kook KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E87-D No:6
      Page(s):
    1596-1600

    In this paper, we perform some experiments to show that the quantization noise caused by low-bit-rate speech coding can be characterized as a white noise process. Then, the signal-to-quantization noise ratio of the decoded speech for a given bit-rate is estimated by observing the perceptual speech quality equivalent to the artificially generated noisy speech obtained by adding a white Gaussian noise source. This information is incorporated into the parameter tuning of a noise-robust compensation algorithm for speech recognition so that the compensation algorithm can be performed better under a range of the estimated SNRs. Finally, we apply the compensation algorithm to a connected digit string recognition system that utilizes speech signals decoded by the GSM adaptive multi-rate (AMR) speech coder. It is shown that the noise-robust compensation algorithm reduces word error rates by 15% or more at low bit-rate modes of the AMR speech coder.

  • Exploring Human Speech Production Mechanisms by MRI

    Kiyoshi HONDA  Hironori TAKEMOTO  Tatsuya KITAMURA  Satoru FUJITA  Sayoko TAKANO  

     
    INVITED PAPER

      Vol:
    E87-D No:5
      Page(s):
    1050-1058

    Recent investigations using magnetic resonance imaging (MRI) of human speech organs have opened up new avenues of research. Visualization of the speech production system provides abundant information on the physiological and acoustic realization of human speech. This article summarizes the current status of MRI applications with respect to speech research as well as our own experience of discovery and re-evaluation of acoustic events emanating from the vocal tract and physiological mechanisms.

  • A Study on Acoustic Modeling for Speech Recognition of Predominantly Monosyllabic Languages

    Ekkarit MANEENOI  Visarut AHKUPUTRA  Sudaporn LUKSANEEYANAWIN  Somchai JITAPUNKUL  

     
    PAPER

      Vol:
    E87-D No:5
      Page(s):
    1146-1163

    This paper presents a study on acoustic modeling for speech recognition of predominantly monosyllabic languages. Various speech units used in speech recognition systems have been investigated. To evaluate the effectiveness of these acoustic models, the Thai language is selected, since it is a predominantly monosyllabic language and has a complex vowel system. Several experiments have been carried out to find the proper speech unit that can accurately create acoustic model and give a higher recognition rate. Results of recognition rates under different acoustic models are given and compared. In addition, this paper proposes a new speech unit for speech recognition, namely onset-rhyme unit. Two models are proposed-the Phonotactic Onset-Rhyme Model (PORM) and the Contextual Onset-Rhyme Model (CORM). The models comprise a pair of onset and rhyme units, which makes up a syllable. An onset comprises an initial consonant and its transition towards the following vowel. Together with the onset, the rhyme consists of a steady vowel segment and a final consonant. Experimental results show that the onset-rhyme model improves on the efficiency of other speech units. The onset-rhyme model improves on the accuracy of the inter-syllable triphone model by nearly 9.3% and of the context-dependent Initial-Final model by nearly 4.7% for the speaker-dependent systems using only an acoustic model, and 5.6% and 4.5% for the speaker-dependent systems using both acoustic and language model respectively. The results show that the onset-rhyme models attain a high recognition rate. Moreover, they also give more efficiency in terms of system complexity.

  • Speaker Adaptation Method for Acoustic-to-Articulatory Inversion using an HMM-Based Speech Production Model

    Sadao HIROYA  Masaaki HONDA  

     
    PAPER

      Vol:
    E87-D No:5
      Page(s):
    1071-1078

    We present a speaker adaptation method that makes it possible to determine articulatory parameters from an unknown speaker's speech spectrum using an HMM (Hidden Markov Model)-based speech production model. The model consists of HMMs of articulatory parameters for each phoneme and an articulatory-to-acoustic mapping that transforms the articulatory parameters into a speech spectrum for each HMM state. The model is statistically constructed by using actual articulatory-acoustic data. In the adaptation method, geometrical differences in the vocal tract as well as the articulatory behavior in the reference model are statistically adjusted to an unknown speaker. First, the articulatory parameters are estimated from an unknown speaker's speech spectrum using the reference model. Secondly, the articulatory-to-acoustic mapping is adjusted by maximizing the output probability of the acoustic parameters for the estimated articulatory parameters of the unknown speaker. With the adaptation method, the RMS error between the estimated articulatory parameters and the observed ones is 1.65 mm. The improvement rate over the speaker independent model is 56.1 %.

  • Two Methodology-Trials Using Higher Order Correlation for Reverberation Measurement of Noisy Acoustic Room

    Kiminobu NISHIMURA  Mitsuo OHTA  

     
    PAPER-Audio/Speech Coding

      Vol:
    E87-A No:3
      Page(s):
    598-604

    In this paper, first, we consider how to illustrate the effect of background noise to the measurement of room acoustics under a background noise of arbitrary distribution type. Two kinds of estimation methods are proposed to evaluate a proper reverberation time of a room by observing real unrefined decay curves, which can not realize smoothly a sufficient decay of 60 dB in a low frequency region, especially under a contamination of background noise. In the first method, an observation equation is derived from a stochastic model by means of well-known Sabine's differential equation, which is approximately rewritten in a matched form of difference equation especially to preserve its original physical meaning and functional linearity on the reverberation parameter. The effect of background noise is eliminated by employing a generalized state estimation algorithm based on Bayes' theorem. In the second one, after reflecting the effect of background noise in an observation equation of measuring model, a well-known mutual information criterion is introduced to estimate a reverberation time especially based on the basic property of statistical independency between signal and background noise. Finally, the effectiveness of the proposed methods are experimentally confirmed too by applying it to the actual measurement of a reverberation time in the actual living situation of room contaminated by a background noise. The proposed methods are, however, some technique using actively the higher order correlation beyond a linear one, and so they are methodology-trials which should coexist with other techniques.

  • Evaluation of a Novel Signal Processing Strategy for Cochlear Implant Speech Processors

    Erdenebat DASHTSEREN  Shigeyoshi KITAZAWA  Satoshi IWASAKI  Shinya KIRIYAMA  

     
    PAPER-Medical Engineering

      Vol:
    E87-D No:2
      Page(s):
    463-471

    Our study focuses on an evaluation of a novel speech processing strategy for multi-channel cochlear implant speech processors. Stimulation pulse trains for the Nucleus 24CI speech processor were generated in a way different from the speech processing strategies implemented in this processor. The distinctive features of the novel strategy are: 1) electrode stimulation order driven by location of maximum instantaneous frequency amplitude; 2) variable stimulation rates on electrodes; 3) variable number of selected channels within a cycle of signal processing schema. Within-subject designed tests on Japanese initial, medial and final consonants in CV, VCV and CV/N context tokens were carried out with cochlear implant patients using the Cochlear ACETM strategy, and results were compared with those of normal hearing listeners. Results of the initial and medial consonant tests showed significantly better performance with the novel strategy than with the ACE strategy for both the cochlear implant and normal hearing listener groups. Results of the final consonant tests showed a slightly better performance with the ACE strategy for cochlear implant listeners while showing a slightly better performance with the novel strategy for normal hearing listeners.

  • A Variable Step-Size Adaptive Cross-Spectral Algorithm for Acoustic Echo Cancellation

    Xiaojian LU  Benoit CHAMPAGNE  

     
    PAPER-Digital Signal Processing

      Vol:
    E86-A No:11
      Page(s):
    2812-2821

    The adaptive cross-spectral (ACS) technique recently introduced by Okuno et al. provides an attractive solution to acoustic echo cancellation (AEC) as it does not require double-talk (DT) detection. In this paper, we first introduce a generalized ACS (GACS) technique where a step-size parameter is used to control the magnitude of the incremental correction applied to the coefficient vector of the adaptive filter. Based on the study of the effects of the step-size on the GACS convergence behaviour, a new variable step-size ACS (VSS-ACS) algorithm is proposed, where the value of the step-size is commanded dynamically by a special finite state machine. Furthermore, the proposed algorithm has a new adaptation scheme to improve the initial convergence rate when the network connection is created. Experimental results show that the new VSS-ACS algorithm outperforms the original ACS in terms of a higher acoustic echo attenuation during DT periods and faster convergence rate.

  • A Hybrid HMM/BN Acoustic Model for Automatic Speech Recognition

    Konstantin MARKOV  Satoshi NAKAMURA  

     
    PAPER-Speech and Speaker Recognition

      Vol:
    E86-D No:3
      Page(s):
    438-445

    In current HMM based speech recognition systems, it is difficult to supplement acoustic spectrum features with additional information such as pitch, gender, articulator positions, etc. On the other hand, Bayesian Networks (BN) allow for easy combination of different continuous as well as discrete features by exploring conditional dependencies between them. However, the lack of efficient algorithms has limited their application in continuous speech recognition. In this paper we propose new acoustic model, where HMM are used for modeling of temporal speech characteristics and state probability model is represented by BN. In our experimental system based on HMM/BN model, in addition to speech observation variable, state BN has two more (hidden) variables representing noise type and SNR value. Evaluation results on AURORA2 database showed 36.4% word error rate reduction for closed noise test which is comparable with other much more complex systems utilizing effective adaptation and noise robust methods.

  • On Automatic Speech Recognition at the Dawn of the 21st Century

    Chin-Hui LEE  

     
    INVITED SURVEY PAPER

      Vol:
    E86-D No:3
      Page(s):
    377-396

    In the last three decades of the 20th Century, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, to medium size vocabulary voice interactive command and control systems for business automation, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. Although we have witnessed many new technological promises, we have also encountered a number of practical limitations that hinder a widespread deployment of applications and services. On one hand, fast progress was observed in statistical speech and language modeling. On the other hand only spotty successes have been reported in applying knowledge sources in acoustics, speech and language science to improving speech recognition performance and robustness to adverse conditions. In this paper we review some key advances in several areas of speech recognition. A bottom-up detection framework is also proposed to facilitate worldwide research collaboration for incorporating technology advances in both statistical modeling and knowledge integration into going beyond the current speech recognition limitations and benefiting the society in the 21st century.

  • Effects of Nonuniform Acoustic Fields in Vessels and Blood Velocity Profiles on Doppler Power Spectrum and Mean Blood Velocity

    Dali ZHANG  Yoji HIRAO  Yohsuke KINOUCHI  Hisao YAMAGUCHI  Kazuo YOSHIZAKI  

     
    PAPER-Medical Engineering

      Vol:
    E85-D No:9
      Page(s):
    1443-1451

    This paper presents a detailed simulation method to estimate Doppler power spectrum and mean blood velocity using real CW Doppler transducers with twin-crystal arrangement. The method is based on dividing the sample volume into small cells and using the statistics of the Doppler power spectrum with the same Doppler shift frequency, which predicts the mean blood velocity. The acoustic fields of semicircular transducers across blood vessels were calculated and the effects of acoustical and physiological factors on Doppler power spectrum and mean blood velocity were analyzed. Results show that nonuniformity of the acoustic field of the ultrasonic beam in the blood vessel and blood velocity profiles significantly affect Doppler power spectrum and mean blood velocity. However, Doppler angle, vessel depth, and sample volume length are not sensitive functions. Comparisons between simulation and experimental results illustrated a good agreement for parabolic flow profile. These results will contribute to a better understanding of Doppler power spectrum and mean blood velocity in medical ultrasound diagnostics.

  • Novel Formulation for the Scalar-Field Approach of IE-MEI Method to Solve the Three-Dimensional Scattering Problem

    N. M. Alam CHOWDHURY  Jun-ichi TAKADA  Masanobu HIROSE  

     
    PAPER-Ultrasonics

      Vol:
    E85-A No:8
      Page(s):
    1905-1912

    A novel formulation for the Scalar-field approach of Integral Equation formulation of the Measured Equation of Invariance (SIE-MEI) is derived from the scalar reciprocity relation to solve the scalar Helmholtz equation. The basics of this formulation are similar to IE-MEI method for the electromagnetic (EM) problem. The surface integral equation is derived from reciprocity relation and on-surface MEI postulates are used. As a result it generates a sparse linear system with the same number of unknowns as of Boundary Element Method (BEM) and keeps the merits in minimum storage memory requirements and CPU time consumption for computing the final matrix. IE-MEI method has been proposed for two-dimensional (2D) electromagnetic problem, but three-dimensional (3D) problem is very difficult to be extend. This scalar-field approach of IE-MEI method is identical to electromagnetic in 2D, but easily extended to the 3D scalar-field scattering problem contrary to EM problem. The numerical results of sphere and cube are verified with some rigorous or numerical solutions, which give excellent agreement.

  • SS-CDMA Flexible Wireless Network: Implementation of Approximately Synchronized CDMA Modem for Uplink

    Suguru KAMEDA  Kouichi TAKAHASHI  Hiroyuki NAKASE  Kazuo TSUBOUCHI  

     
    PAPER-Spread Spectrum Technologies and Applications

      Vol:
    E85-A No:3
      Page(s):
    694-702

    We have proposed an intracell uplink of a spread-spectrum code-division multiple-access (SS-CDMA) flexible wireless network based on approximately synchronized (AS) CDMA. Since the AS-CDMA has no co-channel interference, complicated transmission power control (TPC) is not required. A modem of the AS-CDMA has been designed and implemented for the Japanese 2.4 GHz industrial, scientific and medical (ISM) band. Using the implemented modem, the degradation of Eb/N0 from the theoretical limit is 1.0 dB at a bit error rate (BER) of 10-3. Under 2-user environment, the degradation of carrier-to-noise ratio (CNR) is 0.5 dB at a BER of 10-3 when the desired-to-undesired signal ratio (DUR) is -20.3 dB. We have evaluated BER performances in cases of varying carrier frequency offset and median DUR with computer simulation. Under 8-user environment, at the carrier frequency offset of 0.3 ppm, the BER with the DUR of -16 dB is found to be 10-3. Using the AS-CDMA with a 4-step open-loop TPC technique, the design of intracell uplink is available.

  • A Survey on Automatic Speech Recognition

    Seiichi NAKAGAWA  

     
    INVITED SURVEY PAPER-Speech and Hearing

      Vol:
    E85-D No:3
      Page(s):
    465-486

    In this paper, we describe the recent trend in automatic speech recognition. First, we should point out that the current art of speech recognition by machines is admittedly inferior to the ability of human beings. In particular, we assert that the improvement of acoustic models is necessary. Second, we describe robust feature parameters for noisy environments, which are important in practical usage. Then, we indicate that much training data in the same environment as the recognition stage are useful from the viewpoints of information theory and pattern recognition. Third, we discuss acoustic models and language models which are central issues in speech recognition techniques. Then the principle and limitations of the hidden Markov model (HMM) and recent extended models are discussed. The role of language models is to eliminate improbable candidate words, that is, to reduce the search space. In other words, language models having smaller entropy are preferable. From this standpoint, we survey stochastic language models. Finally, we state some points which deserve attention when constructing speech recognition systems.

  • Tone Enhancement in Mandarin Speech for Listeners with Hearing Impairment

    Jian LU  Norihiro UEMI  Gang LI  Tohru IFUKUBE  

     
    PAPER-Speech and Hearing

      Vol:
    E84-D No:5
      Page(s):
    651-661

    In this paper, a digital processing method is described for modifying tone contrast that is defined as the greatest difference in frequencies between peaks and valleys of pitch curves in monosyllable utterances. Under quiet and noisy backgrounds, modified Mandarin tone words were presented to hearing-im- paired Chinese listeners with moderate to severe sensorineural hearing loss. The listeners were asked to identify four alternative monosyllable words which were distinguishable by tones 1, 2, 3 and 4 respectively. Employing this method, it was found that modified speech with enhanced tone contrast yielded moderate gains in the percentage of correct identification of the tones when compared to unmodified speech tones with only compression amplification. It was likewise found that reducing tone contrast generally reduced the degree of correct tone identification. These findings therefore offer support to the assertion that a hearing aid with tone modifications is indeed effective for hearing-impaired Chinese.

  • Sharp Directivity Function Based on Fourier Series Expansion and Its Directional System Realization with Small Number of Microphones

    Masataka NAKAMURA  Toshitaka YAMATO  Katsuhito KOUNO  Atsuyuki TAKASHIMA  

     
    PAPER

      Vol:
    E84-A No:4
      Page(s):
    975-983

    In order that speech recognition system may have a high recognition rate in a noisy environment, a wide-band sharp directional microphone system is required at the input for securing a high S/N ratio. The authors have already reported the realization of a wide-band uni-directional microphone system by three-microphone integration method. In this paper, we intend to describe the derivation of a sharp directivity function and the realization of its microphone system. First, setting the shape of the characteristic function to bring a sharp directional pattern and then expanding it into the Fourier series, we derive a new directivity function. Next, on the basis of this directivity function, we will present a sharp directional microphone system with only three non-directional microphones and the subsequent analog signal processing. And also, the directional pattern acquired by the proposed method and the effect of the dispersion in the sensitivity of the constituent microphones on the directivity are discussed in detail.

  • Active Noise Control System in a Duct with Partial Feedback Canceller

    Takuya AOKI  Tatsuya MORISHITA  Toshiyuki TANAKA  Masao TAKI  

     
    PAPER-Active Noise Control

      Vol:
    E84-A No:2
      Page(s):
    400-405

    The application of an active noise control system in a finite-length duct is studied. Previously proposed single-input-single-output systems are inappropriate in this case, because reflection at the terminals degrades the performance, and/or infinite-impulse-response filters are required for perfect noise cancellation. In this paper, we propose a single-input-single-output system applicable to finite-length ducts, which theoretically achieves perfect noise cancellation while using finite-impulse-response filters only. The tap lengths of the filters are as short as the delays between the reference sensor and the secondary source. A useful implementation of the proposed system is also discussed.

121-140hit(178hit)