The search functionality is under construction.

Author Search Result

[Author] Dae-hee YOUN(20hit)

1-20hit
  • Efficient FFT Algorithm for Psychoacoustic Model of the MPEG-4 AAC

    Jae-Seong LEE  Chang-Joon LEE  Young-Cheol PARK  Dae-Hee YOUN  

     
    LETTER-Speech and Hearing

      Vol:
    E92-D No:12
      Page(s):
    2535-2539

    This paper proposes an efficient FFT algorithm for the Psycho-Acoustic Model (PAM) of MPEG-4 AAC. The proposed algorithm synthesizes FFT coefficients using MDCT and MDST coefficients through circular convolution. The complexity of the MDCT and MDST coefficients is approximately half of the original FFT. We also design a new PAM based on the proposed FFT algorithm, which has 15% lower computational complexity than the original PAM without degradation of sound quality. Subjective as well as objective test results are presented to confirm the efficiency of the proposed FFT computation algorithm and the PAM.

  • Approximated Virtual Source Imaging System for a Pair of Closely Spaced Loudspeakers

    Jae-woong JEONG  Young-cheol PARK  Dae-hee YOUN  

     
    LETTER-Speech and Hearing

      Vol:
    E97-D No:9
      Page(s):
    2526-2529

    This paper presents an approximated virtual source imaging system based on crosstalk cancellation with a pair of closely spaced loudspeakers. Utilizing the frequency-dependent relative importance of sound localization cues, the proposed system provides separate approximations for the low- and high-frequency bands. Experimental results show that the system provides good approximations within ±55° in the stereo dipole setup with natural sound quality.

  • On Improving the Performance of a Speech Model-Based Blind Reverberation Time Estimation in Noisy Environments

    Tung-chin LEE  Young-cheol PARK  Dae-hee YOUN  

     
    LETTER-Measurement Technology

      Vol:
    E97-A No:12
      Page(s):
    2688-2692

    This paper proposes a method of improving the performance of blind reverberation time (RT) estimation in noisy environments. RT estimation is conducted using a maximum likelihood (ML) method based on the autocorrelation function of the linear predictive residual signal. To reduce the effect of environmental noise, a noise reduction technique is applied to the noisy speech signal. In addition, a frequency coefficient selection is performed to eliminate signal components with low signal-to-noise ratio (SNR). Experimental results confirm that the proposed method improves the accuracy of RT measures, particularly when the speech signal is corrupted by a colored noise with a narrow bandwidth.

  • LP/WLP Hybrid Scheme for Quality Improvement of TCX Coders Operating at Low Bit Rates

    Tung-chin LEE  Young-cheol PARK  Dae-hee YOUN  

     
    LETTER-Speech and Hearing

      Vol:
    E95-D No:7
      Page(s):
    2017-2020

    In this paper, we propose a switchable linear prediction (LP)/warped linear prediction (WLP) hybrid scheme for the transform coded excitation (TCX) coder, which is adopted as a core codec in AMR-WB+ and USAC. The proposed algorithm selects either an LP or WLP filter on a per-frame basis. To provide a smooth transitions between LP and WLP frames, a window switching scheme is developed using sine and rectangular windows. In addition, a Gaussian Mixture Model (GMM)-based classification module is used to determine the prediction mode. Through a subjective listening test it was confirmed that the proposed LP/WLP switching scheme offers improved sound quality.

  • Adaptive Microphone Array System with Two-Stage Adaptation Mode Controller

    Yang-Won JUNG  Hong-Goo KANG  Chungyong LEE  Dae-Hee YOUN  Changkyu CHOI  Jaywoo KIM  

     
    PAPER-Digital Signal Processing

      Vol:
    E88-A No:4
      Page(s):
    972-977

    In this paper, an adaptive microphone array system with a two-stage adaptation mode controller (AMC) is proposed for high-quality speech acquisition in real environments. The proposed system includes an adaptive array algorithm, a time-delay estimator and a newly proposed AMC. To ensure proper adaptation of the adaptive array algorithm, the proposed AMC uses not only temporal information, but also spatial information. The proposed AMC is constructed with two processing stages: an initialization stage and a running stage. In the initialization stage, a sound source localization technique is adopted, and a signal correlation characteristic is used in the running stage. For the adaptive array algorithm, a generalized sidelobe canceller with an adaptive blocking matrix is used. The proposed algorithm is implemented as a real-time man-machine interface module of a home-agent robot. Simulation results show 13 dB SINR improvement with the speaker sitting 2 m distance from the home-agent robot. The speech recognition rate is also enhanced by 32% when compared to the single channel acquisition system.

  • An Efficient Active Noise Control Algorithm Based on the Lattice-Transversal Joint (LTJ) Filter Structure

    Jeong-Hyeon YUN  Young-Cheol PARK  Dae-Hee YOUN  Il-Whan CHA  

     
    LETTER-Digital Signal Processing

      Vol:
    E81-A No:8
      Page(s):
    1755-1757

    An efficient active noise control algorithm based on the lattice-transversal joint (LTJ) filter structure is presented, and applied to the active control of broadband noise in a 3-dimensional enclosure. The presented algorithm implements the filtered-x LMS within the LTJ structure obtained by cascading the lattice and transversal structures. Simulation results show that the LTJ-based noise control algorithm has fast convergence speed that is comparable to the lattice-based algorithm while its computational complexity is less demanding.

  • Design of Time-Varying Reverberators for Low Memory Applications

    Tacksung CHOI  Young-Cheol PARK  Dae-Hee YOUN  

     
    LETTER-Music Information Processing

      Vol:
    E91-D No:2
      Page(s):
    379-382

    Development of an artificial reverberator for low-memory requirements is an issue of importance in applications such as mobile multimedia devices. One possibility is to use an All-Pass Filter (APF), which is embedded in the feedback loop of the comb filter network. In this paper, we propose a reverberator employing time-varying APFs to increase the reverberation performance. By changing the gain of the APF, we can increase the number of frequency peaks perceptually. Thus, the resulting reverberation sounds much more natural, even with less memory, than the conventional approach. In this paper, we perform theoretical and perceptual analyses of artificial reverberators employing time-varying APF. Through the analyses, we derive the degree of phase variation of the APF that is perceptually acceptable. Based on the analyses, we propose a method of designing artificial reverberators associated with the time-varying APFs. Through subjective tests, it is shown that the proposed method is capable of providing perceptually comparable sound quality to the conventional methods even though it uses less memory.

  • A New Method for Calibration of NLOS Error in Positioning Systems

    Yangseok JEONG  Heungryeol YOU  Dae-Hee YOUN  Chungyong LEE  

     
    LETTER-Sensing

      Vol:
    E85-B No:5
      Page(s):
    1056-1058

    In positioning systems NLOS (Non-Line of Sight) errors always cause remarkable positive bias and directly increase range measurement errors. In this paper, a new method is proposed to calibrate NLOS errors in positioning systems by using the relationship between mean excess delay and delay spread measured at a mobile station. The computer simulations showed that the proposed calibration technique effectively reduces the positioning error caused by urban NLOS environment.

  • Efficient Windowing Scheme for MDCT-Based TCX in AMR-WB+

    Jae-seong LEE  Young-cheol PARK  Dae-hee YOUN  Kyung-ok KANG  

     
    LETTER-Speech and Hearing

      Vol:
    E94-D No:6
      Page(s):
    1341-1344

    Although the AMR-WB+ coder provides excellent quality for speech signal, its coding model for music signals is not as optimal as the HE-AAC v2. The main causes of the poor quality of the AMR-WB+ TCX are the non-critical sampling and block artifacts. The new TCX windowing scheme proposed in this paper uses an MDCT with a 50% frame overlap, so that the problems of non-critical sampling and blocking artifacts are significantly mitigated. Due to long overlaps, the proposed scheme involves an additional codec delay. It is, however, moderate for audio services. The results of objective and subjective tests indicate that the proposed scheme achieves noticeable quality improvements for music signals over the previous TCX schemes.

  • Improving the Performance of the Minimum Statistics Noise Estimator for Single Channel Speech Enhancement

    Seung-Kyun RYU  Hong-Goo KANG  Sung-Kyo JUNG  Dae-Hee YOUN  

     
    LETTER-Speech and Hearing

      Vol:
    E88-A No:2
      Page(s):
    582-585

    This paper proposes an algorithm to improve the performance of the noise power spectrum estimation using the minimum statistics (MS). The minimum statistics noise estimator (MSNE) that is most efficient for speech enhancement often underestimates noise power when the signal characteristics changes abruptly. The proposed algorithm improves the accuracy of noise estimation by removing harmonic components of the speech signal. Simulation results verify that the performance of the proposed algorithm is better than that of the conventional algorithm in terms of the segmental SNR (SegSNR) and the spectral distance (SD).

  • Bark Coherence Function for Speech Quality Evaluation over CDMA System

    Sang-Wook PARK  Seung-Kyun RYU  Dae-Hee YOUN  

     
    LETTER-Speech and Hearing

      Vol:
    E85-D No:1
      Page(s):
    283-285

    A new objective speech quality measure, Bark Coherence Function is presented. The Coherence Function was used for evaluating the non-linear distortion of low-to-medium rate speech coders. However, it is not well suited for quality estimation in modern speech transmission, especially, CDMA mobile communication system. In the proposed method, Coherence Function is newly defined in psycho-acoustic domain as the cognition module of perceptual speech quality measure and evaluates the perceptual non-linear distortion of mobile system. The experimental results showed that the proposed method has good performance over CDMA PCS and digital cellular system.

  • Duration Modeling Using Cumulative Duration Probability

    Tae-Young YANG  Chungyong LEE  Dae-Hee YOUN  

     
    LETTER-Speech and Hearing

      Vol:
    E85-D No:9
      Page(s):
    1452-1454

    A duration modeling technique is proposed for the HMM based connected digit recognizer. The proposed duration modeling technique uses a cumulative duration probability. The cumulative duration probability is defined as the partial sum of the duration probabilities which can be estimated from the training speech data. Two approaches of using it are presented. First, the cumulative duration probability is used as a weighting factor to the state transition probability of HMM. Second, it replaces the conventional state transition probability. In both approaches, the cumulative duration probability is combined directly to the Viterbi decoding procedure. A modified Viterbi decoding procedure is also presented. One of the advantages of the proposed duration modeling technique is that the cumulative duration probability rules the transitions of states and words at each frame. Therefore, an additional post-procedure is not required. The proposed technique was examined by recognition experiments on Korean connected digit. Experimental results showed that two approach achieved almost same performances and that the average recognition accuracy was enhanced from 83.60% to 93.12%.

  • Voice Conversion Using Low Dimensional Vector Mapping

    Ki-Seung LEE  Won DOH  Dae-Hee YOUN  

     
    PAPER-Speech and Hearing

      Vol:
    E85-D No:8
      Page(s):
    1297-1305

    In this paper, a new voice personality transformation algorithm which uses the vocal tract characteristics and pitch period as feature parameters is proposed. The vocal tract transfer function is divided into time-invariant and time-varying parts. Conversion rules for the time-varying part are constructed by the classified-linear transformation matrix based on soft-clustering techniques for LPC cepstrum expressed in KL (Karhunen-Loève) coefficients. An excitation signal containing prosodic information is transformed by average pitch ratio. In order to improve the naturalness, transformation on the excitation signal is separately applied to voiced and unvoiced bands to preserve the overall spectral structure. Objective tests show that the distance between the LPC cepstrum of a target speaker and that of the speech synthesized using the proposed method is reduced by about 70% compared with the distance between the target speaker's LPC cepstrum and the source speaker's. Also, subjective listening tests show that 60-70% of listeners identify the transformed speech as the target speaker's.

  • Low Complexity Reverselink Beamforming Based on Simplex Downhill Optimization Method for CDMA Systems

    Joonsung LEE  Changheon OH  Chungyong LEE  Dae-Hee YOUN  

     
    LETTER-Antenna and Propagation

      Vol:
    E86-B No:8
      Page(s):
    2541-2544

    A new beamforming method based on simplex downhill optimaization process has been presented for the reverse link CDMA systems. The proposed system performs code-filtering at each antenna for each user. The new beamforming method gives lower computations and faster convergence properties than existing algorithms. The simulation results show that the proposed algorithm has a better BER performance in the case of the time-varing channel.

  • Performance Comparison of Single and Multi-Stage Algebraic Codebooks

    Sung-Kyo JUNG  Hong-Goo KANG  Dae-Hee YOUN  

     
    LETTER-Speech and Hearing

      Vol:
    E86-A No:12
      Page(s):
    3288-3290

    This letter presents the advantages of a cascaded algebraic codebook structure at relatively high bit-rates. The cascaded structure that consists of two stages provides flexible pulse combinations due to an additional gain term in the second stage. The perceptual quality of the cascaded structure can be further improved by using a gain re-estimation scheme. Experiments confirm that the cascaded structure has a big advantage in terms of quality and complexity as the bit-rate becomes higher.

  • A Grammatical Structure of the FSN for the Recognition of Korean Price Sentences

    Jeong-Pyo HAM  Tae-Young YANG  Chungyong LEE  Dae-Hee YOUN  

     
    LETTER-Speech and Hearing

      Vol:
    E84-D No:11
      Page(s):
    1577-1579

    In this letter, we propose a grammatical structure of the finite state network (FSN) for the recognition of Korean price sentences. It is implemented by arranging the nodes and the arcs of the FSN. Two kinds of grammatical structure are presented. Both are designed according to the grammar constraints of Korean price sentences. The grammar constraints of Korean price sentences are similar to those of English price sentences; the unit is placed after the digit; several digits form a basic group; the basic group appears recursively followed by meta-units, etc. Speaker-independent recognition experiments were conducted, and the results of the FSN's with proposed grammatical structures were compared with those of the FSN without grammatical structure.

  • A GMM-Based Feature Selection Algorithm for Multi-Class Classification

    Tacksung CHOI  Sunkuk MOON  Young-cheol PARK  Dae-hee YOUN  Seokpil LEE  

     
    LETTER-Pattern Recognition

      Vol:
    E92-D No:8
      Page(s):
    1584-1587

    In this paper, we propose a new feature selection algorithm for multi-class classification. The proposed algorithm is based on Gaussian mixture models (GMMs) of the features, and it uses the distance between the two least separable classes as a metric for feature selection. The proposed system was tested with a support vector machine (SVM) for multi-class classification of music. Results show that the proposed feature selection scheme is superior to conventional schemes.

  • Speaker Adaptation Based on a Maximum Observation Probability Criterion

    Tae-Young YANG  Chungyong LEE  Dae-Hee YOUN  

     
    LETTER-Speech and Hearing

      Vol:
    E84-D No:2
      Page(s):
    286-288

    A speaker adaptation technique that maximizes the observation probability of an input speech is proposed. It is applied to semi-continuous hidden Markov model (SCHMM) speech recognizers. The proposed algorithm adapts the mean µ and the covariance Σ iteratively by the gradient search technique so that the features of the adaptation speech data could achieve maximum observation probabilities. The mixture coefficients and the state transition probabilities are adapted by the model interpolation scheme. The main advantage of this scheme is that the means and the variances, which are common to all states in SCHMM, are adapted independently from the other parameters of SCHMM. It allows fast and precise adaptation especially when there is a large acoustic mismatch between the reference model and a new speaker. Also, it is possible that this scheme could be adopted to other areas which use codebook. The proposed adaptation algorithm was evaluated by a male speaker-dependent, a female speaker-dependent, and a speaker-independent recognizers. The experimental results on the isolated word recognition showed that the proposed adaptation algorithm achieved 46.03% average enhancement in the male speaker-dependent recognizer, 52.18% in the female speaker-dependent recognizer, and 9.84% in the speaker-independent recognizer.

  • A Very Low Complexity VSELP Speech Coder Using Regular Pulse Basis Vectors

    Yong-Soo CHOI  Hong-Goo KANG  Jae-Ha YOO  Il-Whan CHA  Dae-Hee YOUN  

     
    PAPER

      Vol:
    E80-A No:6
      Page(s):
    996-1001

    This paper describes a new Vector Sum Excited Linear Prediction (VSELP) coder with very low complexity. The method, called regular pulse VSELP (RP-VSELP), is based on regular pulse basis vectors with mutually orthonormal property. In this Approach, a very efficient vector-sum codebook is constructed from a set of mutually orthonormal regular pulse bassis vectors and enables us to simplify the codebook search without additional degradation of synthesized speech compared with that of the conventional VSELP. The regular pulse basis vectors are explicitly orthonormalized by means of the Gram-Schmidt procedure. To enhance the speech quality of the RP-VSELP speech coder, perceptually weighted distortion measure between the input and the synthesized speech is utilized in an iterative closedloop training process of the regular pulse basis vectors. It is shown that speech quality is improved by the training process. Experimental results demonstrate that the proposed method produces the synthesized speech quality comparable to that of the VSELP scheme at the bit-rate of 4.8 Kbps.

  • A Robust Room Inverse Filtering Algorithm for Speech Dereverberation Based on a Kurtosis Maximization

    Jae-woong JEONG  Young-cheol PARK  Dae-hee YOUN  Seok-Pil LEE  

     
    LETTER-Speech and Hearing

      Vol:
    E93-D No:5
      Page(s):
    1309-1312

    In this paper, we propose a robust room inverse filtering algorithm for speech dereverberation based on a kurtosis maximization. The proposed algorithm utilizes a new normalized kurtosis function that nonlinearly maps the input kurtosis onto a finite range from zero to one, which results in a kurtosis warping. Due to the kurtosis warping, the proposed algorithm provides more stable convergence and, in turn, better performance than the conventional algorithm. Experimental results are presented to confirm the robustness of the proposed algorithm.