The search functionality is under construction.

Author Search Result

[Author] Hong-Goo KANG(7hit)

1-7hit
  • Refinement of Landmark Detection and Extraction of Articulator-Free Features for Knowledge-Based Speech Recognition

    Jung-In LEE  Jeung-Yoon CHOI  Hong-Goo KANG  

     
    LETTER-Speech and Hearing

      Vol:
    E96-D No:3
      Page(s):
    746-749

    Refinement methods for landmark detection and extraction of articulator-free features for a knowledge-based speech recognition system are described. Sub-band energy difference profiles are used to detect landmarks, with additional parameters used to improve accuracy. For articulator-free feature extraction, duration, relative energy, and silence detection are additionally used to find [continuant] and [strident] features. Vowel, obstruent and sonorant consonant landmarks, and locations of voicing onsets and offsets are detected within a unified framework with 85% accuracy overall. Additionally, 75% and 79% of [continuant] and [strident] features, respectively, are detected from landmarks.

  • Knowledge-Based Manner Class Segmentation Based on the Acoustic Event and Landmark Detection Algorithm

    Jung-In LEE  Jeung-Yoon CHOI  Hong-Goo KANG  

     
    LETTER-Speech and Hearing

      Vol:
    E97-D No:6
      Page(s):
    1682-1685

    There have been steady demands for a speech segmentation method to handle various speech applications. Conventional segmentation algorithms show reliable performance but they require a sufficient training database. This letter proposes a manner class segmentation method based on the acoustic event and landmark detection used in the knowledge-based speech recognition system. Measurements of sub-band abruptness and additional parameters are used to detect the acoustic events. Candidates of manner classes are segmented from the acoustic events and determined based on the knowledge of acoustic phonetics and acoustic parameters. Manners of vowel/glide, nasal, fricative, stop burst, stop closure, and silence are segmented in this system. In total, 71% of manner classes are correctly segmented with 20-ms error boundaries.

  • Adaptive Microphone Array System with Two-Stage Adaptation Mode Controller

    Yang-Won JUNG  Hong-Goo KANG  Chungyong LEE  Dae-Hee YOUN  Changkyu CHOI  Jaywoo KIM  

     
    PAPER-Digital Signal Processing

      Vol:
    E88-A No:4
      Page(s):
    972-977

    In this paper, an adaptive microphone array system with a two-stage adaptation mode controller (AMC) is proposed for high-quality speech acquisition in real environments. The proposed system includes an adaptive array algorithm, a time-delay estimator and a newly proposed AMC. To ensure proper adaptation of the adaptive array algorithm, the proposed AMC uses not only temporal information, but also spatial information. The proposed AMC is constructed with two processing stages: an initialization stage and a running stage. In the initialization stage, a sound source localization technique is adopted, and a signal correlation characteristic is used in the running stage. For the adaptive array algorithm, a generalized sidelobe canceller with an adaptive blocking matrix is used. The proposed algorithm is implemented as a real-time man-machine interface module of a home-agent robot. Simulation results show 13 dB SINR improvement with the speaker sitting 2 m distance from the home-agent robot. The speech recognition rate is also enhanced by 32% when compared to the single channel acquisition system.

  • Improving the Performance of the Minimum Statistics Noise Estimator for Single Channel Speech Enhancement

    Seung-Kyun RYU  Hong-Goo KANG  Sung-Kyo JUNG  Dae-Hee YOUN  

     
    LETTER-Speech and Hearing

      Vol:
    E88-A No:2
      Page(s):
    582-585

    This paper proposes an algorithm to improve the performance of the noise power spectrum estimation using the minimum statistics (MS). The minimum statistics noise estimator (MSNE) that is most efficient for speech enhancement often underestimates noise power when the signal characteristics changes abruptly. The proposed algorithm improves the accuracy of noise estimation by removing harmonic components of the speech signal. Simulation results verify that the performance of the proposed algorithm is better than that of the conventional algorithm in terms of the segmental SNR (SegSNR) and the spectral distance (SD).

  • Performance Comparison of Single and Multi-Stage Algebraic Codebooks

    Sung-Kyo JUNG  Hong-Goo KANG  Dae-Hee YOUN  

     
    LETTER-Speech and Hearing

      Vol:
    E86-A No:12
      Page(s):
    3288-3290

    This letter presents the advantages of a cascaded algebraic codebook structure at relatively high bit-rates. The cascaded structure that consists of two stages provides flexible pulse combinations due to an additional gain term in the second stage. The perceptual quality of the cascaded structure can be further improved by using a gain re-estimation scheme. Experiments confirm that the cascaded structure has a big advantage in terms of quality and complexity as the bit-rate becomes higher.

  • A Very Low Complexity VSELP Speech Coder Using Regular Pulse Basis Vectors

    Yong-Soo CHOI  Hong-Goo KANG  Jae-Ha YOO  Il-Whan CHA  Dae-Hee YOUN  

     
    PAPER

      Vol:
    E80-A No:6
      Page(s):
    996-1001

    This paper describes a new Vector Sum Excited Linear Prediction (VSELP) coder with very low complexity. The method, called regular pulse VSELP (RP-VSELP), is based on regular pulse basis vectors with mutually orthonormal property. In this Approach, a very efficient vector-sum codebook is constructed from a set of mutually orthonormal regular pulse bassis vectors and enables us to simplify the codebook search without additional degradation of synthesized speech compared with that of the conventional VSELP. The regular pulse basis vectors are explicitly orthonormalized by means of the Gram-Schmidt procedure. To enhance the speech quality of the RP-VSELP speech coder, perceptually weighted distortion measure between the input and the synthesized speech is utilized in an iterative closedloop training process of the regular pulse basis vectors. It is shown that speech quality is improved by the training process. Experimental results demonstrate that the proposed method produces the synthesized speech quality comparable to that of the VSELP scheme at the bit-rate of 4.8 Kbps.

  • On the Importance of Transition Regions for Automatic Speaker Recognition

    Bong-Jin LEE  Chi-Sang JUNG  Jeung-Yoon CHOI  Hong-Goo KANG  

     
    LETTER-Speech and Hearing

      Vol:
    E93-D No:1
      Page(s):
    197-200

    This letter describes the importance of transition regions, e.g. at phoneme boundaries, for automatic speaker recognition compared with using steady-state regions. Experimental results of automatic speaker identification tasks confirm that transition regions include the most speaker distinctive features. A possible reason for obtaining such results is described in view of articulation, in particular, the degree of freedom of articulators. These results are expected to provide useful information in designing an efficient automatic speaker recognition system.