The search functionality is under construction.

Author Search Result

[Author] Hochong PARK(8hit)

1-8hit
  • Predominant Melody Extraction from Polyphonic Music Signals Based on Harmonic Structure

    Jea-Yul YOON  Chai-Jong SONG  Hochong PARK  

     
    LETTER-Music Information Processing

      Vol:
    E96-D No:11
      Page(s):
    2504-2507

    A new method for predominant melody extraction from polyphonic music signals based on harmonic structure is proposed. The proposed method first extracts a set of fundamental frequency candidates by analyzing the distance between spectral peaks. Then, the predominant fundamental frequency is selected by pitch tracking according to the harmonic strength of the selected candidates. Finally, the method runs pitch smoothing on a large temporal scale for eliminating pitch doubling error, and conducts voicing frame detection. The proposed method shows the best overall performance for ADC 2004 DB in the MIREX 2011 audio melody extraction task.

  • Performance Enhancement of Cross-Talk Canceller for Four-Speaker System by Selective Speaker Operation

    Su-Jin CHOI  Jeong-Yong BOO  Ki-Jun KIM  Hochong PARK  

     
    LETTER-Speech and Hearing

      Pubricized:
    2015/08/25
      Vol:
    E98-D No:12
      Page(s):
    2341-2344

    We propose a method of enhancing the performance of a cross-talk canceller for a four-speaker system with respect to sweet spot size and ringing effect. For the large sweet spot of a cross-talk canceller, the speaker layout needs to be symmetrical to the listener's position. In addition, a ringing effect of the cross-talk canceller is reduced when many speakers are located close to each other. Based on these properties, the proposed method first selects the two speakers in a four-speaker system that are most symmetrical to the target listener's position and then adds the remaining speakers between these two to the final selection. By operating only these selected speakers, the proposed method enlarges the sweet spot size and reduces the ringing effect. We conducted objective and subjective evaluations and verified that the proposed method improves the performance of the cross-talk canceller compared to the conventional method.

  • Speech Quality Enhancement for In-Ear Microphone Based on Neural Network

    Hochong PARK  Yong-Shik SHIN  Seong-Hyeon SHIN  

     
    LETTER-Speech and Hearing

      Pubricized:
    2019/05/15
      Vol:
    E102-D No:8
      Page(s):
    1594-1597

    Speech captured by an in-ear microphone placed inside an occluded ear has a high signal-to-noise ratio; however, it has different sound characteristics compared to normal speech captured through air conduction. In this study, a method for blind speech quality enhancement is proposed that can convert speech captured by an in-ear microphone to one that resembles normal speech. The proposed method estimates an input-dependent enhancement function by using a neural network in the feature domain and enhances the captured speech via time-domain filtering. Subjective and objective evaluations confirm that the speech enhanced using our proposed method sounds more similar to normal speech than that enhanced using conventional equalizer-based methods.

  • Delay-Reduced MDCT for Scalable Speech Codec with Cascaded Transforms

    Hochong PARK  Ho-Sang SUNG  

     
    LETTER-Speech and Hearing

      Vol:
    E93-D No:2
      Page(s):
    388-391

    A scalable speech codec consisting of a harmonic codec as the core layer and MDCT-based transform codec as the enhancement layer is often required to provide both very low-rate core communication and fine granular scalability. This structure, however, has a serious drawback for practical use because a time delay caused by transform in each layer is accumulated, resulting in a long overall codec delay. In this letter, a new MDCT structure is proposed to reduce the overall codec delay by eliminating the accumulation of time delay by each transform. In the proposed structure, the time delay is first reduced by forcing two transforms to share a common look-ahead. The error components of MDCT caused by the look-ahead sharing are then analyzed and compensated in the decoder, resulting in perfect reconstruction. The proposed structure reduces the codec delay by the frame size, with an equivalent coding efficiency.

  • Multi-Task Learning for Improved Recognition of Multiple Types of Acoustic Information

    Jae-Won KIM  Hochong PARK  

     
    LETTER-Speech and Hearing

      Pubricized:
    2021/07/14
      Vol:
    E104-D No:10
      Page(s):
    1762-1765

    We propose a new method for improving the recognition performance of phonemes, speech emotions, and music genres using multi-task learning. When tasks are closely related, multi-task learning can improve the performance of each task by learning common feature representation for all the tasks. However, the recognition tasks considered in this study demand different input signals of speech and music at different time scales, resulting in input features with different characteristics. In addition, a training dataset with multiple labels for all information sources is not available. Considering these issues, we conduct multi-task learning in a sequential training process using input features with a single label for one information source. A comparative evaluation confirms that the proposed method for multi-task learning provides higher performance for all recognition tasks than individual learning for each task as in conventional methods.

  • Emulation Circuit for Superconducting Quantum Interference Device (SQUID) Sensor in Magnetocardiography System

    Chang-Beom AHN  Dong-Hoon LEE  Hochong PARK  Seoung-Jun OH  

     
    LETTER

      Vol:
    E89-A No:6
      Page(s):
    1688-1689

    The superconducting quantum interference device (SQUID) is a transducer that converts magnetic flux into voltage. Its range of linear conversion, however, is very restricted. To overcome its narrow dynamic range, a flux-locked loop (FLL) is used to feedback the output field to cancel the input field. This prevents the operating point of the SQUID from moving far away from the null point. In this paper, an emulator for the SQUID sensor and the feedback coil has been proposed. Magnetic coupling between the original field and the generated field by the feedback coil was emulated by electronic circuits. By using the emulator, FLL circuits can be analyzed and optimized without use of SQUID sensors. This is a useful feature, especially in the early stage of development of the MCG system when a magnetically shielded room or real SQUID sensors may not yet be available. The emulator may also be used as a test signal generator for multi-channel gain calibration and for system maintenance.

  • Efficient Codebook Search Method for AMR Wideband Speech Codecs

    Hochong PARK  Younhee KIM  Jisang YOO  

     
    PAPER-Speech and Hearing

      Vol:
    E87-D No:8
      Page(s):
    2114-2120

    The AMR wideband speech codec was recently developed for high-quality wideband speech communications. Although it has an excellent performance due to expanded bandwidth of speech signal, it requires a huge amount of computation especially in codebook search. To solve this problem, this paper proposes an efficient codebook search method for AMR wideband codec. Starting from a poorly performing initial codevector, the proposed method enhances the performance of the codevector iteratively by exchanging the worst pulse in the codevector with a better one after evaluating the role of each pulse. Simulations show that the AMR wideband codec adopting the proposed codebook search method provides better performance with much less computational load than that using the standard method.

  • Encoding Detection and Bit Rate Classification of AMR-Coded Speech Based on Deep Neural Network

    Seong-Hyeon SHIN  Woo-Jin JANG  Ho-Won YUN  Hochong PARK  

     
    LETTER-Speech and Hearing

      Pubricized:
    2017/10/20
      Vol:
    E101-D No:1
      Page(s):
    269-272

    A method for encoding detection and bit rate classification of AMR-coded speech is proposed. For each texture frame, 184 features consisting of the short-term and long-term temporal statistics of speech parameters are extracted, which can effectively measure the amount of distortion due to AMR. The deep neural network then classifies the bit rate of speech after analyzing the extracted features. It is confirmed that the proposed features provide better performance than the conventional spectral features designed for bit rate classification of coded audio.