The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] moment matching(4hit)

1-4hit
  • DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching

    Satoshi MIZOGUCHI  Yuki SAITO  Shinnosuke TAKAMICHI  Hiroshi SARUWATARI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/07/30
      Vol:
    E104-D No:11
      Page(s):
    1971-1980

    We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.

  • Generative Moment Matching Network-Based Neural Double-Tracking for Synthesized and Natural Singing Voices

    Hiroki TAMARU  Yuki SAITO  Shinnosuke TAKAMICHI  Tomoki KORIYAMA  Hiroshi SARUWATARI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2019/12/23
      Vol:
    E103-D No:3
      Page(s):
    639-647

    This paper proposes a generative moment matching network (GMMN)-based post-filtering method for providing inter-utterance pitch variation to singing voices and discusses its application to our developed mixing method called neural double-tracking (NDT). When a human singer sings and records the same song twice, there is a difference between the two recordings. The difference, which is called inter-utterance variation, enriches the performer's musical expression and the audience's experience. For example, it makes every concert special because it never recurs in exactly the same manner. Inter-utterance variation enables a mixing method called double-tracking (DT). With DT, the same phrase is recorded twice, then the two recordings are mixed to give richness to singing voices. However, in synthesized singing voices, which are commonly used to create music, there is no inter-utterance variation because the synthesis process is deterministic. There is also no inter-utterance variation when only one voice is recorded. Although there is a signal processing-based method called artificial DT (ADT) to layer singing voices, the signal processing results in unnatural sound artifacts. To solve these problems, we propose a post-filtering method for randomly modulating synthesized or natural singing voices as if the singer sang again. The post-filter built with our method models the inter-utterance pitch variation of human singing voices using a conditional GMMN. Evaluation results indicate that 1) the proposed method provides perceptible and natural inter-utterance variation to synthesized singing voices and that 2) our NDT exhibits higher double-trackedness than ADT when applied to both synthesized and natural singing voices.

  • Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation

    Tran HUY DAT  Kazuya TAKEDA  Fumitada ITAKURA  

     
    PAPER-Speech Enhancement

      Vol:
    E91-D No:3
      Page(s):
    439-447

    We present a multichannel speech enhancement method based on MAP speech spectral magnitude estimation using a generalized gamma model of speech prior distribution, where the model parameters are adapted from actual noisy speech in a frame-by-frame manner. The utilization of a more general prior distribution with its online adaptive estimation is shown to be effective for speech spectral estimation in noisy environments. Furthermore, the multi-channel information in terms of cross-channel statistics are shown to be useful to better adapt the prior distribution parameters to the actual observation, resulting in better performance of speech enhancement algorithm. We tested the proposed algorithm in an in-car speech database and obtained significant improvements of the speech recognition performance, particularly under non-stationary noise conditions such as music, air-conditioner and open window.

  • The Multiple Point Global Lanczos Method for Multiple-Inputs Multiple-Outputs Interconnect Order Reductions

    Chia-Chi CHU  Ming-Hong LAI  Wu-Shiung FENG  

     
    PAPER-Modelling, Systems and Simulation

      Vol:
    E89-A No:10
      Page(s):
    2706-2716

    The global Lanczos algorithm for solving the RLCG interconnect circuits is presented in this paper. This algorithm is an extension of the standard Lanczos algorithm for multiple-inputs multiple-outputs (MIMO) systems. A new matrix Krylov subspace will be developed first. By employing the congruence transformation with the matrix Krylov subspace, the two-side oblique projection-based method can be used to construct a reduced-order system. It will be shown that the system moments are still matched. The error of the 2q-th order system moment will be derived analytically. Furthermore, two novel model-order reduction techniques called the multiple point global Lanczos (MPGL) method and the adaptive-order global Lanczos (AOGL) method which are both based on the multiple point moment matching are proposed. The frequency responses using the multiple point moment matching method have higher coherence to the original system than those using the single point expansion method. Finally, simulation results on frequency domain will illustrate the feasibility and the efficiency of the proposed methods.