The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] voice quality(5hit)

1-5hit
  • Similar Speaker Selection Technique Based on Distance Metric Learning Using Highly Correlated Acoustic Features with Perceptual Voice Quality Similarity

    Yusuke IJIMA  Hideyuki MIZUNO  

     
    PAPER-Speech and Hearing

      Pubricized:
    2014/10/15
      Vol:
    E98-D No:1
      Page(s):
    157-165

    This paper analyzes the correlation between various acoustic features and perceptual voice quality similarity, and proposes a perceptually similar speaker selection technique based on distance metric learning. To analyze the relationship between acoustic features and voice quality similarity, we first conduct a large-scale subjective experiment using the voices of 62 female speakers and perceptual voice quality similarity scores between all pairs of speakers are acquired. Next, multiple linear regression analysis is carried out; it shows that four acoustic features are highly correlated to voice quality similarity. The proposed speaker selection technique first trains a transform matrix based on distance metric learning using the perceptual voice quality similarity acquired in the subjective experiment. Given an input speech, acoustic features of the input speech are transformed using the trained transform matrix, after which speaker selection is performed based on the Euclidean distance on the transformed acoustic feature space. We perform speaker selection experiments and evaluate the performance of the proposed technique by comparing it to speaker selection without feature space transformation. The results indicate that transformation based on distance metric learning reduces the error rate by 53.9%.

  • Spectral Features for Perceptually Natural Phoneme Replacement by Another Speaker's Speech

    Reiko TAKOU  Hiroyuki SEGI  Tohru TAKAGI  Nobumasa SEIYAMA  

     
    PAPER-Speech and Hearing

      Vol:
    E95-A No:4
      Page(s):
    751-759

    The frequency regions and spectral features that can be used to measure the perceived similarity and continuity of voice quality are reported here. A perceptual evaluation test was conducted to assess the naturalness of spoken sentences in which either a vowel or a long vowel of the original speaker was replaced by that of another. Correlation analysis between the evaluation score and the spectral feature distance was conducted to select the spectral features that were expected to be effective in measuring the voice quality and to identify the appropriate speech segment of another speaker. The mel-frequency cepstrum coefficient (MFCC) and the spectral center of gravity (COG) in the low-, middle-, and high-frequency regions were selected. A perceptual paired comparison test was carried out to confirm the effectiveness of the spectral features. The results showed that the MFCC was effective for spectra across a wide range of frequency regions, the COG was effective in the low- and high-frequency regions, and the effective spectral features differed among the original speakers.

  • Tense-Lax Vowel Classification with Energy Trajectory and Voice Quality Measurements

    Suk-Myung LEE  Jeung-Yoon CHOI  

     
    LETTER-Speech and Hearing

      Vol:
    E95-D No:3
      Page(s):
    884-887

    This work examines energy trajectory and voice quality measurements, in addition to conventional formant and duration properties, to classify tense and lax vowels in English. Tense and lax vowels are produced with differing articulatory configurations which can be identified by measuring acoustic cues such as energy peak location, energy convexity, open quotient and spectral tilt. An analysis of variance (ANOVA) is conducted, and dialect effects are observed. An overall 85.2% classification rate is obtained using the proposed features on the TIMIT database, resulting in improvement over using only conventional acoustic features. Adding the proposed features to widely used cepstral features also results in improved classification.

  • Objective Pathological Voice Quality Assessment Based on HOS Features

    Ji-Yeoun LEE  Sangbae JEONG  Hong-Shik CHOI  Minsoo HAHN  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:12
      Page(s):
    2888-2891

    This work proposes new features to improve the pathological voice quality classification performance. They are the means, the variances, and the perturbations of the higher-order statistics (HOS) such as the skewness and the kurtosis. The HOS-based features show meaningful differences among normal, grade 1, grade 2, and grade 3 voices classified in the GRBAS scale. The jitter, the shimmer, the harmonic-to-noise ratio (HNR), and the variance of the short-time energy are utilized as the conventional features. The performances are measured by the classification and regression tree (CART) method. Specifically, the CART-based method by utilizing both the conventional features and the HOS-based ones shows its effectiveness in the pathological voice quality measurement, with the classification accuracy of 87.8%.

  • Objective Speech Quality Assessment Based on Payload Discrimination of Lost Packets for Cellular Phones in NGN Environment

    Satoshi UEMURA  Norihiro FUKUMOTO  Hideaki YAMADA  Hajime NAKAMURA  

     
    PAPER-Network Management/Operation

      Vol:
    E91-B No:11
      Page(s):
    3667-3676

    A feature of services provided in a Next Generation Network (NGN) is that the end-to-end quality is guaranteed. This is quite a challenging issue, given the considerable fluctuation in network conditions within a Fixed Mobile Convergence (FMC) network. Therefore, a novel approach, whereby a network node and a mobile terminal such as a cellular phone cooperate with each other to control service quality is essential. In order to achieve such cooperation, the mobile terminal needs to become more intelligent so it can estimate the service quality, including the user's perceptual quality, and notify the measurement result to the network node. Subsequently, the network node implements some kind of service control function, such as a resource and admission control function, based on the notification from the mobile terminal. In this paper, the role of the mobile terminal in such collaborative system is focused on. As a part of a QoS/QoE measurement system, we describe an objective speech quality assessment with payload discrimination of lost packets to measure the user's perceptual quality of VoIP. The proposed assessment is so simple that it can be implemented on a cellular phone. We therefore did this as part of the QoS/QoE measurement system. By using the implemented system, we can measure the user's perceptual quality of VoIP as well as the network QoS metrics, in terms of criteria such as packet loss rate, jitter and burstiness in real time.