IEICE global.ieice.org Site

Keyword Search Result

[Keyword] restricted Boltzmann machine(4hit)

1-4hit

Speech Chain VC: Linking Linguistic and Acoustic Levels via Latent Distinctive Features for RBM-Based Voice Conversion
Takuya KISHIDA Toru NAKASHIKA

PAPER-Speech and Hearing

Pubricized:
2020/08/06
Vol:
E103-D No:11
Page(s):
2340-2350
This paper proposes a voice conversion (VC) method based on a model that links linguistic and acoustic representations via latent phonological distinctive features. Our method, called speech chain VC, is inspired by the concept of the speech chain, where speech communication consists of a chain of events linking the speaker's brain with the listener's brain. We assume that speaker identity information, which appears in the acoustic level, is embedded in two steps — where phonological information is encoded into articulatory movements (linguistic to physiological) and where articulatory movements generate sound waves (physiological to acoustic). Speech chain VC represents these event links by using an adaptive restricted Boltzmann machine (ARBM) introducing phoneme labels and acoustic features as two classes of visible units and latent phonological distinctive features associated with articulatory movements as hidden units. Subjective evaluation experiments showed that intelligibility of the converted speech significantly improved compared with the conventional ARBM-based method. The speaker-identity conversion quality of the proposed method was comparable to that of a Gaussian mixture model (GMM)-based method. Analyses on the representations of the hidden layer of the speech chain VC model supported that some of the hidden units actually correspond to phonological distinctive features. Final part of this paper proposes approaches to achieve one-shot VC by using the speech chain VC model. Subjective evaluation experiments showed that when a target speaker is the same gender as a source speaker, the proposed methods can achieve one-shot VC based on each single source and target speaker's utterance.
Forecasting Service Performance on the Basis of Temporal Information by the Conditional Restricted Boltzmann Machine
Jiali YOU Hanxing XUE Yu ZHUO Xin ZHANG Jinlin WANG

PAPER-Network

Pubricized:
2017/11/10
Vol:
E101-B No:5
Page(s):
1210-1221
Predicting the service performance of Internet applications is important in service selection, especially for video services. In order to design a predictor for forecasting video service performance in third-party application, two famous service providers in China, Iqiyi and Letv, are monitored and analyzed. The study highlights that the measured performance in the observation period is time-series data, and it has strong autocorrelation, which means it is predictable. In order to combine the temporal information and map the measured data to a proper feature space, the authors propose a predictor based on a Conditional Restricted Boltzmann Machine (CRBM), which can capture the potential temporal relationship of the historical information. Meanwhile, the measured data of different sources are combined to enhance the training process, which can enlarge the training size and avoid the over-fit problem. Experiments show that combining the measured results from different resolutions for a video can raise prediction performance, and the CRBM algorithm shows better prediction ability and more stable performance than the baseline algorithms.
Deep Nonlinear Metric Learning for Speaker Verification in the I-Vector Space
Yong FENG Qingyu XIONG Weiren SHI

LETTER-Speech and Hearing

Pubricized:
2016/10/04
Vol:
E100-D No:1
Page(s):
215-219
Speaker verification is the task of determining whether two utterances represent the same person. After representing the utterances in the i-vector space, the crucial problem is only how to compute the similarity of two i-vectors. Metric learning has provided a viable solution to this problem. Until now, many metric learning algorithms have been proposed, but they are usually limited to learning a linear transformation. In this paper, we propose a nonlinear metric learning method, which learns an explicit mapping from the original space to an optimal subspace using deep Restricted Boltzmann Machine network. The proposed method is evaluated on the NIST SRE 2008 dataset. Since the proposed method has a deep learning architecture, the evaluation results show superior performance than some state-of-the-art methods.
Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines
Toru NAKASHIKA Tetsuya TAKIGUCHI Yasuo ARIKI

PAPER-Voice Conversion and Speech Enhancement

Vol:
E97-D No:6
Page(s):
1403-1410
This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build high-order eigen spaces of source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. We build a deep conversion architecture that concatenates the two speaker-dependent RBMs with neural networks, expecting that they automatically discover abstractions to express the original input features. Under this concept, if we train the RBMs using only the speech of an individual speaker that includes various phonemes while keeping the speaker individuality unchanged, it can be considered that there are fewer phonemes and relatively more speaker individuality in the output features of the hidden layer than original acoustic features. Training the RBMs for a source speaker and a target speaker, we can then connect and convert the speaker individuality abstractions using Neural Networks (NN). The converted abstraction of the source speaker is then back-propagated into the acoustic space (e.g., MFCC) using the RBM of the target speaker. We conducted speaker-voice conversion experiments and confirmed the efficacy of our method with respect to subjective and objective criteria, comparing it with the conventional Gaussian Mixture Model-based method and an ordinary NN.