1-8hit |
Daisuke SAITO Nobuaki MINEMATSU Keikichi HIROSE
This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of multiple Gaussian mixture models (GMM). In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice GMM (EV-GMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor factor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the dimension of the mean vector and the Gaussian component. The speaker space is derived by the tensor factor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. In addition, in this paper, effects of speaker adaptive training before factorization are also investigated. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
Xingyu ZHANG Xia ZOU Meng SUN Penglong WU Yimin WANG Jun HE
In order to improve the noise robustness of automatic speaker recognition, many techniques on speech/feature enhancement have been explored by using deep neural networks (DNN). In this work, a DNN multi-level enhancement (DNN-ME), which consists of the stages of signal enhancement, cepstrum enhancement and i-vector enhancement, is proposed for text-independent speaker recognition. Given the fact that these enhancement methods are applied in different stages of the speaker recognition pipeline, it is worth exploring the complementary role of these methods, which benefits the understanding of the pros and cons of the enhancements of different stages. In order to use the capabilities of DNN-ME as much as possible, two kinds of methods called Cascaded DNN-ME and joint input of DNNs are studied. Weighted Gaussian mixture models (WGMMs) proposed in our previous work is also applied to further improve the model's performance. Experiments conducted on the Speakers in the Wild (SITW) database have shown that DNN-ME demonstrated significant superiority over the systems with only a single enhancement for noise robust speaker recognition. Compared with the i-vector baseline, the equal error rate (EER) was reduced from 5.75 to 4.01.
Somchai PHATTHANACHUANCHOM Rawesak TANAWONGSUWAN
Color transfer is a simple process to change a color tone in one image (source) to look like another image (target). In transferring colors between images, there are several issues needed to be considered including partial color transfer, trial-and-error, and multiple target color transfer. Our approach enables users to transfer colors partially and locally by letting users select their regions of interest from image segmentation. Since there are many ways that we can transfer colors from a set of target regions to a set of source regions, we introduce the region exploration and navigation approach where users can choose their preferred color tones to transfer one region at a time and gradually customize towards their desired results. The preferred color tones sometimes can come from more than one image; therefore our method is extended to allow users to select their preferred color tones from multiple images. Our experimental results have shown the flexibility of our approach to generate reasonable segmented regions of interest and to enable users to explore the possible results more conveniently.
Kosei KURISU Nobuo SUEMATSU Kazunori IWATA Akira HAYASHI
In image segmentation, finite mixture modeling has been widely used. In its simplest form, the spatial correlation among neighboring pixels is not taken into account, and its segmentation results can be largely deteriorated by noise in images. We propose a spatially correlated mixture model in which the mixing proportions of finite mixture models are governed by a set of underlying functions defined on the image space. The spatial correlation among pixels is introduced by putting a Gaussian process prior on the underlying functions. We can set the spatial correlation rather directly and flexibly by choosing the covariance function of the Gaussian process prior. The effectiveness of our model is demonstrated by experiments with synthetic and real images.
Gaussian mixture model (GMM) has recently been applied for image registration given its robustness and efficiency. However, in previous GMM methods, all the feature points are treated identically. By incorporating local class features, this letter proposes a multiple Gaussian mixture models (M-GMM) method for image registration. The proposed method can achieve higher accuracy results with less registration time. Experiments on real image pairs further proved the superiority of the proposed method.
Shiho HAGIWARA Takanori DATE Kazuya MASU Takashi SATO
This paper proposes a novel and an efficient method termed hypersphere sampling to estimate the circuit yield of low-failure probability with a large number of variable sources. Importance sampling using a mean-shift Gaussian mixture distribution as an alternative distribution is used for yield estimation. Further, the proposed method is used to determine the shift locations of the Gaussian distributions. This method involves the bisection of cones whose bases are part of the hyperspheres, in order to locate probabilistically important regions of failure; the determination of these regions accelerates the convergence speed of importance sampling. Clustering of the failure samples determines the required number of Gaussian distributions. Successful static random access memory (SRAM) yield estimations of 6- to 24-dimensional problems are presented. The number of Monte Carlo trials has been reduced by 2-5 orders of magnitude as compared to conventional Monte Carlo simulation methods.
Makoto YAMADA Masashi SUGIYAMA
The ratio of two probability densities is called the importance and its estimation has gathered a great deal of attention these days since the importance can be used for various data processing purposes. In this paper, we propose a new importance estimation method using Gaussian mixture models (GMMs). Our method is an extention of the Kullback-Leibler importance estimation procedure (KLIEP), an importance estimation method using linear or kernel models. An advantage of GMMs is that covariance matrices can also be learned through an expectation-maximization procedure, so the proposed method--which we call the Gaussian mixture KLIEP (GM-KLIEP)--is expected to work well when the true importance function has high correlation. Through experiments, we show the validity of the proposed approach.
The maximum likelihood estimate of a mixture model is usually found by using the EM algorithm. However, the EM algorithm suffers from a local optima problem and therefore we cannot obtain the potential performance of mixture models in practice. In the case of mixture models, local maxima often have too many components of a mixture model in one part of the space and too few in another, widely separated part of the space. To escape from such configurations we proposed a new variant of the EM algorithm in which simultaneous split and merge operations are repeatedly performed by using a new criterion for efficiently selecting the split and merge candidates. We apply the proposed algorithm to the training of Gaussian mixtures and the dimensionality reduction based on a mixture of factor analyzers using synthetic and real data and show that the proposed algorithm can markedly improve the ML estimates.