Yamato OHTANI Tomoki TODA Hiroshi SARUWATARI Kiyohiro SHIKANO
In this paper, we describe a novel model training method for one-to-many eigenvoice conversion (EVC). One-to-many EVC is a technique for converting a specific source speaker's voice into an arbitrary target speaker's voice. An eigenvoice Gaussian mixture model (EV-GMM) is trained in advance using multiple parallel data sets consisting of utterance-pairs of the source speaker and many pre-stored target speakers. The EV-GMM can be adapted to new target speakers using only a few of their arbitrary utterances by estimating a small number of adaptive parameters. In the adaptation process, several parameters of the EV-GMM to be fixed for different target speakers strongly affect the conversion performance of the adapted model. In order to improve the conversion performance in one-to-many EVC, we propose an adaptive training method of the EV-GMM. In the proposed training method, both the fixed parameters and the adaptive parameters are optimized by maximizing a total likelihood function of the EV-GMMs adapted to individual pre-stored target speakers. We conducted objective and subjective evaluations to demonstrate the effectiveness of the proposed training method. The experimental results show that the proposed adaptive training yields significant quality improvements in the converted speech.
Bag-of-Visual-Words representation has recently become popular for scene classification. However, learning the visual words in an unsupervised manner suffers from the problem when faced these patches with similar appearances corresponding to distinct semantic concepts. This paper proposes a novel supervised learning framework, which aims at taking full advantage of label information to address the problem. Specifically, the Gaussian Mixture Modeling (GMM) is firstly applied to obtain "semantic interpretation" of patches using scene labels. Each scene induces a probability density on the low-level visual features space, and patches are represented as vectors of posterior scene semantic concepts probabilities. And then the Information Bottleneck (IB) algorithm is introduce to cluster the patches into "visual words" via a supervised manner, from the perspective of semantic interpretations. Such operation can maximize the semantic information of the visual words. Once obtained the visual words, the appearing frequency of the corresponding visual words in a given image forms a histogram, which can be subsequently used in the scene categorization task via the Support Vector Machine (SVM) classifier. Experiments on a challenging dataset show that the proposed visual words better perform scene classification task than most existing methods.
Suk Tae SEO In Keun LEE Hye Cheun JEONG Soon Hak KWON
Histogram equalization is the most popular method for image enhancement. However it has some drawbacks: i) it causes undesirable artifacts and ii) it can degrade the visual quality. To overcome the drawbacks, in this letter, multi-histogram equalization on smoothed histogram using a Gaussian kernel is proposed. To demonstrate the effectiveness, the method is tested on several images and compared with conventional methods.
Nonlinear distortions in power amplifiers (PAs) generate spectral regrowth at the output, which causes interference to adjacent channels and errors in digitally modulated signals. This paper presents a novel method to evaluate adjacent channel leakage power ratio (ACPR) and error vector magnitude (EVM) from the amplitude-to-amplitude (AM/AM) and amplitude-to-phase (AM/PM) characteristics. The transmitted signal is considered to be complex Gaussian distributed in orthogonal frequency-division multiplexing (OFDM) systems. We use the Mehler formula to derive closed-form expressions of the PAs output power spectral density (PSD), ACPR and EVM for memoryless PA and memory PA respectively. We inspect the derived relationships using an OFDM signal in the IEEE 802.11a WLAN standard. Simulation results show that the proposed method is appropriate to predict the ACPR and EVM values of the nonlinear PA output in OFDM systems, when the AM/AM and AM/PM characteristics are known.
Takafumi KANAMORI Taiji SUZUKI Masashi SUGIYAMA
Density ratio estimation has gathered a great deal of attention recently since it can be used for various data processing tasks. In this paper, we consider three methods of density ratio estimation: (A) the numerator and denominator densities are separately estimated and then the ratio of the estimated densities is computed, (B) a logistic regression classifier discriminating denominator samples from numerator samples is learned and then the ratio of the posterior probabilities is computed, and (C) the density ratio function is directly modeled and learned by minimizing the empirical Kullback-Leibler divergence. We first prove that when the numerator and denominator densities are known to be members of the exponential family, (A) is better than (B) and (B) is better than (C). Then we show that once the model assumption is violated, (C) is better than (A) and (B). Thus in practical situations where no exact model is available, (C) would be the most promising approach to density ratio estimation.
We consider the use of the additive white Gaussian noise channel to achieve information theoretically secure oblivious transfer. A protocol for this primitive that ensures the correctness and privacy for players is presented together with the signal design. We also study the information theoretic efficiency of the protocol, and some more practical issues where the parameter of the channel is unknown to the players.
Masahiko NISHIMOTO Kohichi OGATA
Gaussian rough surfaces can be characterized by two roughness parameters, the root-mean-square height and correlation length. For accurate estimation of these parameters from measured surface height-profile, data samples with sufficiently long record length are necessary. In this letter, an expression of correlation length in terms of a surface slope function is introduced in order to estimate correlation length and analytical expression of the data record length required for accurate estimation is derived. The result shows that the method using the slope function can reduce the data record length approximately 60% as compared to the commonly employed method using the correlation function. In order to check the result, a Monte Carlo simulation is also carried out and the validity of the result is confirmed.
Nobumoto YAMANE Motohiro TABUCHI Yoshitaka MORIKAWA
In this paper, an image restoration method using the Wiener filter is proposed. In order to bring the theory of the Wiener filter consistent with images that have spatially varying statistics, the proposed method adopts the locally adaptive Wiener filter (AWF) based on the universal Gaussian mixture distribution model (UNI-GMM) previously proposed for denoising. Applying the UNI-GMM-AWF for deconvolution problem, the proposed method employs the stationary Wiener filter (SWF) as a pre-filter. The SWF in the discrete cosine transform domain shrinks the blur point spread function and facilitates the modeling and filtering at the proceeding AWF. The SWF and UNI-GMM are learned using a generic training image set and the proposed method is tuned toward the image set. Simulation results are presented to demonstrate the effectiveness of the proposed method.
Makoto YAMADA Masashi SUGIYAMA
The ratio of two probability densities is called the importance and its estimation has gathered a great deal of attention these days since the importance can be used for various data processing purposes. In this paper, we propose a new importance estimation method using Gaussian mixture models (GMMs). Our method is an extention of the Kullback-Leibler importance estimation procedure (KLIEP), an importance estimation method using linear or kernel models. An advantage of GMMs is that covariance matrices can also be learned through an expectation-maximization procedure, so the proposed method--which we call the Gaussian mixture KLIEP (GM-KLIEP)--is expected to work well when the true importance function has high correlation. Through experiments, we show the validity of the proposed approach.
In this letter, we propose an efficient method to improve the performance of voiced/unvoiced (V/UV) sounds decision for the selectable mode vocoder (SMV) of 3GPP2 using the Gaussian mixture model (GMM). We first present an effective analysis of the features and the classification method adopted in the SMV. And feature vectors which are applied to the GMM are then selected from relevant parameters of the SMV for the efficient V/UV classification. The performance of the proposed algorithm are evaluated under various conditions and yield better results compared to the conventional method of the SMV.
In this letter, an acoustic environment classification algorithm based on the 3GPP2 selectable mode vocoder (SMV) is proposed for context-aware mobile phones. Classification of the acoustic environment is performed based on a Gaussian mixture model (GMM) using coding parameters of the SMV extracted directly from the encoding process of the acoustic input data in the mobile phone. Experimental results show that the proposed environment classification algorithm provides superior performance over a conventional method in various acoustic environments.
Liang SHA Guijin WANG Anbang YAO Xinggang LIN
Particle filter has attracted increasing attention from researchers of object tracking due to its promising property of handling nonlinear and non-Gaussian systems. In this paper, we mainly explore the problem of precisely estimating observation likelihoods of particles in the joint feature-spatial space. For this purpose, a mixture Gaussian kernel function based similarity is presented to evaluate the discrepancy between the target region and the particle region. Such a similarity can be interpreted as the expectation of the spatial weighted feature distribution over the target region. To adapt outburst of object motion, we also present a method to appropriately adjust state transition model by utilizing the priors of motion speed and object size. In comparison with the standard particle filter tracker, our tracking algorithm shows the better performance on challenging video sequences.
Behrooz SAFARINEJADIAN Mohammad B. MENHAJ Mehdi KARRARI
In this paper, the problem of density estimation and clustering in sensor networks is considered. It is assumed that measurements of the sensors can be statistically modeled by a common Gaussian mixture model. This paper develops a distributed variational Bayesian algorithm (DVBA) to estimate the parameters of this model. This algorithm produces an estimate of the density of the sensor data without requiring the data to be transmitted to and processed at a central location. Alternatively, DVBA can be viewed as a distributed processing approach for clustering the sensor data into components corresponding to predominant environmental features sensed by the network. The convergence of the proposed DVBA is then investigated. Finally, to verify the performance of DVBA, we perform several simulations of sensor networks. Simulation results are very promising.
Xiang ZHANG Hongbin SUO Qingwei ZHAO Yonghong YAN
In this letter, we propose a new approach to SVM based speaker recognition, which utilizes a kind of novel phonotactic information as the feature for SVM modeling. Gaussian mixture models (GMMs) have been proven extremely successful for text-independent speaker recognition. The GMM universal background model (UBM) is a speaker-independent model, each component of which can be considered as modeling some underlying phonetic sound classes. We assume that the utterances from different speakers should get different average posterior probabilities on the same Gaussian component of the UBM, and the supervector composed of the average posterior probabilities on all components of the UBM for each utterance should be discriminative. We use these supervectors as the features for SVM based speaker recognition. Experiment results on a NIST SRE 2006 task show that the proposed approach demonstrates comparable performance with the commonly used systems. Fusion results are also presented.
Hiroaki TEZUKA Takao NISHITANI
This paper describes a multiresolutional Gaussian mixture model (GMM) for precise and stable foreground segmentation. A multiple block sizes GMM and a computationally efficient fine-to-coarse strategy, which are carried out in the Walsh transform (WT) domain, are newly introduced to the GMM scheme. By using a set of variable size block-based GMMs, a precise and stable processing is realized. Our fine-to-coarse strategy comes from the WT spectral nature, which drastically reduces the computational steps. In addition, the total computation amount of the proposed approach requires only less than 10% of the original pixel-based GMM approach. Experimental results show that our approach gives stable performance in many conditions, including dark foreground objects against light, global lighting changes, and scenery in heavy snow.
Shingo TAKAHASHI Shuji TSUKIYAMA
In order to improve the performance of the existing statistical timing analysis, slew distributions must be taken into account and a mechanism to propagate them together with delay distributions along signal paths is necessary. This paper introduces Gaussian mixture models to represent the slew and delay distributions, and proposes a novel algorithm for statistical timing analysis. The algorithm propagates a pair of delay and slew in a given circuit graph, and changes the delay distributions of circuit elements dynamically by propagated slews. The proposed model and algorithm are evaluated by comparing with Monte Carlo simulation. The experimental results show that the accuracy improvement in µ+3σ value of maximum delay is up to 4.5 points from the current statistical timing analysis using Gaussian distributions.
In the electromagnetic theory, the vacuum impedance Z0 is a universal constant, which is as important as the velocity of light c0 in vacuum. Unfortunately, however, its significance is not appreciated so well and sometimes the presence itself is ignored. It is partly because in the Gaussian system of units, which has widely been used for long time, Z0 is a dimensionless constant and of unit magnitude. In this paper, we clarify that Z0 is a fundamental parameter in electromagnetism and plays major roles in the following scenes: reorganizing the structure of the electromagnetic formula in reference to the relativity; renormalizing the quantities toward natural unit systems starting from the SI unit system; and defining the magnitudes of electromagnetic units.
An extension of the traditional color-based visual tracker, i.e., the continuously adaptive mean shift tracker, is given for improving the convenience and generality of the color-based tracker. This is achieved by introducing a probability density function for pixels based on the hue histogram of object. As its merits, the direction and size of the tracked object are easily derived by the principle component analysis (PCA), and its extension to three-dimensional case becomes straightforward.
Youngsoo KIM Sangbae JEONG Daeyoung KIM
In this paper, an efficient node-level target classification scheme in wireless sensor networks (WSNs) is proposed. It uses acoustic and seismic information, and its performance is verified by the classification accuracy of vehicles in a WSN. Because of the hard limitation in resources, parametric classifiers should be more preferable than non-parametric ones in WSN systems. As a parametric classifier, the Gaussian mixture model (GMM) algorithm not only shows good performances to classify targets in WSNs, but it also requires very few resources suitable to a sensor node. In addition, our sensor fusion method uses a decision tree, generated by the classification and regression tree (CART) algorithm, to improve the accuracy, so that the algorithm drives a considerable increase of the classification rate using less resources. Experimental results using a real dataset of WSN show that the proposed scheme shows a 94.10% classification rate and outperforms the k-nearest neighbors and the support vector machine.
Yuuji MUKAI Hideki NODA Michiharu NIIMI Takashi OSANAI
This paper presents a text-independent speaker verification method using Gaussian mixture models (GMMs), where only utterances of enrolled speakers are required. Artificial cohorts are used instead of those from speaker databases, and GMMs for artificial cohorts are generated by changing model parameters of the GMM for a claimed speaker. Equal error rates by the proposed method are about 60% less than those by a conventional method which also uses only utterances of enrolled speakers.