Yosuke IIJIMA Atsunori OKADA Yasushi YUMINAKA
In high-speed data communication systems, it is important to evaluate the quality of the transmitted signal at the receiver. At a high-speed data rate, the transmission line characteristics act as a high-frequency attenuator and contribute to the intersymbol interference (ISI) at the receiver. To evaluate ISI conditions, eye diagrams are widely used to analyze signal quality and visualize the ISI effect as an eye-opening rate. Various types of on-chip eye-opening monitors (EOM) have been proposed to adjust waveform-shaping circuits. However, the eye diagram evaluation of multi-valued signaling becomes more difficult than that of binary transmission because of the complicated signal transition patterns. Moreover, in severe ISI situations where the eye is completely closed, eye diagram evaluation does not work well. This paper presents a novel evaluation method using Two-dimensional(2D) symbol mapping and a linear mixture model (LMM) for multi-valued data transmission. In our proposed method, ISI evaluation can be realized by 2D symbol mapping, and an efficient quantitative analysis can be realized using the LMM. An experimental demonstration of four leveled pulse amplitude modulation(PAM-4) data transmission using a Cat5e cable 100 m is presented. The experimental results show that the proposed method can extract features of the ISI effect even though the eye is completely closed in the server condition.
Hiroki NISHIMOTO Renyuan ZHANG Yasuhiko NAKASHIMA
The efficient implementation strategy for speeding up high-quality clustering algorithms is developed on the basis of general purpose graphic processing units (GPGPUs) in this work. Among various clustering algorithms, a sophisticated Gaussian mixture model (GMM) by estimating parameters through variational Bayesian (VB) mechanism is conducted due to its superior performances. Since the VB-GMM methodology is computation-hungry, the GPGPU is employed to carry out massive matrix-computations. To efficiently migrate the conventional CPU-oriented schemes of VB-GMM onto GPGPU platforms, an entire migration-flow with thirteen stages is presented in detail. The CPU-GPGPU co-operation scheme, execution re-order, and memory access optimization are proposed for optimizing the GPGPU utilization and maximizing the clustering speed. Five types of real-world applications along with relevant data-sets are introduced for the cross-validation. From the experimental results, the feasibility of implementing VB-GMM algorithm by GPGPU is verified with practical benefits. The proposed GPGPU migration achieves 192x speedup in maximum. Furthermore, it succeeded in identifying the proper number of clusters, which is hardly conducted by the EM-algotihm.
Naohiro TAWARA Atsunori OGAWA Tomoharu IWATA Hiroto ASHIKAWA Tetsunori KOBAYASHI Tetsuji OGAWA
Most conventional multi-source domain adaptation techniques for recurrent neural network language models (RNNLMs) are domain-centric. In these approaches, each domain is considered independently and this makes it difficult to apply the models to completely unseen target domains that are unobservable during training. Instead, our study exploits domain attributes, which represent common knowledge among such different domains as dialects, types of wordings, styles, and topics, to achieve domain generalization that can robustly represent unseen target domains by combining the domain attributes. To achieve attribute-based domain generalization system in language modeling, we introduce domain attribute-based experts to a multi-stream RNNLM called recurrent adaptive mixture model (RADMM) instead of domain-based experts. In the proposed system, a long short-term memory is independently trained on each domain attribute as an expert model. Then by integrating the outputs from all the experts in response to the context-dependent weight of the domain attributes of the current input, we predict the subsequent words in the unseen target domain and exploit the specific knowledge of each domain attribute. To demonstrate the effectiveness of our proposed domain attributes-centric language model, we experimentally compared the proposed model with conventional domain-centric language model by using texts taken from multiple domains including different writing styles, topics, dialects, and types of wordings. The experimental results demonstrated that lower perplexity can be achieved using domain attributes.
Zhenyu ZHANG Shaoli KANG Bin REN Xiang ZHANG
Time of arrival (TOA) is a widely used wireless cellular network ranging technology. How to perform accurate TOA estimation in multi-path and non-line-of-sight (NLOS) environments and then accurately calculating mobile terminal locations are two critical issues in positioning research. NLOS identification can be performed in the TOA measurement part and the position calculation part. In this paper, for the above two steps, two schemes for mitigating NLOS errors are proposed. First, a TOA ranging method based on clustering theory is proposed to solve the problem of line-of-sight (LOS) path estimation in multi-path channels. We model the TOA range as a Gaussian mixture model and illustrate how LOS and NLOS can be measured and identified based on non-parametric Bayesian methods when the wireless transmission environment is unknown. Moreover, for NLOS propagation channels, this paper proposes a user location estimator based on the maximum a posteriori criterion. Combined with the TOA estimation and user location computation scheme proposed in this paper, the terminal's positioning accuracy is improved. Experiments showed that the TOA measurement and localization algorithms presented in this paper have good robustness in complex wireless environments.
Yosuke IIJIMA Keigo TAYA Yasushi YUMINAKA
To meet the increasing demand for high-speed communication in VLSI (very large-scale integration) systems, next-generation high-speed data transmission standards (e.g., IEEE 802.3bs and PCIe 6.0) will adopt four-level pulse amplitude modulation (PAM-4) for data coding. Although PAM-4 is spectrally efficient to mitigate inter-symbol interference caused by bandwidth-limited wired channels, it is more sensitive than conventional non-return-to-zero line coding. To evaluate the received signal quality when using adaptive coefficient settings for a PAM-4 equalizer during data transmission, we propose an eye-opening monitor technique based on machine learning. The proposed technique uses a Gaussian mixture model to classify the received PAM-4 symbols. Simulation and experimental results demonstrate the feasibility of adaptive equalization for PAM-4 coding.
In this paper, we propose a method which enables us to control the variance of the coefficients of the LMS-type adaptive filters. In the method, each coefficient of the adaptive filter is modeled as an random variable with a Gaussian distribution, and its value is estimated as the mean value of the distribution. Besides, at each time, we check if the updated value exists within the predefined range of distribution. The update of a coefficient will be canceled when its updated value exceeds the range. We propose an implementation method which has similar formula as the Gaussian mixture model (GMM) widely used in signal processing and machine learning. The effectiveness of the proposed method is evaluated by the computer simulations.
Kazuki SESHIMO Akira OTA Daichi NISHIO Satoshi YAMANE
In recent years, the use of big data has attracted more attention, and many techniques for data analysis have been proposed. Big data analysis is difficult, however, because such data varies greatly in its regularity. Heterogeneous mixture machine learning is one algorithm for analyzing such data efficiently. In this study, we propose online heterogeneous learning based on an online EM algorithm. Experiments show that this algorithm has higher learning accuracy than that of a conventional method and is practical. The online learning approach will make this algorithm useful in the field of data analysis.
Junya KOGUCHI Shinnosuke TAKAMICHI Masanori MORISE Hiroshi SARUWATARI Shigeki SAGAYAMA
We propose a speech analysis-synthesis and deep neural network (DNN)-based text-to-speech (TTS) synthesis framework using Gaussian mixture model (GMM)-based approximation of full-band spectral envelopes. GMMs have excellent properties as acoustic features in statistic parametric speech synthesis. Each Gaussian function of a GMM fits the local resonance of the spectrum. The GMM retains the fine spectral envelope and achieve high controllability of the structure. However, since conventional speech analysis methods (i.e., GMM parameter estimation) have been formulated for a narrow-band speech, they degrade the quality of synthetic speech. Moreover, a DNN-based TTS synthesis method using GMM-based approximation has not been formulated in spite of its excellent expressive ability. Therefore, we employ peak-picking-based initialization for full-band speech analysis to provide better initialization for iterative estimation of the GMM parameters. We introduce not only prediction error of GMM parameters but also reconstruction error of the spectral envelopes as objective criteria for training DNN. Furthermore, we propose a method for multi-task learning based on minimizing these errors simultaneously. We also propose a post-filter based on variance scaling of the GMM for our framework to enhance synthetic speech. Experimental results from evaluating our framework indicated that 1) the initialization method of our framework outperformed the conventional one in the quality of analysis-synthesized speech; 2) introducing the reconstruction error in DNN training significantly improved the synthetic speech; 3) our variance-scaling-based post-filter further improved the synthetic speech.
The test of homogeneity for normal mixtures has been used in various fields, but its theoretical understanding is limited because the parameter set for the null hypothesis corresponds to singular points in the parameter space. In this paper, we shed a light on this issue from a new perspective, variational Bayes, and offer a theory for testing homogeneity based on it. Conventional theory has not reveal the stochastic behavior of the variational free energy, which is necessary for constructing a hypothesis test, has remained unknown. We clarify it for the first time and construct a new test base on it. Numerical experiments show the validity of our results.
Daisuke SAITO Nobuaki MINEMATSU Keikichi HIROSE
This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of multiple Gaussian mixture models (GMM). In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice GMM (EV-GMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor factor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the dimension of the mean vector and the Gaussian component. The speaker space is derived by the tensor factor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. In addition, in this paper, effects of speaker adaptive training before factorization are also investigated. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
Pengyu WANG Hongqing ZHU Ning CHEN
A novel superpixel segmentation approach driven by uniform mixture model with spatially constrained (UMMS) is proposed. Under this algorithm, each observation, i.e. pixel is first represented as a five-dimensional vector which consists of colour in CLELAB space and position information. And then, we define a new uniform distribution through adding pixel position, so that this distribution can describe each pixel in input image. Applied weighted 1-Norm to difference between pixels and mean to control the compactness of superpixel. In addition, an effective parameter estimation scheme is introduced to reduce computational complexity. Specifically, the invariant prior probability and parameter range restrict the locality of superpixels, and the robust mean optimization technique ensures the accuracy of superpixel boundaries. Finally, each defined uniform distribution is associated with a superpixel and the proposed UMMS successfully implements superpixel segmentation. The experiments on BSDS500 dataset verify that UMMS outperforms most of the state-of-the-art approaches in terms of segmentation accuracy, regularity, and rapidity.
Xingyu ZHANG Xia ZOU Meng SUN Penglong WU Yimin WANG Jun HE
In order to improve the noise robustness of automatic speaker recognition, many techniques on speech/feature enhancement have been explored by using deep neural networks (DNN). In this work, a DNN multi-level enhancement (DNN-ME), which consists of the stages of signal enhancement, cepstrum enhancement and i-vector enhancement, is proposed for text-independent speaker recognition. Given the fact that these enhancement methods are applied in different stages of the speaker recognition pipeline, it is worth exploring the complementary role of these methods, which benefits the understanding of the pros and cons of the enhancements of different stages. In order to use the capabilities of DNN-ME as much as possible, two kinds of methods called Cascaded DNN-ME and joint input of DNNs are studied. Weighted Gaussian mixture models (WGMMs) proposed in our previous work is also applied to further improve the model's performance. Experiments conducted on the Speakers in the Wild (SITW) database have shown that DNN-ME demonstrated significant superiority over the systems with only a single enhancement for noise robust speaker recognition. Compared with the i-vector baseline, the equal error rate (EER) was reduced from 5.75 to 4.01.
Qi ZHANG Hiroaki SASAKI Kazushi IKEDA
Estimation of the gradient of the logarithm of a probability density function is a versatile tool in statistical data analysis. A recent method for model-seeking clustering called the least-squares log-density gradient clustering (LSLDGC) [Sasaki et al., 2014] employs a sophisticated gradient estimator, which directly estimates the log-density gradients without going through density estimation. However, the typical implementation of LSLDGC is based on a spherical Gaussian function, which may not work well when the probability density function for data has highly correlated local structures. To cope with this problem, we propose a new gradient estimator for log-density gradients with Gaussian mixture models (GMMs). Covariance matrices in GMMs enable the new estimator to capture the highly correlated structures. Through the application of the new gradient estimator to mode-seeking clustering and hierarchical clustering, we experimentally demonstrate the usefulness of our clustering methods over existing methods.
Hui BI Yibo JIANG Hui LI Xuan SHA Yi WANG
The ultrasound image segmentation is a crucial task in many clinical applications. However, the ultrasound image is difficult to segment due to image inhomogeneity caused by the ultrasound imaging technique. In this paper, to deal with image inhomogeneity with considering ultrasound image properties the Local Rayleigh Distribution Fitting (LRDF) energy term is introduced into the traditional level set method newly. While the curve evolution equation is derived for energy minimization, and self-driven uterus contour is achieved on the ultrasound images. The experimental segmentation results on synthetic images and in-vivo ultrasound images present that the proposed approach is effective and accurate, with the Dice Score Coefficient (DSC) of 0.95 ± 0.02.
Ryo MASUMURA Taichi ASAMI Takanobu OBA Hirokazu MASATAKI Sumitaka SAKAUCHI Akinori ITO
This paper proposes a novel domain adaptation method that can utilize out-of-domain text resources and partially domain matched text resources in language modeling. A major problem in domain adaptation is that it is hard to obtain adequate adaptation effects from out-of-domain text resources. To tackle the problem, our idea is to carry out model merger in a latent variable space created from latent words language models (LWLMs). The latent variables in the LWLMs are represented as specific words selected from the observed word space, so LWLMs can share a common latent variable space. It enables us to perform flexible mixture modeling with consideration of the latent variable space. This paper presents two types of mixture modeling, i.e., LWLM mixture models and LWLM cross-mixture models. The LWLM mixture models can perform a latent word space mixture modeling to mitigate domain mismatch problem. Furthermore, in the LWLM cross-mixture models, LMs which individually constructed from partially matched text resources are split into two element models, each of which can be subjected to mixture modeling. For the approaches, this paper also describes methods to optimize mixture weights using a validation data set. Experiments show that the mixture in latent word space can achieve performance improvements for both target domain and out-of-domain compared with that in observed word space.
Yinghui ZHANG Hongjun WANG Hengxue ZHOU Ping DENG
Image boundary detection or image segmentation is an important step in image analysis. However, choosing appropriate parameters for boundary detection algorithms is necessary to achieve good boundary detection results. Image boundary detection fusion with unsupervised parameters can output a final consensus boundary, which is generally better than using unsupervised or supervised image boundary detection algorithms. In this study, we theoretically examine why image boundary detection fusion can work well and we propose a mixture model for image boundary detection fusion (MMIBDF) to achieve good consensus segmentation in an unsupervised manner. All of the segmentation algorithms are treated as new features and the segmentation results obtained by the algorithms are the values of the new features. The MMIBDF is designed to sample the boundary according to a discrete distribution. We present an inference method for MMIBDF and describe the corresponding algorithm in detail. Extensive empirical results demonstrate that MMIBDF significantly outperforms other image boundary detection fusion algorithms and the base image boundary detection algorithms according to most performance indices.
Li WANG Xiaoan TANG Junda ZHANG Dongdong GUAN
Feature visualization is of great significances in volume visualization, and feature extraction has been becoming extremely popular in feature visualization. While precise definition of features is usually absent which makes the extraction difficult. This paper employs probability density function (PDF) as statistical property, and proposes a statistical property guided approach to extract features for volume data. Basing on feature matching, it combines simple liner iterative cluster (SLIC) with Gaussian mixture model (GMM), and could do extraction without accurate feature definition. Further, GMM is paired with a normality test to reduce time cost and storage requirement. We demonstrate its applicability and superiority by successfully applying it on homogeneous and non-homogeneous features.
Xueting WANG Kensho HARA Yu ENOKIBORI Takatsugu HIRAYAMA Kenji MASE
Multi-camera videos with abundant information and high flexibility are useful in a wide range of applications, such as surveillance systems, web lectures, news broadcasting, concerts and sports viewing. Viewers can enjoy an enhanced viewing experience by choosing their own viewpoint through viewing interfaces. However, some viewers may feel annoyed by the need for continual manual viewpoint selection, especially when the number of selectable viewpoints is relatively large. In order to solve this issue, we propose an automatic viewpoint navigation method designed especially for sports. This method focuses on a viewer's personal preference for viewpoint selection, instead of common and professional editing rules. We assume that different trajectory distributions of viewing objects cause a difference in the viewpoint selection according to personal preference. We learn the relationship between the viewer's personal viewpoint-selection tendency and the spatio-temporal game context represented by the objects trajectories. We compare three methods based on Gaussian mixture model, SVM with a general histogram and SVM with a bag-of-words to seek the best learning scheme for this relationship. The performance of the proposed methods are evaluated by assessing the degree of similarity between the selected viewpoints and the viewers' edited records.
In statistical approaches such as statistical static timing analysis, the distribution of the maximum of plural distributions is computed by repeating a maximum operation of two distributions. Moreover, since each distribution is represented by a linear combination of several explanatory random variables so as to handle correlations efficiently, sensitivity of the maximum of two distributions to each explanatory random variable, that is, covariance between the maximum and an explanatory random variable, must be calculated in every maximum operation. Since distribution of the maximum of two Gaussian distributions is not a Gaussian, Gaussian mixture model is used for representing a distribution. However, if Gaussian mixture models are used, then it is not always possible to make both variance and covariance of the maximum correct simultaneously. We propose a new algorithm to determine covariance without deteriorating the accuracy of variance of the maximum, and show experimental results to evaluate its performance.
Somchai PHATTHANACHUANCHOM Rawesak TANAWONGSUWAN
Color transfer is a simple process to change a color tone in one image (source) to look like another image (target). In transferring colors between images, there are several issues needed to be considered including partial color transfer, trial-and-error, and multiple target color transfer. Our approach enables users to transfer colors partially and locally by letting users select their regions of interest from image segmentation. Since there are many ways that we can transfer colors from a set of target regions to a set of source regions, we introduce the region exploration and navigation approach where users can choose their preferred color tones to transfer one region at a time and gradually customize towards their desired results. The preferred color tones sometimes can come from more than one image; therefore our method is extended to allow users to select their preferred color tones from multiple images. Our experimental results have shown the flexibility of our approach to generate reasonable segmented regions of interest and to enable users to explore the possible results more conveniently.