Contamination of water resources with pathogenic microorganisms excreted in human feces is a worldwide public health concern. Surveillance of fecal contamination is commonly performed by routine monitoring for a single type or a few types of microorganism(s). To design a feasible routine for periodic monitoring and to control risks of exposure to pathogens, reliable statistical algorithms for inferring correlations between concentrations of microorganisms in water need to be established. Moreover, because pathogens are often present in low concentrations, some contaminations are likely to be under a detection limit. This yields a pairwise left-censored dataset and complicates computation of correlation coefficients. Errors of correlation estimation can be smaller if undetected values are imputed better. To obtain better imputations, we utilize side information and develop a new technique, the asymmetric Tobit model which is an extension of the Tobit model so that domain knowledge can be exploited effectively when fitting the model to a censored dataset. The empirical results demonstrate that imputation with domain knowledge is effective for this task.
Kazuki SESHIMO Akira OTA Daichi NISHIO Satoshi YAMANE
In recent years, the use of big data has attracted more attention, and many techniques for data analysis have been proposed. Big data analysis is difficult, however, because such data varies greatly in its regularity. Heterogeneous mixture machine learning is one algorithm for analyzing such data efficiently. In this study, we propose online heterogeneous learning based on an online EM algorithm. Experiments show that this algorithm has higher learning accuracy than that of a conventional method and is practical. The online learning approach will make this algorithm useful in the field of data analysis.
Muneki YASUDA Junpei WATANABE Shun KATAOKA Kazuyuki TANAKA
In this paper, we consider Bayesian image denoising based on a Gaussian Markov random field (GMRF) model, for which we propose an new algorithm. Our method can solve Bayesian image denoising problems, including hyperparameter estimation, in O(n)-time, where n is the number of pixels in a given image. From the perspective of the order of the computational time, this is a state-of-the-art algorithm for the present problem setting. Moreover, the results of our numerical experiments we show our method is in fact effective in practice.
Rachelle RIVERO Richard LEMENCE Tsuyoshi KATO
With the huge influx of various data nowadays, extracting knowledge from them has become an interesting but tedious task among data scientists, particularly when the data come in heterogeneous form and have missing information. Many data completion techniques had been introduced, especially in the advent of kernel methods — a way in which one can represent heterogeneous data sets into a single form: as kernel matrices. However, among the many data completion techniques available in the literature, studies about mutually completing several incomplete kernel matrices have not been given much attention yet. In this paper, we present a new method, called Mutual Kernel Matrix Completion (MKMC) algorithm, that tackles this problem of mutually inferring the missing entries of multiple kernel matrices by combining the notions of data fusion and kernel matrix completion, applied on biological data sets to be used for classification task. We first introduced an objective function that will be minimized by exploiting the EM algorithm, which in turn results to an estimate of the missing entries of the kernel matrices involved. The completed kernel matrices are then combined to produce a model matrix that can be used to further improve the obtained estimates. An interesting result of our study is that the E-step and the M-step are given in closed form, which makes our algorithm efficient in terms of time and memory. After completion, the (completed) kernel matrices are then used to train an SVM classifier to test how well the relationships among the entries are preserved. Our empirical results show that the proposed algorithm bested the traditional completion techniques in preserving the relationships among the data points, and in accurately recovering the missing kernel matrix entries. By far, MKMC offers a promising solution to the problem of mutual estimation of a number of relevant incomplete kernel matrices.
Yin ZHU Fanman MENG Jian XIONG Guan GUI
Multiple image group cosegmentation (MGC) aims at segmenting common object from multiple group of images, which is a new cosegmentation research topic. The existing MGC methods formulate MGC as label assignment problem (Markov Random Field framework), which is observed to be sensitive to parameter setting. Meanwhile, it is also observed that large object variations and complicated backgrounds dramatically decrease the existing MGC performance. To this end, we propose a new object proposal based MGC model, with the aim of avoiding tedious parameter setting, and improving MGC performance. Our main idea is to formulate MGC as new region proposal selection task. A new energy function in term of proposal is proposed. Two aspects such as the foreground consistency within each single image group, and the group consistency among image groups are considered. The energy minimization method is designed in EM framework. Two steps such as the loop belief propagation and foreground propagation are iteratively implemented for the minimization. We verify our method on ICoseg dataset. Six existing cosegmentation methods are used for the comparison. The experimental results demonstrate that the proposed method can not only improve MGC performance in terms of larger IOU values, but is also robust to the parameter setting.
Hirokazu ABE Masahiro FUJII Takanori IWAMATSU Hiroyuki HATANO Atsushi ITO Yu WATANABE
It is necessary to estimate channel state information coherently to equalize the received signal in wireless communication systems. The pilot symbol, known at the receiver, aided channel estimator degrades the transmission efficiency because it requires the signal spaces and the energy for the transmission. In this paper, we assume a fixed wireless communication system in line of sight slowly varying channel and propose a new blind channel estimation method without help from the pilot symbol for Orthogonal Frequency Division Multiplexing systems. The proposed estimator makes use of the Expectation-Maximization algorithm and the correlation property among the channel frequency responses by considering the assumed channel environment. By computer simulations, we show that the proposed estimator can asymptotically achieve bit error rate performance by using the ideal channel estimation.
Yousuke NARUSE Jun-ichi TAKADA
We introduce a MIMO channel estimation method that exploits the channel's spatiotemporal correlation without the aid of a priori channel statistical information. A simplified Gauss-Markov model that has fewer parameters to be estimated is presented for the Kalman filter. In order to obtain statistical parameters on the time evolution of the channel, considering that the time evolution is a latent statistical variable, the expectation-maximization (EM) algorithm is applied for accurate estimation. Numerical simulations reveal that the proposed method is able to enhance estimation capability by exploiting spatiotemporal correlations, and the method works well even if the forgetting factor is small.
Kazushi MURAOKA Kazuhiko FUKAWA Hiroshi SUZUKI Satoshi SUYAMA
Joint signal detection and channel estimation based on the expectation-maximization (EM) algorithm has been investigated for multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) mobile communications over fast-fading channels. The previous work in [20] developed a channel estimation method suitable for the EM-based iterative receiver. However, it remained possible for unreliable received signals to be repetitively used during the iterative process. In order to improve the EM-based iterative receiver further, this paper proposes spatial removal from the perspective of a message-passing algorithm on factor graphs. The spatial removal performs the channel estimation of a targeted antenna by using detected signals that are obtained from the received signals of all antennas other than the targeted antenna. It can avoid the repetitive use of unreliable received signals for consecutive signal detection and channel estimation. Appropriate applications of the spatial removal are also discussed to exploit both the removal effect and the spatial diversity. Computer simulations under fast-fading conditions demonstrate that the appropriate applications of the spatial removal can improve the packet error rate (PER) of the EM-based receiver thanks to both the removal effect and the spatial diversity.
Hidenori WATANABE Shogo MURAMATSU
This work proposes an exponential computation with low-computational complexity and applies this technique to the expectation-maximization (EM) algorithm for Gaussian mixture model (GMM). For certain machine-learning techniques, such as the EM algorithm for the GMM, fast and low-cost implementations are preferred over high precision ones. Since the exponential function is frequently used in machine-learning algorithms, this work proposes reducing computational complexity by transforming the function into powers of two and introducing a look-up table. Moreover, to improve efficiency the look-up table is scaled. To verify the validity of the proposed technique, this work obtains simulation results for the EM algorithm used for parameter estimation and evaluates the performances of the results in terms of the mean absolute error and computational time. This work compares our proposed method against the Taylor expansion and the exp( ) function in a standard C library, and shows that the computational time of the EM algorithm is reduced while maintaining comparable precision in the estimation results.
Kazushi MURAOKA Kazuhiko FUKAWA Hiroshi SUZUKI Satoshi SUYAMA
This paper proposes an iterative maximum a posteriori (MAP) receiver for orthogonal frequency division multiplexing (OFDM) mobile communications under fast-fading conditions. The previous work in [21] developed a MAP receiver based on the expectation-maximization (EM) algorithm employing the differential model, which can allow correlated time-variation of channel impulse responses. In order to make such a MAP receiver more robust against time-variant channels, this paper proposes two new message-passing algorithms derived from factor graphs; subcarrier removal and partial turbo processing. The subcarrier removal estimates the channel frequency response by using all subcarriers other than the targeted subcarrier. Such channel estimate can be efficiently calculated by removing information on the targeted subcarrier from the estimate of the original EM algorithm that uses all the subcarriers. This modification can avoid the repetitive use of incorrectly detected signals for the channel estimation. On the other hand, the partial turbo processing performs symbol-by-symbol channel decoding by using a symbol interleaver. Owing to this process, the current channel estimate, which is more accurate due to the decoding gain, can be used as the initial channel estimate for the next symbol. Computer simulations under fast multipath fading conditions demonstrate that the subcarrier removal and the partial turbo processing can improve the error floor and the convergence speed, respectively, compared to the conventional MAP receiver.
Xiao XIAO Hiroyuki OKAMURA Tadashi DOHI
Non-homogeneous Poisson processes (NHPPs) have gained much popularity in actual software testing phases to estimate the software reliability, the number of remaining faults in software and the software release timing. In this paper, we propose a new modeling approach for the NHPP-based software reliability models (SRMs) to describe the stochastic behavior of software fault-detection processes. The fundamental idea is to apply the equilibrium distribution to the fault-detection time distribution in NHPP-based modeling. We also develop efficient parameter estimation procedures for the proposed NHPP-based SRMs. Through numerical experiments, it can be concluded that the proposed NHPP-based SRMs outperform the existing ones in many data sets from the perspective of goodness-of-fit and prediction performance.
Kazushi MURAOKA Kazuhiko FUKAWA Hiroshi SUZUKI Satoshi SUYAMA
This paper proposes a new approach for the joint processing of signal detection and channel estimation based on the expectation-maximization (EM) algorithm in orthogonal frequency division multiplexing (OFDM) mobile communications. Conventional schemes based on the EM algorithm estimate a channel impulse response using Kalman filter, and employ the random walk model or the first-order autoregressive (AR) model to derive the process equation for the filter. Since these models assume that the time-variation of the impulse response is white noise without considering any autocorrelation property, the accuracy of the channel estimation deteriorates under fast-fading conditions, resulting in an increased packet error rate (PER). To improve the accuracy of the estimation of fast-fading channels, the proposed scheme employs a differential model that allows the correlated time-variation to be considered by introducing the first- and higher-order time differentials of the channel impulse response. In addition, this paper derives a forward recursive form of the channel estimation along both the frequency and time axes in order to reduce the computational complexity. Computer simulations of channels under fast multipath fading conditions demonstrate that the proposed method is superior in PER to the conventional schemes that employ the random walk model.
Makoto YAMADA Masashi SUGIYAMA
The ratio of two probability densities is called the importance and its estimation has gathered a great deal of attention these days since the importance can be used for various data processing purposes. In this paper, we propose a new importance estimation method using Gaussian mixture models (GMMs). Our method is an extention of the Kullback-Leibler importance estimation procedure (KLIEP), an importance estimation method using linear or kernel models. An advantage of GMMs is that covariance matrices can also be learned through an expectation-maximization procedure, so the proposed method--which we call the Gaussian mixture KLIEP (GM-KLIEP)--is expected to work well when the true importance function has high correlation. Through experiments, we show the validity of the proposed approach.
Masahiro FUJII Atsushi MINAKAWA Yu WATANABE Makoto ITAMI Kohji ITOH
In this paper, we propose a new algorithm to detect the presence of narrow band interference signals on the band of an Ultra Wide-Band (UWB) system when the UWB spectrum overlaps the bands of other narrow band wireless services. In our proposed algorithm for an UWB Multi-Band Orthogonal Frequency Division Multiplexing (MB-OFDM) system, an appropriate model is selected from the assumed interference models based on the Akaike Information Criterion (AIC) which is an explicit theoretic criterion and a measure of fit of the model. The proposed algorithm does not need a priori information on the interference signals except that we can reduce a computational complexity to implement the algorithm if we have knowledge of the bands of the interference signals. Furthermore, we introduce the Expectation Maximization (EM) algorithm to our algorithm in order to estimate the transmitted signals and the interference signals simultaneously. The proposed algorithm may not require the pilot symbols in the assumed UWB system to detect the presence of other systems. By computer simulations, we show that the proposed algorithm validly detects the presence of interference signals on the UWB band.
Feng YANG Yu ZHANG Jian SONG Changyong PAN Zhixing YANG
Based on the expectation-maximization (EM) algorithm, an iterative time-domain channel estimation approach capable of using a priori information is proposed for orthogonal frequency division multiplexing (OFDM) systems in this letter: it outperforms its noniterative counterpart in terms of estimation accuracy as well as bit error rate (BER) performance. Numerical simulations demonstrate that an SNR gain of 1 dB at BER=10-4 with only one iteration and estimation mean square error (MSE) which nearly coincides with the Cramer-Rao bound (CRB) in the low SNR region can be obtained, thanks to the efficient use of a priori information.
Mousa SHAMSI Reza Aghaiezadeh ZOROOFI Caro LUCAS Mohammad Sadeghi HASANABADI Mohammad Reza ALSHARIF
Facial skin detection is an important step in facial surgical planning like as many other applications. There are many problems in facial skin detection. One of them is that the image features can be severely corrupted due to illumination, noise, and occlusion, where, shadows can cause numerous strong edges. Hence, in this paper, we present an automatic Expectation-Maximization (EM) algorithm for facial skin color segmentation that uses knowledge of chromatic space and varying illumination conditions to correct and segment frontal and lateral facial color images, simultaneously. The proposed EM algorithm leads to a method that allows for more robust and accurate segmentation of facial images. The initialization of the model parameters is very important in convergence of algorithm. For this purpose, we use a method for robust parameter estimation of Gaussian mixture components. Also, we use an additional class, which includes all pixels not modeled explicitly by Gaussian with small variance, by a uniform probability density, and amending the EM algorithm appropriately, in order to obtain significantly better results. Experimental results on facial color images show that accurate estimates of the Gaussian mixture parameters are computed. Also, other results on images presenting a wide range of variations in lighting conditions, demonstrate the efficiency of the proposed color skin segmentation algorithm compared to conventional EM algorithm.
Learning for boltzmann machines deals with each state individually. If given data is categorized, the probabilities have to be distributed to each state, not to each catetory. We propose boltzmann machines identifying the states in the same categories. Boltzmann machines with hidden units are the special cases. Boltzmann learning and em algorithm are effective learning methods for boltzmann machines. We solve boltzmann learning and em algorithm for the proposed models.
Yasuhiro SUZUKI Hiroya TAKAMURA Manabu OKUMURA
In this paper, we present a method to automatically acquire a large-scale vocabulary of evaluative expressions from a large corpus of blogs. For the purpose, this paper presents a semi-supervised method for classifying evaluative expressions, that is, tuples of subjects, their attributes, and evaluative words, that indicate either favorable or unfavorable opinions towards a specific subject. Due to its characteristics, our semi-supervised method can classify evaluative expressions in a corpus by their polarities, starting from a very small set of seed training examples and using contextual information in the sentences the expressions belong to. Our experimental results with real Weblog data as our corpus show that this bootstrapping approach can improve the accuracy of methods for classifying favorable and unfavorable opinions. We also show that a reasonable amount of evaluative expressions can be really acquired.
Thatsanee CHAROENPORN Canasai KRUENGKRAI Thanaruk THEERAMUNKONG Virach SORNLERTLAMVANICH
Manually collecting contexts of a target word and grouping them based on their meanings yields a set of word senses but the task is quite tedious. Towards automated lexicography, this paper proposes a word-sense discrimination method based on two modern techniques; EM algorithm and principal component analysis (PCA). The spherical Gaussian EM algorithm enhanced with PCA for robust initialization is proposed to cluster word senses of a target word automatically. Three variants of the algorithm, namely PCA, sGEM, and PCA-sGEM, are investigated using a gold standard dataset of two polysemous words. The clustering result is evaluated using the measures of purity and entropy as well as a more recent measure called normalized mutual information (NMI). The experimental result indicates that the proposed algorithms gain promising performance with regard to discriminate word senses and the PCA-sGEM outperforms the other two methods to some extent.
Tsuyoshi KASHIMA Kazuhiko FUKAWA Hiroshi SUZUKI
This paper proposes an iterative maximum a posteriori probability (MAP) receiver for multiple-input-multiple-output (MIMO) and orthogonal frequency-division multiplexing (OFDM) mobile communications. For exploiting the space, time, and frequency diversity, the low-density parity-check code (LDPC) is used as a channel coding with a built-in interleaver. The receiver employs the expectation maximization (EM) algorithm so as to perform the MAP symbol detection with reasonable computational complexity. The minimum mean square error (MMSE), recursive least squares (RLS), and least mean square (LMS) algorithms are theoretically derived for the channel estimation within this framework. Furthermore, the proposed receiver performs a new scheme called backward symbol detection (BSD), in which the signal detection uses the channel impulse response that is estimated one OFDM symbol later. The advantage of BSD, which is explained from the viewpoint of the message passing algorithm, is that BSD can exploit information on the both precedent and subsequent OFDM symbols, similarly to RLS with smoothing and removing (SR-RLS) [25]. In comparison with SR-RLS, BSD reduces the complexity at the cost of packet error rate (PER) performance. Computer simulations show that the receiver employing RLS for the channel estimation outperforms the ones employing MMSE or LMS, and that BSD can improve the PER performance of the ones employing RLS or LMS.