We consider the problem of finding the best subset of sensors in wireless sensor networks where linear Bayesian parameter estimation is conducted from the selected measurements corrupted by correlated noise. We aim to directly minimize the estimation error which is manipulated by using the QR and LU factorizations. We derive an analytic result which expedites the sensor selection in a greedy manner. We also provide the complexity of the proposed algorithm in comparison with previous selection methods. We evaluate the performance through numerical experiments using random measurements under correlated noise and demonstrate a competitive estimation accuracy of the proposed algorithm with a reasonable increase in complexity as compared with the previous selection methods.
Hiroki TANJI Takahiro MURAKAMI
The design and adjustment of the divergence in audio applications using nonnegative matrix factorization (NMF) is still open problem. In this study, to deal with this problem, we explore a representation of the divergence using neural networks (NNs). Instead of the divergence, our approach extends the multiplicative update algorithm (MUA), which estimates the NMF parameters, using NNs. The design of the extended MUA incorporates NNs, and the new algorithm is referred to as the deep MUA (DeMUA) for NMF. While the DeMUA represents the algorithm for the NMF, interestingly, the divergence is obtained from the incorporated NN. In addition, we propose theoretical guides to design the incorporated NN such that it can be interpreted as a divergence. By appropriately designing the NN, MUAs based on existing divergences with a single hyper-parameter can be represented by the DeMUA. To train the DeMUA, we applied it to audio denoising and supervised signal separation. Our experimental results show that the proposed architecture can learn the MUA and the divergences in sparse denoising and speech separation tasks and that the MUA based on generalized divergences with multiple parameters shows favorable performances on these tasks.
Stance prediction on social media aims to infer the stances of users towards a specific topic or event, which are not expressed explicitly. It is of great significance for public opinion analysis to extract and determine users' stances using user-generated content on social media. Existing research makes use of various signals, ranging from text content to online network connections of users on these platforms. However, it lacks joint modeling of the heterogeneous information for stance prediction. In this paper, we propose a self-supervised heterogeneous graph contrastive learning framework for stance prediction in online debate forums. Firstly, we perform data augmentation on the original heterogeneous information network to generate an augmented view. The original view and augmented view are learned from a meta-path based graph encoder respectively. Then, the contrastive learning among the two views is conducted to obtain high-quality representations of users and issues. Finally, the stance prediction is accomplished by matrix factorization between users and issues. The experimental results on an online debate forum dataset show that our model outperforms other competitive baseline methods significantly.
Hitoshi SUDA Gaku KOTANI Daisuke SAITO
In this paper, we propose a new training framework named the INmfCA algorithm for nonparallel voice conversion (VC) systems. To train conversion models, traditional VC frameworks require parallel corpora, in which source and target speakers utter the same linguistic contents. Although the frameworks have achieved high-quality VC, they are not applicable in situations where parallel corpora are unavailable. To acquire conversion models without parallel corpora, nonparallel methods are widely studied. Although the frameworks achieve VC under nonparallel conditions, they tend to require huge background knowledge or many training utterances. This is because of difficulty in disentangling linguistic and speaker information without a large amount of data. In this work, we tackle this problem by exploiting NMF, which can factorize acoustic features into time-variant and time-invariant components in an unsupervised manner. The method acquires alignment between the acoustic features of a source speaker's utterances and a target dictionary and uses the obtained alignment as activation of NMF to train the source speaker's dictionary without parallel corpora. The acquisition method is based on the INCA algorithm, which obtains the alignment of nonparallel corpora. In contrast to the INCA algorithm, the alignment is not restricted to observed samples, and thus the proposed method can efficiently utilize small nonparallel corpora. The results of subjective experiments show that the combination of the proposed algorithm and the INCA algorithm outperformed not only an INCA-based nonparallel framework but also CycleGAN-VC, which performs nonparallel VC without any additional training data. The results also indicate that a one-shot VC framework, which does not need to train source speakers, can be constructed on the basis of the proposed method.
In this study, we aim to improve the performance of audio source separation for monaural mixture signals. For monaural audio source separation, semisupervised nonnegative matrix factorization (SNMF) can achieve higher separation performance by employing small supervised signals. In particular, penalized SNMF (PSNMF) with orthogonality penalty is an effective method. PSNMF forces two basis matrices for target and nontarget sources to be orthogonal to each other and improves the separation accuracy. However, the conventional orthogonality penalty is based on an inner product and does not affect the estimation of the basis matrix properly because of the scale indeterminacy between the basis and activation matrices in NMF. To cope with this problem, a new PSNMF with cosine similarity between the basis matrices is proposed. The experimental comparison shows the efficacy of the proposed cosine similarity penalty in supervised audio source separation.
Computing the Lempel-Ziv Factorization (LZ77) of a string is one of the most important problems in computer science. Nowadays, it has been widely used in many applications such as data compression, text indexing and pattern discovery, and already become the heart of many file compressors like gzip and 7zip. In this paper, we show a linear time algorithm called Xone for computing the LZ77, which has the same space requirement with the previous best space requirement for linear time LZ77 factorization called BGone. Xone greatly improves the efficiency of BGone. Experiments show that the two versions of Xone: XoneT and XoneSA are about 27% and 31% faster than BGoneT and BGoneSA, respectively.
Xueqing ZHANG Xiaoxia LIU Jun GUO Wenlei BAI Daguang GAN
As scientific and technological resources are experiencing information overload, it is quite expensive to find resources that users are interested in exactly. The personalized recommendation system is a good candidate to solve this problem, but data sparseness and the cold starting problem still prevent the application of the recommendation system. Sparse data affects the quality of the similarity measurement and consequently the quality of the recommender system. In this paper, we propose a matrix factorization recommendation algorithm based on similarity calculation(SCMF), which introduces potential similarity relationships to solve the problem of data sparseness. A penalty factor is adopted in the latent item similarity matrix calculation to capture more real relationships furthermore. We compared our approach with other 6 recommendation algorithms and conducted experiments on 5 public data sets. According to the experimental results, the recommendation precision can improve by 2% to 9% versus the traditional best algorithm. As for sparse data sets, the prediction accuracy can also improve by 0.17% to 18%. Besides, our approach was applied to patent resource exploitation provided by the wanfang patents retrieval system. Experimental results show that our method performs better than commonly used algorithms, especially under the cold starting condition.
Akihito AIBA Minoru YOSHIDA Daichi KITAMURA Shinnosuke TAKAMICHI Hiroshi SARUWATARI
We studied an acoustic anomaly detection system for equipments, where the outlier detection method based on recorded sounds is used. In a real environment, the SNR of the target sound against background noise is low, and there is the problem that it is necessary to catch slight changes in sound buried in noise. In this paper, we propose a system in which a sound source extraction process is provided at the preliminary stage of the outlier detection process. In the proposed system, nonnegative matrix factorization based on generalized Gaussian distribution (GGD-NMF) is used as a sound source extraction process. We evaluated the improvement of the anomaly detection performance in a low-SNR environment. In this experiment, SNR capable of detecting an anomaly was greatly improved by providing GGD-NMF for preprocessing.
A fully homomorphic encryption (FHE) would be the important cryptosystem as the basic scheme for the cloud computing. Since Gentry discovered in 2009 the first fully homomorphic encryption scheme, some fully homomorphic encryption schemes were proposed. In the systems proposed until now the bootstrapping process is the main bottleneck and the large complexity for computing the ciphertext is required. In 2011 Zvika Brakerski et al. proposed a leveled FHE without bootstrapping. But circuit of arbitrary level cannot be evaluated in their scheme while in our scheme circuit of any level can be evaluated. The existence of an efficient fully homomorphic cryptosystem would have great practical implications in the outsourcing of private computations, for instance, in the field of the cloud computing. In this paper, IND-CCA1secure FHE based on the difficulty of prime factorization is proposed which does not need the bootstrapping and it is thought that our scheme is more efficient than the previous schemes. In particular the computational overhead for homomorphic evaluation is O(1).
Mengce ZHENG Noboru KUNIHIRO Honggang HU
We address the security issue of RSA with implicitly related keys in this paper. Informally, we investigate under what condition is it possible to efficiently factorize RSA moduli in polynomial time given implicit relation of the related private keys that certain portions of bit pattern are the same. We formulate concrete attack scenarios and propose lattice-based cryptanalysis by using lattice reduction algorithms. A subtle lattice technique is adapted to represent an unknown private key with the help of known implicit relation. We analyze a simple case when given two RSA instances with the known amount of shared most significant bits (MSBs) and least significant bits (LSBs) of the private keys. We further extend to a generic lattice-based attack for given more RSA instances with implicitly related keys. Our theoretical results indicate that RSA with implicitly related keys is more insecure and better asymptotic results can be achieved as the number of RSA instances increases. Furthermore, we conduct numerical experiments to verify the validity of the proposed attacks.
Kyohei ATARASHI Satoshi OYAMA Masahito KURIHARA
Link prediction, the computational problem of determining whether there is a link between two objects, is important in machine learning and data mining. Feature-based link prediction, in which the feature vectors of the two objects are given, is of particular interest because it can also be used for various identification-related problems. Although the factorization machine and the higher-order factorization machine (HOFM) are widely used for feature-based link prediction, they use feature combinations not only across the two objects but also from the same object. Feature combinations from the same object are irrelevant to major link prediction problems such as predicting identity because using them increases computational cost and degrades accuracy. In this paper, we present novel models that use higher-order feature combinations only across the two objects. Since there were no algorithms for efficiently computing higher-order feature combinations only across two objects, we derive one by leveraging reported and newly obtained results of calculating the ANOVA kernel. We present an efficient coordinate descent algorithm for proposed models. We also improve the effectiveness of the existing one for the HOFM. Furthermore, we extend proposed models to a deep neural network. Experimental results demonstrated the effectiveness of our proposed models.
Tomoaki MIMOTO Seira HIDANO Shinsaku KIYOMOTO Atsuko MIYAJI
Time-sequence data is high dimensional and contains a lot of information, which can be utilized in various fields, such as insurance, finance, and advertising. Personal data including time-sequence data is converted to anonymized datasets, which need to strike a balance between both privacy and utility. In this paper, we consider low-rank matrix factorization as one of anonymization methods and evaluate its efficiency. We convert time-sequence datasets to matrices and evaluate both privacy and utility. The record IDs in time-sequence data are changed at regular intervals to reduce re-identification risk. However, since individuals tend to behave in a similar fashion over periods of time, there remains a risk of record linkage even if record IDs are different. Hence, we evaluate the re-identification and linkage risks as privacy risks of time-sequence data. Our experimental results show that matrix factorization is a viable anonymization method and it can achieve better utility than existing anonymization methods.
We consider a property about a result of non-negative matrix factorization under a parallel moving of data points. The shape of a cloud of original data points and that of data points moving parallel to a vector are identical. Thus it is sometimes required that the coefficients to basis vectors of both data points are also identical from the viewpoint of classification. We show a necessary and sufficient condition for such an invariance property under a translation of the data points.
Chihiro WATANABE Kaoru HIRAMATSU Kunio KASHINO
Interpretability has become an important issue in the machine learning field, along with the success of layered neural networks in various practical tasks. Since a trained layered neural network consists of a complex nonlinear relationship between large number of parameters, we failed to understand how they could achieve input-output mappings with a given data set. In this paper, we propose the non-negative task matrix decomposition method, which applies non-negative matrix factorization to a trained layered neural network. This enables us to decompose the inference mechanism of a trained layered neural network into multiple principal tasks of input-output mapping, and reveal the roles of hidden units in terms of their contribution to each principal task.
Junjie SUN Chenyi ZHUANG Qiang MA
A travel route recommendation service that recommends a sequence of points of interest for tourists traveling in an unfamiliar city is a very useful tool in the field of location-based social networks. Although there are many web services and mobile applications that can help tourists to plan their trips by providing information about sightseeing attractions, travel route recommendation services are still not widely applied. One reason could be that most of the previous studies that addressed this task were based on the orienteering problem model, which mainly focuses on the estimation of a user-location relation (for example, a user preference). This assumes that a user receives a reward by visiting a point of interest and the travel route is recommended by maximizing the total rewards from visiting those locations. However, a location-location relation, which we introduce as a transition pattern in this paper, implies useful information such as visiting order and can help to improve the quality of travel route recommendations. To this end, we propose a travel route recommendation method by combining location and transition knowledge, which assigns rewards for both locations and transitions.
Yu PAN Guyu HU Zhisong PAN Shuaihui WANG Dongsheng SHAO
Detecting community structures and analyzing temporal evolution in dynamic networks are challenging tasks to explore the inherent characteristics of the complex networks. In this paper, we propose a semi-supervised evolutionary clustering model based on symmetric nonnegative matrix factorization to detect communities in dynamic networks, named sEC-SNMF. We use the results of community partition at the previous time step as the priori information to modify the current network topology, then smooth-out the evolution of the communities and reduce the impact of noise. Furthermore, we introduce a community transition probability matrix to track and analyze the temporal evolutions. Different from previous algorithms, our approach does not need to know the number of communities in advance and can deal with the situation in which the number of communities and nodes varies over time. Extensive experiments on synthetic datasets demonstrate that the proposed method is competitive and has a superior performance.
The estimation of the matrix rank of harmonic components of a music spectrogram provides some useful information, e.g., the determination of the number of basis vectors of the matrix-factorization-based algorithms, which is required for the automatic music transcription or in post-processing. In this work, we develop an algorithm based on Stein's unbiased risk estimator (SURE) algorithm with the matrix factorization model. The noise variance required for the SURE algorithm is estimated by suppressing the harmonic component via median filtering. An evaluation performed using the MIDI-aligned piano sounds (MAPS) database revealed an average estimation error of -0.26 (standard deviation: 4.4) for the proposed algorithm.
Meng Ting XIONG Yong FENG Ting WU Jia Xing SHANG Bao Hua QIANG Ya Nan WANG
The traditional recommendation system (RS) can learn the potential personal preferences of users and potential attribute characteristics of items through the rating records between users and items to make recommendations.However, for the new items with no historical rating records,the traditional RS usually suffers from the typical cold start problem. Additional auxiliary information has usually been used in the item cold start recommendation,we further bring temporal dynamics,text and relevance in our models to release item cold start.Two new cold start recommendation models TmTx(Time,Text) and TmTI(Time,Text,Item correlation) proposed to solve the item cold start problem for different cold start scenarios.While well-known methods like TimeSVD++ and CoFactor partially take temporal dynamics,comments,and item correlations into consideration to solve the cold start problem but none of them combines these information together.Two models proposed in this paper fused features such as time,text,and relevance can effectively improve the performance under item cold start.We select the convolutional neural network (CNN) to extract features from item description text which provides the model the ability to deal with cold start items.Both proposed models can effectively improve the performance with item cold start.Experimental results on three real-world data set show that our proposed models lead to significant improvement compared with the baseline methods.
Hiroyoshi ITO Takahiro KOMAMIZU Toshiyuki AMAGASA Hiroyuki KITAGAWA
Multi-attributed graphs, in which each node is characterized by multiple types of attributes, are ubiquitous in the real world. Detection and characterization of communities of nodes could have a significant impact on various applications. Although previous studies have attempted to tackle this task, it is still challenging due to difficulties in the integration of graph structures with multiple attributes and the presence of noises in the graphs. Therefore, in this study, we have focused on clusters of attribute values and strong correlations between communities and attribute-value clusters. The graph clustering methodology adopted in the proposed study involves Community detection, Attribute-value clustering, and deriving Relationships between communities and attribute-value clusters (CAR for short). Based on these concepts, the proposed multi-attributed graph clustering is modeled as CAR-clustering. To achieve CAR-clustering, a novel algorithm named CARNMF is developed based on non-negative matrix factorization (NMF) that can detect CAR in a cooperative manner. Results obtained from experiments using real-world datasets show that the CARNMF can detect communities and attribute-value clusters more accurately than existing comparable methods. Furthermore, clustering results obtained using the CARNMF indicate that CARNMF can successfully detect informative communities with meaningful semantic descriptions through correlations between communities and attribute-value clusters.
Masahiro KOHJIMA Tatsushi MATSUBAYASHI Hiroshi SAWADA
Due to the need to protect personal information and the impracticality of exhaustive data collection, there is increasing need to deal with datasets with various levels of granularity, such as user-individual data and user-group data. In this study, we propose a new method for jointly analyzing multiple datasets with different granularity. The proposed method is a probabilistic model based on nonnegative matrix factorization, which is derived by introducing latent variables that indicate the high-resolution data underlying the low-resolution data. Experiments on purchase logs show that the proposed method has a better performance than the existing methods. Furthermore, by deriving an extension of the proposed method, we show that the proposed method is a new fundamental approach for analyzing datasets with different granularity.