The search functionality is under construction.

Keyword Search Result

[Keyword] non-negative matrix factorization(22hit)

1-20hit(22hit)

  • INmfCA Algorithm for Training of Nonparallel Voice Conversion Systems Based on Non-Negative Matrix Factorization

    Hitoshi SUDA  Gaku KOTANI  Daisuke SAITO  

     
    PAPER-Speech and Hearing

      Pubricized:
    2022/03/03
      Vol:
    E105-D No:6
      Page(s):
    1196-1210

    In this paper, we propose a new training framework named the INmfCA algorithm for nonparallel voice conversion (VC) systems. To train conversion models, traditional VC frameworks require parallel corpora, in which source and target speakers utter the same linguistic contents. Although the frameworks have achieved high-quality VC, they are not applicable in situations where parallel corpora are unavailable. To acquire conversion models without parallel corpora, nonparallel methods are widely studied. Although the frameworks achieve VC under nonparallel conditions, they tend to require huge background knowledge or many training utterances. This is because of difficulty in disentangling linguistic and speaker information without a large amount of data. In this work, we tackle this problem by exploiting NMF, which can factorize acoustic features into time-variant and time-invariant components in an unsupervised manner. The method acquires alignment between the acoustic features of a source speaker's utterances and a target dictionary and uses the obtained alignment as activation of NMF to train the source speaker's dictionary without parallel corpora. The acquisition method is based on the INCA algorithm, which obtains the alignment of nonparallel corpora. In contrast to the INCA algorithm, the alignment is not restricted to observed samples, and thus the proposed method can efficiently utilize small nonparallel corpora. The results of subjective experiments show that the combination of the proposed algorithm and the INCA algorithm outperformed not only an INCA-based nonparallel framework but also CycleGAN-VC, which performs nonparallel VC without any additional training data. The results also indicate that a one-shot VC framework, which does not need to train source speakers, can be constructed on the basis of the proposed method.

  • Shift Invariance Property of a Non-Negative Matrix Factorization

    Hideyuki IMAI  

     
    LETTER-General Fundamentals and Boundaries

      Vol:
    E103-A No:2
      Page(s):
    580-581

    We consider a property about a result of non-negative matrix factorization under a parallel moving of data points. The shape of a cloud of original data points and that of data points moving parallel to a vector are identical. Thus it is sometimes required that the coefficients to basis vectors of both data points are also identical from the viewpoint of classification. We show a necessary and sufficient condition for such an invariance property under a translation of the data points.

  • Knowledge Discovery from Layered Neural Networks Based on Non-negative Task Matrix Decomposition

    Chihiro WATANABE  Kaoru HIRAMATSU  Kunio KASHINO  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/10/23
      Vol:
    E103-D No:2
      Page(s):
    390-397

    Interpretability has become an important issue in the machine learning field, along with the success of layered neural networks in various practical tasks. Since a trained layered neural network consists of a complex nonlinear relationship between large number of parameters, we failed to understand how they could achieve input-output mappings with a given data set. In this paper, we propose the non-negative task matrix decomposition method, which applies non-negative matrix factorization to a trained layered neural network. This enables us to decompose the inference mechanism of a trained layered neural network into multiple principal tasks of input-output mapping, and reveal the roles of hidden units in terms of their contribution to each principal task.

  • Detecting Communities and Correlated Attribute Clusters on Multi-Attributed Graphs

    Hiroyoshi ITO  Takahiro KOMAMIZU  Toshiyuki AMAGASA  Hiroyuki KITAGAWA  

     
    PAPER

      Pubricized:
    2019/02/04
      Vol:
    E102-D No:4
      Page(s):
    810-820

    Multi-attributed graphs, in which each node is characterized by multiple types of attributes, are ubiquitous in the real world. Detection and characterization of communities of nodes could have a significant impact on various applications. Although previous studies have attempted to tackle this task, it is still challenging due to difficulties in the integration of graph structures with multiple attributes and the presence of noises in the graphs. Therefore, in this study, we have focused on clusters of attribute values and strong correlations between communities and attribute-value clusters. The graph clustering methodology adopted in the proposed study involves Community detection, Attribute-value clustering, and deriving Relationships between communities and attribute-value clusters (CAR for short). Based on these concepts, the proposed multi-attributed graph clustering is modeled as CAR-clustering. To achieve CAR-clustering, a novel algorithm named CARNMF is developed based on non-negative matrix factorization (NMF) that can detect CAR in a cooperative manner. Results obtained from experiments using real-world datasets show that the CARNMF can detect communities and attribute-value clusters more accurately than existing comparable methods. Furthermore, clustering results obtained using the CARNMF indicate that CARNMF can successfully detect informative communities with meaningful semantic descriptions through correlations between communities and attribute-value clusters.

  • Designing Coded Aperture Camera Based on PCA and NMF for Light Field Acquisition

    Yusuke YAGI  Keita TAKAHASHI  Toshiaki FUJII  Toshiki SONODA  Hajime NAGAHARA  

     
    PAPER

      Pubricized:
    2018/06/20
      Vol:
    E101-D No:9
      Page(s):
    2190-2200

    A light field, which is often understood as a set of dense multi-view images, has been utilized in various 2D/3D applications. Efficient light field acquisition using a coded aperture camera is the target problem considered in this paper. Specifically, the entire light field, which consists of many images, should be reconstructed from only a few images that are captured through different aperture patterns. In previous work, this problem has often been discussed from the context of compressed sensing (CS), where sparse representations on a pre-trained dictionary or basis are explored to reconstruct the light field. In contrast, we formulated this problem from the perspective of principal component analysis (PCA) and non-negative matrix factorization (NMF), where only a small number of basis vectors are selected in advance based on the analysis of the training dataset. From this formulation, we derived optimal non-negative aperture patterns and a straight-forward reconstruction algorithm. Even though our method is based on conventional techniques, it has proven to be more accurate and much faster than a state-of-the-art CS-based method.

  • Tighter Generalization Bounds for Matrix Completion Via Factorization Into Constrained Matrices

    Ken-ichiro MORIDOMI  Kohei HATANO  Eiji TAKIMOTO  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2018/05/18
      Vol:
    E101-D No:8
      Page(s):
    1997-2004

    We prove generalization error bounds of classes of low-rank matrices with some norm constraints for collaborative filtering tasks. Our bounds are tighter, compared to known bounds using rank or the related quantity only, by taking the additional L1 and L∞ constraints into account. Also, we show that our bounds on the Rademacher complexity of the classes are optimal.

  • Semi-Supervised Speech Enhancement Combining Nonnegative Matrix Factorization and Robust Principal Component Analysis

    Yonggang HU  Xiongwei ZHANG  Xia ZOU  Meng SUN  Yunfei ZHENG  Gang MIN  

     
    LETTER-Speech and Hearing

      Vol:
    E100-A No:8
      Page(s):
    1714-1719

    Nonnegative matrix factorization (NMF) is one of the most popular machine learning tools for speech enhancement. The supervised NMF-based speech enhancement is accomplished by updating iteratively with the prior knowledge of the clean speech and noise spectra bases. However, in many real-world scenarios, it is not always possible for conducting any prior training. The traditional semi-supervised NMF (SNMF) version overcomes this shortcoming while the performance degrades. In this letter, without any prior knowledge of the speech and noise, we present an improved semi-supervised NMF-based speech enhancement algorithm combining techniques of NMF and robust principal component analysis (RPCA). In this approach, fixed speech bases are obtained from the training samples chosen from public dateset offline. The noise samples used for noise bases training, instead of characterizing a priori as usual, can be obtained via RPCA algorithm on the fly. This letter also conducts a study on the assumption whether the time length of the estimated noise samples may have an effect on the performance of the algorithm. Three metrics, including PESQ, SDR and SNR are applied to evaluate the performance of the algorithms by making experiments on TIMIT with 20 noise types at various signal-to-noise ratio levels. Extensive experimental results demonstrate the superiority of the proposed algorithm over the competing speech enhancement algorithm.

  • Improve the Prediction of Student Performance with Hint's Assistance Based on an Efficient Non-Negative Factorization

    Ke XU  Rujun LIU  Yuan SUN  Keju ZOU  Yan HUANG  Xinfang ZHANG  

     
    PAPER

      Pubricized:
    2017/01/17
      Vol:
    E100-D No:4
      Page(s):
    768-775

    In tutoring systems, students are more likely to utilize hints to assist their decisions about difficult or confusing problems. In the meanwhile, students with weaker knowledge mastery tend to choose more hints than others with stronger knowledge mastery. Hints are important assistances to help students deal with questions. Students can learn from hints and enhance their knowledge about questions. In this paper we firstly use hints alone to build a model named Hints-Model to predict student performance. In addition, matrix factorization (MF) has been prevalent in educational fields to predict student performance, which is derived from their success in collaborative filtering (CF) for recommender systems (RS). While there is another factorization method named non-negative matrix factorization (NMF) which has been developed over one decade, and has additional non-negative constrains on the factorization matrices. Considering the sparseness of the original matrix and the efficiency, we can utilize an element-based matrix factorization called regularized single-element-based NMF (RSNMF). We compared the results of different factorization methods to their combination with Hints-Model. From the experiment results on two datasets, we can find the combination of RSNMF with Hints-Model has achieved significant improvement and obtains the best result. We have also compared the Hints-Model with the pioneer approach performance factor analysis (PFA), and the outcomes show that the former method exceeds the later one.

  • Automatic Model Order Selection for Convolutive Non-Negative Matrix Factorization

    Yinan LI  Xiongwei ZHANG  Meng SUN  Chong JIA  Xia ZOU  

     
    LETTER-Speech and Hearing

      Vol:
    E99-A No:10
      Page(s):
    1867-1870

    Exploring a parsimonious model that is just enough to represent the temporal dependency of time serial signals such as audio or speech is a practical requirement for many signal processing applications. A well suited method for intuitively and efficiently representing magnitude spectra is to use convolutive non-negative matrix factorization (CNMF) to discover the temporal relationship among nearby frames. However, the model order selection problem in CNMF, i.e., the choice of the number of convolutive bases, has seldom been investigated ever. In this paper, we propose a novel Bayesian framework that can automatically learn the optimal model order through maximum a posteriori (MAP) estimation. The proposed method yields a parsimonious and low-rank approximation by removing the redundant bases iteratively. We conducted intuitive experiments to show that the proposed algorithm is very effective in automatically determining the correct model order.

  • Transfer Semi-Supervised Non-Negative Matrix Factorization for Speech Emotion Recognition

    Peng SONG  Shifeng OU  Xinran ZHANG  Yun JIN  Wenming ZHENG  Jinglei LIU  Yanwei YU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2016/07/01
      Vol:
    E99-D No:10
      Page(s):
    2647-2650

    In practice, emotional speech utterances are often collected from different devices or conditions, which will lead to discrepancy between the training and testing data, resulting in sharp decrease of recognition rates. To solve this problem, in this letter, a novel transfer semi-supervised non-negative matrix factorization (TSNMF) method is presented. A semi-supervised negative matrix factorization algorithm, utilizing both labeled source and unlabeled target data, is adopted to learn common feature representations. Meanwhile, the maximum mean discrepancy (MMD) as a similarity measurement is employed to reduce the distance between the feature distributions of two databases. Finally, the TSNMF algorithm, which optimizes the SNMF and MMD functions together, is proposed to obtain robust feature representations across databases. Extensive experiments demonstrate that in comparison to the state-of-the-art approaches, our proposed method can significantly improve the cross-corpus recognition rates.

  • Improved Semi-Supervised NMF Based Real-Time Capable Speech Enhancement

    Yonggang HU  Xiongwei ZHANG  Xia ZOU  Meng SUN  Gang MIN  Yinan LI  

     
    LETTER-Speech and Hearing

      Vol:
    E99-A No:1
      Page(s):
    402-406

    Nonnegative matrix factorization (NMF) is one of the most popular tools for speech enhancement. In this letter, we present an improved semi-supervised NMF (ISNMF)-based speech enhancement algorithm combining techniques of noise estimation and Incremental NMF (INMF). In this approach, fixed speech bases are obtained from training samples offline in advance while noise bases are trained on-the-fly whenever new noisy frame arrives. The INMF algorithm is adopted for noise bases learning because it can overcome the difficulties that conventional NMF confronts in online processing. The proposed algorithm is real-time capable in the sense that it processes the time frames of the noisy speech one by one and the computational complexity is feasible. Four different objective evaluation measures at various signal-to-noise ratio (SNR) levels demonstrate the superiority of the proposed method over traditional semi-supervised NMF (SNMF) and well-known robust principal component analysis (RPCA) algorithm.

  • Speech Enhancement Combining NMF Weighted by Speech Presence Probability and Statistical Model

    Yonggang HU  Xiongwei ZHANG  Xia ZOU  Gang MIN  Meng SUN  Yunfei ZHENG  

     
    LETTER-Speech and Hearing

      Vol:
    E98-A No:12
      Page(s):
    2701-2704

    The conventional non-negative matrix factorization (NMF)-based speech enhancement is accomplished by updating iteratively with the prior knowledge of the clean speech and noise spectra bases. With the probabilistic estimation of whether the speech is present or not in a certain frame, this letter proposes a speech enhancement algorithm incorporating the speech presence probability (SPP) obtained via noise estimation to the NMF process. To take advantage of both the NMF-based and statistical model-based approaches, the final enhanced speech is achieved by applying a statistical model-based filter to the output of the SPP weighted NMF. Objective evaluations using perceptual evaluation of speech quality (PESQ) on TIMIT with 20 noise types at various signal-to-noise ratio (SNR) levels demonstrate the superiority of the proposed algorithm over the conventional NMF and statistical model-based baselines.

  • Separation of Mass Spectra Based on Probabilistic Latent Component Analysis for Explosives Detection

    Yohei KAWAGUCHI  Masahito TOGAMI  Hisashi NAGANO  Yuichiro HASHIMOTO  Masuyuki SUGIYAMA  Yasuaki TAKADA  

     
    PAPER

      Vol:
    E98-A No:9
      Page(s):
    1888-1897

    A new algorithm for separating mass spectra into individual substances for explosives detection is proposed. In the field of mass spectrometry, separation methods, such as principal-component analysis (PCA) and independent-component analysis (ICA), are widely used. All components, however, have no negative values, and the orthogonality condition imposed on components also does not necessarily hold in the case of mass spectra. Because these methods allow negative values and PCA imposes an orthogonality condition, they are not suitable for separation of mass spectra. The proposed algorithm is based on probabilistic latent-component analysis (PLCA). PLCA is a statistical formulation of non-negative matrix factorization (NMF) using KL divergence. Because PLCA imposes the constraint of non-negativity but not orthogonality, the algorithm is effective for separating components of mass spectra. In addition, to estimate the components more accurately, a sparsity constraint is applied to PLCA for explosives detection. The main contribution is industrial application of the algorithm into an explosives-detection system. Results of an experimental evaluation of the algorithm with data obtained in a real railway station demonstrate that the proposed algorithm outperforms PCA and ICA. Also, results of calculation time demonstrate that the algorithm can work in real time.

  • Mass Spectra Separation for Explosives Detection by Using an Attenuation Model

    Yohei KAWAGUCHI  Masahito TOGAMI  Hisashi NAGANO  Yuichiro HASHIMOTO  Masuyuki SUGIYAMA  Yasuaki TAKADA  

     
    PAPER

      Vol:
    E98-A No:9
      Page(s):
    1898-1905

    A new algorithm for separating mass spectra into individual substances is proposed for explosives detection. The conventional algorithm based on probabilistic latent component analysis (PLCA) is effective in many cases because it makes use of the fact that non-negativity and sparsity hold for mass spectra in explosives detection. The algorithm, however, fails to separate mass spectra in some cases because uncertainty can not be resolved only by non-negativity and sparsity constraints. To resolve the uncertainty, an algorithm based on shift-invariant PLCA (SIPLCA) utilizing temporal correlation of mass spectra is proposed in this paper. In addition, to prevent overfitting, the temporal correlation is modeled with a function representing attenuation by focusing on the fact that the amount of a substance is attenuated continuously and slowly with time. Results of an experimental evaluation of the algorithm with data obtained in a real railway station demonstrate that the proposed algorithm outperforms the PLCA-based conventional algorithm and the simple SIPLCA-based one. The main novelty of this paper is that an evaluation of the detection performance of explosives detection is demonstrated. Results of the evaluation indicate that the proposed separation algorithm can improve the detection performance.

  • Noise-Robust Voice Conversion Based on Sparse Spectral Mapping Using Non-negative Matrix Factorization

    Ryo AIHARA  Ryoichi TAKASHIMA  Tetsuya TAKIGUCHI  Yasuo ARIKI  

     
    PAPER-Voice Conversion and Speech Enhancement

      Vol:
    E97-D No:6
      Page(s):
    1411-1418

    This paper presents a voice conversion (VC) technique for noisy environments based on a sparse representation of speech. Sparse representation-based VC using Non-negative matrix factorization (NMF) is employed for noise-added spectral conversion between different speakers. In our previous exemplar-based VC method, source exemplars and target exemplars are extracted from parallel training data, having the same texts uttered by the source and target speakers. The input source signal is represented using the source exemplars and their weights. Then, the converted speech is constructed from the target exemplars and the weights related to the source exemplars. However, this exemplar-based approach needs to hold all training exemplars (frames), and it requires high computation times to obtain the weights of the source exemplars. In this paper, we propose a framework to train the basis matrices of the source and target exemplars so that they have a common weight matrix. By using the basis matrices instead of the exemplars, the VC is performed with lower computation times than with the exemplar-based method. The effectiveness of this method was confirmed by comparing its effectiveness (in speaker conversion experiments using noise-added speech data) with that of an exemplar-based method and a conventional Gaussian mixture model (GMM)-based method.

  • Exemplar-Based Voice Conversion Using Sparse Representation in Noisy Environments

    Ryoichi TAKASHIMA  Tetsuya TAKIGUCHI  Yasuo ARIKI  

     
    PAPER

      Vol:
    E96-A No:10
      Page(s):
    1946-1953

    This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.

  • Statistical Approaches to Excitation Modeling in HMM-Based Speech Synthesis

    June Sig SUNG  Doo Hwa HONG  Hyun Woo KOO  Nam Soo KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E96-D No:2
      Page(s):
    379-382

    In our previous study, we proposed the waveform interpolation (WI) approach to model the excitation signals for hidden Markov model (HMM)-based speech synthesis. This letter presents several techniques to improve excitation modeling within the WI framework. We propose both the time domain and frequency domain zero padding techniques to reduce the spectral distortion inherent in the synthesized excitation signal. Furthermore, we apply non-negative matrix factorization (NMF) to obtain a low-dimensional representation of the excitation signals. From a number of experiments, including a subjective listening test, the proposed method has been found to enhance the performance of the conventional excitation modeling techniques.

  • Polyphonic Music Transcription by Nonnegative Matrix Factorization with Harmonicity and Temporality Criteria

    Sang Ha PARK  Seokjin LEE  Koeng-Mo SUNG  

     
    LETTER-Engineering Acoustics

      Vol:
    E95-A No:9
      Page(s):
    1610-1614

    Non-negative matrix factorization (NMF) is widely used for music transcription because of its efficiency. However, the conventional NMF-based music transcription algorithm often causes harmonic confusion errors or time split-up errors, because the NMF decomposes the time-frequency data according to the activated frequency in its time. To solve these problems, we proposed an NMF with temporal continuity and harmonicity constraints. The temporal continuity constraint prevented the time split-up of the continuous time components, and the harmonicity constraint helped to bind the fundamental with harmonic frequencies by reducing the additional octave errors. The transcription performance of the proposed algorithm was compared with that of the conventional algorithms, which showed that the proposed method helped to reduce additional false errors and increased the overall transcription performance.

  • Clustering Algorithm for Unsupervised Monaural Musical Sound Separation Based on Non-negative Matrix Factorization

    Sang Ha PARK  Seokjin LEE  Koeng-Mo SUNG  

     
    LETTER-Engineering Acoustics

      Vol:
    E95-A No:4
      Page(s):
    818-823

    Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.

  • Dimensionality Reduction for Histogram Features Based on Supervised Non-negative Matrix Factorization

    Mitsuru AMBAI  Nugraha P. UTAMA  Yuichi YOSHIDA  

     
    PAPER

      Vol:
    E94-D No:10
      Page(s):
    1870-1879

    Histogram-based image features such as HoG, SIFT and histogram of visual words are generally represented as high-dimensional, non-negative vectors. We propose a supervised method of reducing the dimensionality of histogram-based features by using non-negative matrix factorization (NMF). We define a cost function for supervised NMF that consists of two terms. The first term is the generalized divergence term between an input matrix and a product of factorized matrices. The second term is the penalty term that reflects prior knowledge on a training set by assigning predefined constants to cannot-links and must-links in pairs of training data. A multiplicative update rule for minimizing the newly-defined cost function is also proposed. We tested our method on a task of scene classification using histograms of visual words. The experimental results revealed that each of the low-dimensional basis vectors obtained from the proposed method only appeared in a single specific category in most cases. This interesting characteristic not only makes it easy to interpret the meaning of each basis but also improves the power of classification.

1-20hit(22hit)