The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] models(163hit)

1-20hit(163hit)

  • MuSRGM: A Genetic Algorithm-Based Dynamic Combinatorial Deep Learning Model for Software Reliability Engineering Open Access

    Ning FU  Duksan RYU  Suntae KIM  

     
    PAPER-Software Engineering

      Pubricized:
    2024/02/06
      Vol:
    E107-D No:6
      Page(s):
    761-771

    In the software testing phase, software reliability growth models (SRGMs) are commonly used to evaluate the reliability of software systems. Traditional SRGMs are restricted by their assumption of a continuous growth pattern for the failure detection rate (FDR) throughout the testing phase. However, the assumption is compromised by Change-Point phenomena, where FDR fluctuations stem from variations in testing personnel or procedural modifications, leading to reduced prediction accuracy and compromised software reliability assessments. Therefore, the objective of this study is to improve software reliability prediction using a novel approach that combines genetic algorithm (GA) and deep learning-based SRGMs to account for the Change-point phenomenon. The proposed approach uses a GA to dynamically combine activation functions from various deep learning-based SRGMs into a new mutated SRGM called MuSRGM. The MuSRGM captures the advantages of both concave and S-shaped SRGMs and is better suited to capture the change-point phenomenon during testing and more accurately reflect actual testing situations. Additionally, failure data is treated as a time series and analyzed using a combination of Long Short-Term Memory (LSTM) and Attention mechanisms. To assess the performance of MuSRGM, we conducted experiments on three distinct failure datasets. The results indicate that MuSRGM outperformed the baseline method, exhibiting low prediction error (MSE) on all three datasets. Furthermore, MuSRGM demonstrated remarkable generalization ability on these datasets, remaining unaffected by uneven data distribution. Therefore, MuSRGM represents a highly promising advanced solution that can provide increased accuracy and applicability for software reliability assessment during the testing phase.

  • Spherical Style Deformation on Single Component Models

    Xuemei FENG  Qing FANG  Kouichi KONNO  Zhiyi ZHANG  Katsutsugu MATSUYAMA  

     
    PAPER-Computer Graphics

      Pubricized:
    2023/08/22
      Vol:
    E106-D No:11
      Page(s):
    1891-1905

    In this study, we present a spherical style deformation algorithm to be applied on single component models that can deform the models with spherical style, while preserving the local details of the original models. Because 3D models have complex skeleton structures that consist of many components, the deformation around connections between each single component is complicated, especially preventing mesh self-intersections. To the best of our knowledge, there does not exist not only methods to achieve a spherical style in a 3D model consisting of multiple components but also methods suited to a single component. In this study, we focus on spherical style deformation of single component models. Accordingly, we propose a deformation method that transforms the input model with the spherical style, while preserving the local details of the input model. Specifically, we define an energy function that combines the as-rigid-as-possible (ARAP) method and spherical features. The spherical term is defined as l2-regularization on a linear feature; accordingly, the corresponding optimization can be solved efficiently. We also observed that the results of our deformation are dependent on the quality of the input mesh. For instance, when the input mesh consists of many obtuse triangles, the spherical style deformation method fails. To address this problem, we propose an optional deformation method based on convex hull proxy model as the complementary deformation method. Our proxy method constructs a proxy model of the input model and applies our deformation method to the proxy model to deform the input model by projection and interpolation. We have applied our proposed method to simple and complex shapes, compared our experimental results with the 3D geometric stylization method of normal-driven spherical shape analogies, and confirmed that our method successfully deforms models that are smooth, round, and curved. We also discuss the limitations and problems of our algorithm based on the experimental results.

  • Inverse Heat Dissipation Model for Medical Image Segmentation

    Yu KASHIHARA  Takashi MATSUBARA  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/08/22
      Vol:
    E106-D No:11
      Page(s):
    1930-1934

    The diffusion model has achieved success in generating and editing high-quality images because of its ability to produce fine details. Its superior generation ability has the potential to facilitate more detailed segmentation. This study presents a novel approach to segmentation tasks using an inverse heat dissipation model, a kind of diffusion-based models. The proposed method involves generating a mask that gradually shrinks to fit the shape of the desired segmentation region. We comprehensively evaluated the proposed method using multiple datasets under varying conditions. The results show that the proposed method outperforms existing methods and provides a more detailed segmentation.

  • Metadata-Based Quality-Estimation Model for Tile-Based Omnidirectional Video Streaming Open Access

    Yuichiro URATA  Masanori KOIKE  Kazuhisa YAMAGISHI  Noritsugu EGI  

     
    PAPER-Multimedia Systems for Communications

      Pubricized:
    2022/11/15
      Vol:
    E106-B No:5
      Page(s):
    478-488

    In this paper, a metadata-based quality-estimation model is proposed for tile-based omnidirectional video streaming services, aiming to realize quality monitoring during service provision. In the tile-based omnidirectional video (ODV) streaming services, the ODV is divided into tiles, and the high-quality tiles and the low-quality tiles are distributed in accordance with the user's viewing direction. When the user changes the viewing direction, the user temporarily watches video with the low-quality tiles. In addition, the longer the time (delay time) until the high-quality tile for the new viewing direction is downloaded, the longer the viewing time of video with the low-quality tile, and thus the delay time affects quality. From the above, the video quality of the low-quality tiles and the delay time significantly impact quality, and these factors need to be considered in the quality-estimation model. We develop quality-estimation models by extending the conventional quality-estimation models for 2D adaptive streaming. We also show that the quality-estimation model using the bitrate, resolution, and frame rate of high- and low-quality tiles and that the delay time has sufficient estimation accuracy based on the results of subjective quality evaluation experiments.

  • Multi-Scale Correspondence Learning for Person Image Generation

    Shi-Long SHEN  Ai-Guo WU  Yong XU  

     
    PAPER-Person Image Generation

      Pubricized:
    2022/04/15
      Vol:
    E106-D No:5
      Page(s):
    804-812

    A generative model is presented for two types of person image generation in this paper. First, this model is applied to pose-guided person image generation, i.e., converting the pose of a source person image to the target pose while preserving the texture of that source person image. Second, this model is also used for clothing-guided person image generation, i.e., changing the clothing texture of a source person image to the desired clothing texture. The core idea of the proposed model is to establish the multi-scale correspondence, which can effectively address the misalignment introduced by transferring pose, thereby preserving richer information on appearance. Specifically, the proposed model consists of two stages: 1) It first generates the target semantic map imposed on the target pose to provide more accurate guidance during the generation process. 2) After obtaining the multi-scale feature map by the encoder, the multi-scale correspondence is established, which is useful for a fine-grained generation. Experimental results show the proposed method is superior to state-of-the-art methods in pose-guided person image generation and show its effectiveness in clothing-guided person image generation.

  • Surrogate-Based EM Optimization Using Neural Networks for Microwave Filter Design Open Access

    Masataka OHIRA  Zhewang MA  

     
    INVITED PAPER

      Pubricized:
    2022/03/15
      Vol:
    E105-C No:10
      Page(s):
    466-473

    A surrogate-based electromagnetic (EM) optimization using neural networks (NNs) is presented for computationally efficient microwave bandpass filter (BPF) design. This paper first describes the forward problem (EM analysis) and the inverse problems (EM design), and the two fundamental issues in BPF designs. The first issue is that the EM analysis is a time-consuming task, and the second one is that EM design highly depends on the structural optimization performed with the help of EM analysis. To accelerate the optimization design, two surrogate models of forward and inverse models are introduced here, which are built with the NNs. As a result, the inverse model can instantaneously guess initial structural parameters with high accuracy by simply inputting synthesized coupling-matrix elements into the NN. Then, the forward model in conjunction with optimization algorithm enables designers to rapidly find optimal structural parameters from the initial ones. The effectiveness of the surrogate-based EM optimization is verified through the structural designs of a typical fifth-order microstrip BPF with multiple couplings.

  • Path Loss Prediction Method Merged Conventional Models Effectively in Machine Learning for Mobile Communications

    Hiroaki NAKABAYASHI  Kiyoaki ITOI  

     
    PAPER-Propagation

      Pubricized:
    2021/12/14
      Vol:
    E105-B No:6
      Page(s):
    737-747

    Basic characteristics for relating design and base station layout design in land mobile communications are provided through a propagation model for path loss prediction. Owing to the rapid annual increase in traffic data, the number of base stations has increased accordingly. Therefore, propagation models for various scenarios and frequency bands are necessitated. To solve problems optimization and creation methods using the propagation model, a path loss prediction method that merges multiple models in machine learning is proposed herein. The method is discussed based on measurement values from Kitakyushu-shi. In machine learning, the selection of input parameters and suppression of overlearning are important for achieving highly accurate predictions. Therefore, the acquisition of conventional models based on the propagation environment and the use of input parameters of high importance are proposed. The prediction accuracy for Kitakyushu-shi using the proposed method indicates a root mean square error (RMSE) of 3.68dB. In addition, predictions are performed in Narashino-shi to confirm the effectiveness of the method in other urban scenarios. Results confirm the effectiveness of the proposed method for the urban scenario in Narashino-shi, and an RMSE of 4.39dB is obtained for the accuracy.

  • Does Student-Submission Allocation Affect Peer Assessment Accuracy?

    Hideaki OHASHI  Toshiyuki SHIMIZU  Masatoshi YOSHIKAWA  

     
    PAPER

      Pubricized:
    2022/01/05
      Vol:
    E105-D No:5
      Page(s):
    888-897

    Peer assessment in education has pedagogical benefits and is a promising method for grading a large number of submissions. At the same time, student reliability has been regarded as a problem; consequently, various methods of estimating highly reliable grades from scores given by multiple students have been proposed. Under most of the existing methods, a nonadaptive allocation pattern, which performs allocation in advance, is assumed. In this study, we analyze the effect of student-submission allocation on score estimation in peer assessment under a nonadaptive allocation setting. We examine three types of nonadaptive allocation methods, random allocation, circular allocation and group allocation, which are considered the commonly used approaches among the existing nonadaptive peer assessment methods. Through simulation experiments, we show that circular allocation and group allocation tend to yield lower accuracy than random allocation. Then, we utilize this result to improve the existing adaptive allocation method, which performs allocation and assessment in parallel and tends to make similar allocation result to circular allocation. We propose the method to replace part of the allocation with random allocation, and show that the method is effective through experiments.

  • Self-Channel Attention Weighted Part for Person Re-Identification

    Lin DU  Chang TIAN  Mingyong ZENG  Jiabao WANG  Shanshan JIAO  Qing SHEN  Wei BAI  Aihong LU  

     
    LETTER-Image

      Pubricized:
    2020/09/01
      Vol:
    E104-A No:3
      Page(s):
    665-670

    Part based models have been proved to be beneficial for person re-identification (Re-ID) in recent years. Existing models usually use fixed horizontal stripes or rely on human keypoints to get each part, which is not consistent with the human visual mechanism. In this paper, we propose a Self-Channel Attention Weighted Part model (SCAWP) for Re-ID. In SCAWP, we first learn a feature map from ResNet50 and use 1x1 convolution to reduce the dimension of this feature map, which could aggregate the channel information. Then, we learn the weight map of attention within each channel and multiply it with the feature map to get each part. Finally, each part is used for a special identification task to build the whole model. To verify the performance of SCAWP, we conduct experiment on three benchmark datasets, including CUHK03-NP, Market-1501 and DukeMTMC-ReID. SCAWP achieves rank-1/mAP accuracy of 70.4%/68.3%, 94.6%/86.4% and 87.6%/76.8% on three datasets respectively.

  • Multiple Subspace Model and Image-Inpainting Algorithm Based on Multiple Matrix Rank Minimization

    Tomohiro TAKAHASHI  Katsumi KONISHI  Kazunori URUMA  Toshihiro FURUKAWA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2020/08/31
      Vol:
    E103-D No:12
      Page(s):
    2682-2692

    This paper proposes an image inpainting algorithm based on multiple linear models and matrix rank minimization. Several inpainting algorithms have been previously proposed based on the assumption that an image can be modeled using autoregressive (AR) models. However, these algorithms perform poorly when applied to natural photographs because they assume that an image is modeled by a position-invariant linear model with a fixed model order. In order to improve inpainting quality, this work introduces a multiple AR model and proposes an image inpainting algorithm based on multiple matrix rank minimization with sparse regularization. In doing so, a practical algorithm is provided based on the iterative partial matrix shrinkage algorithm, with numerical examples showing the effectiveness of the proposed algorithm.

  • Knowledge Integration by Probabilistic Argumentation

    Saung Hnin Pwint OO  Nguyen Duy HUNG  Thanaruk THEERAMUNKONG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2020/05/01
      Vol:
    E103-D No:8
      Page(s):
    1843-1855

    While existing inference engines solved real world problems using probabilistic knowledge representation, one challenging task is to efficiently utilize the representation under a situation of uncertainty during conflict resolution. This paper presents a new approach to straightforwardly combine a rule-based system (RB) with a probabilistic graphical inference framework, i.e., naïve Bayesian network (BN), towards probabilistic argumentation via a so-called probabilistic assumption-based argumentation (PABA) framework. A rule-based system (RB) formalizes its rules into defeasible logic under the assumption-based argumentation (ABA) framework while the Bayesian network (BN) provides probabilistic reasoning. By knowledge integration, while the former provides a solid testbed for inference, the latter helps the former to solve persistent conflicts by setting an acceptance threshold. By experiments, effectiveness of this approach on conflict resolution is shown via an example of liver disorder diagnosis.

  • Tensor Factor Analysis for Arbitrary Speaker Conversion

    Daisuke SAITO  Nobuaki MINEMATSU  Keikichi HIROSE  

     
    PAPER-Speech and Hearing

      Pubricized:
    2020/03/13
      Vol:
    E103-D No:6
      Page(s):
    1395-1405

    This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of multiple Gaussian mixture models (GMM). In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice GMM (EV-GMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor factor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the dimension of the mean vector and the Gaussian component. The speaker space is derived by the tensor factor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. In addition, in this paper, effects of speaker adaptive training before factorization are also investigated. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.

  • Instance Segmentation by Semi-Supervised Learning and Image Synthesis

    Takeru OBA  Norimichi UKITA  

     
    PAPER

      Pubricized:
    2020/03/18
      Vol:
    E103-D No:6
      Page(s):
    1247-1256

    This paper proposes a method to create various training images for instance segmentation in a semi-supervised manner. In our proposed learning scheme, a few 3D CG models of target objects and a large number of images retrieved by keywords from the Internet are employed for initial model training and model update, respectively. Instance segmentation requires pixel-level annotations as well as object class labels in all training images. A possible solution to reduce a huge annotation cost is to use synthesized images as training images. While image synthesis using a 3D CG simulator can generate the annotations automatically, it is difficult to prepare a variety of 3D object models for the simulator. One more possible solution is semi-supervised learning. Semi-supervised learning such as self-training uses a small set of supervised data and a huge number of unsupervised data. The supervised images are given by the 3D CG simulator in our method. From the unsupervised images, we have to select only correctly-detected annotations. For selecting the correctly-detected annotations, we propose to quantify the reliability of each detected annotation based on its silhouette as well as its textures. Experimental results demonstrate that the proposed method can generate more various images for improving instance segmentation.

  • Deep State-Space Model for Noise Tolerant Skeleton-Based Action Recognition

    Kazuki KAWAMURA  Takashi MATSUBARA  Kuniaki UEHARA  

     
    PAPER

      Pubricized:
    2020/03/18
      Vol:
    E103-D No:6
      Page(s):
    1217-1225

    Action recognition using skeleton data (3D coordinates of human joints) is an attractive topic due to its robustness to the actor's appearance, camera's viewpoint, illumination, and other environmental conditions. However, skeleton data must be measured by a depth sensor or extracted from video data using an estimation algorithm, and doing so risks extraction errors and noise. In this work, for robust skeleton-based action recognition, we propose a deep state-space model (DSSM). The DSSM is a deep generative model of the underlying dynamics of an observable sequence. We applied the proposed DSSM to skeleton data, and the results demonstrate that it improves the classification performance of a baseline method. Moreover, we confirm that feature extraction with the proposed DSSM renders subsequent classifications robust to noise and missing values. In such experimental settings, the proposed DSSM outperforms a state-of-the-art method.

  • Interactive Goal Model Construction Based on a Flow of Questions

    Hiroyuki NAKAGAWA  Hironori SHIMADA  Tatsuhiro TSUCHIYA  

     
    PAPER

      Pubricized:
    2020/03/06
      Vol:
    E103-D No:6
      Page(s):
    1309-1318

    Goal modeling is a method that describes requirements structurally. Goal modeling mainly consists of two tasks: extraction of goals and organization of the extracted goals. Generally, the process of the goal modeling requires intensive manual intervention and higher modeling skills than the process of the usual requirements description. In order to mitigate this problem, we propose a method that provides systematic supports for constructing goal models. In the method, the requirement analyst answers questions and a goal model is semi-automatically constructed based on the answers made. We develop a prototype tool that implements the proposed method and apply it to two systems. The results demonstrate the feasibility of the method.

  • Leveraging Entity-Type Properties in the Relational Context for Knowledge Graph Embedding

    Md Mostafizur RAHMAN  Atsuhiro TAKASU  

     
    PAPER

      Pubricized:
    2020/02/03
      Vol:
    E103-D No:5
      Page(s):
    958-968

    Knowledge graph embedding aims to embed entities and relations of multi-relational data in low dimensional vector spaces. Knowledge graphs are useful for numerous artificial intelligence (AI) applications. However, they (KGs) are far from completeness and hence KG embedding models have quickly gained massive attention. Nevertheless, the state-of-the-art KG embedding models ignore the category specific projection of entities and the impact of entity types in relational aspect. For example, the entity “Washington” could belong to the person or location category depending on its appearance in a specific relation. In a KG, an entity usually holds many type properties. It leads us to a very interesting question: are all the type properties of an entity are meaningful for a specific relation? In this paper, we propose a KG embedding model TPRC that leverages entity-type properties in the relational context. To show the effectiveness of our model, we apply our idea to the TransE, TransR and TransD. Our approach outperforms other state-of-the-art approaches as TransE, TransD, DistMult and ComplEx. Another, important observation is: introducing entity type properties in the relational context can improve the performances of the original translation distance based models.

  • On the Complementary Role of DNN Multi-Level Enhancement for Noisy Robust Speaker Recognition in an I-Vector Framework

    Xingyu ZHANG  Xia ZOU  Meng SUN  Penglong WU  Yimin WANG  Jun HE  

     
    LETTER-Speech and Hearing

      Vol:
    E103-A No:1
      Page(s):
    356-360

    In order to improve the noise robustness of automatic speaker recognition, many techniques on speech/feature enhancement have been explored by using deep neural networks (DNN). In this work, a DNN multi-level enhancement (DNN-ME), which consists of the stages of signal enhancement, cepstrum enhancement and i-vector enhancement, is proposed for text-independent speaker recognition. Given the fact that these enhancement methods are applied in different stages of the speaker recognition pipeline, it is worth exploring the complementary role of these methods, which benefits the understanding of the pros and cons of the enhancements of different stages. In order to use the capabilities of DNN-ME as much as possible, two kinds of methods called Cascaded DNN-ME and joint input of DNNs are studied. Weighted Gaussian mixture models (WGMMs) proposed in our previous work is also applied to further improve the model's performance. Experiments conducted on the Speakers in the Wild (SITW) database have shown that DNN-ME demonstrated significant superiority over the systems with only a single enhancement for noise robust speaker recognition. Compared with the i-vector baseline, the equal error rate (EER) was reduced from 5.75 to 4.01.

  • Transferring Adaptive Bit Rate Streaming Quality Models from H.264/HD to H.265/4K-UHD Open Access

    Pierre LEBRETON  Kazuhisa YAMAGISHI  

     
    PAPER-Network

      Pubricized:
    2019/06/25
      Vol:
    E102-B No:12
      Page(s):
    2226-2242

    In this paper the quality of adaptive bit rate video streaming is investigated and two state-of-the-art models, i.e., the NTT audiovisual quality-estimation and ITU-T P.1203 models, are considered. This paper shows how these models can be applied to new conditions, e.g., 4K ultra high definition (4K-UHD) videos encoded using H.265, considering that they were originally designed and trained for HD videos encoded with H.264. Six subjective evaluations involving up to 192 participants and a large variety of test conditions, e.g., durations from 10sec to 3min, coding-quality variation, and stalling events, were conducted on both TV and mobile devices. Using the subjective data, this paper addresses how models and coefficients can be transferred to new conditions. A comparison between state-of-the-art models is conducted, showing the performance of transferred and retrained models. It is found that other video-quality estimation models, such as VMAF, can be used as input of the NTT and ITU-T P.1203 long-term pooling modules, allowing these other video-quality-estimation models to support the specificities of adaptive bit-rate-streaming scenarios. Finally, all retrained coefficients are detailed in this paper allowing future work to directly reuse the results of this study.

  • Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

    Ryo MASUMURA  Taichi ASAMI  Takanobu OBA  Sumitaka SAKAUCHI  Akinori ITO  

     
    PAPER-Speech and Hearing

      Pubricized:
    2019/09/25
      Vol:
    E102-D No:12
      Page(s):
    2557-2567

    This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.

  • Pre-Training of DNN-Based Speech Synthesis Based on Bidirectional Conversion between Text and Speech

    Kentaro SONE  Toru NAKASHIKA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2019/05/15
      Vol:
    E102-D No:8
      Page(s):
    1546-1553

    Conventional approaches to statistical parametric speech synthesis use context-dependent hidden Markov models (HMMs) clustered using decision trees to generate speech parameters from linguistic features. However, decision trees are not always appropriate to model complex context dependencies of linguistic features efficiently. An alternative scheme that replaces decision trees with deep neural networks (DNNs) was presented as a possible way to overcome the difficulty. By training the network to represent high-dimensional feedforward dependencies from linguistic features to acoustic features, DNN-based speech synthesis systems convert a text into a speech. To improved the naturalness of the synthesized speech, this paper presents a novel pre-training method for DNN-based statistical parametric speech synthesis systems. In our method, a deep relational model (DRM), which represents a joint probability of two visible variables, is applied to describe the joint distribution of acoustic and linguistic features. As with DNNs, a DRM consists several hidden layers and two visible layers. Although DNNs represent feedforward dependencies from one visible variables (inputs) to other visible variables (outputs), a DRM has an ability to represent the bidirectional dependencies between two visible variables. During the maximum-likelihood (ML) -based training, the model optimizes its parameters (connection weights between two adjacent layers, and biases) of a deep architecture considering the bidirectional conversion between 1) acoustic features given linguistic features, and 2) linguistic features given acoustic features generated from itself. Owing to considering whether the generated acoustic features are recognizable, our method can obtain reasonable parameters for speech synthesis. Experimental results in a speech synthesis task show that pre-trained DNN-based systems using our proposed method outperformed randomly-initialized DNN-based systems, especially when the amount of training data is limited. Additionally, speaker-dependent speech recognition experimental results also show that our method outperformed DNN-based systems, by setting the initial parameters of our method are the same as that in the synthesis experiments.

1-20hit(163hit)