IEICE global.ieice.org Site

Keyword Search Result

[Keyword] LSTM(32hit)

21-32hit(32hit)

Attention-Based Dense LSTM for Speech Emotion Recognition Open Access
Yue XIE Ruiyu LIANG Zhenlin LIANG Li ZHAO

LETTER-Pattern Recognition

Pubricized:
2019/04/17
Vol:
E102-D No:7
Page(s):
1426-1429
Despite the widespread use of deep learning for speech emotion recognition, they are severely restricted due to the information loss in the high layer of deep neural networks, as well as the degradation problem. In order to efficiently utilize information and solve degradation, attention-based dense long short-term memory (LSTM) is proposed for speech emotion recognition. LSTM networks with the ability to process time series such as speech are constructed into which attention-based dense connections are introduced. That means the weight coefficients are added to skip-connections of each layer to distinguish the difference of the emotional information between layers and avoid the interference of redundant information from the bottom layer to the effective information from the top layer. The experiments demonstrate that proposed method improves the recognition performance by 12% and 7% on eNTERFACE and IEMOCAP corpus respectively.
Quantitative Analyses on Effects from Constraints in Air-Writing Open Access
Songbin XU Yang XUE Yuqing CHEN

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2019/01/28
Vol:
E102-D No:4
Page(s):
867-870
Very few existing works about inertial sensor based air-writing focused on writing constraints' effects on recognition performance. We proposed a LSTM-based system and made several quantitative analyses under different constraints settings against CHMM, DTW-AP and CNN. The proposed system shows its advantages in accuracy, real-time performance and flexibility.
Feature Based Domain Adaptation for Neural Network Language Models with Factorised Hidden Layers
Michael HENTSCHEL Marc DELCROIX Atsunori OGAWA Tomoharu IWATA Tomohiro NAKATANI

PAPER-Speech and Hearing

Pubricized:
2018/12/04
Vol:
E102-D No:3
Page(s):
598-608
Language models are a key technology in various tasks, such as, speech recognition and machine translation. They are usually used on texts covering various domains and as a result domain adaptation has been a long ongoing challenge in language model research. With the rising popularity of neural network based language models, many methods have been proposed in recent years. These methods can be separated into two categories: model based and feature based adaptation methods. Feature based domain adaptation has compared to model based domain adaptation the advantage that it does not require domain labels in the corpus. Most existing feature based adaptation methods are based on bias adaptation. We propose a novel feature based domain adaptation technique using hidden layer factorisation. This method is fundamentally different from existing methods because we use the domain features to calculate a linear combination of linear layers. These linear layers can capture domain specific information and information common to different domains. In the experiments, we compare our proposed method with existing adaptation methods. The compared adaptation techniques are based on two different ideas, that is, bias based adaptation and gating of hidden units. All language models in our comparison use state-of-the-art long short-term memory based recurrent neural networks. We demonstrate the effectiveness of the proposed method with perplexity results for the well-known Penn Treebank and speech recognition results for a corpus of TED talks.
Automatic Speech Recognition System with Output-Gate Projected Gated Recurrent Unit
Gaofeng CHENG Pengyuan ZHANG Ji XU

PAPER-Speech and Hearing

Pubricized:
2018/11/19
Vol:
E102-D No:2
Page(s):
355-363
The long short-term memory recurrent neural network (LSTM) has achieved tremendous success for automatic speech recognition (ASR). However, the complicated gating mechanism of LSTM introduces a massive computational cost and limits the application of LSTM in some scenarios. In this paper, we describe our work on accelerating the decoding speed and improving the decoding accuracy. First, we propose an architecture, which is called Projected Gated Recurrent Unit (PGRU), for ASR tasks, and show that the PGRU can consistently outperform the standard GRU. Second, to improve the PGRU generalization, particularly on large-scale ASR tasks, we propose the Output-gate PGRU (OPGRU). In addition, the time delay neural network (TDNN) and normalization methods are found beneficial for OPGRU. In this paper, we apply the OPGRU for both the acoustic model and recurrent neural network language model (RNN-LM). Finally, we evaluate the PGRU on the total Eval2000 / RT03 test sets, and the proposed OPGRU single ASR system achieves 0.9% / 0.9% absolute (8.2% / 8.6% relative) reduction in word error rate (WER) compared to our previous best LSTM single ASR system. Furthermore, the OPGRU ASR system achieves significant speed-up on both acoustic model and language model rescoring.
Multi Long-Short Term Memory Models for Short Term Traffic Flow Prediction
Zelong XUE Yang XUE

LETTER-Biocybernetics, Neurocomputing

Pubricized:
2018/09/18
Vol:
E101-D No:12
Page(s):
3272-3275
Many single model methods have been applied to real-time short-term traffic flow prediction. However, since traffic flow data is mixed with a variety of ingredients, the performance of single model is limited. Therefore, we proposed Multi-Long-Short Term Memory Models, which improved traffic flow prediction accuracy comparing with state-of-the-art models.
Multi-Channels LSTM Networks for Fence Activity Classification
Kelu HU Chunlei ZHENG Wei HE Xinghe BAO Yingguan WANG

LETTER-Biocybernetics, Neurocomputing

Pubricized:
2018/04/23
Vol:
E101-D No:8
Page(s):
2173-2177
We propose a novel neural networks model based on LSTM which is used to solve the task of classifying inertial sensor data attached to a fence with the goal of detecting security relevant incidents. To evaluate it we deployed an experimental fence surveillance system. By comparing experimental data of different approaches we find out that the neural network outperforms the baseline approach.
A Deep Learning-Based Approach to Non-Intrusive Objective Speech Intelligibility Estimation
Deokgyu YUN Hannah LEE Seung Ho CHOI

LETTER-Speech and Hearing

Pubricized:
2018/01/09
Vol:
E101-D No:4
Page(s):
1207-1208
This paper proposes a deep learning-based non-intrusive objective speech intelligibility estimation method based on recurrent neural network (RNN) with long short-term memory (LSTM) structure. Conventional non-intrusive estimation methods such as standard P.563 have poor estimation performance and lack of consistency, especially, in various noise and reverberation environments. The proposed method trains the LSTM RNN model parameters by utilizing the STOI that is the standard intrusive intelligibility estimation method with reference speech signal. The input and output of the LSTM RNN are the MFCC vector and the frame-wise STOI value, respectively. Experimental results show that the proposed objective intelligibility estimation method outperforms the conventional standard P.563 in various noisy and reverberant environments.
Next-Activity Set Prediction Based on Sequence Partitioning to Reduce Activity Pattern Complexity in the Multi-User Smart Space
Younggi KIM Younghee LEE

PAPER-Pattern Recognition

Pubricized:
2017/07/18
Vol:
E100-D No:10
Page(s):
2587-2596
Human activity prediction has become a prerequisite for service recommendation and anomaly detection systems in a smart space including ambient assisted living (AAL) and activities of daily living (ADL). In this paper, we present a novel approach to predict the next-activity set in a multi-user smart space. Differing from the majority of the previous studies considering single-user activity patterns, our study considers multi-user activities that occur with a large variety of patterns. Its complexity increases exponentially according to the number of users. In the multi-user smart space, there can be inevitably multiple next-activity candidates after multi-user activities occur. To solve the next-activity problem in a multi-user situation, we propose activity set prediction rather than one activity prediction. We also propose activity sequence partitioning to reduce the complexity of the multi-user activity pattern. This divides an activity sequence into start, ongoing, and finish zones based on the features in the tendency of activity occurrences. The majority of the activities in a multi-user environment occur at the beginning or end, rather than the middle, of an activity sequence. Furthermore, the types of activities typically occurring in each zone can be sufficiently distinguishable. Exploiting these characteristics, we suggest a two-step procedure to predict the next-activity set utilizing a long short-term memory (LSTM) model. The first step identifies the zones to which current activities belong. In the next step, we construct three different LSTM models to predict the next-activity set in each zone. To evaluate the proposed approach, we experimented using a real dataset generated from our campus testbed. Our experiments confirmed the complexity reduction and high accuracy in the next-activity set prediction. Thus, it can be effectively utilized for various applications with context-awareness in a multi-user smart space.
Relation Extraction with Deep Reinforcement Learning
Hongjun ZHANG Yuntian FENG Wenning HAO Gang CHEN Dawei JIN

PAPER-Natural Language Processing

Pubricized:
2017/05/17
Vol:
E100-D No:8
Page(s):
1893-1902
In recent years, deep learning has been widely applied in relation extraction task. The method uses only word embeddings as network input, and can model relations between target named entity pairs. It equally deals with each relation mention, so it cannot effectively extract relations from the corpus with an enormous number of non-relations, which is the main reason why the performance of relation extraction is significantly lower than that of relation classification. This paper designs a deep reinforcement learning framework for relation extraction, which considers relation extraction task as a two-step decision-making game. The method models relation mentions with CNN and Tree-LSTM, which can calculate initial state and transition state for the game respectively. In addition, we can tackle the problem of unbalanced corpus by designing penalty function which can increase the penalties for first-step decision-making errors. Finally, we use Q-Learning algorithm with value function approximation to learn control policy π for the game. This paper sets up a series of experiments in ACE2005 corpus, which show that the deep reinforcement learning framework can achieve state-of-the-art performance in relation extraction task.
An Attention-Based Hybrid Neural Network for Document Modeling
Dengchao HE Hongjun ZHANG Wenning HAO Rui ZHANG Huan HAO

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2017/03/21
Vol:
E100-D No:6
Page(s):
1372-1375
The purpose of document modeling is to learn low-dimensional semantic representations of text accurately for Natural Language Processing tasks. In this paper, proposed is a novel attention-based hybrid neural network model, which would extract semantic features of text hierarchically. Concretely, our model adopts a bidirectional LSTM module with word-level attention to extract semantic information for each sentence in text and subsequently learns high level features via a dynamic convolution neural network module. Experimental results demonstrate that our proposed approach is effective and achieve better performance than conventional methods.
LSTM-CRF Models for Named Entity Recognition
Changki LEE

PAPER-Natural Language Processing

Pubricized:
2017/01/20
Vol:
E100-D No:4
Page(s):
882-887
Recurrent neural networks (RNNs) are a powerful model for sequential data. RNNs that use long short-term memory (LSTM) cells have proven effective in handwriting recognition, language modeling, speech recognition, and language comprehension tasks. In this study, we propose LSTM conditional random fields (LSTM-CRF); it is an LSTM-based RNN model that uses output-label dependencies with transition features and a CRF-like sequence-level objective function. We also propose variations to the LSTM-CRF model using a gate recurrent unit (GRU) and structurally constrained recurrent network (SCRN). Empirical results reveal that our proposed models attain state-of-the-art performance for named entity recognition.
Recognition of Online Handwritten Math Symbols Using Deep Neural Networks
Hai DAI NGUYEN Anh DUC LE Masaki NAKAGAWA

PAPER-Pattern Recognition

Pubricized:
2016/08/30
Vol:
E99-D No:12
Page(s):
3110-3118
This paper presents deep learning to recognize online handwritten mathematical symbols. Recently various deep learning architectures such as Convolution neural networks (CNNs), Deep neural networks (DNNs), Recurrent neural networks (RNNs) and Long short-term memory (LSTM) RNNs have been applied to fields such as computer vision, speech recognition and natural language processing where they have shown superior performance to state-of-the-art methods on various tasks. In this paper, max-out-based CNNs and Bidirectional LSTM (BLSTM) networks are applied to image patterns created from online patterns and to the original online patterns, respectively and then combined. They are compared with traditional recognition methods which are MRFs and MQDFs by recognition experiments on the CROHME database along with analysis and explanation.