IEICE global.ieice.org Site

Keyword Search Result

[Keyword] recurrent neural network(39hit)

1-20hit(39hit)

Intelligent Tool Condition Monitoring Based on Multi-Scale Convolutional Recurrent Neural Network
Xincheng CAO Bin YAO Binqiang CHEN Wangpeng HE Suqin GUO Kun CHEN

PAPER-Smart Industry

Pubricized:
2022/06/16
Vol:
E106-D No:5
Page(s):
644-652
Tool condition monitoring is one of the core tasks of intelligent manufacturing in digital workshop. This paper presents an intelligent recognize method of tool condition based on deep learning. First, the industrial microphone is used to collect the acoustic signal during machining; then, a central fractal decomposition algorithm is proposed to extract sensitive information; finally, the multi-scale convolutional recurrent neural network is used for deep feature extraction and pattern recognition. The multi-process milling experiments proved that the proposed method is superior to the existing methods, and the recognition accuracy reached 88%.
Analysis on Norms of Word Embedding and Hidden Vectors in Neural Conversational Model Based on Encoder-Decoder RNN
Manaya TOMIOKA Tsuneo KATO Akihiro TAMURA

PAPER-Natural Language Processing

Pubricized:
2022/06/30
Vol:
E105-D No:10
Page(s):
1780-1789
A neural conversational model (NCM) based on an encoder-decoder recurrent neural network (RNN) with an attention mechanism learns different sequence-to-sequence mappings from what neural machine translation (NMT) learns even when based on the same technique. In the NCM, we confirmed that target-word-to-source-word mappings captured by the attention mechanism are not as clear and stationary as those for NMT. Considering that vector norms indicate a magnitude of information in the processing, we analyzed the inner workings of an encoder-decoder GRU-based NCM focusing on the norms of word embedding vectors and hidden vectors. First, we conducted correlation analyses on the norms of word embedding vectors with frequencies in the training set and with conditional entropies of a bi-gram language model to understand what is correlated with the norms in the encoder and decoder. Second, we conducted correlation analyses on norms of change in the hidden vector of the recurrent layer with their input vectors for the encoder and decoder, respectively. These analyses were done to understand how the magnitude of information propagates through the network. The analytical results suggested that the norms of the word embedding vectors are associated with their semantic information in the encoder, while those are associated with the predictability as a language model in the decoder. The analytical results further revealed how the norms propagate through the recurrent layer in the encoder and decoder.
A Trade-Off between Memory Stability and Connection Sparsity in Simple Binary Associative Memories
Kento SAKA Toshimichi SAITO

LETTER-Nonlinear Problems

Pubricized:
2022/03/29
Vol:
E105-A No:9
Page(s):
1377-1380
This letter studies a biobjective optimization problem in binary associative memories characterized by ternary connection parameters. First, we introduce a condition of parameters that guarantees storage of any desired memories and suppression of oscillatory behavior. Second, we define a biobjective problem based on two objectives that evaluate uniform stability of desired memories and sparsity of connection parameters. Performing precise numerical analysis for typical examples, we have clarified existence of a trade-off between the two objectives.
Fast Gated Recurrent Network for Speech Synthesis
Bima PRIHASTO Tzu-Chiang TAI Pao-Chi CHANG Jia-Ching WANG

LETTER-Speech and Hearing

Pubricized:
2022/06/10
Vol:
E105-D No:9
Page(s):
1634-1638
The recurrent neural network (RNN) has been used in audio and speech processing, such as language translation and speech recognition. Although RNN-based architecture can be applied to speech synthesis, the long computing time is still the primary concern. This research proposes a fast gated recurrent neural network, a fast RNN-based architecture, for speech synthesis based on the minimal gated unit (MGU). Our architecture removes the unit state history from some equations in MGU. Our MGU-based architecture is about twice faster, with equally good sound quality than the other MGU-based architectures.
Polarity Classification of Social Media Feeds Using Incremental Learning — A Deep Learning Approach
Suresh JAGANATHAN Sathya MADHUSUDHANAN

PAPER-Neural Networks and Bioengineering

Pubricized:
2021/09/15
Vol:
E105-A No:3
Page(s):
584-593
Online feeds are streamed continuously in batches with varied polarities at varying times. The system handling the online feeds must be trained to classify all the varying polarities occurring dynamically. The polarity classification system designed for the online feeds must address two significant challenges: i) stability-plasticity, ii) category-proliferation. The challenges faced in the polarity classification of online feeds can be addressed using the technique of incremental learning, which serves to learn new classes dynamically and also retains the previously learned knowledge. This paper proposes a new incremental learning methodology, ILOF (Incremental Learning of Online Feeds) to classify the feeds by adopting Deep Learning Techniques such as RNN (Recurrent Neural Networks) and LSTM (Long Short Term Memory) and also ELM (Extreme Learning Machine) for addressing the above stated problems. The proposed method creates a separate model for each batch using ELM and incrementally learns from the trained batches. The training of each batch avoids the retraining of old feeds, thus saving training time and memory space. The trained feeds can be discarded when new batch of feeds arrives. Experiments are carried out using the standard datasets comprising of long feeds (IMDB, Sentiment140) and short feeds (Twitter, WhatsApp, and Twitter airline sentiment) and the proposed method showed positive results in terms of better performance and accuracy.
Joint Analysis of Sound Events and Acoustic Scenes Using Multitask Learning
Noriyuki TONAMI Keisuke IMOTO Ryosuke YAMANISHI Yoichi YAMASHITA

PAPER-Speech and Hearing

Pubricized:
2020/11/19
Vol:
E104-D No:2
Page(s):
294-301
Sound event detection (SED) and acoustic scene classification (ASC) are important research topics in environmental sound analysis. Many research groups have addressed SED and ASC using neural-network-based methods, such as the convolutional neural network (CNN), recurrent neural network (RNN), and convolutional recurrent neural network (CRNN). The conventional methods address SED and ASC separately even though sound events and acoustic scenes are closely related to each other. For example, in the acoustic scene “office,” the sound events “mouse clicking” and “keyboard typing” are likely to occur. Therefore, it is expected that information on sound events and acoustic scenes will be of mutual aid for SED and ASC. In this paper, we propose multitask learning for joint analysis of sound events and acoustic scenes, in which the parts of the networks holding information on sound events and acoustic scenes in common are shared. Experimental results obtained using the TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the proposed method improves the performance of SED and ASC by 1.31 and 1.80 percentage points in terms of the F-score, respectively, compared with the conventional CRNN-based method.
FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks
Yuxi SUN Hideharu AMANO

PAPER-Computer System

Pubricized:
2020/09/24
Vol:
E103-D No:12
Page(s):
2457-2462
Recurrent neural networks (RNNs) have been proven effective for sequence-based tasks thanks to their capability to process temporal information. In real-world systems, deep RNNs are more widely used to solve complicated tasks such as large-scale speech recognition and machine translation. However, the implementation of deep RNNs on traditional hardware platforms is inefficient due to long-range temporal dependence and irregular computation patterns within RNNs. This inefficiency manifests itself in the proportional increase in the latency of RNN inference with respect to the number of layers of deep RNNs on CPUs and GPUs. Previous work has focused mostly on optimizing and accelerating individual RNN cells. To make deep RNN inference fast and efficient, we propose an accelerator based on a multi-FPGA platform called Flow-in-Cloud (FiC). In this work, we show that the parallelism provided by the multi-FPGA system can be taken advantage of to scale up the inference of deep RNNs, by partitioning a large model onto several FPGAs, so that the latency stays close to constant with respect to increasing number of RNN layers. For single-layer and four-layer RNNs, our implementation achieves 31x and 61x speedup compared with an Intel CPU.
Sound Event Detection Utilizing Graph Laplacian Regularization with Event Co-Occurrence
Keisuke IMOTO Seisuke KYOCHI

PAPER-Speech and Hearing

Pubricized:
2020/06/08
Vol:
E103-D No:9
Page(s):
1971-1977
A limited number of types of sound event occur in an acoustic scene and some sound events tend to co-occur in the scene; for example, the sound events “dishes” and “glass jingling” are likely to co-occur in the acoustic scene “cooking.” In this paper, we propose a method of sound event detection using graph Laplacian regularization with sound event co-occurrence taken into account. In the proposed method, the occurrences of sound events are expressed as a graph whose nodes indicate the frequencies of event occurrence and whose edges indicate the sound event co-occurrences. This graph representation is then utilized for the model training of sound event detection, which is optimized under an objective function with a regularization term considering the graph structure of sound event occurrence and co-occurrence. Evaluation experiments using the TUT Sound Events 2016 and 2017 detasets, and the TUT Acoustic Scenes 2016 dataset show that the proposed method improves the performance of sound event detection by 7.9 percentage points compared with the conventional CNN-BiGRU-based detection method in terms of the segment-based F1 score. In particular, the experimental results indicate that the proposed method enables the detection of co-occurring sound events more accurately than the conventional method.
A Highly Configurable 7.62GOP/s Hardware Implementation for LSTM
Yibo FAN Leilei HUANG Kewei CHEN Xiaoyang ZENG

PAPER-Integrated Electronics

Pubricized:
2019/11/27
Vol:
E103-C No:5
Page(s):
263-273
The neural network has been one of the most useful techniques in the area of speech recognition, language translation and image analysis in recent years. Long Short-Term Memory (LSTM), a popular type of recurrent neural networks (RNNs), has been widely implemented on CPUs and GPUs. However, those software implementations offer a poor parallelism while the existing hardware implementations lack in configurability. In order to make up for this gap, a highly configurable 7.62 GOP/s hardware implementation for LSTM is proposed in this paper. To achieve the goal, the work flow is carefully arranged to make the design compact and high-throughput; the structure is carefully organized to make the design configurable; the data buffering and compression strategy is carefully chosen to lower the bandwidth without increasing the complexity of structure; the data type, logistic sigmoid (σ) function and hyperbolic tangent (tanh) function is carefully optimized to balance the hardware cost and accuracy. This work achieves a performance of 7.62 GOP/s @ 238 MHz on XCZU6EG FPGA, which takes only 3K look-up table (LUT). Compared with the implementation on Intel Xeon E5-2620 CPU @ 2.10GHz, this work achieves about 90× speedup for small networks and 25× speed-up for large ones. The consumption of resources is also much less than that of the state-of-the-art works.
Software Development Effort Estimation from Unstructured Software Project Description by Sequence Models
Tachanun KANGWANTRAKOOL Kobkrit VIRIYAYUDHAKORN Thanaruk THEERAMUNKONG

PAPER

Pubricized:
2020/01/14
Vol:
E103-D No:4
Page(s):
739-747
Most existing methods of effort estimations in software development are manual, labor-intensive and subjective, resulting in overestimation with bidding fail, and underestimation with money loss. This paper investigates effectiveness of sequence models on estimating development effort, in the form of man-months, from software project data. Four architectures; (1) Average word-vector with Multi-layer Perceptron (MLP), (2) Average word-vector with Support Vector Regression (SVR), (3) Gated Recurrent Unit (GRU) sequence model, and (4) Long short-term memory (LSTM) sequence model are compared in terms of man-months difference. The approach is evaluated using two datasets; ISEM (1,573 English software project descriptions) and ISBSG (9,100 software projects data), where the former is a raw text and the latter is a structured data table explained the characteristic of a software project. The LSTM sequence model achieves the lowest and the second lowest mean absolute errors, which are 0.705 and 14.077 man-months for ISEM and ISBSG datasets respectively. The MLP model achieves the lowest mean absolute errors which is 14.069 for ISBSG datasets.
Real-Time Generic Object Tracking via Recurrent Regression Network
Rui CHEN Ying TONG Ruiyu LIANG

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2019/12/20
Vol:
E103-D No:3
Page(s):
602-611
Deep neural networks have achieved great success in visual tracking by learning a generic representation and leveraging large amounts of training data to improve performance. Most generic object trackers are trained from scratch online and do not benefit from a large number of videos available for offline training. We present a real-time generic object tracker capable of incorporating temporal information into its model, learning from many examples offline and quickly updating online. During the training process, the pre-trained weight of convolution layer is updated lagging behind, and the input video sequence length is gradually increased for fast convergence. Furthermore, only the hidden states in recurrent network are updated to guarantee the real-time tracking speed. The experimental results show that the proposed tracking method is capable of tracking objects at 150 fps with higher predicting overlap rate, and achieves more robustness in multiple benchmarks than state-of-the-art performance.
Recurrent Neural Network Compression Based on Low-Rank Tensor Representation
Andros TJANDRA Sakriani SAKTI Satoshi NAKAMURA

PAPER-Music Information Processing

Pubricized:
2019/10/17
Vol:
E103-D No:2
Page(s):
435-449
Recurrent Neural Network (RNN) has achieved many state-of-the-art performances on various complex tasks related to the temporal and sequential data. But most of these RNNs require much computational power and a huge number of parameters for both training and inference stage. Several tensor decomposition methods are included such as CANDECOMP/PARAFAC (CP), Tucker decomposition and Tensor Train (TT) to re-parameterize the Gated Recurrent Unit (GRU) RNN. First, we evaluate all tensor-based RNNs performance on sequence modeling tasks with a various number of parameters. Based on our experiment results, TT-GRU achieved the best results in a various number of parameters compared to other decomposition methods. Later, we evaluate our proposed TT-GRU with speech recognition task. We compressed the bidirectional GRU layers inside DeepSpeech2 architecture. Based on our experiment result, our proposed TT-format GRU are able to preserve the performance while reducing the number of GRU parameters significantly compared to the uncompressed GRU.
Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition
Ryo MASUMURA Taichi ASAMI Takanobu OBA Sumitaka SAKAUCHI Akinori ITO

PAPER-Speech and Hearing

Pubricized:
2019/09/25
Vol:
E102-D No:12
Page(s):
2557-2567
This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.
Multi Model-Based Distillation for Sound Event Detection Open Access
Yingwei FU Kele XU Haibo MI Qiuqiang KONG Dezhi WANG Huaimin WANG Tie HONG

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2019/07/08
Vol:
E102-D No:10
Page(s):
2055-2058
Sound event detection is intended to identify the sound events in audio recordings, which has widespread applications in real life. Recently, convolutional recurrent neural network (CRNN) models have achieved state-of-the-art performance in this task due to their capabilities in learning the representative features. However, the CRNN models are of high complexities with millions of parameters to be trained, which limits their usage for the mobile and embedded devices with limited computation resource. Model distillation is effective to distill the knowledge of a complex model to a smaller one, which can be deployed on the devices with limited computational power. In this letter, we propose a novel multi model-based distillation approach for sound event detection by making use of the knowledge from models of multiple teachers which are complementary in detecting sound events. Extensive experimental results demonstrated that our approach achieves a compression ratio about 50 times. In addition, better performance is obtained for the sound event detection task.
Low-Cost Method for Recognizing Table Tennis Activity
Se-Min LIM Jooyoung PARK Hyeong-Cheol OH

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2019/06/18
Vol:
E102-D No:10
Page(s):
2051-2054
This study designs a low-cost portable device that functions as a coaching assistant system which can support table tennis practice. Although deep learning technology is a promising solution to realizing human activity recognition, we propose using cosine similarity in making inferences. Our experiments show that the cosine similarity based inference can be a good alternative to the deep learning based inference for the assistant system when resources are limited.
A Top-N-Balanced Sequential Recommendation Based on Recurrent Network
Zhenyu ZHAO Ming ZHU Yiqiang SHENG Jinlin WANG

PAPER

Pubricized:
2019/01/10
Vol:
E102-D No:4
Page(s):
737-744
To solve the low accuracy problem of the recommender system for long term users, in this paper, we propose a top-N-balanced sequential recommendation based on recurrent neural network. We postulated and verified that the interactions between users and items is time-dependent in the long term, but in the short term, it is time-independent. We balance the top-N recommendation and sequential recommendation to generate a better recommender list by improving the loss function and generation method. The experimental results demonstrate the effectiveness of our method. Compared with a state-of-the-art recommender algorithm, our method clearly improves the performance of the recommendation on hit rate. Besides the improvement of the basic performance, our method can also handle the cold start problem and supply new users with the same quality of service as the old users.
Air-Writing Recognition Based on Fusion Network for Learning Spatial and Temporal Features
Buntueng YANA Takao ONOYE

PAPER-Neural Networks and Bioengineering

Vol:
E101-A No:11
Page(s):
1737-1744
A fusion framework between CNN and RNN is proposed dedicatedly for air-writing recognition. By modeling the air-writing using both spatial and temporal features, the proposed network can learn more information than existing techniques. Performance of the proposed network is evaluated by using the alphabet and numeric datasets in the public database namely the 6DMG. Average accuracy of the proposed fusion network outperforms other techniques, i.e. 99.25% and 99.83% are observed in the alphabet gesture and the numeric gesture, respectively. Simplified structure of RNN is also proposed, which can attain about two folds speed-up of ordinary BLSTM network. It is also confirmed that only the distance between consecutive sampling points is enough to attain high recognition performance.
A Unified Neural Network for Quality Estimation of Machine Translation
Maoxi LI Qingyu XIANG Zhiming CHEN Mingwen WANG

LETTER-Natural Language Processing

Pubricized:
2018/06/18
Vol:
E101-D No:9
Page(s):
2417-2421
The-state-of-the-art neural quality estimation (QE) of machine translation model consists of two sub-networks that are tuned separately, a bidirectional recurrent neural network (RNN) encoder-decoder trained for neural machine translation, called the predictor, and an RNN trained for sentence-level QE tasks, called the estimator. We propose to combine the two sub-networks into a whole neural network, called the unified neural network. When training, the bidirectional RNN encoder-decoder are initialized and pre-trained with the bilingual parallel corpus, and then, the networks are trained jointly to minimize the mean absolute error over the QE training samples. Compared with the predictor and estimator approach, the use of a unified neural network helps to train the parameters of the neural networks that are more suitable for the QE task. Experimental results on the benchmark data set of the WMT17 sentence-level QE shared task show that the proposed unified neural network approach consistently outperforms the predictor and estimator approach and significantly outperforms the other baseline QE approaches.
Transform Electric Power Curve into Dynamometer Diagram Image Using Deep Recurrent Neural Network
Junfeng SHI Wenming MA Peng SONG

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2018/05/09
Vol:
E101-D No:8
Page(s):
2154-2158
To learn the working situation of rod-pumped wells under ground, we always need to analyze dynamometer diagrams, which are generated by the load sensor and displacement sensor. Rod-pumped wells are usually located in the places with extreme weather, and these sensors are installed on some special oil equipments in the open air. As time goes by, sensors are prone to generating unstable and incorrect data. Unfortunately, load sensors are too expensive to frequently reinstall. Therefore, the resulting dynamometer diagrams sometimes cannot make an accurate diagnosis. Instead, as an absolutely necessary equipment of the rod-pumped well, the electric motor has much longer life and cannot be easily impacted by the weather. The electric power curve during a swabbing period can also reflect the working situation under ground, but is much harder to explain than the dynamometer diagram. This letter presented a novel deep learning architecture, which can transform the electric power curve into the dimensionless dynamometer diagram image. We conduct our experiments on a real-world dataset, and the results show that our method can get an impressive transformation accuracy.
Predicting Taxi Destination by Regularized RNN with SDZ
Lei ZHANG Guoxing ZHANG Zhizheng LIANG Qingfu FAN Yadong LI

LETTER-Data Engineering, Web Information Systems

Pubricized:
2018/05/02
Vol:
E101-D No:8
Page(s):
2141-2144
The traditional Markov prediction methods of the taxi destination rely only on the previous 2 to 3 GPS points. They negelect long-term dependencies within a taxi trajectory. We adopt a Recurrent Neural Network (RNN) to explore the long-term dependencies to predict the taxi destination as the multiple hidden layers of RNN can store these dependencies. However, the hidden layers of RNN are very sensitive to small perturbations to reduce the prediction accuracy when the amount of taxi trajectories is increasing. In order to improve the prediction accuracy of taxi destination and reduce the training time, we embed suprisal-driven zoneout (SDZ) to RNN, hence a taxi destination prediction method by regularized RNN with SDZ (TDPRS). SDZ can not only improve the robustness of TDPRS, but also reduce the training time by adopting partial update of parameters instead of a full update. Experiments with a Porto taxi trajectory data show that TDPRS improves the prediction accuracy by 12% compared to RNN prediction method in literature[4]. At the same time, the prediction time is reduced by 7%.

1-20hit(39hit)

Keyword Search Result

[Keyword] recurrent neural network(39hit)

Intelligent Tool Condition Monitoring Based on Multi-Scale Convolutional Recurrent Neural Network

Analysis on Norms of Word Embedding and Hidden Vectors in Neural Conversational Model Based on Encoder-Decoder RNN

A Trade-Off between Memory Stability and Connection Sparsity in Simple Binary Associative Memories

Fast Gated Recurrent Network for Speech Synthesis

Polarity Classification of Social Media Feeds Using Incremental Learning — A Deep Learning Approach

Joint Analysis of Sound Events and Acoustic Scenes Using Multitask Learning

FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks

Sound Event Detection Utilizing Graph Laplacian Regularization with Event Co-Occurrence

A Highly Configurable 7.62GOP/s Hardware Implementation for LSTM

Software Development Effort Estimation from Unstructured Software Project Description by Sequence Models

Real-Time Generic Object Tracking via Recurrent Regression Network

Recurrent Neural Network Compression Based on Low-Rank Tensor Representation

Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

Multi Model-Based Distillation for Sound Event Detection Open Access

Low-Cost Method for Recognizing Table Tennis Activity

A Top-N-Balanced Sequential Recommendation Based on Recurrent Network

Air-Writing Recognition Based on Fusion Network for Learning Spatial and Temporal Features

A Unified Neural Network for Quality Estimation of Machine Translation

Transform Electric Power Curve into Dynamometer Diagram Image Using Deep Recurrent Neural Network

Predicting Taxi Destination by Regularized RNN with SDZ

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles