IEICE TRANSACTIONS on Fundamentals

Impact Factor

0.48
Eigenfactor

0.003
article influence

0.1
Cite Score

1.1

To the Advance publication
To the Archives

Advance publication (published online immediately after acceptance)

Speech Emotion Detection using Fusion on Multi-Source Low-Level Information Based Recurrent Branches
Jiaxin WU Bing LI Li ZHAO Xinzhou XU

Pubricized:
2024/07/05
PAPER
- Summary
- Free PDF (436KB)
Edge Assembly Crossover Incorporating Tabu Search for the Traveling Salesman Problem
Maaki SAKAI Kanon HOKAZONO Yoshiko HANADA

Pubricized:
2024/06/24
LETTER
- Summary
- Free PDF (430.1KB)
Attributed Graph Clustering Network with Adaptive Feature Fusion
Xuecheng SUN Zheming LU

Pubricized:
2024/06/19
LETTER
- Summary
- Free PDF (133.5KB)
Anti-Interception Vortex Microwave Photon Transmission with Covert Differential Channel
Yuanhe WANG Chao ZHANG

Pubricized:
2024/06/14
LETTER
- Summary
- Free PDF (419.6KB)
Boolean Functions with Two Distinct Nega-Hadamard Coefficients
Jinfeng CHONG Niu JIANG Zepeng ZHUO Weiyu ZHANG

Pubricized:
2024/06/14
PAPER
- Summary
- Free PDF (3MB)
Pool-unet: A Novel Tongue Image Segmentation Method Based on Pool-former and Multi-task Mask Learning
Xiangrun LI Qiyu SHENG Guangda ZHOU Jialong WEI Yanmin SHI Zhen ZHAO Yongwei LI Xingfeng LI Yang LIU

Pubricized:
2024/05/29
PAPER
- Summary
- Free PDF (9.8MB)
High-Parallelism and Pipelined Architecture for Accelerating Sort-Merge Join on FPGA
Meiting XUE Wenqi WU Jinfeng LUO Yixuan ZHANG Bei ZHAO

Pubricized:
2024/05/28
PAPER
- Summary
- Free PDF (1.4MB)
Chaotic detection of target signal in HFSWR ionospheric clutter background under typhoon excitation
Rong WANG Changjun YU Zhe LYU Aijun LIU

Pubricized:
2024/05/23
LETTER
- Summary
- Free PDF (613.5KB)
New infinite classes of 0-APN power functions over 𝔽_2ⁿ
Huijuan ZHOU Zepeng ZHUO Guolong CHEN

Pubricized:
2024/05/23
PAPER
- Summary
- Free PDF (594.1KB)
Trace representation of balanced quaternary generalized cyclotomic sequences of period pⁿ
Feifei YAN Pinhui KE Zuling CHANG

Pubricized:
2024/05/22
LETTER
- Summary
- Free PDF (295.6KB)
Introduction to Quantum Deletion Error-Correcting Codes
Manabu HAGIWARA

Pubricized:
2024/05/22
INVITED PAPER
- Summary
- Free PDF (211.5KB)
Enhanced Radar Emitter Recognition with Virtual Adversarial Training: A Semi-Supervised Framework
Ziqin FENG Hong WAN Guan GUI

Pubricized:
2024/05/15
PAPER
- Summary
- Free PDF (5.7MB)
Adaptive Output Feedback Leader-Following in Networks of Linear Systems Using Switching Logic
Sungryul LEE

Pubricized:
2024/05/13
LETTER
- Summary
- Free PDF (707.8KB)
Cloud-edge-end Collaborative Multi-service Resource Management for IoT-based Distribution Grid
Feng WANG Xiangyu WEN Lisheng LI Yan WEN Shidong ZHANG Yang LIU

Pubricized:
2024/05/13
PAPER
- Summary
- Free PDF (1.7MB)
Characterization for a generic construction of bent functions and its consequences
Yanjun LI Jinjie GAO Haibin KAN Jie PENG Lijing ZHENG Changhui CHEN

Pubricized:
2024/05/07
LETTER
- Summary
- Free PDF (135.4KB)
Pre-T event-triggered controller with a gain-scaling factor for a chain of integrators and its extension to strict-feedback nonlinearity
Ho-Lim CHOI

Pubricized:
2024/04/30
LETTER
- Summary
- Free PDF (566.5KB)
DETrack: Multi-object tracking algorithm based on feature decomposition and feature enhancement
Feng WEN Haixin HUANG Xiangyang YIN Junguang MA Xiaojie HU

Pubricized:
2024/04/22
PAPER
- Summary
- Free PDF (4.8MB)
Color correction method considering hue information for dichromats
Shi BAO Xiaoyan SONG Xufei ZHUANG Min LU Gao LE

Pubricized:
2024/04/22
PAPER
- Summary
- Free PDF (3.3MB)
A Novel Frequency Hopping Prediction Model Based on TCN-GRU
Chen ZHONG Chegnyu WU Xiangyang LI Ao ZHAN Zhengqiang WANG

Pubricized:
2024/04/19
LETTER
- Summary
- Free PDF (527.5KB)
Spatial extrapolation of early room impulse responses with noise-robust physics-informed neural network
Izumi TSUNOKUNI Gen SATO Yusuke IKEDA Yasuhiro OIKAWA

Pubricized:
2024/04/08
LETTER
- Summary
- Free PDF (3.3MB)
A feasible scheme for the backward transmission in the three-user X channel with reciprocal propagation delay
Feng LIU Helin WANG Conggai LI Yanli XU

Pubricized:
2024/04/05
LETTER
- Summary
- Free PDF (88.9KB)
Deep Learning-inspired Automatic Minutiae Extraction From Semi-automated Annotations
Hongtian ZHAO Hua YANG Shibao ZHENG

Pubricized:
2024/04/05
PAPER
- Summary
- Free PDF (7.3MB)
Feistel Ciphers Based on A Single Primitive
Kento TSUJI Tetsu IWATA

Pubricized:
2024/03/29
PAPER
- Summary
Search for 9-variable Boolean Functions with the Optimal Algebraic Immunity-Resiliency Trade-off and High Nonlinearity
Yueying LOU Qichun WANG

Pubricized:
2024/03/28
LETTER
- Summary
Peak-to-Average Power Ratio Reduction Scheme in DCO-OFDM with a Combined Index Modulation and Convex Optimization
Menglong WU Jianwen ZHANG Yongfa XIE Yongchao SHI Tianao YAO

Pubricized:
2024/03/22
LETTER
- Summary
Constructions of 2-correlation immune Rotation Symmetric Boolean functions
Jiao DU Ziwei ZHAO Shaojing FU Longjiang QU Chao LI

Pubricized:
2024/03/22
PAPER
- Summary
Triangle Projection Algorithm in ADMM-LP Decoding of LDPC Codes
Yun JIANG Huiyang LIU Xiaopeng JIAO Ji WANG Qiaoqiao XIA

Pubricized:
2024/03/18
LETTER
- Summary
A Dual-Branch Algorithm for Semantic-Focused Face Super-Resolution Reconstruction
Qi QI Liuyi MENG Ming XU Bing BAI

Pubricized:
2024/03/18
LETTER
- Summary
RIS-Assisted MIMO OFDM Dual-Function Radar-Communication based on Mutual Information Optimization
Nihad A. A. ELHAG Liang LIU Ping WEI Hongshu LIAO Lin GAO

Pubricized:
2024/03/15
PAPER
- Summary
Accurate False-positive Probability of Multiset-based Demirci-Selçuk Meet-in-the-middle Attacks
Dong Jae LEE Deukjo HONG Jaechul SUNG Seokhie HONG

Pubricized:
2024/03/15
PAPER
- Summary
Dispersion in a Polygon
Tetsuya ARAKI Shin-ichi NAKANO

Pubricized:
2024/03/11
PAPER
- Summary
Quantum Collision Resistance of Double-Block-Length Hashing
Shoichi HIROSE Hidenori KUWAKADO

Pubricized:
2024/03/04
PAPER
- Summary
An Efficiency-Enhancing Wideband OFDM Dual-Function MIMO Radar-Communication System Design
Yumeng ZHANG

Pubricized:
2024/03/04
LETTER
- Summary
Permanent magnet synchronous motor speed control system based on fractional order integral sliding mode control
Jun-Feng Liu Yuan Feng Zeng-Hui Li Jing-Wei Tang

Pubricized:
2024/03/04
LETTER
- Summary
Outsider-Anonymous Broadcast Encryption with Keyword Search: Generic Construction, CCA Security, and with Sublinear Ciphertexts
Keita EMURA Kaisei KAJITA Go OHTAKE

Pubricized:
2024/02/26
PAPER
- Summary
A New Construction of Three-phase Z-complementary Triads Based on Extended Boolean Functions
Xiuping PENG Yinna LIU Hongbin LIN

Pubricized:
2024/02/15
LETTER
- Summary
A Combination Method for Impedance Extraction of SMD Electronic Components Based on Full-wave Simulation and De-embedding Technique
Yang XIAO Zhongyuan ZHOU Mingjie SHENG Qi ZHOU

Pubricized:
2024/02/15
PAPER
- Summary
Rectangle-of-influence Drawings of Five-Connected Plane Graphs
Kazuyuki MIURA

Pubricized:
2024/02/09
PAPER
- Summary
A Multi-Channel Biomedical Sensor System With System-Level Chopping and Stochastic A/D Conversion
Yusaku HIRAI Toshimasa MATSUOKA Takatsugu KAMATA Sadahiro TANI Takao ONOYE

Pubricized:
2024/02/09
PAPER
- Summary
Mixed-integer linear optimization formulations for feature subset selection in kernel SVM classification
Ryuta TAMURA Yuichi TAKANO Ryuhei MIYASHIRO

Pubricized:
2024/02/08
PAPER
- Summary
Improving the Security Bounds against Differential Attacks for Pholkos Family
Nobuyuki TAKEUCHI Kosei SAKAMOTO Takuro SHIRAYA Takanori ISOBE

Pubricized:
2024/02/08
PAPER
- Summary
SAT-based Analysis of Related-key Impossible Distinguishers on Piccolo and (Tweakable) TWINE
Shion UTSUMI Kosei SAKAMOTO Takanori ISOBE

Pubricized:
2024/02/08
PAPER
- Summary
New Constructions of Approximately Mutually Unbiased Bases by Character Sums over Galois Rings
You GAO Ming-Yue XIE Gang WANG Lin-Zhi SHEN

Pubricized:
2024/02/07
LETTER
- Summary
Improved PBFT-based High Security and Large Throughput Data Resource Sharing for Distribution Power Grid
Zhimin SHAO Chunxiu LIU Cong WANG Longtan LI Yimin LIU Zaiyan ZHOU

Pubricized:
2024/01/31
PAPER
- Summary
Extraction of Weak Harmonic Target Signal from Ionospheric Noise of High Frequency Surface Wave Radar
Xiaolong ZHENG Bangjie LI Daqiao ZHANG Di YAO Xuguang YANG

Pubricized:
2024/01/23
LETTER
- Summary
Controlling Chaotic Resonance with Extremely Local-Specific Feedback Signals
Takahiro IINUMA Yudai EBATO Sou NOBUKAWA Nobuhiko WAGATSUMA Keiichiro INAGAKI Hirotaka DOHO Teruya YAMANISHI Haruhiko NISHIMURA

Pubricized:
2024/01/17
PAPER
- Summary
International Competition on Graph Counting Algorithms 2023
Takeru INOUE Norihito YASUDA Hidetomo NABESHIMA Masaaki NISHINO Shuhei DENZUMI Shin-ichi MINATO

Pubricized:
2024/01/15
PAPER
- Summary
Backpressure Learning-Based Data Transmission Reliability-Aware Self-Organizing Networking for Power Line Communication in Distribution Network
Zhan SHI

Pubricized:
2024/01/15
PAPER
- Summary
Data-reuse extended NLMS algorithm based on optimized time-varying step-size for system identification
Hakan BERCAG Osman KUKRER Aykut HOCANIN

Pubricized:
2024/01/11
LETTER
- Summary
Experimental Evaluations on Learning-based Inter-radar Wideband Interference Mitigation Method
Ryoto Koizumi Xiaoyan Wang Masahiro Umehira Ran Sun Shigeki Takeda

Pubricized:
2024/01/11
PAPER
- Summary
Optimization of multi-component olfactory display using inkjet devices
Hiroya Hachiyama Takamichi Nakamoto

Pubricized:
2023/12/28
PAPER
- Summary
Choco Banana is NP-complete
Chuzo IWAMOTO Takeru TOKUNAGA

Pubricized:
2023/12/27
LETTER
- Summary
New classes of permutation quadrinomials over 𝔽_q³
Changhui CHEN Haibin KAN Jie PENG Li WANG

Pubricized:
2023/12/27
PAPER
- Summary
Dynamic Hybrid Beamforming-Based HAP Massive MIMO With Statistical CSI
Pingping JI Lingge JIANG Chen HE Di HE Zhuxian LIAN

Pubricized:
2023/12/25
LETTER
- Summary
Zero-order-hold triggered control of a chain of integrators with an arbitrary sampling period
Ho-Lim CHOI

Pubricized:
2023/12/25
LETTER
- Summary
Edge Device Verification Techniques for Updated Object Detection AI via Target Object Existence
Akira KITAYAMA Goichi ONO Hiroaki ITO

Pubricized:
2023/12/20
PAPER
- Summary
On Weighted-Sum Orthogonal Latin Squares and Secret Sharing
Koji NUIDA Tomoko ADACHI

Pubricized:
2023/12/19
LETTER
- Summary
Joint 2D and 3D semantic segmentation with consistent instance semantic
Yingcai WAN Lijin FANG

Pubricized:
2023/12/15
PAPER
- Summary
Coin-based Cryptographic Protocols without Hand Operations
Yuta MINAMIKAWA Kazumasa SHINAGAWA

Pubricized:
2023/12/13
PAPER
- Summary
Video Reflection Removal by Modified EDVR and 3D Convolution
Sota MORIYAMA Koichi ICHIGE Yuichi HORI Masayuki TACHI

Pubricized:
2023/12/11
LETTER
- Summary
CPNet: Covariance-Improved Prototype Network for Limited Samples Masked Face Recognition Using Few-Shot Learning
Sendren Sheng-Dong XU Albertus Andrie CHRISTIAN Chien-Peng HO Shun-Long WENG

Pubricized:
2023/12/11
PAPER
- Summary
CyCSNet: Learning Cycle-Consistency of Semantics for Weakly-Supervised Semantic Segmentation
Zhikui DUAN Xinmei YU Yi DING

Pubricized:
2023/12/11
PAPER
- Summary
Improved source localization method of the small-aperture array based on the Parasitic Fly's Coupled Ears and MUSIC-like algorithm
Hongbo LI Aijun LIU Qiang YANG Zhe LYU Di YAO

Pubricized:
2023/12/08
LETTER
- Summary
Analytical Model of Maximum Operating Frequency of Class-D ZVS Inverter with Linearized Parasitic Capacitance and any Duty Ratio
Yi XIONG Senanayake THILAK Yu YONEZAWA Jun IMAOKA Masayoshi YAMAMOTO

Pubricized:
2023/12/05
PAPER
- Summary
An Optimized CNN-Attention Network for Clipped OFDM Receiver of Underwater Acoustic Communications
Feng LIU Qian XI Yanli XU

Pubricized:
2023/12/01
LETTER
- Summary
Deep Learning-Based CSI Feedback for Terahertz Ultra-Massive MIMO Systems
Yuling LI Aihuang GUO

Pubricized:
2023/12/01
LETTER
- Summary
Advance sharing of quantum shares for quantum secrets
Mamoru SHIBATA Ryutaroh MATSUMOTO

Pubricized:
2023/11/24
PAPER
- Summary
An Investigation on LP Decoding of Short Binary Linear Codes With the Subgradient Method
Haiyang LIU Xiaopeng JIAO Lianrong MA

Pubricized:
2023/11/21
LETTER
- Summary
Privacy Preserving Function Evaluation using Lookup Tables with Word-Wise FHE
Ruixiao LI Hayato YAMANA

Pubricized:
2023/11/16
PAPER
- Summary
Efficient Wafer-level Spatial Variation Modeling for Multi-site RF IC Testing
Riaz-ul-haque MIAN Tomoki NAKAMURA Masuo KAJIYAMA Makoto EIKI Michihiro SHINTANI

Pubricized:
2023/11/16
PAPER
- Summary
Synchronization of canards in coupled canard-generating Bonhoeffer-van der Pol oscillators subject to weak periodic perturbations
Kundan LAL DAS Munehisa SEKIKAWA Tadashi TSUBONE Naohiko INABA Hideaki OKAZAKI

Pubricized:
2023/11/13
PAPER
- Summary

Whole issue

Volume E74-A No.7 (Publication Date:1991/07/25)

Special Issue on Continuous Speech Recognition and Understanding

FOREWORD
Katsuhiko SHIRAI Sadaoki FURUI

FOREWORD

Page(s):
1759-1760
- HTML
- PDF(141.7KB) >> Buy this Article
Robustness of Phoneme-Based HMMs against Speaking-Style Variations
Tatsuo MATSUOKA Kiyohiro SHIKANO

PAPER-Phoneme Recognition and Word Spotting

Page(s):
1761-1767
In a practical continuous speech recognition system, the target speech is often spoken in a different speaking-style (e.g., speed or loudness) from the training speech. It is difficult to cope with such speaking style variations because the amount of training speech is limited. Therefore, acoustic modeling should be robust against different styles of speech in order to obtain high recognition performance from the limited training speech. This paper describes robustness of six of phoneme-based HMMs against speaking-style variations. The six types of model were VQ-and FVQ-based discrete HMMs, and single-Gaussian and mixture-Gaussian HMMs with either diagonal or full covariance matrices. They were investigated using isolated word utterance, phrase-by-phrase utterance and fluently spoken utterance, with different utterance types for training and testing. The experimental results show that the mixture-Gaussian HMM with diagonal covariance matrices is the most promising choice. The FVQ-based HMM and the single-Gaussian HMM with full covariance matrices also achieved good results. The mixture-Gaussian HMM with full covariance matrices sometime achieved very high accuracies, but often suffered from "overtuning" or a lack of training data. Finally this paper proposes a new model-adaptation technique that combines multiple models with appropriate weighting factors. Each model has different characteristics (e.g., coverage of speaking styles and sensitivety to data), and weighting factors can be estimated using "deletedinterpolation". When the mixture-Gaussian diagonal covariance models were used as baseline models, this technique achieved better recognition accuracy than a model trained using all three utterance types at a time. The advantage of this technique is that estimating the weighting factors is stable even from a limited amount of training speech because there are few free parameters to be estimated.
Word Spotting Using Context-Dependent Phoneme-Based HMMs
Tatsuo MATSUOKA

PAPER-Phoneme Recognition and Word Spotting

Page(s):
1768-1772
In a practical continuous speech recognition system, input speech includes many extraneous words. Furthermore, detecting the beginning point of the target word is very difficult. Under those circumstances, word-spotting is useful for extracting and recognizing the target speech from such input speech. On the other hand, a phoneme-based HMM is useful for large-vocabulary word recognition. Training a phoneme-based HMM is easier and more stable than training a word-based HMM when there is not so much training speech, because there are several times more phoneme tokens than word tokens in the training speech. For these reasons, we use word-spotting with phoneme-based HMMs. Furthermore, for more precise modeling, we chose context-dependent phoneme modeling. This paper proposes a new clustering method for context-dependent phoneme HMMs. This clustering method uses triphone context when training samples are sufficient, and automatically selects biphone and uniphone contexts if only a few training samples are given. Using this clustering method, context-dependent models were created and tested in phoneme recognition experiments and word spotting experiments. The context-dependent models achieved 90.0% phoneme recognition accuracy that is 7.6% higher than the context-independent models, and they achieved 69.2% word spotting accuracy that is 7.0% higher than the context-independent models.
A Japanese Text Dictation System Based on Phoneme Recognition and a Dependency Grammar
Shozo MAKINO Akinori ITO Mitsuru ENDO Ken'iti KIDO

PAPER-Dictation Systems

Page(s):
1773-1782
This paper describes an overview of Japanese text dictation system composed of an acoustic processor and a linguistic processor. The system deals with 843 conceptual words and 431 functional words. The phoneme recognition is carried out using a modified LVQ2 method which we propose. The phoneme recognition score was 86.1% for 226 sentences uttered by two male speakers. The linguistic processor is composed of a processor for spotting Bunsetsu-units and a syntactic processor. The structure of the Bunsetsu-unit is effectively described by a finite-state automaton. The test-set perplexity of the finite-state automaton is 230. In the processor for spotting Bunsetsu-units, using a syntax-driven continuous-DP matching algorithm, the Bunsetsu-units are spotted from a recognized phoneme sequence and then a Bunsetsu-unit lattice is generated. In the syntactic processor, the Bunsetsu-unit lattice is parsed based on the dependency grammar. The dependency grammar is expressed as the correspondence between a FEATURE marker in a modifier-Bunsetsu and a SLOT-FILLER marker in a head-Bunsetsu. The recognition scores of the Bunsetsu-unit and conceptual words were 73.2% and 85.7% for 226 sentences uttered by the two male speakers.
Japanese Phonetic Typewriter Using HMM Phone Recognition and Stochastic Phone-Sequence Modeling
Takeshi KAWABATA Toshiyuki HANAZAWA Katsunobu ITOH Kiyohiro SHIKANO

PAPER-Dictation Systems

Page(s):
1783-1787
A phonetic typewriter is an unlimitedvocabulary continuous speech recognition system recognizing each phone in speech without the need for lexical information. This paper describes a Japanese phonetic typewriter system based on HMM phone recognition and syllable-based stochastic phone sequence modeling. Even though HMM methods have considerable capacity for recognizing speech, it is difficult to recognize individual phones in continuous speech without lexical information. HMM phone recognition is improved by incorporating syllable trigrams for phone sequence modeling. HMM phone units are trained using an isolated word database, and their duration parameters are modified according to speaking rate. Syllable trigram tables are made from a text database of over 300,000 syllables, and phone sequence probabilities calculated from the trigrams are combined with HMM probabilities. Using these probabilities, to limit the number of intermediate candidates leads to an accurate phonetic typewriter system without requiring excessive computation time. An interpolated n-gram approach to phone sequence modeling, is shown to be more effective than a simple trigram method.
Connected Spoken Word Recognition Using the Markov Model for the Feature Vector
Tomio TAKARA Tomoki YAKABU

PAPER-Continuous Speech Recognition

Page(s):
1788-1796
This paper reports on a new application of the Markov model to an automatic speech recognition system, in which the feature vectors of speech are regarded to represent the states and the output symbols of the Markov model. The transition-probability of the states and the symbol-output probability are assumed to be represented by multidimensional normal density functions of the feature vector. The DP-matching algorithm is used for calculating optimum time sequence of observed feature vectors. In order to confirm the efficiency of this system, we compared experimentally performance of this system to that of other approaches, such as those using Maharanobis' distance or Euclidean distance. Based on experimentation, in a speaker independent mode, using a vocabulary of Japanese single-digit and four-digit numerals, the current system is shown to be more effective than others.
An Integration of Knowledge and Neural Networks toward a Phoneme Typewriter without a Language Model
Yasuhiro KOMORI Kaichiro HATAZAKI

PAPER-Continuous Speech Recognition

Page(s):
1797-1805
In this paper, a speech recognition system toward a phoneme typewriter without a language model is proposed. The system is realized as an integration of spectrogram reading knowledge and Time-Delay Neural Networks (TDNNs). The system mainly consists of two parts: in the consonant recognition part, a sophisticated integration of knowledge and TDNN is proposed. This improves not only recognition performance and segmentation accuracy, but also reduces insertion errors drastically. In the vowel recognition part, a TDNN is used for detection and rough segmentation using its time shift tolerance advantage. The knowledge part is mainly used for verification of categories and boundaries. A phoneme recognition experiment on 2,620 Japanese words, uttered by one male speaker showed a 91.4% (11,612/12,710) recognition rate, a 3.6% deletion error rate, a 5.0% substitution error rate and a 20.7% insertion error rate, for all Japanese phonemes. This good result was obtained without any language model.
Continuous Speech Recognition Using Two-Level LR Parsing
Kenji KITA Toshiyuki TAKEZAWA Tsuyoshi MORIMOTO

PAPER-Continuous Speech Recognition

Page(s):
1806-1810
This paper describes a continuous speech recognition system using two-level LR parsing and phone based HMMs. ATR has already implemented a predictive LR parsing algorithm in an HMM-based speech recognition system for Japanese. However, up to now, this system has used only intra-phrase grammatical constraints. In Japanese, a sentence is composed of several phrases and thus, two kinds of grammars, namely an intra-phrase grammar and an inter-phrase grammar, are sufficient for recognizing sentences. Two-level LR parsing makes it possible to use not only intra-phrase grammatical constraints but also inter-phrase grammatical constraints during speech recognition. The system is applied to Japanese sentence recognition where sentences were uttered phrase by phrase, and attains a word accuracy of 95.9% and a sentence accuracy of 84.7%.
Processing Unknown Words in Continuous Speech Recognition
Kenji KITA Terumasa EHARA Tsuyoshi MORIMOTO

PAPER-Continuous Speech Recognition

Page(s):
1811-1816
Current continuous speech recognition systems essentially ignore unknown words. Systems are designed to recognize words in the lexicon. However, for using speech recognition systems in a real application such as spoken-language processing, it is very important to process unknown words. This paper proposes a continuous speech recognition method which accepts any utterance that might include unknown words. In this method, words not in the lexicon are transcribed as phone sequences, while words in the lexicon are recognized correctly. The HMM-LR speech recognition system, which is an integration of Hidden Markov Models and generalized LR parsing, is used as the baseline system, and enhanced with the trigram model of syllables to take into account the stochastic characteristics of a language. In our approach, two kinds of grammars, a task grammar which describes the task and a phonetic grammar which describes constraints between phones, are merged and used in the HMM-LR system. The system can output a phonetic transcription for an unknown word by using the phonetic grammar. Experiment results indicate that our approach is very promising.
A Large Vocabulary Continuous Speech Recognition System with High Predictability
Minoru SHIGENAGA Yoshihiro SEKIGUCHI Takehiro YAMAGUCHI Ryouta MASUDA

PAPER-Continuous Speech Recognition

Page(s):
1817-1825
A large vocabulary (with 1019 words and 1382 kinds of inflectional endings) continuous speech recognition system with high predictability applicable to any task and have an unsupervised speaker adaptation capability is described. Phoneme identification is based on various features. Speaker adaptation is done using reliable identified phonemes. Using prosodic information, phrase boundaries are detected. The syntactic analyzer uses a syntactic state transition network and outputs syntactic interpretations. The semantic analyser deals with the meaning of each word, the dependency relationships between words, the extended case structures of predicates, associative function, in universally applicable forms. The extended case grammar with a set of four-items of the case structure and the dependency relationships between words are based on semantic attributes of relating words, and realizes, together with associative function, universally applicable high prediction capability.
Continuous Speech Recognition Using a Dependency Grammar and Phoneme-Based HMMs
Sho-ichi MATSUNAGA Shigeru HOMMA Shigeki SAGAYAMA Sadaoki FURUI

PAPER-Continuous Speech Recognition

Page(s):
1826-1833
This paper describes two Japanese continuous speech recognition systems (system-1 and system-2) based on phoneme-based HMMs and a two-level grammar approach. Two grammars are an intra-phrase transition network grammar for phrase recognition, and an inter-phrase dependency grammar for sentence recognition. A joint score, combining acoustic likelihood and linguistic certainty factors derived from phonemebased HMMs and dependency rules, is maximized to obtain the best sentence recognition results. System-1 is tuned for sentences uttered phrase-by-phrase and system-2 is tuned for sentence utterances, to make the amount of computation practical. In system-1, two efficient parsing algorithms are used for each grammar. They are a bi-directional network parser and a breadth-first dependency parser. With the phrase-network parser, input phrase utterances are parsed bi-directionally both left-to-right and right-to-left, and optimal Viterbi paths are found along which the accumulated phonetic likelihood is maximized. The dependency parser utilizes efficient breadth-first search and beam search algorithms. For system-2, we have extended the dependency analysis algorithm for sentence utterances, using a technique for detecting most-likely multi-phrase candidates based on the Viterbi phrase alignment. Where the perplexity of the phrase syntax is 40, system-1 and system-2 increase phrase recognition performance in the sentence by approximately 6% and 14%, showing the effectiveness of semantic dependency analysis.
Connectionist Approaches to Large Vocabulary Continuous Speech Recognition
Hidefumi SAWAI Yasuhiro MINAMI Masanori MIYATAKE Alex WAIBEL Kiyohiro SHIKANO

PAPER-Continuous Speech Recognition

Page(s):
1834-1844
This paper describes recent progress in a connectionist large-vocabulary continuous speech recognition system integrating speech recognition and language processing. The speech recognition part consists of Large Phonemic Time-Delay Neural Networks (TDNNs) which can automatically spot all 24 Japanese phonemes (i.e., 18 consonants /b/, /d/, /g/, /p/, /t/, /k/, /m/, /n/, /N/, /s/, /sh/ ([]), /h/, /z/, /ch/ ([t]), /ts/, /r/, /w/, /y/([j]) and 5 vowels /a/, /i/, /u/, /e/, /o/ and a double consonant /Q/ or silence) by simply scanning among input speech without any specific segmentation techniques. On the other hand, the language processing part is made up of a predictive LR parser in which the LR parser is guided by the LR parsing table automatically generated from context-free grammar rules, and proceeds left-to-right without backtracking. Time alignment between the predicted phonemes and a sequence of the TDNN phoneme outputs is carried out by the DTW matching method. We call this 'hybrid' integrated recognition system the 'TDNN-LR' method. We report that large-vocabulary isolated word and continuous speech recognition using the TDNN-LR method provided excellent speaker-dependent recognition performance, where incremental training using a small number of training tokens is found to be very effective for adaptation of speaking rate. Furthermore, we report some new achievements as extensions of the TDNN-LR method: (1) two proposed NN architectures provide robust phoneme recognition performance on variations of speaking manner, (2) a speaker-adaptation technique can be realized using a NN mapping function between input and standard speakers and (3) new architectures proposed for speaker-independent recognition provide performance that nearly matches speaker-dependent recognition performance.
Spontaneous Speech Understanding Based on Cooperative Problem-Solving
Akio KOMATSU Eiji OOHIRA Akira ICHIKAWA

PAPER-Speech Understanding

Page(s):
1845-1853
Natural spontaneous speech is so ambiguous that a system for understanding it requires the cooperation of many knowledge sources. Thus, in order to integrate speech processing and language processing, it is necessary to provide a system with a mechanism for supporting such cooperation. We propose here a general framework for cooperative problemsolving, based on the blackboard model and a TMS (truth maintenance system), with an enhanced proving function. In this framework, a reasonably consistent interpretation is automatically kept on the blackboard, while each knowledge source performs its own inference and puts the results on the blackboard. Based on this framework, a model has been established for a system which can understand spontaneous speech through the cooperation of independent knowledge sources. Most notably, prosodic information is used as suprasegmental cues to infer the structure of spontaneous speech. This allows robust parsing of spoken sentences. The feasibility and validity of our basic framework have been confirmed by computer simulation experiments on spontaneous speech.
Comparison of Syntax-0riented Spoken Japanese Understanding System with Semantic-Oriented System
Seiichi NAKAGAWA Yoshimitsu HIRATA Isao MURASE Tomohiro TANOUE

PAPER-Speech Understanding

Page(s):
1854-1862
This paper describes syntax/semantics oriented spoken Japanese understanding systems named "SPOJUSSYNO/SEMO" and compares them. At first these systems make Hidden-Markov-Models (HMM) based on word units automatically by concatenating syllables. Then a word lattice is hypothsized by using a word spotting algorithm and word-based HMMs for an input utterance. In SPOJUS-SYNO, the time-synchronous left-to-right parsing algorithm is executed to find the best word sequence from the word lattice according to syntactic & semantic knowledge represented by a context free semantic grammar. In SPOJUS-SEMO, the knowledges of syntax and semantics are represented by a dependency and case grammar. These systems were implemented in the "UNIX-QA" task with the vocabulary size of 521 words. Experimental result shows that the sentence recognition/understanding rate was about 80/87% for six male speakers for the SPOJUS-SYNO, but was very low performance for the SPOJUS-SEMO.
SUSKIT---A Speech Understanding System Based on Robust Phone Spotting--
Yutaka KOBAYASHI Masanori OMOTE Hidenori ENDO Yasuhisa NIIMI

PAPER-Speech Understanding

Page(s):
1863-1869
This paper describes an overview of our speech understanding system and reports on the recent results of the sentence recognition experiments. The system, we call SUSKIT-, recognizes database queries in natural Japanese sentences. The user is expected to speak sentence by sentence. Among the difficult problems to overcome, this study paid the prime attentions to how to cope with the contextual variations of pronunciations and how to verify partial sentence hypotheses in a hierarchical system. The SUSKIT- predicts words strings in a top-down manner, however, the verification of hypotheses against the input speech is done using a unit independent of word boundaries. Words are not suitable units of verification because the smoothing effect owing to phonetic contexts makes it difficult to recognize short words. In order to avoid the misrecognition caused by the smoothing effect across word boundaries, the SUSKIT- dynamically extracts those phoneme strings bounded by the easily detectable phonemes from the predicted word string as verification templates. The left-to-right timesynchronous beam-search strategy was adopted for searching likely sentences. We carried out sentence recognition experiments using the speech corpus consists of 159 sentences read by three Japanese male speakers. The task perplexity was 8.3. Using the speaker-dependent HMM parameters, we obtained the sentence recognition rates of 83.0-92.5%.
A Generic Framework Based on ATMS for Speech Understanding Systems
Shingo NISHIOKA Osamu KAKUSHO Riichiro MIZOGUCHI

PAPER-Speech Understanding

Page(s):
1870-1880
A speech understanding system confronts with the ambiguities caused by the acoustic-phonetic errors and multiple-meaning of words. Thus the effective framework is required to resolve the ambiguity. The speech understanding system described in this paper deals with two different kind of phrase to avoid the combinatorial explosion. And the speech understanding system is constructed on ATMS based problem solving system to extract maximum performance. Experimental results show that, the time consumed by the speech understanding system reduces into 10. Furthermore, to evaluate the generality and effectiveness of the ATMS based problem solving system, the results of an another experiment are also presented in this paper.
MASCOTS: Dialog Management System for Speech Understanding System
Tetsuya YAMAMOTO Yoshikazu OHTA Yoichi YAMASHITA Osamu KAKUSHO Riichiro MIZOGUCHI

PAPER-Speech Understanding

Page(s):
1881-1888
This paper describes a dialog management system called MASCOTS which manages a dialog between a user and a problem solving system through spoken Japanese and helps the speech understanding system in its language processing. MASCOTS tries to predict the next user utterance based on the architecture for managing dialog with two stacks and plan information. MASCOTS not only contributes to making language processing efficient, but also works for a problem solving system. MASCOTS identifies the kind of the utterance and standardizes its representation form in place of a problem solving system. In this paper, the architecture of MASCOTS is discussed focusing on the characteristics of dialog and two ways of predicting the next user utterance exchanging the information with the language processing system.
Integration of Speech Recognition and Language Processing in a Japanese to English Spoken Language Translation System
Tsuyoshi MORIMOTO Kiyohiro SHIKANO Kiyoshi KOGURE Hitoshi IIDA Akira KUREMATSU

PAPER-Speech Understanding

Page(s):
1889-1896
The experimental spoken language translation system (SL-TRANS) has been implemented. It can recognize Japanese speech, translate it to English, and output a synthesized English speech. One of the most important problems in realizing such a system is how to integrate, or connect, speech recognition and language processing. In this paper, a new method realized in the system is described. The method is composed of three processes: grammar-driven predictive speech recognition, Kakariuke-dependency-based candidate filtering, and HPSG-based lattice parsing which is supplemented with a sentence preference mechanism. Input speech is uttered phrase by phrase. The speech recognizer takes an input phrase utterance and outputs several candidates with recognition scores for each phrase. Japanese phrasal grammar is used in recognition. It contributes to the output of grammatically well-formed phrase candidates, as well as to the reduction of phone perplexity. The candidate filter takes a phrase lattice, which is a sequence of multiple candidates for a phrase, and outputs a reduced phrase lattice. It removes semantically inappropriate phrase candidates by applying the Kakariuke dependency relationship between phrases. Finally, the HPSG-based lattice parser takes a phrase lattice and chooses the most plausible sentence by checking syntactic and semantic legitimacy or evaluating sentential preference. Experiment results for the system are also reported and the usefulness of the method is confirmed.
Comparison of Language Models by Context-Free Grammar, Bigram and Quasi/Simplified-Trigram
Seiichi NAKAGAWA Isao MURASE

PAPER-Language Modeling

Page(s):
1897-1905
In this paper, we investigate the language models using context-free grammar, bigram and quasi/simplified-trigram. For calculating of statistics of bigram and quasi/simplified-trigram, we used the set of sentences generated randomly from CFG that are legal in terms of semantics. We compared them on the perplexities for their models and the sentence recognition accuracies. The sentence recognition was experimented in the "UNIX-QA" task with the vocabulary size of 521 words. From these results, the perplexities of bigram and quasi-trigram were about 1.5-1.7 times and 1.2-1.3 times larger than the perplexity of CFG that corresponds to the most restricted grammar (perplexity=10.0), and we realized that quasi-trigram has the almost same ability of modeling as the restricted CFG when the set of plausible sentences in the task is given.
Creating Speech Copora for Speech Science and Technology
Shuichi ITAHASHI

PAPER-Speech Database

Page(s):
1906-1910
This paper describes recent speech database efforts in Japan in which the author has been involved. The JEIDA Japanese Common Speech Data Corpus was first reported in 1986. It has been converted to DAT recently. The JEIDA Noise Database has been released to the public recently. It contains various kinds of environmental noise and standard noise for sound level calibration. The 'Spoken Language' project collected speech data including continuous speech spoken by 10 males and 10 females. The 'Spoken Japanese' project, started in 1989, attempts to collect various dialectal speech from all over Japan and create speech databases. A compact disc containing a fairy tale and weather forecast spoken by 20 dialect speakers has been produced. It also describes the Continuous Speech Database Committee which was established recently by the Acoustical Society of Japan.

Regular Section

Alternate Approach to the Stability of Linear Combinations of Polynomials
Norio FUKUMA Takehiro MORI

PAPER-Control and Computing

Page(s):
1911-1914
A stability of convex combinations of polynomials and a stability margin of stable polynomials are studied using Hermite matrices for continuous-time systems. Available results are found to give a heavy computational burden especially in checking the stability of a polytope of polynomials by means of "the edge theorem". We propose alternate stability conditions and margin which reduce the computational burden. In our approach, the stability condition reported by Bialas and Garloff can be derived readily.
A Note on Dual Trail Partition of a Plane Graph
Shuichi UENO Katsufumi TSUJI Yoji KAJITANI

LETTER-Graphs, Networks and Matroids

Page(s):
1915-1917
Given a plane graph G, a trail of G is said to be dual if it is also a trail in the geometric dual of G. We show that the problem of partitioning the edges of G into the minimum number of dual trails is NP-hard.

IEICE TRANSACTIONS on Fundamentals

Advance publication (published online immediately after acceptance)

Volume E74-A No.7 (Publication Date:1991/07/25)

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles