IEICE global.ieice.org Site

Keyword Search Result

[Keyword] confusion(8hit)

1-8hit

A Note on the Confusion Coefficient of Boolean Functions
Yu ZHOU Jianyong HU Xudong MIAO Xiaoni DU

PAPER-Cryptography and Information Security

Pubricized:
2023/05/24
Vol:
E106-A No:12
Page(s):
1525-1530
Low confusion coefficient values can make side-channel attacks harder for vector Boolean functions in Block cipher. In this paper, we give new results of confusion coefficient for f ⊞ g, f ⊡ g, f ⊕ g and fg for different Boolean functions f and g, respectively. And we deduce a relationship on the sum-of-squares of the confusion coefficient between one n-variable function and two (n - 1)-variable decomposition functions. Finally, we find that the confusion coefficient of vector Boolean functions is affine invariant.
Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection
Naoki SAWADA Hiromitsu NISHIZAKI

PAPER-Spoken term detection

Pubricized:
2016/07/19
Vol:
E99-D No:10
Page(s):
2518-2527
This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections.
Error Correction Using Long Context Match for Smartphone Speech Recognition
Yuan LIANG Koji IWANO Koichi SHINODA

PAPER-Speech and Hearing

Pubricized:
2015/07/31
Vol:
E98-D No:11
Page(s):
1932-1942
Most error correction interfaces for speech recognition applications on smartphones require the user to first mark an error region and choose the correct word from a candidate list. We propose a simple multimodal interface to make the process more efficient. We develop Long Context Match (LCM) to get candidates that complement the conventional word confusion network (WCN). Assuming that not only the preceding words but also the succeeding words of the error region are validated by users, we use such contexts to search higher-order n-grams corpora for matching word sequences. For this purpose, we also utilize the Web text data. Furthermore, we propose a combination of LCM and WCN (“LCM + WCN”) to provide users with candidate lists that are more relevant than those yielded by WCN alone. We compare our interface with the WCN-based interface on the Corpus of Spontaneous Japanese (CSJ). Our proposed “LCM + WCN” method improved the 1-best accuracy by 23%, improved the Mean Reciprocal Rank (MRR) by 28%, and our interface reduced the user's load by 12%.
Robust and Fast Phonetic String Matching Method for Lyric Searching Based on Acoustic Distance
Xin XU Tsuneo KATO

PAPER-Music Information Processing

Vol:
E97-D No:9
Page(s):
2501-2509
This paper proposes a robust and fast lyric search method for music information retrieval (MIR). The effectiveness of lyric search systems based on full-text retrieval engines or web search engines is highly compromised when the queries of lyric phrases contain incorrect parts due to mishearing. To improve the robustness of the system, the authors introduce acoustic distance, which is computed based on a confusion matrix of an automatic speech recognition experiment, into Dynamic-Programming (DP)-based phonetic string matching to identify the songs that the misheard lyric phrases refer to. An evaluation experiment verified that the search accuracy is increased by 4.4% compared with the conventional method. Furthermore, in this paper a two-pass search algorithm is proposed to realize real-time execution. The algorithm pre-selects the probable candidates using a rapid index-based search in the first pass and executes a DP-based search process with an adaptive termination strategy in the second pass. Experimental results show that the proposed search method reduced processing time by more than 86.2% compared with the conventional methods for the same search accuracy.
Improving the Readability of ASR Results for Lectures Using Multiple Hypotheses and Sentence-Level Knowledge
Yasuhisa FUJII Kazumasa YAMAMOTO Seiichi NAKAGAWA

PAPER-Speech and Hearing

Vol:
E95-D No:4
Page(s):
1101-1111
This paper presents a novel method for improving the readability of automatic speech recognition (ASR) results for classroom lectures. Because speech in a classroom is spontaneous and contains many ill-formed utterances with various disfluencies, the ASR result should be edited to improve the readability before presenting it to users, by applying some operations such as removing disfluencies, determining sentence boundaries, inserting punctuation marks and repairing dropped words. Owing to the presence of many kinds of domain-dependent words and casual styles, even state-of-the-art recognizers can only achieve a 30-50% word error rate for speech in classroom lectures. Therefore, a method for improving the readability of ASR results is needed to make it robust to recognition errors. We can use multiple hypotheses instead of the single-best hypothesis as a method to achieve a robust response to recognition errors. However, if the multiple hypotheses are represented by a lattice (or a confusion network), it is difficult to utilize sentence-level knowledge, such as chunking and dependency parsing, which are imperative for determining the discourse structure and therefore imperative for improving readability. In this paper, we propose a novel algorithm that infers clean, readable transcripts from spontaneous multiple hypotheses represented by a confusion network while integrating sentence-level knowledge. Automatic and manual evaluations showed that using multiple hypotheses and sentence-level knowledge is effective to improve the readability of ASR results, while preserving the understandability.
Formal Detection of Three Automation Surprises in Human-Machine Interaction
Yoshitaka UKAWA Toshimitsu USHIO Masakazu ADACHI Shigemasa TAKAI

PAPER-Concurrent Systems

Vol:
E87-A No:11
Page(s):
2878-2884
In this paper, we propose a formal method for detection of three automation surprises in human-machine interaction; a mode confusion, a refusal state, and a blocking state. The mode confusion arises when a machine is in a different mode from that anticipated by the user, and is the most famous automation surprise. The refusal state is a situation that the machine does not respond to a command the user executes. The blocking state is a situation where an internal event occurs, leading to change of an interface the user does not know. In order to detect these phenomena, we propose a composite model in which a machine and a user model evolve concurrently. We show that the detection of these phenomena in human-machine interaction can be reduced to a reachability problem in the composite model.
Necessary and Sufficient Condition for Liveness of Asymmetric Choice Petri Nets
Tadashi MATSUMOTO Yasuhiko TSURUTA

PAPER

Vol:
E80-A No:3
Page(s):
521-533
Petri net is a graphical and mathematical tool for modelling, analysis, verification, and evaluation of discrete event systems. Liveness is one of the most important problems of Petri net analysis. This is concerned with a capability for firing of transitions and can be interpreted as a problem to decide whether the system under consideration is always able to reach a stationary behavior, or to decide whether the system is free from any redundant elements. An asymmetric choice (AC) net is a superclass of useful subclasses such as EFCs, FCs, SMs, and MGs, where SMs admit no synchronization, MGs admit no conflicts, FCs as well as EFCs admit no confusion, and ACs allow asymmetric confusion but disallow symmetric confusion. It is known that an AC net N is live iff it is place-live, but this is not the "initial-marking-based" condition and place-liveness is in general hard to test. For the initial-marking-based liveness for AC nets, it is only known that an AC net N is live if (but not only if) every deadlock in N contains a marked structural trap.
A Study on Speaker Adaptation for Mandarin Syllable Recognition with Minimum Error Discriminative Training
Chih-Heng LIN Chien-Hsing WU Pao-Chung CHANG

PAPER

Vol:
E78-D No:6
Page(s):
712-718
This paper investigates a different method of speaker adaptation for Mandarin syllable recognition. Based on the minimum classification error (MCE) criterion, we use the generalized probabilistic decent (GPD) algorithm to adjust interatively the parameters of the hidden Markov models (HMM). The experiments on the multi-speaker Mandarin syllable database of Telecommunication Laboratories (T.L.) yield the following results: 1) Efficient speaker adaptation can be achieved through discriminative training using the MCE criterion and the GPD algorithm. 2) The computations required can be reduced through the use of the confusion sets in Mandarin base syllables. 3) For the discriminative training, the adjustment on the mean values of the Gaussian mixtures has the most prominent effect on speaker adaptation. 4) The discriminative training approach can be used to enhance the speaker adaptation capability of the maximum a posteriori (MAP) approach.

Keyword Search Result

[Keyword] confusion(8hit)

A Note on the Confusion Coefficient of Boolean Functions

Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection

Error Correction Using Long Context Match for Smartphone Speech Recognition

Robust and Fast Phonetic String Matching Method for Lyric Searching Based on Acoustic Distance

Improving the Readability of ASR Results for Lectures Using Multiple Hypotheses and Sentence-Level Knowledge

Formal Detection of Three Automation Surprises in Human-Machine Interaction

Necessary and Sufficient Condition for Liveness of Asymmetric Choice Petri Nets

A Study on Speaker Adaptation for Mandarin Syllable Recognition with Minimum Error Discriminative Training

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles