IEICE global.ieice.org Site

Author Search Result

[Author] Mikio NAKANO(5hit)

1-5hit

Automatic Allocation of Training Data for Speech Understanding Based on Multiple Model Combinations
Kazunori KOMATANI Mikio NAKANO Masaki KATSUMARU Kotaro FUNAKOSHI Tetsuya OGATA Hiroshi G. OKUNO

PAPER-Speech and Hearing

Vol:
E95-D No:9
Page(s):
2298-2307
The optimal way to build speech understanding modules depends on the amount of training data available. When only a small amount of training data is available, effective allocation of the data is crucial to preventing overfitting of statistical methods. We have developed a method for allocating a limited amount of training data in accordance with the amount available. Our method exploits rule-based methods for when the amount of data is small, which are included in our speech understanding framework based on multiple model combinations, i.e., multiple automatic speech recognition (ASR) modules and multiple language understanding (LU) modules, and then allocates training data preferentially to the modules that dominate the overall performance of speech understanding. Experimental evaluation showed that our allocation method consistently outperforms baseline methods that use a single ASR module and a single LU module while the amount of training data increases.
Posteriori Restoration of Turn-Taking and ASR Results for Incorrectly Segmented Utterances
Kazunori KOMATANI Naoki HOTTA Satoshi SATO Mikio NAKANO

PAPER-Speech and Hearing

Pubricized:
2015/07/24
Vol:
E98-D No:11
Page(s):
1923-1931
Appropriate turn-taking is important in spoken dialogue systems as well as generating correct responses. Especially if the dialogue features quick responses, a user utterance is often incorrectly segmented due to short pauses within it by voice activity detection (VAD). Incorrectly segmented utterances cause problems both in the automatic speech recognition (ASR) results and turn-taking: i.e., an incorrect VAD result leads to ASR errors and causes the system to start responding though the user is still speaking. We develop a method that performs a posteriori restoration for incorrectly segmented utterances and implement it as a plug-in for the MMDAgent open-source software. A crucial part of the method is to classify whether the restoration is required or not. We cast it as a binary classification problem of detecting originally single utterances from pairs of utterance fragments. Various features are used representing timing, prosody, and ASR result information. Experiments show that the proposed method outperformed a baseline with manually-selected features by 4.8% and 3.9% in cross-domain evaluations with two domains. More detailed analysis revealed that the dominant and domain-independent features were utterance intervals and results from the Gaussian mixture model (GMM).
Discriminating Unknown Objects from Known Objects Using Image and Speech Information
Yuko OZASA Mikio NAKANO Yasuo ARIKI Naoto IWAHASHI

PAPER-Multimedia Pattern Processing

Pubricized:
2014/12/16
Vol:
E98-D No:3
Page(s):
704-711
This paper deals with a problem where a robot identifies an object that a human asks it to bring by voice when there is a set of objects that the human and the robot can see. When the robot knows the requested object, it must identify the object and when it does not know the object, it must say it does not. This paper presents a new method for discriminating unknown objects from known objects using object images and human speech. It uses a confidence measure that integrates image recognition confidences and speech recognition confidences based on logistic regression.
Ranking Multiple Dialogue States by Corpus Statistics to Improve Discourse Understanding in Spoken Dialogue Systems
Ryuichiro HIGASHINAKA Mikio NAKANO

PAPER-Natural Language Processing

Vol:
E92-D No:9
Page(s):
1771-1782
This paper discusses the discourse understanding process in spoken dialogue systems. This process enables a system to understand user utterances from the context of a dialogue. Ambiguity in user utterances caused by multiple speech recognition hypotheses and parsing results sometimes makes it difficult for a system to decide on a single interpretation of a user intention. As a solution, the idea of retaining possible interpretations as multiple dialogue states and resolving the ambiguity using succeeding user utterances has been proposed. Although this approach has proven to improve discourse understanding accuracy, carefully created hand-crafted rules are necessary in order to accurately rank the dialogue states. This paper proposes automatically ranking multiple dialogue states using statistical information obtained from dialogue corpora. The experimental results in the train ticket reservation and weather information service domains show that the statistical information can significantly improve the ranking accuracy of dialogue states as well as the slot accuracy and the concept error rate of the top-ranked dialogue states.
A Method for Predicting Stressed Words in Teaching Materials for English Jazz Chants
Ryo NAGATA Kotaro FUNAKOSHI Tatsuya KITAMURA Mikio NAKANO

PAPER-Educational Technology

Vol:
E95-D No:11
Page(s):
2658-2663
To acquire a second language, one must develop an ear and tongue for the correct stress and intonation patterns of that language. In English language teaching, there is an effective method called Jazz Chants for working on the sound system. In this paper, we propose a method for predicting stressed words, which play a crucial role in Jazz Chants. The proposed method is specially designed for stress prediction in Jazz chants. It exploits several sources of information including words, POSs, sentence types, and the constraint on the number of stressed words in a chant text. Experiments show that the proposed method achieves an F-measure of 0.939 and outperforms the other methods implemented for comparison. The proposed method is expected to be useful in supporting non-native teachers of English when they teach chants to students and create chant texts with stress marks from arbitrary texts.

Author Search Result

[Author] Mikio NAKANO(5hit)

Automatic Allocation of Training Data for Speech Understanding Based on Multiple Model Combinations

Posteriori Restoration of Turn-Taking and ASR Results for Incorrectly Segmented Utterances

Discriminating Unknown Objects from Known Objects Using Image and Speech Information

Ranking Multiple Dialogue States by Corpus Statistics to Improve Discourse Understanding in Spoken Dialogue Systems

A Method for Predicting Stressed Words in Teaching Materials for English Jazz Chants

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles