IEICE global.ieice.org Site

Keyword Search Result

[Keyword] speech translation(8hit)

1-8hit

Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation
Sashi NOVITASARI Sakriani SAKTI Satoshi NAKAMURA

PAPER-Speech and Hearing

Pubricized:
2021/08/27
Vol:
E104-D No:12
Page(s):
2195-2208
Real-time machine speech translation systems mimic human interpreters and translate incoming speech from a source language to the target language in real-time. Such systems can be achieved by performing low-latency processing in ASR (automatic speech recognition) module before passing the output to MT (machine translation) and TTS (text-to-speech synthesis) modules. Although several studies recently proposed sequence mechanisms for neural incremental ASR (ISR), these frameworks have a more complicated training mechanism than the standard attention-based ASR because they have to decide the incremental step and learn the alignment between speech and text. In this paper, we propose attention-transfer ISR (AT-ISR) that learns the knowledge from attention-based non-incremental ASR for a low delay end-to-end speech recognition. ISR comes with a trade-off between delay and performance, so we investigate how to reduce AT-ISR delay without a significant performance drop. Our experiment shows that AT-ISR achieves a comparable performance to the non-incremental ASR when the incremental recognition begins after the speech utterance reaches 25% of the complete utterance length. Additional experiments to investigate the effect of ISR on translation tasks are also performed. The focus is to find the optimum granularity of the output unit. The results reveal that our end-to-end subword-level ISR resulted in the best translation quality with the lowest WER and the lowest uncovered-word rate.
Development of the “VoiceTra” Multi-Lingual Speech Translation System Open Access
Shigeki MATSUDA Teruaki HAYASHI Yutaka ASHIKARI Yoshinori SHIGA Hidenori KASHIOKA Keiji YASUDA Hideo OKUMA Masao UCHIYAMA Eiichiro SUMITA Hisashi KAWAI Satoshi NAKAMURA

INVITED PAPER

Pubricized:
2017/01/13
Vol:
E100-D No:4
Page(s):
621-632
This study introduces large-scale field experiments of VoiceTra, which is the world's first speech-to-speech multilingual translation application for smart phones. In the study, approximately 10 million input utterances were collected since the experiments commenced. The usage of collected data was analyzed and discussed. The study has several important contributions. First, it explains system configuration, communication protocol between clients and servers, and details of multilingual automatic speech recognition, multilingual machine translation, and multilingual speech synthesis subsystems. Second, it demonstrates the effects of mid-term system updates using collected data to improve an acoustic model, a language model, and a dictionary. Third, it analyzes system usage.
Consolidation-Based Speech Translation and Evaluation Approach
Chiori HORI Bing ZHAO Stephan VOGEL Alex WAIBEL Hideki KASHIOKA Satoshi NAKAMURA

PAPER-Speech and Hearing

Vol:
E92-D No:3
Page(s):
477-488
The performance of speech translation systems combining automatic speech recognition (ASR) and machine translation (MT) systems is degraded by redundant and irrelevant information caused by speaker disfluency and recognition errors. This paper proposes a new approach to translating speech recognition results through speech consolidation, which removes ASR errors and disfluencies and extracts meaningful phrases. A consolidation approach is spun off from speech summarization by word extraction from ASR 1-best. We extended the consolidation approach for confusion network (CN) and tested the performance using TED speech and confirmed the consolidation results preserved more meaningful phrases in comparison with the original ASR results. We applied the consolidation technique to speech translation. To test the performance of consolidation-based speech translation, Chinese broadcast news (BN) speech in RT04 were recognized, consolidated and then translated. The speech translation results via consolidation cannot be directly compared with gold standards in which all words in speech are translated because consolidation-based translations are partial translations. We would like to propose a new evaluation framework for partial translation by comparing them with the most similar set of words extracted from a word network created by merging gradual summarizations of the gold standard translation. The performance of consolidation-based MT results was evaluated using BLEU. We also propose Information Preservation Accuracy (IPAccy) and Meaning Preservation Accuracy (MPAccy) to evaluate consolidation and consolidation-based MT. We confirmed that consolidation contributed to the performance of speech translation.
Training Set Selection for Building Compact and Efficient Language Models
Keiji YASUDA Hirofumi YAMAMOTO Eiichiro SUMITA

PAPER-Natural Language Processing

Vol:
E92-D No:3
Page(s):
506-511
For statistical language model training, target domain matched corpora are required. However, training corpora sometimes include both target domain matched and unmatched sentences. In such a case, training set selection is effective for both reducing model size and improving model performance. In this paper, training set selection method for statistical language model training is described. The method provides two advantages for training a language model. One is its capacity to improve the language model performance, and the other is its capacity to reduce computational loads for the language model. The method has four steps. 1) Sentence clustering is applied to all available corpora. 2) Language models are trained on each cluster. 3) Perplexity on the development set is calculated using the language models. 4) For the final language model training, we use the clusters whose language models yield low perplexities. The experimental results indicate that the language model trained on the data selected by our method gives lower perplexity on an open test set than a language model trained on all available corpora.
An Objective Method for Evaluating Speech Translation System: Using a Second Language Learner's Corpus
Keiji YASUDA Fumiaki SUGAYA Toshiyuki TAKEZAWA Genichiro KIKUI Seiichi YAMAMOTO Masuzo YANAGIDA

PAPER-Speech Corpora and Related Topics

Vol:
E88-D No:3
Page(s):
569-577
In this paper we propose an objective method for assessing the capability of a speech translation system. It automates the translation paired comparison method, which gives a simple, easy to understand TOEIC score proposed by Sugaya et al., to succinctly evaluate a speech translation system. To avoid the expensive evaluation cost of the original method where large manual effort is required, the new objective method automates the procedure by employing an objective metric such as BLEU and DP-based measure. The evaluation results obtained by the proposed method are similar to those of the original method. Also, the proposed method is used to evaluate the usefulness of a speech translation system. It is then found that our speech translation system is useful in general, even to users with higher TOEIC score than the system's.
A Speech Translation System Applied to a Real-World Task/Domain and Its Evaluation Using Real-World Speech Data
Atsushi NAKAMURA Masaki NAITO Hajime TSUKADA Rainer GRUHN Eiichiro SUMITA Hideki KASHIOKA Hideharu NAKAJIMA Tohru SHIMIZU Yoshinori SAGISAKA

PAPER-Speech and Hearing

Vol:
E84-D No:1
Page(s):
142-154
This paper describes an application of a speech translation system to another task/domain in the real-world by using developmental data collected from real-world interactions. The total cost for this task-alteration was calculated to be 9 Person-Month. The newly applied system was also evaluated by using speech data collected from real-world interactions. For real-world speech having a machine-friendly speaking style, the newly applied system could recognize typical sentences with a word accuracy of 90% or better. We also found that, concerning the overall speech translation performance, the system could translate about 80% of the input Japanese speech into acceptable English sentences.
A Unification-Based Japanese Parser for Speech-to-Speech Translation
Masaaki NAGATA Tsuyoshi MORIMOTO

PAPER

Vol:
E76-D No:1
Page(s):
51-61
A unification-based Japanese parser has been implemented for an experimental Japanese-to-English spoken language translation system (SL-TRANS). The parser consists of a unification-based spoken-style Japanese grammar and an active chart parser. The grammar handles the syntactic, semantic, and pragmatic constraints in an integrated fashion using HPSG-based framework in order to cope with speech recognition errors. The parser takes multiple sentential candidates from the HMM-LR speech recognizer, and produces a semantic representation associated with the best scoring parse based on acoustic and linguistic plausibility. The unification-based parser has been tested using 12 dialogues in the conference registration domain, which include 261 sentences uttered by one male speaker. The sentence recognition accuracy of the underlying speech recognizer is 73.6% for the top candidate, and 83.5% for the top three candidates, where the test-set perplexity of the CFG grammar is 65. By ruling out erroneous speech recognition results using various linguistic constraints, the parser improves the sentence recognition accuracy up to 81.6% for the top candidate, and 85.8% for the top three candidates. From the experiment result, we found that the combination of syntactic restriction, selectional restriction and coordinate structure restriction can provide a sufficient restriction to rule out the recognition errors between case-marking particles with the same vowel, which are the type of errors most likely to occur. However, we also found that it is necessary to use pragmatic information, such as topic, presupposition, and discourse structure, to rule out the recognition errors involved with topicalizing particles and sentence final particles.
Future Perspective of Automatic Telephone Interpretation
Akira KUREMATSU

INVITED PAPER

Vol:
E75-B No:1
Page(s):
14-19
This paper describes the future perspective of automatic telephone interpretation using a multimedia intelligent communication network. The need for language interpretation over a telecommunication system creates a strong drive toward integrating information modalities for voice, image, data, computation and conferencing into modern systems using the capability of language interpretation. An automatic telephone interpretation system will solve the problems of language differences in international human-to-human communication. The future prospective of advanced multimedia language communication will be stated as the versatile application of an integrated intelligent network.

Keyword Search Result

[Keyword] speech translation(8hit)

Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation

Development of the “VoiceTra” Multi-Lingual Speech Translation System Open Access

Consolidation-Based Speech Translation and Evaluation Approach

Training Set Selection for Building Compact and Efficient Language Models

An Objective Method for Evaluating Speech Translation System: Using a Second Language Learner's Corpus

A Speech Translation System Applied to a Real-World Task/Domain and Its Evaluation Using Real-World Speech Data

A Unification-Based Japanese Parser for Speech-to-Speech Translation

Future Perspective of Automatic Telephone Interpretation

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles