IEICE global.ieice.org Site

Keyword Search Result

[Keyword] spoken language translation(2hit)

1-2hit

Neural End-To-End Speech Translation Leveraged by ASR Posterior Distribution Open Access
Yuka KO Katsuhito SUDOH Sakriani SAKTI Satoshi NAKAMURA

PAPER-Speech and Hearing

Pubricized:
2024/05/24
Vol:
E107-D No:10
Page(s):
1322-1331
End-to-end speech translation (ST) directly renders source language speech to the target language without intermediate automatic speech recognition (ASR) output as in a cascade approach. End-to-end ST avoids error propagation from intermediate ASR results. Although recent attempts have applied multi-task learning using an auxiliary task of ASR to improve ST performance, they use cross-entropy loss to one-hot references in the ASR task, and the trained ST models do not consider possible ASR confusion. In this study, we propose a novel multi-task learning framework for end-to-end STs leveraged by ASR-based loss against posterior distributions obtained using a pre-trained ASR model called ASR posterior-based loss (ASR-PBL). The ASR-PBL method, which enables a ST model to reflect possible ASR confusion among competing hypotheses with similar pronunciations, can be applied to one of the strong multi-task ST baseline models with Hybrid CTC/Attention ASR task loss. In our experiments on the Fisher Spanish-to-English corpus, the proposed method demonstrated better BLEU results than the baseline that used standard CE loss.
Multimedia Spoken Language Translation
Jae-Woo YANG Youngjik LEE Jin-H. KIM

INVITED PAPER

Vol:
E79-D No:6
Page(s):
653-658
This paper is concerned with spoken language translation in multimedia communication environment. We summarize the current research activities, by describing various spoken language translation systems and the multimodal technology related to spoken language translation. We propose a spoken language translation system that exploits the multimedia communication environment in order to overcome the limits caused by imperfect speech recognition. Our approach is in contrast to that of most conventional speech translation systems that limit their dialogue domains to obtain better speech recognition. We also propose a performance measure for spoken language translation systems. Our measure is defined as the ratio of the information quantities at each end of communication. Using this measure, we show that multimedia enhance spoken language translation systems.

Keyword Search Result

[Keyword] spoken language translation(2hit)

Neural End-To-End Speech Translation Leveraged by ASR Posterior Distribution Open Access

Multimedia Spoken Language Translation

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles