The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] spoken language translation(2hit)

1-2hit
  • Neural End-To-End Speech Translation Leveraged by ASR Posterior Distribution Open Access

    Yuka KO  Katsuhito SUDOH  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2024/05/24
      Vol:
    E107-D No:10
      Page(s):
    1322-1331

    End-to-end speech translation (ST) directly renders source language speech to the target language without intermediate automatic speech recognition (ASR) output as in a cascade approach. End-to-end ST avoids error propagation from intermediate ASR results. Although recent attempts have applied multi-task learning using an auxiliary task of ASR to improve ST performance, they use cross-entropy loss to one-hot references in the ASR task, and the trained ST models do not consider possible ASR confusion. In this study, we propose a novel multi-task learning framework for end-to-end STs leveraged by ASR-based loss against posterior distributions obtained using a pre-trained ASR model called ASR posterior-based loss (ASR-PBL). The ASR-PBL method, which enables a ST model to reflect possible ASR confusion among competing hypotheses with similar pronunciations, can be applied to one of the strong multi-task ST baseline models with Hybrid CTC/Attention ASR task loss. In our experiments on the Fisher Spanish-to-English corpus, the proposed method demonstrated better BLEU results than the baseline that used standard CE loss.

  • Multimedia Spoken Language Translation

    Jae-Woo YANG  Youngjik LEE  Jin-H. KIM  

     
    INVITED PAPER

      Vol:
    E79-D No:6
      Page(s):
    653-658

    This paper is concerned with spoken language translation in multimedia communication environment. We summarize the current research activities, by describing various spoken language translation systems and the multimodal technology related to spoken language translation. We propose a spoken language translation system that exploits the multimedia communication environment in order to overcome the limits caused by imperfect speech recognition. Our approach is in contrast to that of most conventional speech translation systems that limit their dialogue domains to obtain better speech recognition. We also propose a performance measure for spoken language translation systems. Our measure is defined as the ratio of the information quantities at each end of communication. Using this measure, we show that multimedia enhance spoken language translation systems.