The search functionality is under construction.
The search functionality is under construction.

A Lightweight End-to-End Speech Recognition System on Embedded Devices

Yu WANG, Hiromitsu NISHIZAKI

  • Full Text Views

    5

  • Cite this

Summary :

In industry, automatic speech recognition has come to be a competitive feature for embedded products with poor hardware resources. In this work, we propose a tiny end-to-end speech recognition model that is lightweight and easily deployable on edge platforms. First, instead of sophisticated network structures, such as recurrent neural networks, transformers, etc., the model we propose mainly uses convolutional neural networks as its backbone. This ensures that our model is supported by most software development kits for embedded devices. Second, we adopt the basic unit of MobileNet-v3, which performs well in computer vision tasks, and integrate the features of the hidden layer at different scales, thus compressing the number of parameters of the model to less than 1 M and achieving an accuracy greater than that of some traditional models. Third, in order to further reduce the CPU computation, we directly extract acoustic representations from 1-dimensional speech waveforms and use a self-supervised learning approach to encourage the convergence of the model. Finally, to solve some problems where hardware resources are relatively weak, we use a prefix beam search decoder to dynamically extend the search path with an optimized pruning strategy and an additional initialism language model to capture the probability of between-words in advance and thus avoid premature pruning of correct words. In our experiments, according to a number of evaluation categories, our end-to-end model outperformed several tiny speech recognition models used for embedded devices in related work.

Publication
IEICE TRANSACTIONS on Information Vol.E106-D No.7 pp.1230-1239
Publication Date
2023/07/01
Publicized
2023/04/13
Online ISSN
1745-1361
DOI
10.1587/transinf.2022EDP7221
Type of Manuscript
PAPER
Category
Speech and Hearing

Authors

Yu WANG
  Streamax Technology Co., Ltd.,University of Yamanashi
Hiromitsu NISHIZAKI
  University of Yamanashi

Keyword