Label-Adversarial Jointly Trained Acoustic Word Embedding

Zhaoqi LI; Ta LI; Qingwei ZHAO; Pengyuan ZHANG

doi:10.1587/transinf.2022EDL8012

Label-Adversarial Jointly Trained Acoustic Word Embedding

Zhaoqi LI, Ta LI, Qingwei ZHAO, Pengyuan ZHANG

Full Text Views

0

Cite this

Summary :

Query-by-example spoken term detection (QbE-STD) is a task of using speech queries to match utterances, and the acoustic word embedding (AWE) method of generating fixed-length representations for speech segments has shown high performance and efficiency in recent work. We propose an AWE training method using a label-adversarial network to reduce the interference information learned during AWE training. Experiments demonstrate that our method achieves significant improvements on multilingual and zero-resource test sets.

Publication: IEICE TRANSACTIONS on Information Vol.E105-D No.8 pp.1501-1505

Publication Date: 2022/08/01

Publicized: 2022/05/20

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2022EDL8012

Type of Manuscript: LETTER

Category: Speech and Hearing

Authors

Zhaoqi LI
  Chinese Academy of Sciences,University of Chinese Academy of Sciences
Ta LI
  Chinese Academy of Sciences,University of Chinese Academy of Sciences
Qingwei ZHAO
  Chinese Academy of Sciences,University of Chinese Academy of Sciences
Pengyuan ZHANG
  Chinese Academy of Sciences,University of Chinese Academy of Sciences

Keyword

query-by-example, spoken term detection, acoustic word embeddings, gradient reversal layer

Cite this

Copy

Zhaoqi LI, Ta LI, Qingwei ZHAO, Pengyuan ZHANG, "Label-Adversarial Jointly Trained Acoustic Word Embedding" in IEICE TRANSACTIONS on Information, vol. E105-D, no. 8, pp. 1501-1505, August 2022, doi: 10.1587/transinf.2022EDL8012.
Abstract: Query-by-example spoken term detection (QbE-STD) is a task of using speech queries to match utterances, and the acoustic word embedding (AWE) method of generating fixed-length representations for speech segments has shown high performance and efficiency in recent work. We propose an AWE training method using a label-adversarial network to reduce the interference information learned during AWE training. Experiments demonstrate that our method achieves significant improvements on multilingual and zero-resource test sets.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDL8012/_p

Copy

@ARTICLE{e105-d_8_1501,
author={Zhaoqi LI, Ta LI, Qingwei ZHAO, Pengyuan ZHANG, },
journal={IEICE TRANSACTIONS on Information},
title={Label-Adversarial Jointly Trained Acoustic Word Embedding},
year={2022},
volume={E105-D},
number={8},
pages={1501-1505},
abstract={Query-by-example spoken term detection (QbE-STD) is a task of using speech queries to match utterances, and the acoustic word embedding (AWE) method of generating fixed-length representations for speech segments has shown high performance and efficiency in recent work. We propose an AWE training method using a label-adversarial network to reduce the interference information learned during AWE training. Experiments demonstrate that our method achieves significant improvements on multilingual and zero-resource test sets.},
keywords={},
doi={10.1587/transinf.2022EDL8012},
ISSN={1745-1361},
month={August},}

Copy

TY - JOUR
TI - Label-Adversarial Jointly Trained Acoustic Word Embedding
T2 - IEICE TRANSACTIONS on Information
SP - 1501
EP - 1505
AU - Zhaoqi LI
AU - Ta LI
AU - Qingwei ZHAO
AU - Pengyuan ZHANG
PY - 2022
DO - 10.1587/transinf.2022EDL8012
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E105-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2022
AB - Query-by-example spoken term detection (QbE-STD) is a task of using speech queries to match utterances, and the acoustic word embedding (AWE) method of generating fixed-length representations for speech segments has shown high performance and efficiency in recent work. We propose an AWE training method using a label-adversarial network to reduce the interference information learned during AWE training. Experiments demonstrate that our method achieves significant improvements on multilingual and zero-resource test sets.
ER -