Query-by-example spoken term detection (QbE-STD) is a task of using speech queries to match utterances, and the acoustic word embedding (AWE) method of generating fixed-length representations for speech segments has shown high performance and efficiency in recent work. We propose an AWE training method using a label-adversarial network to reduce the interference information learned during AWE training. Experiments demonstrate that our method achieves significant improvements on multilingual and zero-resource test sets.
Zhaoqi LI
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Ta LI
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Qingwei ZHAO
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Pengyuan ZHANG
Chinese Academy of Sciences,University of Chinese Academy of Sciences
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Zhaoqi LI, Ta LI, Qingwei ZHAO, Pengyuan ZHANG, "Label-Adversarial Jointly Trained Acoustic Word Embedding" in IEICE TRANSACTIONS on Information,
vol. E105-D, no. 8, pp. 1501-1505, August 2022, doi: 10.1587/transinf.2022EDL8012.
Abstract: Query-by-example spoken term detection (QbE-STD) is a task of using speech queries to match utterances, and the acoustic word embedding (AWE) method of generating fixed-length representations for speech segments has shown high performance and efficiency in recent work. We propose an AWE training method using a label-adversarial network to reduce the interference information learned during AWE training. Experiments demonstrate that our method achieves significant improvements on multilingual and zero-resource test sets.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDL8012/_p
Copy
@ARTICLE{e105-d_8_1501,
author={Zhaoqi LI, Ta LI, Qingwei ZHAO, Pengyuan ZHANG, },
journal={IEICE TRANSACTIONS on Information},
title={Label-Adversarial Jointly Trained Acoustic Word Embedding},
year={2022},
volume={E105-D},
number={8},
pages={1501-1505},
abstract={Query-by-example spoken term detection (QbE-STD) is a task of using speech queries to match utterances, and the acoustic word embedding (AWE) method of generating fixed-length representations for speech segments has shown high performance and efficiency in recent work. We propose an AWE training method using a label-adversarial network to reduce the interference information learned during AWE training. Experiments demonstrate that our method achieves significant improvements on multilingual and zero-resource test sets.},
keywords={},
doi={10.1587/transinf.2022EDL8012},
ISSN={1745-1361},
month={August},}
Copy
TY - JOUR
TI - Label-Adversarial Jointly Trained Acoustic Word Embedding
T2 - IEICE TRANSACTIONS on Information
SP - 1501
EP - 1505
AU - Zhaoqi LI
AU - Ta LI
AU - Qingwei ZHAO
AU - Pengyuan ZHANG
PY - 2022
DO - 10.1587/transinf.2022EDL8012
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E105-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2022
AB - Query-by-example spoken term detection (QbE-STD) is a task of using speech queries to match utterances, and the acoustic word embedding (AWE) method of generating fixed-length representations for speech segments has shown high performance and efficiency in recent work. We propose an AWE training method using a label-adversarial network to reduce the interference information learned during AWE training. Experiments demonstrate that our method achieves significant improvements on multilingual and zero-resource test sets.
ER -