In the European Telecommunication Standards Institute (ETSI), Distributed Speech Recognition (DSR) front-end, the distortion added due to feature compression on the front end side increases the variance flooring effect, which in turn increases the identification error rate. The penalty incurred in reducing the bit rate is the degradation in speaker recognition performance. In this paper, we present a nontraditional solution for the previously mentioned problem. To reduce the bit rate, a speech signal is segmented at the client, and the most effective phonemes (determined according to their type and frequency) for speaker recognition are selected and sent to the server. Speaker recognition occurs at the server. Applying this approach to YOHO corpus, we achieved an identification error rate (ER) of 0.05% using an average segment of 20.4% for a testing utterance in a speaker identification task. We also achieved an equal error rate (EER) of 0.42% using an average segment of 15.1% for a testing utterance in a speaker verification task.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Mohamed Abdel FATTAH, Fuji REN, Shingo KUROIWA, "Effects of Phoneme Type and Frequency on Distributed Speaker Identification and Verification" in IEICE TRANSACTIONS on Information,
vol. E89-D, no. 5, pp. 1712-1719, May 2006, doi: 10.1093/ietisy/e89-d.5.1712.
Abstract: In the European Telecommunication Standards Institute (ETSI), Distributed Speech Recognition (DSR) front-end, the distortion added due to feature compression on the front end side increases the variance flooring effect, which in turn increases the identification error rate. The penalty incurred in reducing the bit rate is the degradation in speaker recognition performance. In this paper, we present a nontraditional solution for the previously mentioned problem. To reduce the bit rate, a speech signal is segmented at the client, and the most effective phonemes (determined according to their type and frequency) for speaker recognition are selected and sent to the server. Speaker recognition occurs at the server. Applying this approach to YOHO corpus, we achieved an identification error rate (ER) of 0.05% using an average segment of 20.4% for a testing utterance in a speaker identification task. We also achieved an equal error rate (EER) of 0.42% using an average segment of 15.1% for a testing utterance in a speaker verification task.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e89-d.5.1712/_p
Copy
@ARTICLE{e89-d_5_1712,
author={Mohamed Abdel FATTAH, Fuji REN, Shingo KUROIWA, },
journal={IEICE TRANSACTIONS on Information},
title={Effects of Phoneme Type and Frequency on Distributed Speaker Identification and Verification},
year={2006},
volume={E89-D},
number={5},
pages={1712-1719},
abstract={In the European Telecommunication Standards Institute (ETSI), Distributed Speech Recognition (DSR) front-end, the distortion added due to feature compression on the front end side increases the variance flooring effect, which in turn increases the identification error rate. The penalty incurred in reducing the bit rate is the degradation in speaker recognition performance. In this paper, we present a nontraditional solution for the previously mentioned problem. To reduce the bit rate, a speech signal is segmented at the client, and the most effective phonemes (determined according to their type and frequency) for speaker recognition are selected and sent to the server. Speaker recognition occurs at the server. Applying this approach to YOHO corpus, we achieved an identification error rate (ER) of 0.05% using an average segment of 20.4% for a testing utterance in a speaker identification task. We also achieved an equal error rate (EER) of 0.42% using an average segment of 15.1% for a testing utterance in a speaker verification task.},
keywords={},
doi={10.1093/ietisy/e89-d.5.1712},
ISSN={1745-1361},
month={May},}
Copy
TY - JOUR
TI - Effects of Phoneme Type and Frequency on Distributed Speaker Identification and Verification
T2 - IEICE TRANSACTIONS on Information
SP - 1712
EP - 1719
AU - Mohamed Abdel FATTAH
AU - Fuji REN
AU - Shingo KUROIWA
PY - 2006
DO - 10.1093/ietisy/e89-d.5.1712
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E89-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2006
AB - In the European Telecommunication Standards Institute (ETSI), Distributed Speech Recognition (DSR) front-end, the distortion added due to feature compression on the front end side increases the variance flooring effect, which in turn increases the identification error rate. The penalty incurred in reducing the bit rate is the degradation in speaker recognition performance. In this paper, we present a nontraditional solution for the previously mentioned problem. To reduce the bit rate, a speech signal is segmented at the client, and the most effective phonemes (determined according to their type and frequency) for speaker recognition are selected and sent to the server. Speaker recognition occurs at the server. Applying this approach to YOHO corpus, we achieved an identification error rate (ER) of 0.05% using an average segment of 20.4% for a testing utterance in a speaker identification task. We also achieved an equal error rate (EER) of 0.42% using an average segment of 15.1% for a testing utterance in a speaker verification task.
ER -