Synergies in processing requirements and knowledge of human speech production and perception have led to a similarity of the speech signal representations used for the tasks of recognition, coding, and modification. The representations are generally composed of a description of the vocal-tract transfer function and, in the case of coding and modification, a description of the excitation signal. This paper provides an overview of commonly used representations. For coding and modification, autoregressive models represented by line spectral frequencies perform well for the vocal tract, and pitch-synchronous filter banks and modulation-domain filters perform well for the excitation. For recognition, good representations are based on a smoothed magnitude response of the vocal tract.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
W. Bastiaan KLEIJN, "Signal Processing Representations of Speech" in IEICE TRANSACTIONS on Information,
vol. E86-D, no. 3, pp. 359-376, March 2003, doi: .
Abstract: Synergies in processing requirements and knowledge of human speech production and perception have led to a similarity of the speech signal representations used for the tasks of recognition, coding, and modification. The representations are generally composed of a description of the vocal-tract transfer function and, in the case of coding and modification, a description of the excitation signal. This paper provides an overview of commonly used representations. For coding and modification, autoregressive models represented by line spectral frequencies perform well for the vocal tract, and pitch-synchronous filter banks and modulation-domain filters perform well for the excitation. For recognition, good representations are based on a smoothed magnitude response of the vocal tract.
URL: https://global.ieice.org/en_transactions/information/10.1587/e86-d_3_359/_p
Copy
@ARTICLE{e86-d_3_359,
author={W. Bastiaan KLEIJN, },
journal={IEICE TRANSACTIONS on Information},
title={Signal Processing Representations of Speech},
year={2003},
volume={E86-D},
number={3},
pages={359-376},
abstract={Synergies in processing requirements and knowledge of human speech production and perception have led to a similarity of the speech signal representations used for the tasks of recognition, coding, and modification. The representations are generally composed of a description of the vocal-tract transfer function and, in the case of coding and modification, a description of the excitation signal. This paper provides an overview of commonly used representations. For coding and modification, autoregressive models represented by line spectral frequencies perform well for the vocal tract, and pitch-synchronous filter banks and modulation-domain filters perform well for the excitation. For recognition, good representations are based on a smoothed magnitude response of the vocal tract.},
keywords={},
doi={},
ISSN={},
month={March},}
Copy
TY - JOUR
TI - Signal Processing Representations of Speech
T2 - IEICE TRANSACTIONS on Information
SP - 359
EP - 376
AU - W. Bastiaan KLEIJN
PY - 2003
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E86-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2003
AB - Synergies in processing requirements and knowledge of human speech production and perception have led to a similarity of the speech signal representations used for the tasks of recognition, coding, and modification. The representations are generally composed of a description of the vocal-tract transfer function and, in the case of coding and modification, a description of the excitation signal. This paper provides an overview of commonly used representations. For coding and modification, autoregressive models represented by line spectral frequencies perform well for the vocal tract, and pitch-synchronous filter banks and modulation-domain filters perform well for the excitation. For recognition, good representations are based on a smoothed magnitude response of the vocal tract.
ER -