We have developed LITHAN (LIsten-THink-ANswer) speech understanding system which automatically recognizes continuously uttered speech utilizing higher linguistic information such as syntactic, semantic and pragmatic information. This system predicts possible words utilizing linguistic information at the unrecognized portion of the input utterance, and identifies each predicted word by the use of the optimum matching algorithm between a recognized phoneme string and the representative one in the word dictionary. We propose an effective tree search method of parsing when the results of phoneme recognition and word identification are not error free. LITHAN uses many types of a priori information; the statistic of each phoneme; the similarity matrix between phonemes; the word dictionary; the spoken grammar with the additional information as regards the spoken grammar; the semantic and pragmatic information. We have applied this efficient, flexible system to restricted utterances with vocabulary of about 100 words which concerned with operational commands and queries of the status of a computer network. According to the results tested on a sample 200 sentences spoken by 10 male speakers at a normal speed, 64% of the sentences and 93% of the output words were correctly recognized.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Toshiyuki SAKAI, Sei-ichi NAKAGAWA, "A Speech Understanding System of Simple Japanese Sentences in a Task Domain" in IEICE TRANSACTIONS on transactions,
vol. E60-E, no. 1, pp. 13-20, January 1977, doi: .
Abstract: We have developed LITHAN (LIsten-THink-ANswer) speech understanding system which automatically recognizes continuously uttered speech utilizing higher linguistic information such as syntactic, semantic and pragmatic information. This system predicts possible words utilizing linguistic information at the unrecognized portion of the input utterance, and identifies each predicted word by the use of the optimum matching algorithm between a recognized phoneme string and the representative one in the word dictionary. We propose an effective tree search method of parsing when the results of phoneme recognition and word identification are not error free. LITHAN uses many types of a priori information; the statistic of each phoneme; the similarity matrix between phonemes; the word dictionary; the spoken grammar with the additional information as regards the spoken grammar; the semantic and pragmatic information. We have applied this efficient, flexible system to restricted utterances with vocabulary of about 100 words which concerned with operational commands and queries of the status of a computer network. According to the results tested on a sample 200 sentences spoken by 10 male speakers at a normal speed, 64% of the sentences and 93% of the output words were correctly recognized.
URL: https://global.ieice.org/en_transactions/transactions/10.1587/e60-e_1_13/_p
Copy
@ARTICLE{e60-e_1_13,
author={Toshiyuki SAKAI, Sei-ichi NAKAGAWA, },
journal={IEICE TRANSACTIONS on transactions},
title={A Speech Understanding System of Simple Japanese Sentences in a Task Domain},
year={1977},
volume={E60-E},
number={1},
pages={13-20},
abstract={We have developed LITHAN (LIsten-THink-ANswer) speech understanding system which automatically recognizes continuously uttered speech utilizing higher linguistic information such as syntactic, semantic and pragmatic information. This system predicts possible words utilizing linguistic information at the unrecognized portion of the input utterance, and identifies each predicted word by the use of the optimum matching algorithm between a recognized phoneme string and the representative one in the word dictionary. We propose an effective tree search method of parsing when the results of phoneme recognition and word identification are not error free. LITHAN uses many types of a priori information; the statistic of each phoneme; the similarity matrix between phonemes; the word dictionary; the spoken grammar with the additional information as regards the spoken grammar; the semantic and pragmatic information. We have applied this efficient, flexible system to restricted utterances with vocabulary of about 100 words which concerned with operational commands and queries of the status of a computer network. According to the results tested on a sample 200 sentences spoken by 10 male speakers at a normal speed, 64% of the sentences and 93% of the output words were correctly recognized.},
keywords={},
doi={},
ISSN={},
month={January},}
Copy
TY - JOUR
TI - A Speech Understanding System of Simple Japanese Sentences in a Task Domain
T2 - IEICE TRANSACTIONS on transactions
SP - 13
EP - 20
AU - Toshiyuki SAKAI
AU - Sei-ichi NAKAGAWA
PY - 1977
DO -
JO - IEICE TRANSACTIONS on transactions
SN -
VL - E60-E
IS - 1
JA - IEICE TRANSACTIONS on transactions
Y1 - January 1977
AB - We have developed LITHAN (LIsten-THink-ANswer) speech understanding system which automatically recognizes continuously uttered speech utilizing higher linguistic information such as syntactic, semantic and pragmatic information. This system predicts possible words utilizing linguistic information at the unrecognized portion of the input utterance, and identifies each predicted word by the use of the optimum matching algorithm between a recognized phoneme string and the representative one in the word dictionary. We propose an effective tree search method of parsing when the results of phoneme recognition and word identification are not error free. LITHAN uses many types of a priori information; the statistic of each phoneme; the similarity matrix between phonemes; the word dictionary; the spoken grammar with the additional information as regards the spoken grammar; the semantic and pragmatic information. We have applied this efficient, flexible system to restricted utterances with vocabulary of about 100 words which concerned with operational commands and queries of the status of a computer network. According to the results tested on a sample 200 sentences spoken by 10 male speakers at a normal speed, 64% of the sentences and 93% of the output words were correctly recognized.
ER -