Comparison of Language Models by Context-Free Grammar, Bigram and Quasi/Simplified-Trigram

Seiichi NAKAGAWA; Isao MURASE

Comparison of Language Models by Context-Free Grammar, Bigram and Quasi/Simplified-Trigram

Seiichi NAKAGAWA, Isao MURASE

Full Text Views

0

Cite this

Summary :

In this paper, we investigate the language models using context-free grammar, bigram and quasi/simplified-trigram. For calculating of statistics of bigram and quasi/simplified-trigram, we used the set of sentences generated randomly from CFG that are legal in terms of semantics. We compared them on the perplexities for their models and the sentence recognition accuracies. The sentence recognition was experimented in the "UNIX-QA" task with the vocabulary size of 521 words. From these results, the perplexities of bigram and quasi-trigram were about 1.5-1.7 times and 1.2-1.3 times larger than the perplexity of CFG that corresponds to the most restricted grammar (perplexity=10.0), and we realized that quasi-trigram has the almost same ability of modeling as the restricted CFG when the set of plausible sentences in the task is given.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E74-A No.7 pp.1897-1905

Publication Date: 1991/07/25

Publicized

Online ISSN

DOI

Type of Manuscript: Special Section PAPER (Special Issue on Continuous Speech Recognition and Understanding)

Category: Language Modeling

Authors

Seiichi NAKAGAWA
Isao MURASE

Keyword

Cite this

Copy

Seiichi NAKAGAWA, Isao MURASE, "Comparison of Language Models by Context-Free Grammar, Bigram and Quasi/Simplified-Trigram" in IEICE TRANSACTIONS on Fundamentals, vol. E74-A, no. 7, pp. 1897-1905, July 1991, doi: .
Abstract: In this paper, we investigate the language models using context-free grammar, bigram and quasi/simplified-trigram. For calculating of statistics of bigram and quasi/simplified-trigram, we used the set of sentences generated randomly from CFG that are legal in terms of semantics. We compared them on the perplexities for their models and the sentence recognition accuracies. The sentence recognition was experimented in the "UNIX-QA" task with the vocabulary size of 521 words. From these results, the perplexities of bigram and quasi-trigram were about 1.5-1.7 times and 1.2-1.3 times larger than the perplexity of CFG that corresponds to the most restricted grammar (perplexity=10.0), and we realized that quasi-trigram has the almost same ability of modeling as the restricted CFG when the set of plausible sentences in the task is given.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/e74-a_7_1897/_p

Copy

@ARTICLE{e74-a_7_1897,
author={Seiichi NAKAGAWA, Isao MURASE, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Comparison of Language Models by Context-Free Grammar, Bigram and Quasi/Simplified-Trigram},
year={1991},
volume={E74-A},
number={7},
pages={1897-1905},
abstract={In this paper, we investigate the language models using context-free grammar, bigram and quasi/simplified-trigram. For calculating of statistics of bigram and quasi/simplified-trigram, we used the set of sentences generated randomly from CFG that are legal in terms of semantics. We compared them on the perplexities for their models and the sentence recognition accuracies. The sentence recognition was experimented in the "UNIX-QA" task with the vocabulary size of 521 words. From these results, the perplexities of bigram and quasi-trigram were about 1.5-1.7 times and 1.2-1.3 times larger than the perplexity of CFG that corresponds to the most restricted grammar (perplexity=10.0), and we realized that quasi-trigram has the almost same ability of modeling as the restricted CFG when the set of plausible sentences in the task is given.},
keywords={},
doi={},
ISSN={},
month={July},}

Copy

TY - JOUR
TI - Comparison of Language Models by Context-Free Grammar, Bigram and Quasi/Simplified-Trigram
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1897
EP - 1905
AU - Seiichi NAKAGAWA
AU - Isao MURASE
PY - 1991
DO -
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E74-A
IS - 7
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - July 1991
AB - In this paper, we investigate the language models using context-free grammar, bigram and quasi/simplified-trigram. For calculating of statistics of bigram and quasi/simplified-trigram, we used the set of sentences generated randomly from CFG that are legal in terms of semantics. We compared them on the perplexities for their models and the sentence recognition accuracies. The sentence recognition was experimented in the "UNIX-QA" task with the vocabulary size of 521 words. From these results, the perplexities of bigram and quasi-trigram were about 1.5-1.7 times and 1.2-1.3 times larger than the perplexity of CFG that corresponds to the most restricted grammar (perplexity=10.0), and we realized that quasi-trigram has the almost same ability of modeling as the restricted CFG when the set of plausible sentences in the task is given.
ER -