A lexicon is an important linguistic resource needed for both shallow and deep language processing. Currently, there are few machine-readable Thai dictionaries available, and most of them do not satisfy the computational requirements. This paper presents the design of a Thai lexicon named the TCL's Computational Lexicon (TCLLEX) and proposes a method to construct a large-scale Thai lexicon by re-using two existing dictionaries and a large number of texts on the Internet. In addition to morphological, syntactic, semantic case role and logical information in the existing dictionaries, a sort of semantic constraint called selectional preference is automatically acquired by analyzing Thai texts on the web and then added into the lexicon. In the acquisition process of the selectional preferences, the so-called Bayesian Information Criterion (BIC) is applied as the measure in a tree cut model. The experiments are done to verify the feasibility and effectiveness of obtained selection preferences.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Thatsanee CHAROENPORN, Canasai KRUENGKRAI, Thanaruk THEERAMUNKONG, Virach SORNLERTLAMVANICH, "Construction of Thai Lexicon from Existing Dictionaries and Texts on the Web" in IEICE TRANSACTIONS on Information,
vol. E89-D, no. 7, pp. 2286-2293, July 2006, doi: 10.1093/ietisy/e89-d.7.2286.
Abstract: A lexicon is an important linguistic resource needed for both shallow and deep language processing. Currently, there are few machine-readable Thai dictionaries available, and most of them do not satisfy the computational requirements. This paper presents the design of a Thai lexicon named the TCL's Computational Lexicon (TCLLEX) and proposes a method to construct a large-scale Thai lexicon by re-using two existing dictionaries and a large number of texts on the Internet. In addition to morphological, syntactic, semantic case role and logical information in the existing dictionaries, a sort of semantic constraint called selectional preference is automatically acquired by analyzing Thai texts on the web and then added into the lexicon. In the acquisition process of the selectional preferences, the so-called Bayesian Information Criterion (BIC) is applied as the measure in a tree cut model. The experiments are done to verify the feasibility and effectiveness of obtained selection preferences.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e89-d.7.2286/_p
Copy
@ARTICLE{e89-d_7_2286,
author={Thatsanee CHAROENPORN, Canasai KRUENGKRAI, Thanaruk THEERAMUNKONG, Virach SORNLERTLAMVANICH, },
journal={IEICE TRANSACTIONS on Information},
title={Construction of Thai Lexicon from Existing Dictionaries and Texts on the Web},
year={2006},
volume={E89-D},
number={7},
pages={2286-2293},
abstract={A lexicon is an important linguistic resource needed for both shallow and deep language processing. Currently, there are few machine-readable Thai dictionaries available, and most of them do not satisfy the computational requirements. This paper presents the design of a Thai lexicon named the TCL's Computational Lexicon (TCLLEX) and proposes a method to construct a large-scale Thai lexicon by re-using two existing dictionaries and a large number of texts on the Internet. In addition to morphological, syntactic, semantic case role and logical information in the existing dictionaries, a sort of semantic constraint called selectional preference is automatically acquired by analyzing Thai texts on the web and then added into the lexicon. In the acquisition process of the selectional preferences, the so-called Bayesian Information Criterion (BIC) is applied as the measure in a tree cut model. The experiments are done to verify the feasibility and effectiveness of obtained selection preferences.},
keywords={},
doi={10.1093/ietisy/e89-d.7.2286},
ISSN={1745-1361},
month={July},}
Copy
TY - JOUR
TI - Construction of Thai Lexicon from Existing Dictionaries and Texts on the Web
T2 - IEICE TRANSACTIONS on Information
SP - 2286
EP - 2293
AU - Thatsanee CHAROENPORN
AU - Canasai KRUENGKRAI
AU - Thanaruk THEERAMUNKONG
AU - Virach SORNLERTLAMVANICH
PY - 2006
DO - 10.1093/ietisy/e89-d.7.2286
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E89-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2006
AB - A lexicon is an important linguistic resource needed for both shallow and deep language processing. Currently, there are few machine-readable Thai dictionaries available, and most of them do not satisfy the computational requirements. This paper presents the design of a Thai lexicon named the TCL's Computational Lexicon (TCLLEX) and proposes a method to construct a large-scale Thai lexicon by re-using two existing dictionaries and a large number of texts on the Internet. In addition to morphological, syntactic, semantic case role and logical information in the existing dictionaries, a sort of semantic constraint called selectional preference is automatically acquired by analyzing Thai texts on the web and then added into the lexicon. In the acquisition process of the selectional preferences, the so-called Bayesian Information Criterion (BIC) is applied as the measure in a tree cut model. The experiments are done to verify the feasibility and effectiveness of obtained selection preferences.
ER -