A Survey of Thai Knowledge Extraction for the Semantic Web Research and Tools

Ponrudee NETISOPAKUL; Gerhard WOHLGENANNT

doi:10.1587/transinf.2017DAR0001

IEICE TRANSACTIONS on Information

Open Access
A Survey of Thai Knowledge Extraction for the Semantic Web Research and Tools

Ponrudee NETISOPAKUL, Gerhard WOHLGENANNT

Full Text Views

37

Cite this

Free PDF (1MB)

Summary :

As the manual creation of domain models and also of linked data is very costly, the extraction of knowledge from structured and unstructured data has been one of the central research areas in the Semantic Web field in the last two decades. Here, we look specifically at the extraction of formalized knowledge from natural language text, which is the most abundant source of human knowledge available. There are many tools on hand for information and knowledge extraction for English natural language, for written Thai language the situation is different. The goal of this work is to assess the state-of-the-art of research on formal knowledge extraction specifically from Thai language text, and then give suggestions and practical research ideas on how to improve the state-of-the-art. To address the goal, first we distinguish nine knowledge extraction for the Semantic Web tasks defined in literature on knowledge extraction from English text, for example taxonomy extraction, relation extraction, or named entity recognition. For each of the nine tasks, we analyze the publications and tools available for Thai text in the form of a comprehensive literature survey. Additionally to our assessment, we measure the self-assessment by the Thai research community with the help of a questionnaire-based survey on each of the tasks. Furthermore, the structure and size of the Thai community is analyzed using complex literature database queries. Combining all the collected information we finally identify research gaps in knowledge extraction from Thai language. An extensive list of practical research ideas is presented, focusing on concrete suggestions for every knowledge extraction task - which can be implemented and evaluated with reasonable effort. Besides the task-specific hints for improvements of the state-of-the-art, we also include general recommendations on how to raise the efficiency of the respective research community.

Publication: IEICE TRANSACTIONS on Information Vol.E101-D No.4 pp.986-1002

Publication Date: 2018/04/01

Publicized: 2018/01/18

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2017DAR0001

Type of Manuscript: Special Section SURVEY PAPER (Special Section on Data Engineering and Information Management)

Category

Authors

Ponrudee NETISOPAKUL
King Mongkut's Institute of Technology Ladkrabang (KMITL)
Gerhard WOHLGENANNT
ITMO University

Keyword

knowledge extraction, Thai language text, landscape analysis, semantic web

Cite this

Copy

Ponrudee NETISOPAKUL, Gerhard WOHLGENANNT, "A Survey of Thai Knowledge Extraction for the Semantic Web Research and Tools" in IEICE TRANSACTIONS on Information, vol. E101-D, no. 4, pp. 986-1002, April 2018, doi: 10.1587/transinf.2017DAR0001.
Abstract: As the manual creation of domain models and also of linked data is very costly, the extraction of knowledge from structured and unstructured data has been one of the central research areas in the Semantic Web field in the last two decades. Here, we look specifically at the extraction of formalized knowledge from natural language text, which is the most abundant source of human knowledge available. There are many tools on hand for information and knowledge extraction for English natural language, for written Thai language the situation is different. The goal of this work is to assess the state-of-the-art of research on formal knowledge extraction specifically from Thai language text, and then give suggestions and practical research ideas on how to improve the state-of-the-art. To address the goal, first we distinguish nine knowledge extraction for the Semantic Web tasks defined in literature on knowledge extraction from English text, for example taxonomy extraction, relation extraction, or named entity recognition. For each of the nine tasks, we analyze the publications and tools available for Thai text in the form of a comprehensive literature survey. Additionally to our assessment, we measure the self-assessment by the Thai research community with the help of a questionnaire-based survey on each of the tasks. Furthermore, the structure and size of the Thai community is analyzed using complex literature database queries. Combining all the collected information we finally identify research gaps in knowledge extraction from Thai language. An extensive list of practical research ideas is presented, focusing on concrete suggestions for every knowledge extraction task - which can be implemented and evaluated with reasonable effort. Besides the task-specific hints for improvements of the state-of-the-art, we also include general recommendations on how to raise the efficiency of the respective research community.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2017DAR0001/_p

Copy

@ARTICLE{e101-d_4_986,
author={Ponrudee NETISOPAKUL, Gerhard WOHLGENANNT, },
journal={IEICE TRANSACTIONS on Information},
title={A Survey of Thai Knowledge Extraction for the Semantic Web Research and Tools},
year={2018},
volume={E101-D},
number={4},
pages={986-1002},
abstract={As the manual creation of domain models and also of linked data is very costly, the extraction of knowledge from structured and unstructured data has been one of the central research areas in the Semantic Web field in the last two decades. Here, we look specifically at the extraction of formalized knowledge from natural language text, which is the most abundant source of human knowledge available. There are many tools on hand for information and knowledge extraction for English natural language, for written Thai language the situation is different. The goal of this work is to assess the state-of-the-art of research on formal knowledge extraction specifically from Thai language text, and then give suggestions and practical research ideas on how to improve the state-of-the-art. To address the goal, first we distinguish nine knowledge extraction for the Semantic Web tasks defined in literature on knowledge extraction from English text, for example taxonomy extraction, relation extraction, or named entity recognition. For each of the nine tasks, we analyze the publications and tools available for Thai text in the form of a comprehensive literature survey. Additionally to our assessment, we measure the self-assessment by the Thai research community with the help of a questionnaire-based survey on each of the tasks. Furthermore, the structure and size of the Thai community is analyzed using complex literature database queries. Combining all the collected information we finally identify research gaps in knowledge extraction from Thai language. An extensive list of practical research ideas is presented, focusing on concrete suggestions for every knowledge extraction task - which can be implemented and evaluated with reasonable effort. Besides the task-specific hints for improvements of the state-of-the-art, we also include general recommendations on how to raise the efficiency of the respective research community.},
keywords={},
doi={10.1587/transinf.2017DAR0001},
ISSN={1745-1361},
month={April},}

Copy

TY - JOUR
TI - A Survey of Thai Knowledge Extraction for the Semantic Web Research and Tools
T2 - IEICE TRANSACTIONS on Information
SP - 986
EP - 1002
AU - Ponrudee NETISOPAKUL
AU - Gerhard WOHLGENANNT
PY - 2018
DO - 10.1587/transinf.2017DAR0001
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2018
AB - As the manual creation of domain models and also of linked data is very costly, the extraction of knowledge from structured and unstructured data has been one of the central research areas in the Semantic Web field in the last two decades. Here, we look specifically at the extraction of formalized knowledge from natural language text, which is the most abundant source of human knowledge available. There are many tools on hand for information and knowledge extraction for English natural language, for written Thai language the situation is different. The goal of this work is to assess the state-of-the-art of research on formal knowledge extraction specifically from Thai language text, and then give suggestions and practical research ideas on how to improve the state-of-the-art. To address the goal, first we distinguish nine knowledge extraction for the Semantic Web tasks defined in literature on knowledge extraction from English text, for example taxonomy extraction, relation extraction, or named entity recognition. For each of the nine tasks, we analyze the publications and tools available for Thai text in the form of a comprehensive literature survey. Additionally to our assessment, we measure the self-assessment by the Thai research community with the help of a questionnaire-based survey on each of the tasks. Furthermore, the structure and size of the Thai community is analyzed using complex literature database queries. Combining all the collected information we finally identify research gaps in knowledge extraction from Thai language. An extensive list of practical research ideas is presented, focusing on concrete suggestions for every knowledge extraction task - which can be implemented and evaluated with reasonable effort. Besides the task-specific hints for improvements of the state-of-the-art, we also include general recommendations on how to raise the efficiency of the respective research community.
ER -

IEICE TRANSACTIONS on Information