The search functionality is under construction.

Author Search Result

[Author] Thanaruk THEERAMUNKONG(26hit)

1-20hit(26hit)

  • Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

    Prakasith KAYASITH  Thanaruk THEERAMUNKONG  

     
    PAPER-Speech and Hearing

      Vol:
    E92-D No:3
      Page(s):
    460-468

    It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.

  • Multi-Stage Automatic NE and PoS Annotation Using Pattern-Based and Statistical-Based Techniques for Thai Corpus Construction

    Nattapong TONGTEP  Thanaruk THEERAMUNKONG  

     
    PAPER-Natural Language Processing

      Vol:
    E96-D No:10
      Page(s):
    2245-2256

    Automated or semi-automated annotation is a practical solution for large-scale corpus construction. However, the special characteristics of Thai language, such as lack of word-boundary and sentence-boundary markers, trigger several issues in automatic corpus annotation. This paper presents a multi-stage annotation framework, containing two stages of chunking and three stages of tagging. The two chunking stages are pattern matching-based named entity (NE) extraction and dictionary-based word segmentation while the three succeeding tagging stages are dictionary-, pattern- and statist09812490981249ical-based tagging. Applying heuristics of ambiguity priority, NE extraction is performed first on an original text using a set of patterns, in the order of pattern ambiguity. Next, the remaining text is segmented into words with a dictionary. The obtained chunks are then tagged with types of named entities or parts-of-speech (PoS) using dictionaries, patterns and statistics. Focusing on the reduction of human intervention in corpus construction, our experimental results show that the dictionary-based tagging process can assign unique tags to 64.92% of the words, with the remaining of 24.14% unknown words and 10.94% ambiguously tagged words. Later, the pattern-based tagging can reduce unknown words to only 13.34% while the statistical-based tagging can solve the ambiguously tagged words to only 3.01%.

  • FOREWORD Open Access

    Thanaruk THEERAMUNKONG  

     
    FOREWORD

      Vol:
    E94-D No:3
      Page(s):
    403-403
  • Kernel Trees for Support Vector Machines

    Ithipan METHASATE  Thanaruk THEERAMUNKONG  

     
    PAPER

      Vol:
    E90-D No:10
      Page(s):
    1550-1556

    The support vector machines (SVMs) are one of the most effective classification techniques in several knowledge discovery and data mining applications. However, a SVM requires the user to set the form of its kernel function and parameters in the function, both of which directly affect to the performance of the classifier. This paper proposes a novel method, named a kernel-tree, the function of which is composed of multiple kernels in the form of a tree structure. The optimal kernel tree structure and its parameters is determined by genetic programming (GP). To perform a fine setting of kernel parameters, the gradient descent method is used. To evaluate the proposed method, benchmark datasets from UCI and dataset of text classification are applied. The result indicates that the method can find a better optimal solution than the grid search and the gradient search.

  • Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts

    Nattapong TONGTEP  Thanaruk THEERAMUNKONG  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E95-D No:7
      Page(s):
    1932-1946

    Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verbs; and flexible word orders. Unlike most previous works which focused on NE relations of specific actions, such as work_for, live_in, located_in, and kill, this paper proposes more general types of NE relations, called predicate-oriented relation (PoR), where an extracted action part (verb) is used as a core component to associate related named entities extracted from Thai Texts. Lacking a practical parser for the Thai language, we present three types of surface features, i.e. punctuation marks (such as token spaces), entity types and the number of entities and then apply five alternative commonly used learning schemes to investigate their performance on predicate-oriented relation extraction. The experimental results show that our approach achieves the F-measure of 97.76%, 99.19%, 95.00% and 93.50% on four different types of predicate-oriented relation (action-location, location-action, action-person and person-action) in crime-related news documents using a data set of 1,736 entity pairs. The effects of NE extraction techniques, feature sets and class unbalance on the performance of relation extraction are explored.

  • News Relation Discovery Based on Association Rule Mining with Combining Factors

    Nichnan KITTIPHATTANABAWON  Thanaruk THEERAMUNKONG  Ekawit NANTAJEEWARAWAT  

     
    PAPER

      Vol:
    E94-D No:3
      Page(s):
    404-415

    Recently, to track and relate news documents from several sources, association rule mining has been applied due to its performance and scalability. This paper presents an empirical investigation on how term representation basis, term weighting, and association measure affects the quality of relations discovered among news documents. Twenty four combinations initiated by two term representation bases, four term weightings, and three association measures are explored with their results compared to human judgment of three-level relations: completely related, somehow related, and unrelated relations. The performance evaluation is conducted by comparing the top-k results of each combination to those of the others using so-called rank-order mismatch (ROM). The experimental results indicate that a combination of bigram (BG), term frequency with inverse document frequency (TFIDF) and confidence (CONF), as well as a combination of BG, TFIDF and conviction (CONV), achieves the best performance to find the related documents by placing them in upper ranks with 0.41% ROM on top-50 mined relations. However, a combination of unigram (UG), TFIDF and lift (LIFT) performs the best by locating irrelevant relations in lower ranks (top-1100) with 9.63% ROM. A detailed analysis on the number of the three-level relations with regard to their rankings is also performed in order to examine the characteristic of the resultant relations. Finally, a discussion and an error analysis are given.

  • Extracting Semantic Frames from Thai Medical-Symptom Unstructured Text with Unknown Target-Phrase Boundaries

    Peerasak INTARAPAIBOON  Ekawit NANTAJEEWARAWAT  Thanaruk THEERAMUNKONG  

     
    PAPER

      Vol:
    E94-D No:3
      Page(s):
    465-478

    Due to the limitations of language-processing tools for the Thai language, pattern-based information extraction from Thai documents requires supplementary techniques. Based on sliding-window rule application and extraction filtering, we present a framework for extracting semantic information from medical-symptom phrases with unknown boundaries in Thai unstructured-text information entries. A supervised rule learning algorithm is employed for automatic construction of information extraction rules from hand-tagged training symptom phrases. Two filtering components are introduced: one uses a classification model to predict rule application across a symptom-phrase boundary based on instantiation features of rule internal wildcards, the other uses weighted classification confidence to resolve conflicts arising from overlapping extractions. In our experimental study, we focus our attention on two basic types of symptom phrasal descriptions: one is concerned with abnormal characteristics of some observable entities and the other with human-body locations at which primitive symptoms appear. The experimental results show that the filtering components improve precision while preserving recall satisfactorily.

  • Knowledge Integration by Probabilistic Argumentation

    Saung Hnin Pwint OO  Nguyen Duy HUNG  Thanaruk THEERAMUNKONG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2020/05/01
      Vol:
    E103-D No:8
      Page(s):
    1843-1855

    While existing inference engines solved real world problems using probabilistic knowledge representation, one challenging task is to efficiently utilize the representation under a situation of uncertainty during conflict resolution. This paper presents a new approach to straightforwardly combine a rule-based system (RB) with a probabilistic graphical inference framework, i.e., naïve Bayesian network (BN), towards probabilistic argumentation via a so-called probabilistic assumption-based argumentation (PABA) framework. A rule-based system (RB) formalizes its rules into defeasible logic under the assumption-based argumentation (ABA) framework while the Bayesian network (BN) provides probabilistic reasoning. By knowledge integration, while the former provides a solid testbed for inference, the latter helps the former to solve persistent conflicts by setting an acceptance threshold. By experiments, effectiveness of this approach on conflict resolution is shown via an example of liver disorder diagnosis.

  • FOREWORD Open Access

    Masaki NAKAGAWA  Thanaruk THEERAMUNKONG  

     
    FOREWORD

      Vol:
    E91-D No:11
      Page(s):
    2543-2544
  • Pattern-Based Features vs. Statistical-Based Features in Decision Trees for Word Segmentation

    Thanaruk THEERAMUNKONG  Thanasan TANHERMHONG  

     
    PAPER-Natural Language Processing

      Vol:
    E87-D No:5
      Page(s):
    1254-1260

    This paper proposes two alternative approaches that do not make use of a dictionary but instead utilizes different types of learned features to segment words in a language that has no explicit word boundary. Both methods utilize decision trees as knowledge representation acquired from a training corpus in the segmentation process. The first method, a language-dependent technique, applies a set of constructed features patterns based on character types to generate a set of heuristic segmentation rules. It separates a running text into a sequence of small chunks based on the given patterns, and constructs a decision tree for word segmentation. The second method extracts statistics of character sequences from a training corpus and uses them as features for the process of constructing a set of rules by decision tree induction. The latter needs no linguistic knowledge. By experiments on Thai language, both methods achieve relatively high accuracy but the latter performs much better.

  • Fast Algorithms for Mining Generalized Frequent Patterns of Generalized Association Rules

    Kritsada SRIPHAEW  Thanaruk THEERAMUNKONG  

     
    PAPER-Databases

      Vol:
    E87-D No:3
      Page(s):
    761-770

    Mining generalized frequent patterns of generalized association rules is an important process in knowledge discovery system. In this paper, we propose a new approach for efficiently mining all frequent patterns using a novel set enumeration algorithm with two types of constraints on two generalized itemset relationships, called subset-superset and ancestor-descendant constraints. We also show a method to mine a smaller set of generalized closed frequent itemsets instead of mining a large set of conventional generalized frequent itemsets. To this end, we develop two algorithms called SET and cSET for mining generalized frequent itemsets and generalized closed frequent itemsets, respectively. By a number of experiments, the proposed algorithms outperform the previous well-known algorithms in both computational time and memory utilization. Furthermore, the experiments with real datasets indicate that mining generalized closed frequent itemsets gains more merit on computational costs since the number of generalized closed frequent itemsets is much more smaller than the number of generalized frequent itemsets.

  • A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques

    Jakkrit TECHO  Cholwich NATTEE  Thanaruk THEERAMUNKONG  

     
    PAPER-Unknown Word Processing

      Vol:
    E92-D No:12
      Page(s):
    2321-2333

    While classification techniques can be applied for automatic unknown word recognition in a language without word boundary, it faces with the problem of unbalanced datasets where the number of positive unknown word candidates is dominantly smaller than that of negative candidates. To solve this problem, this paper presents a corpus-based approach that introduces a so-called group-based ranking evaluation technique into ensemble learning in order to generate a sequence of classification models that later collaborate to select the most probable unknown word from multiple candidates. Given a classification model, the group-based ranking evaluation (GRE) is applied to construct a training dataset for learning the succeeding model, by weighing each of its candidates according to their ranks and correctness when the candidates of an unknown word are considered as one group. A number of experiments have been conducted on a large Thai medical text to evaluate performance of the proposed group-based ranking evaluation approach, namely V-GRE, compared to the conventional naive Bayes classifier and our vanilla version without ensemble learning. As the result, the proposed method achieves an accuracy of 90.930.50% when the first rank is selected while it gains 97.260.26% when the top-ten candidates are considered, that is 8.45% and 6.79% improvement over the conventional record-based naive Bayes classifier and the vanilla version. Another result on applying only best features show 93.930.22% and up to 98.85 0.15% accuracy for top-1 and top-10, respectively. They are 3.97% and 9.78% improvement over naive Bayes and the vanilla version. Finally, an error analysis is given.

  • An Application of Intuitionistic Fuzzy Sets to Improve Information Extraction from Thai Unstructured Text

    Peerasak INTARAPAIBOON  Thanaruk THEERAMUNKONG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/05/23
      Vol:
    E101-D No:9
      Page(s):
    2334-2345

    Multi-slot information extraction, also known as frame extraction, is a task that identify several related entities simultaneously. Most researches on this task are concerned with applying IE patterns (rules) to extract related entities from unstructured documents. An important obstacle for the success in this task is unknowing where text portions containing interested information are. This problem is more complicated when involving languages with sentence boundary ambiguity, e.g. the Thai language. Applying IE rules to all reasonable text portions can degrade the effect of this obstacle, but it raises another problem that is incorrect (unwanted) extractions. This paper aims to present a method for removing these incorrect extractions. In the method, extractions are represented as intuitionistic fuzzy sets, and a similarity measure for IFSs is used to calculate distance between IFS of an unclassified extraction and that of each already-classified extraction. The concept of k nearest neighbor is adopted to design whether the unclassified extraction is correct or not. From the experiment on various domains, the proposed technique improves extraction precision while satisfactorily preserving recall.

  • An EM-Based Approach for Mining Word Senses from Corpora

    Thatsanee CHAROENPORN  Canasai KRUENGKRAI  Thanaruk THEERAMUNKONG  Virach SORNLERTLAMVANICH  

     
    PAPER-Natural Language Processing

      Vol:
    E90-D No:4
      Page(s):
    775-782

    Manually collecting contexts of a target word and grouping them based on their meanings yields a set of word senses but the task is quite tedious. Towards automated lexicography, this paper proposes a word-sense discrimination method based on two modern techniques; EM algorithm and principal component analysis (PCA). The spherical Gaussian EM algorithm enhanced with PCA for robust initialization is proposed to cluster word senses of a target word automatically. Three variants of the algorithm, namely PCA, sGEM, and PCA-sGEM, are investigated using a gold standard dataset of two polysemous words. The clustering result is evaluated using the measures of purity and entropy as well as a more recent measure called normalized mutual information (NMI). The experimental result indicates that the proposed algorithms gain promising performance with regard to discriminate word senses and the PCA-sGEM outperforms the other two methods to some extent.

  • Construction of Thai Lexicon from Existing Dictionaries and Texts on the Web

    Thatsanee CHAROENPORN  Canasai KRUENGKRAI  Thanaruk THEERAMUNKONG  Virach SORNLERTLAMVANICH  

     
    PAPER-Natural Language Processing

      Vol:
    E89-D No:7
      Page(s):
    2286-2293

    A lexicon is an important linguistic resource needed for both shallow and deep language processing. Currently, there are few machine-readable Thai dictionaries available, and most of them do not satisfy the computational requirements. This paper presents the design of a Thai lexicon named the TCL's Computational Lexicon (TCLLEX) and proposes a method to construct a large-scale Thai lexicon by re-using two existing dictionaries and a large number of texts on the Internet. In addition to morphological, syntactic, semantic case role and logical information in the existing dictionaries, a sort of semantic constraint called selectional preference is automatically acquired by analyzing Thai texts on the web and then added into the lexicon. In the acquisition process of the selectional preferences, the so-called Bayesian Information Criterion (BIC) is applied as the measure in a tree cut model. The experiments are done to verify the feasibility and effectiveness of obtained selection preferences.

  • Effects of Term Distributions on Binary Classification

    Verayuth LERTNATTEE  Thanaruk THEERAMUNKONG  

     
    PAPER

      Vol:
    E90-D No:10
      Page(s):
    1592-1600

    In order to support decision making, text classification is an important tool. Recently, in addition to term frequency and inverse document frequency, term distributions have been shown to be useful to improve classification accuracy in multi-class classification. This paper investigates the performance of these term distributions on binary classification using a centroid-based approach. In such one-against-the-rest, there are only two classes, the positive (focused) class and the negative class. To improve the performance, a so-called hierarchical EM method is applied to cluster the negative class, which is usually much larger and more diverse than the positive one, into several homogeneous groups. The experimental results on two collections of web pages, namely Drug Information (DI) and WebKB, show the merits of term distributions and clustering on binary classification. The performance of the proposed method is also investigated using the Thai Herbal collection where the texts are written in Thai language.

  • Quality Evaluation for Document Relation Discovery Using Citation Information

    Kritsada SRIPHAEW  Thanaruk THEERAMUNKONG  

     
    PAPER-Data Mining

      Vol:
    E90-D No:8
      Page(s):
    1225-1234

    Assessment of discovered patterns is an important issue in the field of knowledge discovery. This paper presents an evaluation method that utilizes citation (reference) information to assess the quality of discovered document relations. With the concept of transitivity as direct/indirect citations, a series of evaluation criteria is introduced to define the validity of discovered relations. Two kinds of validity, called soft validity and hard validity, are proposed to express the quality of the discovered relations. For the purpose of impartial comparison, the expected validity is statistically estimated based on the generative probability of each relation pattern. The proposed evaluation is investigated using more than 10,000 documents obtained from a research publication database. With frequent itemset mining as a process to discover document relations, the proposed method was shown to be a powerful way to evaluate the relations in four aspects: soft/hard scoring, direct/indirect citation, relative quality over the expected value, and comparison to human judgment.

  • Extracting Chemical Reactions from Thai Text for Semantics-Based Information Retrieval

    Peerasak INTARAPAIBOON  Ekawit NANTAJEEWARAWAT  Thanaruk THEERAMUNKONG  

     
    PAPER

      Vol:
    E94-D No:3
      Page(s):
    479-486

    Based on sliding-window rule application and extraction filtering, we present a framework for extracting multi-slot frames describing chemical reactions from Thai free text with unknown target-phrase boundaries. A supervised rule learning algorithm is employed for automatic construction of pattern-based extraction rules from hand-tagged training phrases. A filtering method is devised for removal of incorrect extraction results based on features observed from text portions appearing between adjacent slot fillers in source documents. Extracted reaction frames are represented as concept expressions in description logics and are used as metadata for document indexing. A document knowledge base supporting semantics-based information retrieval is constructed by integrating document metadata with domain-specific ontologies.

  • A Family-Based Evolutional Approach for Kernel Tree Selection in SVMs

    Ithipan METHASATE  Thanaruk THEERAMUNKONG  

     
    PAPER-Biocybernetics, Neurocomputing

      Vol:
    E93-D No:4
      Page(s):
    909-921

    Finding a kernel mapping function for support vector machines (SVMs) is a key step towards construction of a high-performanced SVM-based classifier. While some recent methods exploited an evolutional approach to construct a suitable multifunction kernel, most of them searched randomly and diversely. In this paper, the concept of a family of identical-structured kernel trees is proposed to enable exploration of structure space using genetic programming whereas to pursue investigation of parameter space on a certain tree using evolution strategy. To control balance between structure and parameter search towards an optimal kernel, simulated annealing is introduced. By experiments on a number of benchmark datasets in the UCI and text classification collection, the proposed method is shown to be able to find a better optimal solution than other search methods, including grid search and gradient search.

  • FOREWORD Open Access

    Takayuki ITO  Thanaruk THEERAMUNKONG  Susumu KUNIFUJI  

     
    FOREWORD

      Vol:
    E106-D No:4
      Page(s):
    431-432
1-20hit(26hit)