The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] lexicon(6hit)

1-6hit
  • Incorporation of Target Specific Knowledge for Sentiment Analysis on Microblogging

    Yongyos KAEWPITAKKUN  Kiyoaki SHIRAI  

     
    PAPER

      Pubricized:
    2016/01/14
      Vol:
    E99-D No:4
      Page(s):
    959-968

    Sentiment analysis of microblogging has become an important classification task because a large amount of user-generated content is published on the Internet. In Twitter, it is common that a user expresses several sentiments in one tweet. Therefore, it is important to classify the polarity not of the whole tweet but of a specific target about which people express their opinions. Moreover, the performance of the machine learning approach greatly depends on the domain of the training data and it is very time-consuming to manually annotate a large set of tweets for a specific domain. In this paper, we propose a method for sentiment classification at the target level by incorporating the on-target sentiment features and user-aware features into the classifier trained automatically from the data createdfor the specific target. An add-on lexicon, extended target list, and competitor list are also constructed as knowledge sources for the sentiment analysis. None of the processes in the proposed framework require manual annotation. The results of our experiment show that our method is effective and improves on the performance of sentiment classification compared to the baselines.

  • Morpheme-Based Modeling of Pronunciation Variation for Large Vocabulary Continuous Speech Recognition in Korean

    Kyong-Nim LEE  Minhwa CHUNG  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:7
      Page(s):
    1063-1072

    This paper describes a morpheme-based pronunciation model that is especially useful to develop the pronunciation lexicon for Large Vocabulary Continuous Speech Recognition (LVCSR) in Korean. To address pronunciation variation in Korean, we analyze phonological rules based on phonemic contexts together with morphological category and morpheme boundary information. Since the same phoneme sequences can be pronounced in different ways at across morpheme boundary, incorporating morphological environment is required to manipulate pronunciation variation modeling. We implement a rule-based pronunciation variants generator to produce a pronunciation lexicon with context-dependent multiple variants. At the lexical level, we apply an explicit modeling of pronunciation variation to add pronunciation variants at across morphemes as well as within morpheme into the pronunciation lexicon. At the acoustic level, we train the phone models with re-labeled transcriptions through forced alignment using context-dependent pronunciation lexicon. The proposed pronunciation lexicon offers the potential benefit for both training and decoding of a LVCSR system. Subsequently, we perform the speech recognition experiment on read speech task with 34K-morpheme vocabulary. Experiment confirms that improved performance is achieved by pronunciation variation modeling based on morpho-phonological analysis.

  • Automatic Affect Recognition Using Natural Language Processing Techniques and Manually Built Affect Lexicon

    Young Hwan CHO  Kong Joo LEE  

     
    PAPER-Natural Language Processing

      Vol:
    E89-D No:12
      Page(s):
    2964-2971

    In this paper, we present preliminary work on recognizing affect from a Korean textual document by using a manually built affect lexicon and adopting natural language processing tools. A manually built affect lexicon is constructed in order to be able to detect various emotional expressions, and its entries consist of emotion vectors. The natural language processing tools analyze an input document to enhance the accuracy of our affect recognizer. The performance of our affect recognizer is evaluated through automatic classification of song lyrics according to moods.

  • Construction of Thai Lexicon from Existing Dictionaries and Texts on the Web

    Thatsanee CHAROENPORN  Canasai KRUENGKRAI  Thanaruk THEERAMUNKONG  Virach SORNLERTLAMVANICH  

     
    PAPER-Natural Language Processing

      Vol:
    E89-D No:7
      Page(s):
    2286-2293

    A lexicon is an important linguistic resource needed for both shallow and deep language processing. Currently, there are few machine-readable Thai dictionaries available, and most of them do not satisfy the computational requirements. This paper presents the design of a Thai lexicon named the TCL's Computational Lexicon (TCLLEX) and proposes a method to construct a large-scale Thai lexicon by re-using two existing dictionaries and a large number of texts on the Internet. In addition to morphological, syntactic, semantic case role and logical information in the existing dictionaries, a sort of semantic constraint called selectional preference is automatically acquired by analyzing Thai texts on the web and then added into the lexicon. In the acquisition process of the selectional preferences, the so-called Bayesian Information Criterion (BIC) is applied as the measure in a tree cut model. The experiments are done to verify the feasibility and effectiveness of obtained selection preferences.

  • A Character-Based Postprocessing System for Handwritten Japanese Address Recognition

    Keiji YAMANAKA  Susumu KUROYANAGI  Akira IWATA  

     
    PAPER-Image Processing,Computer Graphics and Pattern Recognition

      Vol:
    E82-D No:2
      Page(s):
    468-474

    Based on a previous work on handwritten Japanese kanji character recognition, a postprocessing system for handwritten Japanese address recognition is proposed. Basically, the recognition system is composed of CombNET-II, a general-purpose large-scale character recognizer and MMVA, a modified majority voting system. Beginning with a set of character candidates, produced by a character recognizer for each character that composes the input word and a lexicon, an interpretation to the input word is generated. MMVA is used in the postprocessing stage to select the interpretation that accumulates the highest score. In the case of more than one possible interpretation, the Conflict Analyzing System calls the character recognizer again to generate scores for each character that composes each interpretation to determine the final output word. The proposed word recognition system was tested with 2 sets of handwritten Japanese city names, and recognition rates higher than 99% were achieved, demonstrating the effectiveness of the method.

  • A Lexicon Directed Algorithm for Recognition of Unconstrained Handwritten Words

    Fumitaka KIMURA  Shinji TSURUOKA  Yasuji MIYAKE  Malayappan SHRIDHAR  

     
    PAPER

      Vol:
    E77-D No:7
      Page(s):
    785-793

    In this paper, authors discuss on a lexicon directed algorithm for recognition of unconstrained handwritten words (cursive, discrete, or mixed) such as those encountered in mail pieces. The procedure consists of binarization, presegmentation, intermediate feature extraction, segmentation recognition, and post-processing. The segmentation recognition and the post-processing are repeated for all lexicon words while the binarization to the intermediate feature extraction are applied once for an input word. This algorithm is essentially non hierarchical in character segmentation and recognition which are performed in a single segmentation recognition process. The result of performance evaluation using large handwritten address block database, and algorithm improvements are described and discussed to achieve higher recognition accuracy and speed. Experimental studies with about 3000 word images indicate that overall accuracy in the range of 91% to 98% depending on the size of the lexicon (assumed to contain correct word) are achievable with the processing speed of 20 to 30 word per minute on typical work station.