The search functionality is under construction.

Author Search Result

[Author] Li HE(3hit)

1-3hit
  • Chinese Lexical Sememe Prediction Using CilinE Knowledge

    Hao WANG  Sirui LIU  Jianyong DUAN  Li HE  Xin LI  

     
    PAPER-Language, Thought, Knowledge and Intelligence

      Pubricized:
    2022/08/18
      Vol:
    E106-A No:2
      Page(s):
    146-153

    Sememes are the smallest semantic units of human languages, the composition of which can represent the meaning of words. Sememes have been successfully applied to many downstream applications in natural language processing (NLP) field. Annotation of a word's sememes depends on language experts, which is both time-consuming and labor-consuming, limiting the large-scale application of sememe. Researchers have proposed some sememe prediction methods to automatically predict sememes for words. However, existing sememe prediction methods focus on information of the word itself, ignoring the expert-annotated knowledge bases which indicate the relations between words and should value in sememe predication. Therefore, we aim at incorporating the expert-annotated knowledge bases into sememe prediction process. To achieve that, we propose a CilinE-guided sememe prediction model which employs an existing word knowledge base CilinE to remodel the sememe prediction from relational perspective. Experiments on HowNet, a widely used Chinese sememe knowledge base, have shown that CilinE has an obvious positive effect on sememe prediction. Furthermore, our proposed method can be integrated into existing methods and significantly improves the prediction performance. We will release the data and code to the public.

  • Conceptual Knowledge Enhanced Model for Multi-Intent Detection and Slot Filling Open Access

    Li HE  Jingxuan ZHAO  Jianyong DUAN  Hao WANG  Xin LI  

     
    PAPER

      Pubricized:
    2023/10/25
      Vol:
    E107-D No:4
      Page(s):
    468-476

    In Natural Language Understanding, intent detection and slot filling have been widely used to understand user queries. However, current methods tend to rely on single words and sentences to understand complex semantic concepts, and can only consider local information within the sentence. Therefore, they usually cannot capture long-distance dependencies well and are prone to problems where complex intentions in sentences are difficult to recognize. In order to solve the problem of long-distance dependency of the model, this paper uses ConceptNet as an external knowledge source and introduces its extensive semantic information into the multi-intent detection and slot filling model. Specifically, for a certain sentence, based on confidence scores and semantic relationships, the most relevant conceptual knowledge is selected to equip the sentence, and a concept context map with rich information is constructed. Then, the multi-head graph attention mechanism is used to strengthen context correlation and improve the semantic understanding ability of the model. The experimental results indicate that the model has significantly improved performance compared to other models on the MixATIS and MixSNIPS multi-intent datasets.

  • PSDSpell: Pre-Training with Self-Distillation Learning for Chinese Spelling Correction Open Access

    Li HE  Xiaowu ZHANG  Jianyong DUAN  Hao WANG  Xin LI  Liang ZHAO  

     
    PAPER

      Pubricized:
    2023/10/25
      Vol:
    E107-D No:4
      Page(s):
    495-504

    Chinese spelling correction (CSC) models detect and correct a text typo based on the misspelled character and its context. Recently, Bert-based models have dominated the research of Chinese spelling correction. However, these methods only focus on the semantic information of the text during the pretraining stage, neglecting the learning of correcting spelling errors. Moreover, when multiple incorrect characters are in the text, the context introduces noisy information, making it difficult for the model to accurately detect the positions of the incorrect characters, leading to false corrections. To address these limitations, we apply the multimodal pre-trained language model ChineseBert to the task of spelling correction. We propose a self-distillation learning-based pretraining strategy, where a confusion set is used to construct text containing erroneous characters, allowing the model to jointly learns how to understand language and correct spelling errors. Additionally, we introduce a single-channel masking mechanism to mitigate the noise caused by the incorrect characters. This mechanism masks the semantic encoding channel while preserving the phonetic and glyph encoding channels, reducing the noise introduced by incorrect characters during the prediction process. Finally, experiments are conducted on widely used benchmarks. Our model achieves superior performance against state-of-the-art methods by a remarkable gain.