The search functionality is under construction.

Author Search Result

[Author] JinAn XU(6hit)

1-6hit
  • An Approach for Chinese-Japanese Named Entity Equivalents Extraction Using Inductive Learning and Hanzi-Kanji Mapping Table

    JinAn XU  Yufeng CHEN  Kuang RU  Yujie ZHANG  Kenji ARAKI  

     
    PAPER-Natural Language Processing

      Pubricized:
    2017/05/02
      Vol:
    E100-D No:8
      Page(s):
    1882-1892

    Named Entity Translation Equivalents extraction plays a critical role in machine translation (MT) and cross language information retrieval (CLIR). Traditional methods are often based on large-scale parallel or comparable corpora. However, the applicability of these studies is constrained, mainly because of the scarcity of parallel corpora of the required scale, especially for language pairs of Chinese and Japanese. In this paper, we propose a method considering the characteristics of Chinese and Japanese to automatically extract the Chinese-Japanese Named Entity (NE) translation equivalents based on inductive learning (IL) from monolingual corpora. The method adopts the Chinese Hanzi and Japanese Kanji Mapping Table (HKMT) to calculate the similarity of the NE instances between Japanese and Chinese. Then, we use IL to obtain partial translation rules for NEs by extracting the different parts from high similarity NE instances in Chinese and Japanese. In the end, the feedback processing updates the Chinese and Japanese NE entity similarity and rule sets. Experimental results show that our simple, efficient method, which overcomes the insufficiency of the traditional methods, which are severely dependent on bilingual resource. Compared with other methods, our method combines the language features of Chinese and Japanese with IL for automatically extracting NE pairs. Our use of a weak correlation bilingual text sets and minimal additional knowledge to extract NE pairs effectively reduces the cost of building the corpus and the need for additional knowledge. Our method may help to build a large-scale Chinese-Japanese NE translation dictionary using monolingual corpora.

  • Exploring Hypotactic Structure for Chinese-English Machine Translation with a Structure-Aware Encoder-Decoder Neural Model

    Guoyi MIAO  Yufeng CHEN  Mingtong LIU  Jinan XU  Yujie ZHANG  Wenhe FENG  

     
    PAPER-Natural Language Processing

      Pubricized:
    2022/01/11
      Vol:
    E105-D No:4
      Page(s):
    797-806

    Translation of long and complex sentence has always been a challenge for machine translation. In recent years, neural machine translation (NMT) has achieved substantial progress in modeling the semantic connection between words in a sentence, but it is still insufficient in capturing discourse structure information between clauses within complex sentences, which often leads to poor discourse coherence when translating long and complex sentences. On the other hand, the hypotactic structure, a main component of the discourse structure, plays an important role in the coherence of discourse translation, but it is not specifically studied. To tackle this problem, we propose a novel Chinese-English NMT approach that incorporates the hypotactic structure knowledge of complex sentences. Specifically, we first annotate and build a hypotactic structure aligned parallel corpus to provide explicit hypotactic structure knowledge of complex sentences for NMT. Then we propose three hypotactic structure-aware NMT models with three different fusion strategies, including source-side fusion, target-side fusion, and both-side fusion, to integrate the annotated structure knowledge into NMT. Experimental results on WMT17, WMT18 and WMT19 Chinese-English translation tasks demonstrate that the proposed method can significantly improve the translation performance and enhance the discourse coherence of machine translation.

  • Character-Level Dependency Model for Joint Word Segmentation, POS Tagging, and Dependency Parsing in Chinese

    Zhen GUO  Yujie ZHANG  Chen SU  Jinan XU  Hitoshi ISAHARA  

     
    PAPER-Natural Language Processing

      Pubricized:
    2015/10/06
      Vol:
    E99-D No:1
      Page(s):
    257-264

    Recent work on joint word segmentation, POS (Part Of Speech) tagging, and dependency parsing in Chinese has two key problems: the first is that word segmentation based on character and dependency parsing based on word were not combined well in the transition-based framework, and the second is that the joint model suffers from the insufficiency of annotated corpus. In order to resolve the first problem, we propose to transform the traditional word-based dependency tree into character-based dependency tree by using the internal structure of words and then propose a novel character-level joint model for the three tasks. In order to resolve the second problem, we propose a novel semi-supervised joint model for exploiting n-gram feature and dependency subtree feature from partially-annotated corpus. Experimental results on the Chinese Treebank show that our joint model achieved 98.31%, 94.84% and 81.71% for Chinese word segmentation, POS tagging, and dependency parsing, respectively. Our model outperforms the pipeline model of the three tasks by 0.92%, 1.77% and 3.95%, respectively. Particularly, the F1 value of word segmentation and POS tagging achieved the best result compared with those reported until now.

  • MKGN: A Multi-Dimensional Knowledge Enhanced Graph Network for Multi-Hop Question and Answering

    Ying ZHANG  Fandong MENG  Jinchao ZHANG  Yufeng CHEN  Jinan XU  Jie ZHOU  

     
    PAPER-Natural Language Processing

      Pubricized:
    2021/12/29
      Vol:
    E105-D No:4
      Page(s):
    807-819

    Machine reading comprehension with multi-hop reasoning always suffers from reasoning path breaking due to the lack of world knowledge, which always results in wrong answer detection. In this paper, we analyze what knowledge the previous work lacks, e.g., dependency relations and commonsense. Based on our analysis, we propose a Multi-dimensional Knowledge enhanced Graph Network, named MKGN, which exploits specific knowledge to repair the knowledge gap in reasoning process. Specifically, our approach incorporates not only entities and dependency relations through various graph neural networks, but also commonsense knowledge by a bidirectional attention mechanism, which aims to enhance representations of both question and contexts. Besides, to make the most of multi-dimensional knowledge, we investigate two kinds of fusion architectures, i.e., in the sequential and parallel manner. Experimental results on HotpotQA dataset demonstrate the effectiveness of our approach and verify that using multi-dimensional knowledge, especially dependency relations and commonsense, can indeed improve the reasoning process and contribute to correct answer detection.

  • Gated Convolutional Neural Networks with Sentence-Related Selection for Distantly Supervised Relation Extraction

    Yufeng CHEN  Siqi LI  Xingya LI  Jinan XU  Jian LIU  

     
    PAPER-Natural Language Processing

      Pubricized:
    2021/06/01
      Vol:
    E104-D No:9
      Page(s):
    1486-1495

    Relation extraction is one of the key basic tasks in natural language processing in which distant supervision is widely used for obtaining large-scale labeled data without expensive labor cost. However, the automatically generated data contains massive noise because of the wrong labeling problem in distant supervision. To address this problem, the existing research work mainly focuses on removing sentence-level noise with various sentence selection strategies, which however could be incompetent for disposing word-level noise. In this paper, we propose a novel neural framework considering both intra-sentence and inter-sentence relevance to deal with word-level and sentence-level noise from distant supervision, which is denoted as Sentence-Related Gated Piecewise Convolutional Neural Networks (SR-GPCNN). Specifically, 1) a gate mechanism with multi-head self-attention is adopted to reduce word-level noise inside sentences; 2) a soft-label strategy is utilized to alleviate wrong-labeling propagation problem; and 3) a sentence-related selection model is designed to filter sentence-level noise further. The extensive experimental results on NYT dataset demonstrate that our approach filters word-level and sentence-level noise effectively, thus significantly outperforms all the baseline models in terms of both AUC and top-n precision metrics.

  • A Hybrid Topic Model for Multi-Document Summarization

    JinAn XU  JiangMing LIU  Kenji ARAKI  

     
    PAPER-Natural Language Processing

      Pubricized:
    2015/02/09
      Vol:
    E98-D No:5
      Page(s):
    1089-1094

    Topic features are useful in improving text summarization. However, independency among topics is a strong restriction on most topic models, and alleviating this restriction can deeply capture text structure. This paper proposes a hybrid topic model to generate multi-document summaries using a combination of the Hidden Topic Markov Model (HTMM), the surface texture model and the topic transition model. Based on the topic transition model, regular topic transition probability is used during generating summary. This approach eliminates the topic independence assumption in the Latent Dirichlet Allocation (LDA) model. Meanwhile, the results of experiments show the advantage of the combination of the three kinds of models. This paper includes alleviating topic independency, and integrating surface texture and shallow semantic in documents to improve summarization. In short, this paper attempts to realize an advanced summarization system.