The search functionality is under construction.

Keyword Search Result

[Keyword] Chinese(55hit)

1-20hit(55hit)

  • Sense-Aware Decoder for Character Based Japanese-Chinese NMT Open Access

    Zezhong LI  Fuji REN  

     
    LETTER-Natural Language Processing

      Pubricized:
    2023/12/11
      Vol:
    E107-D No:4
      Page(s):
    584-587

    Compared to subword based Neural Machine Translation (NMT), character based NMT eschews linguistic-motivated segmentation which performs directly on the raw character sequence, following a more absolute end-to-end manner. This property is more fascinating for machine translation (MT) between Japanese and Chinese, both of which use consecutive logographic characters without explicit word boundaries. However, there is still one disadvantage which should be addressed, that is, character is a less meaning-bearing unit than the subword, which requires the character models to be capable of sense discrimination. Specifically, there are two types of sense ambiguities existing in the source and target language, separately. With the former, it has been partially solved by the deep encoder and several existing works. But with the later, interestingly, the ambiguity in the target side is rarely discussed. To address this problem, we propose two simple yet effective methods, including a non-parametric pre-clustering for sense induction and a joint model to perform sense discrimination and NMT training simultaneously. Extensive experiments on Japanese⟷Chinese MT show that our proposed methods consistently outperform the strong baselines, and verify the effectiveness of using sense-discriminated representation for character based NMT.

  • PSDSpell: Pre-Training with Self-Distillation Learning for Chinese Spelling Correction Open Access

    Li HE  Xiaowu ZHANG  Jianyong DUAN  Hao WANG  Xin LI  Liang ZHAO  

     
    PAPER

      Pubricized:
    2023/10/25
      Vol:
    E107-D No:4
      Page(s):
    495-504

    Chinese spelling correction (CSC) models detect and correct a text typo based on the misspelled character and its context. Recently, Bert-based models have dominated the research of Chinese spelling correction. However, these methods only focus on the semantic information of the text during the pretraining stage, neglecting the learning of correcting spelling errors. Moreover, when multiple incorrect characters are in the text, the context introduces noisy information, making it difficult for the model to accurately detect the positions of the incorrect characters, leading to false corrections. To address these limitations, we apply the multimodal pre-trained language model ChineseBert to the task of spelling correction. We propose a self-distillation learning-based pretraining strategy, where a confusion set is used to construct text containing erroneous characters, allowing the model to jointly learns how to understand language and correct spelling errors. Additionally, we introduce a single-channel masking mechanism to mitigate the noise caused by the incorrect characters. This mechanism masks the semantic encoding channel while preserving the phonetic and glyph encoding channels, reducing the noise introduced by incorrect characters during the prediction process. Finally, experiments are conducted on widely used benchmarks. Our model achieves superior performance against state-of-the-art methods by a remarkable gain.

  • Multi-Agent Surveillance Based on Travel Cost Minimization

    Kyohei MURAKATA  Koichi KOBAYASHI  Yuh YAMASHITA  

     
    PAPER

      Pubricized:
    2023/07/19
      Vol:
    E107-A No:1
      Page(s):
    25-30

    The multi-agent surveillance problem is to find optimal trajectories of multiple agents that patrol a given area as evenly as possible. In this paper, we consider the multi-agent surveillance problem based on travel cost minimization. The surveillance area is given by an undirected graph. The penalty for each agent is introduced to evaluate the surveillance performance. Through a mixed logical dynamical system model, the multi-agent surveillance problem is reduced to a mixed integer linear programming (MILP) problem. In model predictive control, trajectories of agents are generated by solving the MILP problem at each discrete time. Furthermore, a condition that the MILP problem is always feasible is derived based on the Chinese postman problem. Finally, the proposed method is demonstrated by a numerical example.

  • U-Net Architecture for Ancient Handwritten Chinese Character Detection in Han Dynasty Wooden Slips

    Hojun SHIMOYAMA  Soh YOSHIDA  Takao FUJITA  Mitsuji MUNEYASU  

     
    PAPER-Image

      Pubricized:
    2023/05/15
      Vol:
    E106-A No:11
      Page(s):
    1406-1415

    Recent character detectors have been modeled using deep neural networks and have achieved high performance in various tasks, such as text detection in natural scenes and character detection in historical documents. However, existing methods cannot achieve high detection accuracy for wooden slips because of their multi-scale character sizes and aspect ratios, high character density, and close character-to-character distance. In this study, we propose a new U-Net-based character detection and localization framework that learns character regions and boundaries between characters. The proposed method enhances the learning performance of character regions by simultaneously learning the vertical and horizontal boundaries between characters. Furthermore, by adding simple and low-cost post-processing using the learned regions of character boundaries, it is possible to more accurately detect the location of a group of characters in a close neighborhood. In this study, we construct a wooden slip dataset. Experiments demonstrated that the proposed method outperformed existing character detection methods, including state-of-the-art character detection methods for historical documents.

  • Chinese Named Entity Recognition Method Based on Dictionary Semantic Knowledge Enhancement

    Tianbin WANG  Ruiyang HUANG  Nan HU  Huansha WANG  Guanghan CHU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/02/15
      Vol:
    E106-D No:5
      Page(s):
    1010-1017

    Chinese Named Entity Recognition is the fundamental technology in the field of the Chinese Natural Language Process. It is extensively adopted into information extraction, intelligent question answering, and knowledge graph. Nevertheless, due to the diversity and complexity of Chinese, most Chinese NER methods fail to sufficiently capture the character granularity semantics, which affects the performance of the Chinese NER. In this work, we propose DSKE-Chinese NER: Chinese Named Entity Recognition based on Dictionary Semantic Knowledge Enhancement. We novelly integrate the semantic information of character granularity into the vector space of characters and acquire the vector representation containing semantic information by the attention mechanism. In addition, we verify the appropriate number of semantic layers through the comparative experiment. Experiments on public Chinese datasets such as Weibo, Resume and MSRA show that the model outperforms character-based LSTM baselines.

  • Chinese Lexical Sememe Prediction Using CilinE Knowledge

    Hao WANG  Sirui LIU  Jianyong DUAN  Li HE  Xin LI  

     
    PAPER-Language, Thought, Knowledge and Intelligence

      Pubricized:
    2022/08/18
      Vol:
    E106-A No:2
      Page(s):
    146-153

    Sememes are the smallest semantic units of human languages, the composition of which can represent the meaning of words. Sememes have been successfully applied to many downstream applications in natural language processing (NLP) field. Annotation of a word's sememes depends on language experts, which is both time-consuming and labor-consuming, limiting the large-scale application of sememe. Researchers have proposed some sememe prediction methods to automatically predict sememes for words. However, existing sememe prediction methods focus on information of the word itself, ignoring the expert-annotated knowledge bases which indicate the relations between words and should value in sememe predication. Therefore, we aim at incorporating the expert-annotated knowledge bases into sememe prediction process. To achieve that, we propose a CilinE-guided sememe prediction model which employs an existing word knowledge base CilinE to remodel the sememe prediction from relational perspective. Experiments on HowNet, a widely used Chinese sememe knowledge base, have shown that CilinE has an obvious positive effect on sememe prediction. Furthermore, our proposed method can be integrated into existing methods and significantly improves the prediction performance. We will release the data and code to the public.

  • Research on Mongolian-Chinese Translation Model Based on Transformer with Soft Context Data Augmentation Technique

    Qing-dao-er-ji REN  Yuan LI  Shi BAO  Yong-chao LIU  Xiu-hong CHEN  

     
    PAPER-Neural Networks and Bioengineering

      Pubricized:
    2021/11/19
      Vol:
    E105-A No:5
      Page(s):
    871-876

    As the mainstream approach in the field of machine translation, neural machine translation (NMT) has achieved great improvements on many rich-source languages, but performance of NMT for low-resource languages ae not very good yet. This paper uses data enhancement technology to construct Mongolian-Chinese pseudo parallel corpus, so as to improve the translation ability of Mongolian-Chinese translation model. Experiments show that the above methods can improve the translation ability of the translation model. Finally, a translation model trained with large-scale pseudo parallel corpus and integrated with soft context data enhancement technology is obtained, and its BLEU value is 39.3.

  • A Method of K-Means Clustering Based on TF-IDF for Software Requirements Documents Written in Chinese Language

    Jing ZHU  Song HUANG  Yaqing SHI  Kaishun WU  Yanqiu WANG  

     
    PAPER-Software Engineering

      Pubricized:
    2021/12/28
      Vol:
    E105-D No:4
      Page(s):
    736-754

    Nowadays there is no way to automatically obtain the function points when using function point analyze (FPA) method, especially for the requirement documents written in Chinese language. Considering the characteristics of Chinese grammar in words segmentation, it is necessary to divide words accurately Chinese words, so that the subsequent entity recognition and disambiguation can be carried out in a smaller range, which lays a solid foundation for the efficient automatic extraction of the function points. Therefore, this paper proposed a method of K-Means clustering based on TF-IDF, and conducts experiments with 24 software requirement documents written in Chinese language. The results show that the best clustering effect is achieved when the extracted information is retained by 55% to 75% and the number of clusters takes the middle value of the total number of clusters. Not only for Chinese, this method and conclusion of this paper, but provides an important reference for automatic extraction of function points from software requirements documents written in other Oriental languages, and also fills the gaps of data preprocessing in the early stage of automatic calculation function points.

  • A Modulus Factorization Algorithm for Self-Orthogonal and Self-Dual Quasi-Cyclic Codes via Polynomial Matrices Open Access

    Hajime MATSUI  

     
    LETTER-Coding Theory

      Pubricized:
    2021/05/21
      Vol:
    E104-A No:11
      Page(s):
    1649-1653

    A construction method of self-orthogonal and self-dual quasi-cyclic codes is shown which relies on factorization of modulus polynomials for cyclicity in this study. The smaller-size generator polynomial matrices are used instead of the generator matrices as linear codes. An algorithm based on Chinese remainder theorem finds the generator polynomial matrix on the original modulus from the ones constructed on each factor. This method enables us to efficiently construct and search these codes when factoring modulus polynomials into reciprocal polynomials.

  • Constructions of Binary Sequence Pairs of Length 5q with Optimal Three-Level Correlation

    Xiumin SHEN  Xiaofei SONG  Yanguo JIA  Yubo LI  

     
    LETTER-Coding Theory

      Pubricized:
    2021/04/14
      Vol:
    E104-A No:10
      Page(s):
    1435-1439

    Binary sequence pairs with optimal periodic correlation have important applications in many fields of communication systems. In this letter, four new families of binary sequence pairs are presented based on the generalized cyclotomy over Z5q, where q ≠ 5 is an odd prime. All these binary sequence pairs have optimal three-level correlation values {-1, 3}.

  • Efficient Algorithms for Sign Detection in RNS Using Approximate Reciprocals Open Access

    Shinichi KAWAMURA  Yuichi KOMANO  Hideo SHIMIZU  Saki OSUKA  Daisuke FUJIMOTO  Yuichi HAYASHI  Kentaro IMAFUKU  

     
    PAPER

      Vol:
    E104-A No:1
      Page(s):
    121-134

    The residue number system (RNS) is a method for representing an integer x as an n-tuple of its residues with respect to a given set of moduli. In RNS, addition, subtraction, and multiplication can be carried out by independent operations with respect to each modulus. Therefore, an n-fold speedup can be achieved by parallel processing. The main disadvantage of RNS is that we cannot efficiently compare the magnitude of two integers or determine the sign of an integer. Two general methods of comparison are to transform a number in RNS to a mixed-radix system or to a radix representation using the Chinese remainder theorem (CRT). We used the CRT to derive an equation approximating a value of x relative to M, the product of moduli. Then, we propose two algorithms that efficiently evaluate the equation and output a sign bit. The expected number of steps of these algorithms is of order n. The algorithms use a lookup table that is (n+3) times as large as M, which is reasonably small for most applications including cryptography.

  • Improving Faster R-CNN Framework for Multiscale Chinese Character Detection and Localization

    Minseong KIM  Hyun-Chul CHOI  

     
    LETTER-Pattern Recognition

      Pubricized:
    2020/04/06
      Vol:
    E103-D No:7
      Page(s):
    1777-1781

    Faster R-CNN uses a region proposal network which consists of a single scale convolution filter and fully connected networks to localize detected regions. However, using a single scale filter is not enough to detect full regions of characters. In this letter, we propose a simple but effective way, i.e., utilizing variously sized convolution filters, to accurately detect Chinese characters of multiple scales in documents. We experimentally verified that our method improved IoU by 4% and detection rate by 3% than the previous single scale Faster R-CNN method.

  • A Double Adversarial Network Model for Multi-Domain and Multi-Task Chinese Named Entity Recognition

    Yun HU  Changwen ZHENG  

     
    PAPER-Natural Language Processing

      Pubricized:
    2020/04/01
      Vol:
    E103-D No:7
      Page(s):
    1744-1752

    Named Entity Recognition (NER) systems are often realized by supervised methods such as CRF and neural network methods, which require large annotated data. In some domains that small annotated training data is available, multi-domain or multi-task learning methods are often used. In this paper, we explore the methods that use news domain and Chinese Word Segmentation (CWS) task to improve the performance of Chinese named entity recognition in weibo domain. We first propose two baseline models combining multi-domain and multi-task information. The two baseline models share information between different domains and tasks through sharing parameters simply. Then, we propose a Double ADVersarial model (DoubADV model). The model uses two adversarial networks considering the shared and private features in different domains and tasks. Experimental results show that our DoubADV model outperforms other baseline models and achieves state-of-the-art performance compared with previous works in multi-domain and multi-task situation.

  • Prosody Correction Preserving Speaker Individuality for Chinese-Accented Japanese HMM-Based Text-to-Speech Synthesis Open Access

    Daiki SEKIZAWA  Shinnosuke TAKAMICHI  Hiroshi SARUWATARI  

     
    LETTER-Speech and Hearing

      Pubricized:
    2019/03/11
      Vol:
    E102-D No:6
      Page(s):
    1218-1221

    This article proposes a prosody correction method based on partial model adaptation for Chinese-accented Japanese hidden Markov model (HMM)-based text-to-speech synthesis. Although text-to-speech synthesis built from non-native speech accurately reproduces the speaker's individuality in synthetic speech, the naturalness of the synthetic speech is strongly degraded. In the proposed model, to improve the naturalness while preserving the speaker individuality of Chinese-accented Japanese text-to-speech synthesis, we partially utilize HMM parameters of native Japanese speech to synthesize prosody-corrected synthetic speech. Results of an experimental evaluation demonstrate that duration and F0 correction are significantly effective for improving naturalness.

  • Preordering for Chinese-Vietnamese Statistical Machine Translation

    Huu-Anh TRAN  Heyan HUANG  Phuoc TRAN  Shumin SHI  Huu NGUYEN  

     
    PAPER-Natural Language Processing

      Pubricized:
    2018/11/12
      Vol:
    E102-D No:2
      Page(s):
    375-382

    Word order is one of the most significant differences between the Chinese and Vietnamese. In the phrase-based statistical machine translation, the reordering model will learn reordering rules from bilingual corpora. If the bilingual corpora are large and good enough, the reordering rules are exact and coverable. However, Chinese-Vietnamese is a low-resource language pair, the extraction of reordering rules is limited. This leads to the quality of reordering in Chinese-Vietnamese machine translation is not high. In this paper, we have combined Chinese dependency relation and Chinese-Vietnamese word alignment results in order to pre-order Chinese word order to be suitable to Vietnamese one. The experimental results show that our methodology has improved the machine translation performance compared to the translation system using only the reordering models of phrase-based statistical machine translation.

  • New Families of Quaternary Sequences of Period 2p with Low Autocorrelation

    Xiaofei SONG  Yanguo JIA  Xiumin SHEN  Yubo LI  Xiuping PENG  

     
    LETTER-Coding Theory

      Vol:
    E101-A No:11
      Page(s):
    1964-1969

    In this letter, two new families of quaternary sequences with low four-level or five-level autocorrelation are constructed based on generalized cyclotomy over Z2p. These quaternary sequences are balanced and the maximal absolute value of the out-of-phase autocorrelation is 4.

  • A Modulus Factorization Algorithm for Self-Orthogonal and Self-Dual Integer Codes

    Hajime MATSUI  

     
    LETTER-Coding Theory

      Vol:
    E101-A No:11
      Page(s):
    1952-1956

    Integer codes are defined by error-correcting codes over integers modulo a fixed positive integer. In this paper, we show that the construction of integer codes can be reduced into the cases of prime-power moduli. We can efficiently search integer codes with small prime-power moduli and can construct target integer codes with a large composite-number modulus. Moreover, we also show that this prime-factorization reduction is useful for the construction of self-orthogonal and self-dual integer codes, i.e., these properties in the prime-power moduli are preserved in the composite-number modulus. Numerical examples of integer codes and generator matrices demonstrate these facts and processes.

  • New Constructions of Zero-Difference Balanced Functions

    Zhibao LIN  Zhengqian LI  Pinhui KE  

     
    LETTER-Coding Theory

      Vol:
    E101-A No:10
      Page(s):
    1719-1723

    Zero-difference balanced (ZDB) functions, which have many applications in coding theory and sequence design, have received a lot of attention in recent years. In this letter, based on two known classes of ZDB functions, a new class of ZDB functions, which is defined on the group (Z2e-1×Zn,+) is presented, where e is a prime and n=p1m1p2m2…pkmk, pi is odd prime satisfying that e|(pi-1) for any 1≤i≤k . In the case of gcd(2e-1,n)=1, the new constructed ZDB functions are cyclic.

  • New Construction Methods for Binary Sequence Pairs of Period pq with Ideal Two-Level Correlation

    Xiumin SHEN  Yanguo JIA  Xiaofei SONG  Yubo LI  

     
    PAPER-Coding Theory

      Vol:
    E101-A No:4
      Page(s):
    704-712

    In this paper, a new generalized cyclotomy over Zpq is presented based on cyclotomy and Chinese remainder theorem, where p and q are different odd primes. Several new construction methods for binary sequence pairs of period pq with ideal two-level correlation are given by utilizing these generalized cyclotomic classes. All the binary sequence pairs from our constructions have both ideal out-of-phase correlation values -1 and optimum balance property.

  • Corpus Expansion for Neural CWS on Microblog-Oriented Data with λ-Active Learning Approach

    Jing ZHANG  Degen HUANG  Kaiyu HUANG  Zhuang LIU  Fuji REN  

     
    PAPER-Natural Language Processing

      Pubricized:
    2017/12/08
      Vol:
    E101-D No:3
      Page(s):
    778-785

    Microblog data contains rich information of real-world events with great commercial values, so microblog-oriented natural language processing (NLP) tasks have grabbed considerable attention of researchers. However, the performance of microblog-oriented Chinese Word Segmentation (CWS) based on deep neural networks (DNNs) is still not satisfying. One critical reason is that the existing microblog-oriented training corpus is inadequate to train effective weight matrices for DNNs. In this paper, we propose a novel active learning method to extend the scale of the training corpus for DNNs. However, due to a large amount of partially overlapped sentences in the microblogs, it is difficult to select samples with high annotation values from raw microblogs during the active learning procedure. To select samples with higher annotation values, parameter λ is introduced to control the number of repeatedly selected samples. Meanwhile, various strategies are adopted to measure the overall annotation values of a sample during the active learning procedure. Experiments on the benchmark datasets of NLPCC 2015 show that our λ-active learning method outperforms the baseline system and the state-of-the-art method. Besides, the results also demonstrate that the performances of the DNNs trained on the extended corpus are significantly improved.

1-20hit(55hit)