1-3hit |
JinAn XU Yufeng CHEN Kuang RU Yujie ZHANG Kenji ARAKI
Named Entity Translation Equivalents extraction plays a critical role in machine translation (MT) and cross language information retrieval (CLIR). Traditional methods are often based on large-scale parallel or comparable corpora. However, the applicability of these studies is constrained, mainly because of the scarcity of parallel corpora of the required scale, especially for language pairs of Chinese and Japanese. In this paper, we propose a method considering the characteristics of Chinese and Japanese to automatically extract the Chinese-Japanese Named Entity (NE) translation equivalents based on inductive learning (IL) from monolingual corpora. The method adopts the Chinese Hanzi and Japanese Kanji Mapping Table (HKMT) to calculate the similarity of the NE instances between Japanese and Chinese. Then, we use IL to obtain partial translation rules for NEs by extracting the different parts from high similarity NE instances in Chinese and Japanese. In the end, the feedback processing updates the Chinese and Japanese NE entity similarity and rule sets. Experimental results show that our simple, efficient method, which overcomes the insufficiency of the traditional methods, which are severely dependent on bilingual resource. Compared with other methods, our method combines the language features of Chinese and Japanese with IL for automatically extracting NE pairs. Our use of a weak correlation bilingual text sets and minimal additional knowledge to extract NE pairs effectively reduces the cost of building the corpus and the need for additional knowledge. Our method may help to build a large-scale Chinese-Japanese NE translation dictionary using monolingual corpora.
A simple and efficient semi-supervised classification method is presented. An unsupervised spectral mapping method is extended to a semi-supervised situation with multiplicative modulation of similarities between data. Our proposed algorithm is derived by linearization of this nonlinear semi-supervised mapping method. Experiments using the proposed method for some public benchmark data and color image data reveal that our method outperforms a supervised algorithm using the linear discriminant analysis and a previous semi-supervised classification method.
A semi-supervised classification method is presented. A robust unsupervised spectral mapping method is extended to a semi-supervised situation. Our proposed algorithm is derived by linearization of this nonlinear semi-supervised mapping method. Experiments using the proposed method for some public benchmark data reveal that our method outperforms a supervised algorithm using the linear discriminant analysis for the iris and wine data and is also more accurate than a semi-supervised algorithm of the logistic GRF for the ionosphere dataset.