The search functionality is under construction.
The search functionality is under construction.

Application of a Word-Based Text Compression Method to Japanese and Chinese Texts

Shigeru YOSHIDA, Takashi MORIHARA, Hironori YAHAGI, Noriko ITANI

  • Full Text Views

    0

  • Cite this

Summary :

16-bit Asian language codes can not be compressed well by conventional 8-bit sampling text compression schemes. Previously, we reported the application of a word-based text compression method that uses 16-bit sampling for the compression of Japanese texts. This paper describes our further efforts in applying a word-based method with a static canonical Huffman encoder to both Japanese and Chinese texts. The method was proposed to support a multilingual environment, as we replaced the word-dictionary and the canonical Huffman code table for the respective language appropriately. A computer simulation showed that this method is effective for both languages. The obtained compression ratio was a little less than 0.5 without regarding the Markov context, and around 0.4 when accounting for the first order Markov context.

Publication
IEICE TRANSACTIONS on Fundamentals Vol.E85-A No.12 pp.2933-2938
Publication Date
2002/12/01
Publicized
Online ISSN
DOI
Type of Manuscript
PAPER
Category
Information Theory

Authors

Keyword