The search functionality is under construction.

IEICE TRANSACTIONS on Fundamentals

Modification of LZSS by Using Structures of Hangul Characters for Hangul Text Compression

Jae Young LEE, Keong Mo SUNG

  • Full Text Views

    0

  • Cite this

Summary :

This paper suggests modified LZSS which is suitable for compressing Hangul data by Hangul character token and the string token with small size based on Hangul properties. The Hangul properties can be described in 2 ways. 1) The structure of a Hangul character consists of 3 letters: The first sound letter, the middle sound letter, and the last sound letter which are called Cho-seong, Jung-seong, and Jong-seong, respectively. 2) The code of Hangul is represented by 2 bytes. The first property is used for making the character token processing Hangul characters which occupies most of the unmatched characters. That is, the unmatched Hangul characters are replaced with one Hangul character token represented by Huffman codes of Cho-seong, Jung-seong, and Jong-seong in regular sequence, instead of 2 character tokens. The second property is used to shorten the size of the string token processing matched string. In other words, since more than 75% of Hangul data are Hangul and Hangul codes are constructed in 2 bytes, the addresses of the window of LZSS can be assigned in 2-byte unit. As a result, the distance field and the length field of the string token can be lessened by one bit each. After compressing Hangul data through these tokens, about 3% of improvement could be made in compression ratio.

Publication
IEICE TRANSACTIONS on Fundamentals Vol.E79-A No.11 pp.1904-1910
Publication Date
1996/11/25
Publicized
Online ISSN
DOI
Type of Manuscript
PAPER
Category
Information Theory and Coding Theory

Authors

Keyword