Highly Efficient Universal Coding with Classifying to Subdictionaries for Text Compression

Yasuhiko NAKANO; Hironori YAHAGI; Yoshiyuki OKADA; Shigeru YOSHIDA

Highly Efficient Universal Coding with Classifying to Subdictionaries for Text Compression

Yasuhiko NAKANO, Hironori YAHAGI, Yoshiyuki OKADA, Shigeru YOSHIDA

Full Text Views

0

Cite this

Summary :

We developed a simple, practical, adaptive data compression algorithm of the LZ78 class. According to the Lempel-Ziv greedy parsing, a string boundary is not related to the statistical history modeled by finite-state sources. We have already reported an algorithm classifying data into subdictionaries (CSD), which uses multiple subdictionaries and conditions the current string by using the previous one to obtain a higher compression ratio. In this paper, we present a practical implementation of this method suitable for any kinds of data, and show that CSD is more efficient than the LZC which is the method used by the program compress available on UNIX systems. The CSD compression performance was about 10% better than that of LZC with the practical dictionary size, an 8k-entry dictionary when the test data was from the Calgary Compression Corpus. With hashing, the CSD processing speed became as fast as that of LZC, although the CSD algorithm was more complicated than LZC.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E77-A No.9 pp.1520-1526

Publication Date: 1994/09/25

Publicized

Online ISSN

DOI

Type of Manuscript: PAPER

Category: Algorithms, Data Structures and Computational Complexity

Cite this

Copy

Yasuhiko NAKANO, Hironori YAHAGI, Yoshiyuki OKADA, Shigeru YOSHIDA, "Highly Efficient Universal Coding with Classifying to Subdictionaries for Text Compression" in IEICE TRANSACTIONS on Fundamentals, vol. E77-A, no. 9, pp. 1520-1526, September 1994, doi: .
Abstract: We developed a simple, practical, adaptive data compression algorithm of the LZ78 class. According to the Lempel-Ziv greedy parsing, a string boundary is not related to the statistical history modeled by finite-state sources. We have already reported an algorithm classifying data into subdictionaries (CSD), which uses multiple subdictionaries and conditions the current string by using the previous one to obtain a higher compression ratio. In this paper, we present a practical implementation of this method suitable for any kinds of data, and show that CSD is more efficient than the LZC which is the method used by the program compress available on UNIX systems. The CSD compression performance was about 10% better than that of LZC with the practical dictionary size, an 8k-entry dictionary when the test data was from the Calgary Compression Corpus. With hashing, the CSD processing speed became as fast as that of LZC, although the CSD algorithm was more complicated than LZC.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/e77-a_9_1520/_p

Copy

@ARTICLE{e77-a_9_1520,
author={Yasuhiko NAKANO, Hironori YAHAGI, Yoshiyuki OKADA, Shigeru YOSHIDA, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Highly Efficient Universal Coding with Classifying to Subdictionaries for Text Compression},
year={1994},
volume={E77-A},
number={9},
pages={1520-1526},
abstract={We developed a simple, practical, adaptive data compression algorithm of the LZ78 class. According to the Lempel-Ziv greedy parsing, a string boundary is not related to the statistical history modeled by finite-state sources. We have already reported an algorithm classifying data into subdictionaries (CSD), which uses multiple subdictionaries and conditions the current string by using the previous one to obtain a higher compression ratio. In this paper, we present a practical implementation of this method suitable for any kinds of data, and show that CSD is more efficient than the LZC which is the method used by the program compress available on UNIX systems. The CSD compression performance was about 10% better than that of LZC with the practical dictionary size, an 8k-entry dictionary when the test data was from the Calgary Compression Corpus. With hashing, the CSD processing speed became as fast as that of LZC, although the CSD algorithm was more complicated than LZC.},
keywords={},
doi={},
ISSN={},
month={September},}

Copy

TY - JOUR
TI - Highly Efficient Universal Coding with Classifying to Subdictionaries for Text Compression
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1520
EP - 1526
AU - Yasuhiko NAKANO
AU - Hironori YAHAGI
AU - Yoshiyuki OKADA
AU - Shigeru YOSHIDA
PY - 1994
DO -
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E77-A
IS - 9
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - September 1994
AB - We developed a simple, practical, adaptive data compression algorithm of the LZ78 class. According to the Lempel-Ziv greedy parsing, a string boundary is not related to the statistical history modeled by finite-state sources. We have already reported an algorithm classifying data into subdictionaries (CSD), which uses multiple subdictionaries and conditions the current string by using the previous one to obtain a higher compression ratio. In this paper, we present a practical implementation of this method suitable for any kinds of data, and show that CSD is more efficient than the LZC which is the method used by the program compress available on UNIX systems. The CSD compression performance was about 10% better than that of LZC with the practical dictionary size, an 8k-entry dictionary when the test data was from the Calgary Compression Corpus. With hashing, the CSD processing speed became as fast as that of LZC, although the CSD algorithm was more complicated than LZC.
ER -