1-3hit |
Shigeaki KUZUOKA Tomohiko UYEMATSU
This paper investigates some relations among four complexities of sequence over countably infinite alphabet, and shows that two kinds of empirical entropies and the self-entropy rate regarding a Markov source are asymptotically equal and lower bounded by the maximum number of phrases in distinct parsing of the sequence. Some connections with source coding theorems are also investigated.
Junya KIYOHARA Tsutomu KAWABATA
We study Lempel-Ziv-Yokoo algorithm [1, Algorithm 4] for universal data compression. In this paper, we give a simpler implementation of Lempel-Ziv-Yokoo algorithm than the original one [1, Algorithm 4] and show its asymptotic optimality for a stationary ergodic source.
Yasuhiko NAKANO Hironori YAHAGI Yoshiyuki OKADA Shigeru YOSHIDA
We developed a simple, practical, adaptive data compression algorithm of the LZ78 class. According to the Lempel-Ziv greedy parsing, a string boundary is not related to the statistical history modeled by finite-state sources. We have already reported an algorithm classifying data into subdictionaries (CSD), which uses multiple subdictionaries and conditions the current string by using the previous one to obtain a higher compression ratio. In this paper, we present a practical implementation of this method suitable for any kinds of data, and show that CSD is more efficient than the LZC which is the method used by the program compress available on UNIX systems. The CSD compression performance was about 10% better than that of LZC with the practical dictionary size, an 8k-entry dictionary when the test data was from the Calgary Compression Corpus. With hashing, the CSD processing speed became as fast as that of LZC, although the CSD algorithm was more complicated than LZC.