1-4hit |
Takahiro OTA Hiroyoshi MORITA Akiko MANADA
The technique of lossless compression via substring enumeration (CSE) is a kind of enumerative code and uses a probabilistic model built from the circular string of an input source for encoding a one-dimensional (1D) source. CSE is applicable to two-dimensional (2D) sources, such as images, by dealing with a line of pixels of a 2D source as a symbol of an extended alphabet. At the initial step of CSE encoding process, we need to output the number of occurrences of all symbols of the extended alphabet, so that the time complexity increases exponentially when the size of source becomes large. To reduce computational time, we can rearrange pixels of a 2D source into a 1D source string along a space-filling curve like a Hilbert curve. However, information on adjacent cells in a 2D source may be lost in the conversion. To reduce the time complexity and compress a 2D source without converting to a 1D source, we propose a new CSE which can encode a 2D source in a block-by-block fashion instead of in a line-by-line fashion. The proposed algorithm uses the flat torus of an input 2D source as a probabilistic model instead of the circular string of the source. Moreover, we prove the asymptotic optimality of the proposed algorithm for 2D general sources.
In both theoretical analysis and practical use for an antidictionary coding algorithm, an important problem is how to encode an antidictionary of an input source. This paper presents a proposal for a compact tree representation of an antidictionary built from a circular string for an input source. We use a technique for encoding a tree in the compression via substring enumeration to encode a tree representation of the antidictionary. Moreover, we propose a new two-pass universal antidictionary coding algorithm by means of the proposal tree representation. We prove that the proposed algorithm is asymptotic optimal for a stationary ergodic source.
Ken-ichi IWATA Mitsuharu ARIMURA
A generalization of compression via substring enumeration (CSE) for k-th order Markov sources with a finite alphabet is proposed, and an upper bound of the codeword length of the proposed method is presented. We analyze the worst case maximum redundancy of CSE for k-th order Markov sources with a finite alphabet. The compression ratio of the proposed method asymptotically converges to the optimal one for k-th order Markov sources with a finite alphabet if the length n of a source string tends to infinity.
Ken-ichi IWATA Mitsuharu ARIMURA Yuki SHIMA
Dubé and Beaudoin proposed a lossless data compression called compression via substring enumeration (CSE) in 2010. We evaluate an upper bound of the number of bits used by the CSE technique to encode any binary string from an unknown member of a known class of k-th order Markov processes. We compare the worst case maximum redundancy obtained by the CSE technique for any binary string with the least possible value of the worst case maximum redundancy obtained by the best fixed-to-variable length code that satisfies the Kraft inequality.