1-10hit |
Koshi SHIMADA Shota SAITO Toshiyasu MATSUSHIMA
The context tree model has the property that the occurrence probability of symbols is determined from a finite past sequence and is a broader class of sources that includes i.i.d. or Markov sources. This paper proposes a non-stationary source with context tree models that change from interval to interval. The Bayes code for this source requires weighting of the posterior probabilities of the context tree models and change points, so the computational complexity of it usually increases to exponential order. Therefore, the challenge is how to reduce the computational complexity. In this paper, we propose a special class of prior probability distribution of context tree models and change points and develop an efficient Bayes coding algorithm by combining two existing Bayes coding algorithms. The algorithm minimizes the Bayes risk function of the proposed source in this paper, and the computational complexity of the proposed algorithm is polynomial order. We investigate the behavior and performance of the proposed algorithm by conducting experiments.
Takahiro OTA Hiroyoshi MORITA Akiko MANADA
The technique of lossless compression via substring enumeration (CSE) is a kind of enumerative code and uses a probabilistic model built from the circular string of an input source for encoding a one-dimensional (1D) source. CSE is applicable to two-dimensional (2D) sources, such as images, by dealing with a line of pixels of a 2D source as a symbol of an extended alphabet. At the initial step of CSE encoding process, we need to output the number of occurrences of all symbols of the extended alphabet, so that the time complexity increases exponentially when the size of source becomes large. To reduce computational time, we can rearrange pixels of a 2D source into a 1D source string along a space-filling curve like a Hilbert curve. However, information on adjacent cells in a 2D source may be lost in the conversion. To reduce the time complexity and compress a 2D source without converting to a 1D source, we propose a new CSE which can encode a 2D source in a block-by-block fashion instead of in a line-by-line fashion. The proposed algorithm uses the flat torus of an input 2D source as a probabilistic model instead of the circular string of the source. Moreover, we prove the asymptotic optimality of the proposed algorithm for 2D general sources.
In both theoretical analysis and practical use for an antidictionary coding algorithm, an important problem is how to encode an antidictionary of an input source. This paper presents a proposal for a compact tree representation of an antidictionary built from a circular string for an input source. We use a technique for encoding a tree in the compression via substring enumeration to encode a tree representation of the antidictionary. Moreover, we propose a new two-pass universal antidictionary coding algorithm by means of the proposal tree representation. We prove that the proposed algorithm is asymptotic optimal for a stationary ergodic source.
Ken-ichi IWATA Mitsuharu ARIMURA
A generalization of compression via substring enumeration (CSE) for k-th order Markov sources with a finite alphabet is proposed, and an upper bound of the codeword length of the proposed method is presented. We analyze the worst case maximum redundancy of CSE for k-th order Markov sources with a finite alphabet. The compression ratio of the proposed method asymptotically converges to the optimal one for k-th order Markov sources with a finite alphabet if the length n of a source string tends to infinity.
Shota SAITO Nozomi MIYA Toshiyasu MATSUSHIMA
This paper considers universal lossless variable-length source coding problem and investigates the Bayes code from viewpoints of the distribution of its codeword lengths. First, we show that the codeword lengths of the Bayes code satisfy the asymptotic normality. This study can be seen as the investigation on the asymptotic shape of the distribution of codeword lengths. Second, we show that the codeword lengths of the Bayes code satisfy the law of the iterated logarithm. This study can be seen as the investigation on the asymptotic end points of the distribution of codeword lengths. Moreover, the overflow probability, which represents the bottom of the distribution of codeword lengths, is studied for the Bayes code. We derive upper and lower bounds of the infimum of a threshold on the overflow probability under the condition that the overflow probability does not exceed ε∈(0,1). We also analyze the necessary and sufficient condition on a threshold for the overflow probability of the Bayes code to approach zero asymptotically.
Ken-ichi IWATA Mitsuharu ARIMURA Yuki SHIMA
Dubé and Beaudoin proposed a lossless data compression called compression via substring enumeration (CSE) in 2010. We evaluate an upper bound of the number of bits used by the CSE technique to encode any binary string from an unknown member of a known class of k-th order Markov processes. We compare the worst case maximum redundancy obtained by the CSE technique for any binary string with the least possible value of the worst case maximum redundancy obtained by the best fixed-to-variable length code that satisfies the Kraft inequality.
Shigeaki KUZUOKA Tomohiko UYEMATSU
This paper investigates some relations among four complexities of sequence over countably infinite alphabet, and shows that two kinds of empirical entropies and the self-entropy rate regarding a Markov source are asymptotically equal and lower bounded by the maximum number of phrases in distinct parsing of the sequence. Some connections with source coding theorems are also investigated.
Mohammad M. RASHID Tsutomu KAWABATA
Prediction of actual symbol probability is crucial for statistical data compression that uses arithmetic coder. Krichevsky-Trofimov (KT) estimator has been a standard predictor and applied in CTW or FWCTW methods. However, KT-estimator performs poorly when non occurring symbols appear. To rectify this we proposed a zero-redundancy estimator, especially with a finite window(Rashid and Kawabata, ISIT2003) for non stationary source. In this paper, we analyze the zero-redundancy estimators in the case of Markovian source and give an asymptotic evaluation of the redundancy. We show that one of the estimators has the per symbol redundancy given by one half of the dimension of positive parameters divided by the window size when the window size is large.
Tomohiko UYEMATSU Shigeaki KUZUOKA
This paper proposes the conditional LZ complexity and analyzes its property. Especially, we show an inequality corresponding to Ziv's inequality concerning a distinct parsing of a pair of sequences. Further, as a byproduct of the result, we show a simple proof of the asymptotical optimality of Ziv's universal source coding algorithm with side information.
Tsutomu KAWABATA Frans M. J. WILLEMS
We propose a variation of the Context Tree Weighting algorithm for tree source modified such that the growth of the context resembles Lempel-Ziv parsing. We analyze this algorithm, give a concise upper bound to the individual redundancy for any tree source, and prove the asymptotic optimality of the data compression rate for any stationary and ergodic source.