1-10hit |
Tomohiko UYEMATSU Tetsunao MATSUTA
This paper proposes three new information measures for individual sequences and clarifies their properties. Our new information measures are called as the non-overlapping max-entropy, the overlapping smooth max-entropy, and the non-overlapping smooth max-entropy, respectively. These measures are related to the fixed-length coding of individual sequences. We investigate these measures, and show the following three properties: (1) The non-overlapping max-entropy coincides with the topological entropy. (2) The overlapping smooth max-entropy and the non-overlapping smooth max-entropy coincide with the Ziv-entropy. (3) When an individual sequence is drawn from an ergodic source, the overlapping smooth max-entropy and the non-overlapping smooth max-entropy coincide with the entropy rate of the source. Further, we apply these information measures to the fixed-length coding of individual sequences, and propose some new universal coding schemes which are asymptotically optimum.
Universal Slepian-Wolf coding for parametric general sources is considered. Our main result shows that under mild conditions on the family of sources, there exists a universal decoder that attains asymptotically the same random-coding error exponent as the maximum-likelihood decoder.
Shigeaki KUZUOKA Tomohiko UYEMATSU
This paper investigates the fixed-slope lossy coding of individual sequences and nonstationary sources. We clarify that, for a given individual sequence, the optimal cost attainable by the blockwise lossy encoders is equal to the optimal average cost with respect to the empirical distribution of the given sequence. Moreover, we show that, for a given nonstationary source, the optimal cost attainable by the blockwise encoders is equal to the supremum of the optimal average cost over all the stationary sources in the stationary hull of the given source. In addition, we show that the universal lossy coding algorithm based on Lempel-Ziv 78 code attains the optimal cost for any individual sequence and any nonstationary source.
Akisato KIMURA Tomohiko UYEMATSU Shigeaki KUZUOKA
This paper deals with a universal coding problem for a certain kind of multiterminal source coding system that we call the complementary delivery coding system. In this system, messages from two correlated sources are jointly encoded, and each decoder has access to one of the two messages to enable it to reproduce the other message. Both fixed-to-fixed length and fixed-to-variable length lossless coding schemes are considered. Explicit constructions of universal codes and bounds of the error probabilities are clarified via type-theoretical and graph-theoretical analyses.
This paper deals with the random number generation problem under the framework of a separate coding system for correlated memoryless sources posed and investigated by Slepian and Wolf. Two correlated data sequences with length n are separately encoded to nR1, nR2 bit messages at each location and those are sent to the information processing center where the encoder wish to generate an approximation of the sequence of independent uniformly distributed random variables with length nR3 from two received random messages. The admissible rate region is defined by the set of all the triples (R1,R2,R3) for which the approximation error goes to zero as n tends to infinity. In this paper we examine the asymptotic behavior of the approximation error inside and outside the admissible rate region. We derive an explicit lower bound of the optimal exponent for the approximation error to vanish and show that it can be attained by the universal codes. Furthermore, we derive an explicit lower bound of the optimal exponent for the approximation error to tend to 2 as n goes to infinity outside the admissible rate region.
Shigeaki KUZUOKA Tomohiko UYEMATSU
This paper investigates the fixed-rate and fixed-distortion lossy coding problems of individual sequences subject to the subadditive distortion measure. The fixed-rate and fixed-distortion universal lossy coding schemes based on the complexity of the sequence are proposed. The obtained coding theorems reveal that the optimal distortion (resp. rate) attainable by the fixed-rate (resp. fixed-distortion) lossy coding is equal to the optimal average distortion (resp. rate) with respect to the overlapping empirical distribution of the given sequence. Some connections with the lossy coding problem of ergodic sources are also investigated.
Masayuki GOTOH Toshiyasu MATSUSHIMA Shigeichi HIRASAWA
We shall generalize B. S. Clarke and A. R. Barron 's analysis of the Bayes method for the FSMX sources. The FSMX source considered here is specified by the set of all states and its parameter value. At first, we show the asymptotic codelengths of individual sequences of the Bayes codes for the FSMX sources. Secondly, we show the asymptotic expected codelengths. The Bayesian posterior density and the maximum likelihood estimator satisfy asymptotic normality for the finite ergodic Markov source, and this is the key of our analysis.
Hidetoshi YOKOO Masaharu TAKAHASHI
This paper proposes a new lossless data compression method, which utilizes a context sorting algorithm. Every symbol in the data can be predicted by taking its immediately preceding characters, or context, into account. The context sorting algorithm sorts a set of all the previous contexts to find the most similar context to the current one. It then predicts the next symbol by sorting previous symbol-context pairs in an order of context similarity. The codeword for the next symbol represents the rank of the symbol in this sorted sequence. The compression performance is evaluated both analytically and empirically. Although the proposed method operates character by character, with no probability distribution used to make a prediction, it has comparable compression performance to the best known data compression utilities.
Yasuhiko NAKANO Hironori YAHAGI Yoshiyuki OKADA Shigeru YOSHIDA
We developed a simple, practical, adaptive data compression algorithm of the LZ78 class. According to the Lempel-Ziv greedy parsing, a string boundary is not related to the statistical history modeled by finite-state sources. We have already reported an algorithm classifying data into subdictionaries (CSD), which uses multiple subdictionaries and conditions the current string by using the previous one to obtain a higher compression ratio. In this paper, we present a practical implementation of this method suitable for any kinds of data, and show that CSD is more efficient than the LZC which is the method used by the program compress available on UNIX systems. The CSD compression performance was about 10% better than that of LZC with the practical dictionary size, an 8k-entry dictionary when the test data was from the Calgary Compression Corpus. With hashing, the CSD processing speed became as fast as that of LZC, although the CSD algorithm was more complicated than LZC.
This paper's main objective is to clearly describe the construction of a universal code for minimizing Davisson's minimax redundancy in a range where the true model and stochastic parameters are unknown. Minimax redundancy is defined as the maximum difference between the expected persymbol code length and the per-symbol source entropy in the source range. A universal coding scheme is here formulated in terms of the weight function, i.e., a method is presented for determining a weight function which minimizes the minimax redundancy even when the true model is unknown. It is subsequently shown that the minimax redundancy achieved through the presented coding method is upper-bounded by the minimax redundancy of Rissanen's semi-predictive coding method.