The search functionality is under construction.

Author Search Result

[Author] Mitsuharu ARIMURA(10hit)

1-10hit
  • Asymptotic Optimality of the Block Sorting Data Compression Algorithm

    Mitsuharu ARIMURA  Hirosuke YAMAMOTO  

     
    PAPER-Source Coding

      Vol:
    E81-A No:10
      Page(s):
    2117-2122

    In this paper the performance of the Block Sorting algorithm proposed by Burrows and Wheeler is evaluated theoretically. It is proved that the Block Sorting algorithm is asymptotically optimal for stationary ergodic finite order Markov sources. Our proof is based on the facts that symbols with the same Markov state (or context) in an original data sequence are grouped together in the output sequence obtained by Burrows-Wheeler transform, and the codeword length of each group can be bounded by a function described with the frequencies of symbols included in the group.

  • Average Coding Rate of a Multi-Shot Tunstall Code with an Arbitrary Parsing Tree Sequence

    Mitsuharu ARIMURA  

     
    LETTER-Source Coding and Data Compression

      Vol:
    E99-A No:12
      Page(s):
    2281-2285

    Average coding rate of a multi-shot Tunstall code, which is a variation of variable-to-fixed length (VF) lossless source codes, for stationary memoryless sources is investigated. A multi-shot VF code parses a given source sequence to variable-length blocks and encodes them to fixed-length codewords. If we consider the situation that the parsing count is fixed, overall multi-shot VF code can be treated as a one-shot VF code. For this setting of Tunstall code, the compression performance is evaluated using two criterions. The first one is the average coding rate which is defined as the codeword length divided by the average block length. The second one is the expectation of the pointwise coding rate. It is proved that both of the above average coding rate converge to the entropy of a stationary memoryless source under the assumption that the geometric mean of the leaf counts of the multi-shot Tunstall parsing trees goes to infinity.

  • A Characterization of Optimal FF Coding Rate Using a New Optimistically Optimal Code

    Mitsuharu ARIMURA  Hiroki KOGA  Ken-ichi IWATA  

     
    LETTER-Source Coding

      Vol:
    E96-A No:12
      Page(s):
    2443-2446

    In this letter, we first introduce a stronger notion of the optimistic achievable coding rate and discuss a coding theorem. Next, we give a necessary and sufficient condition under which the coding rates of all the optimal FF codes asymptotically converge to a constant.

  • Lossless Data Compression via Substring Enumeration for k-th Order Markov Sources with a Finite Alphabet

    Ken-ichi IWATA  Mitsuharu ARIMURA  

     
    PAPER-Source Coding and Data Compression

      Vol:
    E99-A No:12
      Page(s):
    2130-2135

    A generalization of compression via substring enumeration (CSE) for k-th order Markov sources with a finite alphabet is proposed, and an upper bound of the codeword length of the proposed method is presented. We analyze the worst case maximum redundancy of CSE for k-th order Markov sources with a finite alphabet. The compression ratio of the proposed method asymptotically converges to the optimal one for k-th order Markov sources with a finite alphabet if the length n of a source string tends to infinity.

  • A Variable-to-Fixed Length Lossless Source Code Attaining Better Performance than Tunstall Code in Several Criterions

    Mitsuharu ARIMURA  

     
    PAPER-Information Theory

      Vol:
    E101-A No:1
      Page(s):
    249-258

    Tunstall code is known as an optimal variable-to-fixed length (VF) lossless source code under the criterion of average coding rate, which is defined as the codeword length divided by the average phrase length. In this paper we define the average coding rate of a VF code as the expectation of the pointwise coding rate defined by the codeword length divided by the phrase length. We call this type of average coding rate the average pointwise coding rate. In this paper, a new VF code is proposed. An incremental parsing tree construction algorithm like the one that builds Tunstall parsing tree is presented. It is proved that this code is optimal under the criterion of the average pointwise coding rate, and that the average pointwise coding rate of this code converges asymptotically to the entropy of the stationary memoryless source emitting the data to be encoded. Moreover, it is proved that the proposed code attains better worst-case coding rate than Tunstall code.

  • A Bitplane Tree Weighting Method for Lossless Compression of Gray Scale Images

    Mitsuharu ARIMURA  Hirosuke YAMAMOTO  Suguru ARIMOTO  

     
    LETTER-Source Coding/Channel Capacity

      Vol:
    E80-A No:11
      Page(s):
    2268-2271

    A Bitplane Tree Weighting (BTW) method with arithmetic coding is proposed for lossless coding of gray scale images, which are represented with multiple bitplanes. A bitplane tree, in the same way as the context tree in the CTW method, is used to derive a weighted coding probability distribution for arithmetic coding with the first order Markov model. It is shown that the proposed method can attain better compression ratio than known schemes with MDL criterion. Furthermore, the BTW method can be extended to a high order Markov model by combining the BTW with the CTW or with prediction. The performance of these modified methods is also evaluated. It is shown that they attain better compression ratio than the original BTW method without increasing memory size and coding time, and they can beat the lossless JPEG coding.

  • On the Average Coding Rate of the Tunstall Code for Stationary and Memoryless Sources

    Mitsuharu ARIMURA  

     
    PAPER-Source Coding

      Vol:
    E93-A No:11
      Page(s):
    1904-1911

    The coding rate of a one-shot Tunstall code for stationary and memoryless sources is investigated in non-universal situations so that the probability distribution of the source is known to the encoder and the decoder. When studying the variable-to-fixed length code, the average coding rate has been defined as (i) the codeword length divided by the average block length. We define the average coding rate as (ii) the expectation of the pointwise coding rate, and prove that (ii) converges to the same value as (i).

  • Evaluation of Maximum Redundancy of Data Compression via Substring Enumeration for k-th Order Markov Sources

    Ken-ichi IWATA  Mitsuharu ARIMURA  Yuki SHIMA  

     
    PAPER-Information Theory

      Vol:
    E97-A No:8
      Page(s):
    1754-1760

    Dubé and Beaudoin proposed a lossless data compression called compression via substring enumeration (CSE) in 2010. We evaluate an upper bound of the number of bits used by the CSE technique to encode any binary string from an unknown member of a known class of k-th order Markov processes. We compare the worst case maximum redundancy obtained by the CSE technique for any binary string with the least possible value of the worst case maximum redundancy obtained by the best fixed-to-variable length code that satisfies the Kraft inequality.

  • Almost Sure Convergence Coding Theorems of One-Shot and Multi-Shot Tunstall Codes for Stationary Memoryless Sources

    Mitsuharu ARIMURA  

     
    PAPER-Source Coding

      Vol:
    E98-A No:12
      Page(s):
    2393-2406

    Almost sure convergence coding theorems of one-shot and multi-shot Tunstall codes are proved for stationary memoryless sources. Coding theorem of one-shot Tunstall code is proved in the case that the leaf count of Tunstall tree increases. On the other hand, coding theorem is proved for multi-shot Tunstall code with increasing parsing count, under the assumption that the Tunstall tree grows as the parsing proceeds. In this result, it is clarified that the theorem for the one-shot Tunstall code is not a corollary of the theorem for the multi-shot Tunstall code. In the case of the multi-shot Tunstall code, it can be regarded that the coding theorem is proved for the sequential algorithm such that parsing and coding are processed repeatedly. Cartesian concatenation of trees and geometric mean of the leaf counts of trees are newly introduced, which play crucial roles in the analyses of multi-shot Tunstall code.

  • Redundancy-Optimal FF Codes for a General Source and Its Relationships to the Rate-Optimal FF Codes

    Mitsuharu ARIMURA  Hiroki KOGA  Ken-ichi IWATA  

     
    PAPER-Source Coding

      Vol:
    E96-A No:12
      Page(s):
    2332-2342

    In this paper we consider fixed-to-fixed length (FF) coding of a general source X with vanishing error probability and define two kinds of optimalities with respect to the coding rate and the redundancy, where the redundancy is defined as the difference between the coding rate and the symbolwise ideal codeword length. We first show that the infimum achievable redundancy coincides with the asymptotic width W(X) of the entropy spectrum. Next, we consider the two sets $mCH(X)$ and $mCW(X)$ and investigate relationships between them, where $mCH(X)$ and $mCW(X)$ denote the sets of all the optimal FF codes with respect to the coding rate and the redundancy, respectively. We give two necessary and sufficient conditions corresponding to $mCH(X) subseteq mCW(X)$ and $mCW(X) subseteq mCH(X)$, respectively. We can also show the existence of an FF code that is optimal with respect to both the redundancy and the coding rate.