IEICE global.ieice.org Site

Keyword Search Result

[Keyword] lossless data compression(9hit)

1-9hit

Compression by Substring Enumeration Using Sorted Contingency Tables
Takahiro OTA Hiroyoshi MORITA Akiko MANADA

PAPER-Information Theory

Vol:
E103-A No:6
Page(s):
829-835
This paper proposes two variants of improved Compression by Substring Enumeration (CSE) with a finite alphabet. In previous studies on CSE, an encoder utilizes inequalities which evaluate the number of occurrences of a substring or a minimal forbidden word (MFW) to be encoded. The inequalities are derived from a contingency table including the number of occurrences of a substring or an MFW. Moreover, codeword length of a substring and an MFW grows with the difference between the upper and lower bounds deduced from the inequalities, however the lower bound is not tight. Therefore, we derive a new tight lower bound based on the contingency table and consequently propose a new CSE algorithm using the new inequality. We also propose a new encoding order of substrings and MFWs based on a sorted contingency table such that both its row and column marginal total are sorted in descending order instead of a lexicographical order used in previous studies. We then propose a new CSE algorithm which is the first proposed CSE algorithm using the new encoding order. Experimental results show that compression ratios of all files of the Calgary corpus in the proposed algorithms are better than those of a previous study on CSE with a finite alphabet. Moreover, compression ratios under the second proposed CSE get better than or equal to that under a well-known compressor for 11 files amongst 14 files in the corpus.
A Variable-to-Fixed Length Lossless Source Code Attaining Better Performance than Tunstall Code in Several Criterions
Mitsuharu ARIMURA

PAPER-Information Theory

Vol:
E101-A No:1
Page(s):
249-258
Tunstall code is known as an optimal variable-to-fixed length (VF) lossless source code under the criterion of average coding rate, which is defined as the codeword length divided by the average phrase length. In this paper we define the average coding rate of a VF code as the expectation of the pointwise coding rate defined by the codeword length divided by the phrase length. We call this type of average coding rate the average pointwise coding rate. In this paper, a new VF code is proposed. An incremental parsing tree construction algorithm like the one that builds Tunstall parsing tree is presented. It is proved that this code is optimal under the criterion of the average pointwise coding rate, and that the average pointwise coding rate of this code converges asymptotically to the entropy of the stationary memoryless source emitting the data to be encoded. Moreover, it is proved that the proposed code attains better worst-case coding rate than Tunstall code.
Average Coding Rate of a Multi-Shot Tunstall Code with an Arbitrary Parsing Tree Sequence
Mitsuharu ARIMURA

LETTER-Source Coding and Data Compression

Vol:
E99-A No:12
Page(s):
2281-2285
Average coding rate of a multi-shot Tunstall code, which is a variation of variable-to-fixed length (VF) lossless source codes, for stationary memoryless sources is investigated. A multi-shot VF code parses a given source sequence to variable-length blocks and encodes them to fixed-length codewords. If we consider the situation that the parsing count is fixed, overall multi-shot VF code can be treated as a one-shot VF code. For this setting of Tunstall code, the compression performance is evaluated using two criterions. The first one is the average coding rate which is defined as the codeword length divided by the average block length. The second one is the expectation of the pointwise coding rate. It is proved that both of the above average coding rate converge to the entropy of a stationary memoryless source under the assumption that the geometric mean of the leaf counts of the multi-shot Tunstall parsing trees goes to infinity.
Lossless Data Compression via Substring Enumeration for k-th Order Markov Sources with a Finite Alphabet
Ken-ichi IWATA Mitsuharu ARIMURA

PAPER-Source Coding and Data Compression

Vol:
E99-A No:12
Page(s):
2130-2135
A generalization of compression via substring enumeration (CSE) for k-th order Markov sources with a finite alphabet is proposed, and an upper bound of the codeword length of the proposed method is presented. We analyze the worst case maximum redundancy of CSE for k-th order Markov sources with a finite alphabet. The compression ratio of the proposed method asymptotically converges to the optimal one for k-th order Markov sources with a finite alphabet if the length n of a source string tends to infinity.
Almost Sure Convergence Coding Theorems of One-Shot and Multi-Shot Tunstall Codes for Stationary Memoryless Sources
Mitsuharu ARIMURA

PAPER-Source Coding

Vol:
E98-A No:12
Page(s):
2393-2406
Almost sure convergence coding theorems of one-shot and multi-shot Tunstall codes are proved for stationary memoryless sources. Coding theorem of one-shot Tunstall code is proved in the case that the leaf count of Tunstall tree increases. On the other hand, coding theorem is proved for multi-shot Tunstall code with increasing parsing count, under the assumption that the Tunstall tree grows as the parsing proceeds. In this result, it is clarified that the theorem for the one-shot Tunstall code is not a corollary of the theorem for the multi-shot Tunstall code. In the case of the multi-shot Tunstall code, it can be regarded that the coding theorem is proved for the sequential algorithm such that parsing and coding are processed repeatedly. Cartesian concatenation of trees and geometric mean of the leaf counts of trees are newly introduced, which play crucial roles in the analyses of multi-shot Tunstall code.
Evaluation of Maximum Redundancy of Data Compression via Substring Enumeration for k-th Order Markov Sources
Ken-ichi IWATA Mitsuharu ARIMURA Yuki SHIMA

PAPER-Information Theory

Vol:
E97-A No:8
Page(s):
1754-1760
Dubé and Beaudoin proposed a lossless data compression called compression via substring enumeration (CSE) in 2010. We evaluate an upper bound of the number of bits used by the CSE technique to encode any binary string from an unknown member of a known class of k-th order Markov processes. We compare the worst case maximum redundancy obtained by the CSE technique for any binary string with the least possible value of the worst case maximum redundancy obtained by the best fixed-to-variable length code that satisfies the Kraft inequality.
On the Average Coding Rate of the Tunstall Code for Stationary and Memoryless Sources
Mitsuharu ARIMURA

PAPER-Source Coding

Vol:
E93-A No:11
Page(s):
1904-1911
The coding rate of a one-shot Tunstall code for stationary and memoryless sources is investigated in non-universal situations so that the probability distribution of the source is known to the encoder and the decoder. When studying the variable-to-fixed length code, the average coding rate has been defined as (i) the codeword length divided by the average block length. We define the average coding rate as (ii) the expectation of the pointwise coding rate, and prove that (ii) converges to the same value as (i).
Unequal Error Protection in Ziv-Lempel Coding
Eiji FUJIWARA Masato KITAKAMI

PAPER-Dependable Communication

Vol:
E86-D No:12
Page(s):
2595-2600
Data compression is popularly applied to computer systems and communication systems. Especially, lossless compression is applied to text compression. Since compressed data are very sensitive to errors, several error control methods for data compression using probability model, such as for arithmetic coding, have been proposed. This paper proposes to apply an unequal error protection, or a UEP, scheme to LZ77 coding and LZW coding. This investigates a structure of the compressed data and clarifies a part which is more sensitive to errors than the other by using theoretical analysis and computer simulation. The UEP scheme protects the error-sensitive part from errors more strongly than the others. Computer simulation says that the proposed scheme can recover from errors in the compressed data more effectively than the conventional methods.
Asymptotic Optimality of the Block Sorting Data Compression Algorithm
Mitsuharu ARIMURA Hirosuke YAMAMOTO

PAPER-Source Coding

Vol:
E81-A No:10
Page(s):
2117-2122
In this paper the performance of the Block Sorting algorithm proposed by Burrows and Wheeler is evaluated theoretically. It is proved that the Block Sorting algorithm is asymptotically optimal for stationary ergodic finite order Markov sources. Our proof is based on the facts that symbols with the same Markov state (or context) in an original data sequence are grouped together in the output sequence obtained by Burrows-Wheeler transform, and the codeword length of each group can be bounded by a function described with the frequencies of symbols included in the group.

Keyword Search Result

[Keyword] lossless data compression(9hit)

Compression by Substring Enumeration Using Sorted Contingency Tables

A Variable-to-Fixed Length Lossless Source Code Attaining Better Performance than Tunstall Code in Several Criterions

Average Coding Rate of a Multi-Shot Tunstall Code with an Arbitrary Parsing Tree Sequence

Lossless Data Compression via Substring Enumeration for k-th Order Markov Sources with a Finite Alphabet

Almost Sure Convergence Coding Theorems of One-Shot and Multi-Shot Tunstall Codes for Stationary Memoryless Sources

Evaluation of Maximum Redundancy of Data Compression via Substring Enumeration for k-th Order Markov Sources

On the Average Coding Rate of the Tunstall Code for Stationary and Memoryless Sources

Unequal Error Protection in Ziv-Lempel Coding

Asymptotic Optimality of the Block Sorting Data Compression Algorithm

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles