The search functionality is under construction.

Keyword Search Result

[Keyword] data compression(80hit)

1-20hit(80hit)

  • Input Data Format for Sparse Matrix in Quantum Annealing Emulator

    Sohei SHIMOMAI  Kei UEDA  Shinji KIMURA  

     
    PAPER-Algorithms and Data Structures

      Pubricized:
    2023/09/25
      Vol:
    E107-A No:3
      Page(s):
    557-565

    Recently, Quantum Annealing (QA) has attracted attention as an efficient algorithm for combinatorial optimization problems. In QA, the input data size becomes large and its reduction is important for accelerating by the hardware emulation since the usable memory size and its bandwidth are limited. The paper proposes the compression method of input sparse matrices for QA emulator. The proposed method uses the sparseness of the coefficient matrix and the reappearance of the same values. An independent table is introduced and data are compressed by the search and registration method of two consecutive data in the value table. The proposed method is applied to Traveling Salesman Problem (TSP) with 32, 64 and 96 cities and Nurse Scheduling Problem (NSP). The proposed method could reduce the amount of data by 1/40 for 96 city TSP and could manage 96 city TSP on the hardware emulator. When applied to NSP, we confirmed the effectiveness of the proposed method by the compression ratio ranging from 1/4 to 1/11.8. The data reduction is also useful for the simulation/emulation performance when using the compressed data directly and 1.9 times faster speed can be found on 96 city TSP than the CSR-based method.

  • An Efficient Bayes Coding Algorithm for Changing Context Tree Model

    Koshi SHIMADA  Shota SAITO  Toshiyasu MATSUSHIMA  

     
    PAPER-Source Coding and Data Compression

      Pubricized:
    2023/08/24
      Vol:
    E107-A No:3
      Page(s):
    448-457

    The context tree model has the property that the occurrence probability of symbols is determined from a finite past sequence and is a broader class of sources that includes i.i.d. or Markov sources. This paper proposes a non-stationary source with context tree models that change from interval to interval. The Bayes code for this source requires weighting of the posterior probabilities of the context tree models and change points, so the computational complexity of it usually increases to exponential order. Therefore, the challenge is how to reduce the computational complexity. In this paper, we propose a special class of prior probability distribution of context tree models and change points and develop an efficient Bayes coding algorithm by combining two existing Bayes coding algorithms. The algorithm minimizes the Bayes risk function of the proposed source in this paper, and the computational complexity of the proposed algorithm is polynomial order. We investigate the behavior and performance of the proposed algorithm by conducting experiments.

  • Properties of k-Bit Delay Decodable Codes

    Kengo HASHIMOTO  Ken-ichi IWATA  

     
    PAPER-Source Coding and Data Compression

      Pubricized:
    2023/09/07
      Vol:
    E107-A No:3
      Page(s):
    417-447

    The class of k-bit delay decodable codes, source codes allowing decoding delay of at most k bits for k≥0, can attain a shorter average codeword length than Huffman codes. This paper discusses the general properties of the class of k-bit delay decodable codes with a finite number of code tables and proves two theorems which enable us to limit the scope of codes to be considered when discussing optimal k-bit delay decodable codes.

  • Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMV

    Siyi HU  Makiko ITO  Takahide YOSHIKAWA  Yuan HE  Hiroshi NAKAMURA  Masaaki KONDO  

     
    PAPER

      Pubricized:
    2023/07/20
      Vol:
    E106-D No:12
      Page(s):
    2015-2025

    Widely adopted by machine learning and graph processing applications nowadays, sparse matrix-Vector multiplication (SpMV) is a very popular algorithm in linear algebra. This is especially the case for fully-connected MLP layers, which dominate many SpMV computations and play a substantial role in diverse services. As a consequence, a large fraction of data center cycles is spent on SpMV kernels. Meanwhile, despite having efficient storage options against sparsity (such as CSR or CSC), SpMV kernels still suffer from the problem of limited memory bandwidth during data transferring because of the memory hierarchy of modern computing systems. In more detail, we find that both integer and floating-point data used in SpMV kernels are handled plainly without any necessary pre-processing. Therefore, we believe bandwidth conservation techniques, such as data compression, may dramatically help SpMV kernels when data is transferred between the main memory and the Last Level Cache (LLC). Furthermore, we also observe that convergence conditions in some typical scientific computation benchmarks (based on SpMV kernels) will not be degraded when adopting lower precision floating-point data. Based on these findings, in this work, we propose a simple yet effective data compression scheme that can be extended to general purpose computing architectures or HPC systems preferably. When it is adopted, a best-case speedup of 1.92x is made. Besides, evaluations with both the CG kernel and the PageRank algorithm indicate that our proposal introduces negligible overhead on both the convergence speed and the accuracy of final results.

  • A Compression Router for Low-Latency Network-on-Chip

    Naoya NIWA  Yoshiya SHIKAMA  Hideharu AMANO  Michihiro KOIBUCHI  

     
    PAPER-Computer System

      Pubricized:
    2022/11/08
      Vol:
    E106-D No:2
      Page(s):
    170-180

    Network-on-Chips (NoCs) are important components for scalable many-core processors. Because the performance of parallel applications is usually sensitive to the latency of NoCs, reducing it is a primary requirement. In this study, a compression router that hides the (de)compression-operation delay is proposed. The compression router (de)compresses the contents of the incoming packet before the switch arbitration is completed, thus shortening the packet length without latency penalty and reducing the network injection-and-ejection latency. Evaluation results show that the compression router improves up to 33% of the parallel application performance (conjugate gradients (CG), fast Fourier transform (FT), integer sort (IS), and traveling salesman problem (TSP)) and 63% of the effective network throughput by 1.8 compression ratio on NoC. The cost is an increase in router area and its energy consumption by 0.22mm2 and 1.6 times compared to the conventional virtual-channel router. Another finding is that off-loading the decompressor onto a network interface decreases the compression-router area by 57% at the expense of the moderate increase in communication latency.

  • Boosting the Performance of Interconnection Networks by Selective Data Compression

    Naoya NIWA  Hideharu AMANO  Michihiro KOIBUCHI  

     
    PAPER

      Pubricized:
    2022/07/12
      Vol:
    E105-D No:12
      Page(s):
    2057-2065

    This study presents a selective data-compression interconnection network to boost its performance. Data compression virtually increases the effective network bandwidth. One drawback of data compression is a long latency to perform (de-)compression operation at a compute node. In terms of the communication latency, we explore the trade-off between the compression latency overhead and the reduced injection latency by shortening the packet length by compression algorithms. As a result, we present to selectively apply a compression technique to a packet. We perform a compression operation to long packets and it is also taken when network congestion is detected at a source compute node. Through a cycle-accurate network simulation, the selective compression method using the above compression algorithms improves by up to 39% the network throughput with a moderate increase in the communication latency of short packets.

  • Lempel-Ziv Factorization in Linear-Time O(1)-Workspace for Constant Alphabets

    Weijun LIU  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2021/08/30
      Vol:
    E104-D No:12
      Page(s):
    2145-2153

    Computing the Lempel-Ziv Factorization (LZ77) of a string is one of the most important problems in computer science. Nowadays, it has been widely used in many applications such as data compression, text indexing and pattern discovery, and already become the heart of many file compressors like gzip and 7zip. In this paper, we show a linear time algorithm called Xone for computing the LZ77, which has the same space requirement with the previous best space requirement for linear time LZ77 factorization called BGone. Xone greatly improves the efficiency of BGone. Experiments show that the two versions of Xone: XoneT and XoneSA are about 27% and 31% faster than BGoneT and BGoneSA, respectively.

  • Extended-Domain Golomb Code and Symmetry of Relative Redundancy

    Ryosuke SUGIURA  Yutaka KAMAMOTO  Takehiro MORIYA  

     
    PAPER-Coding Theory

      Pubricized:
    2021/02/08
      Vol:
    E104-A No:8
      Page(s):
    1033-1042

    This paper presents extended-domain Golomb (XDG) code, an extension of Golomb code for sparse geometric sources as well as a generalization of extended-domain Golomb-Rice (XDGR) code, based on the idea of almost instantaneous fixed-to-variable length (AIFV) codes. Showing that the XDGR encoding can be interpreted as extended usage of the code proposed in the previous works, this paper discusses the following two facts: The proposed XDG code can be constructed as an AIFV code relating to Golomb code as XDGR code does to Rice code; XDG and Golomb codes are symmetric in the sense of relative redundancy. The proposed XDG code can be efficiently used for losslessly compressing geometric sources too sparse for the conventional Golomb and Rice codes. According to the symmetry, its relative redundancy is guaranteed to be as low as Golomb code compressing non-sparse geometric sources. Awing to this fact, the parameter of the proposed XDG code, which is more finely tunable than the conventional XDGR code, can be optimized for given inputs using the conventional techniques. Therefore, it is expected to be more useful for many coding applications that deal with geometric sources at low bit rates.

  • Compression by Substring Enumeration Using Sorted Contingency Tables

    Takahiro OTA  Hiroyoshi MORITA  Akiko MANADA  

     
    PAPER-Information Theory

      Vol:
    E103-A No:6
      Page(s):
    829-835

    This paper proposes two variants of improved Compression by Substring Enumeration (CSE) with a finite alphabet. In previous studies on CSE, an encoder utilizes inequalities which evaluate the number of occurrences of a substring or a minimal forbidden word (MFW) to be encoded. The inequalities are derived from a contingency table including the number of occurrences of a substring or an MFW. Moreover, codeword length of a substring and an MFW grows with the difference between the upper and lower bounds deduced from the inequalities, however the lower bound is not tight. Therefore, we derive a new tight lower bound based on the contingency table and consequently propose a new CSE algorithm using the new inequality. We also propose a new encoding order of substrings and MFWs based on a sorted contingency table such that both its row and column marginal total are sorted in descending order instead of a lexicographical order used in previous studies. We then propose a new CSE algorithm which is the first proposed CSE algorithm using the new encoding order. Experimental results show that compression ratios of all files of the Calgary corpus in the proposed algorithms are better than those of a previous study on CSE with a finite alphabet. Moreover, compression ratios under the second proposed CSE get better than or equal to that under a well-known compressor for 11 files amongst 14 files in the corpus.

  • A Variable-to-Fixed Length Lossless Source Code Attaining Better Performance than Tunstall Code in Several Criterions

    Mitsuharu ARIMURA  

     
    PAPER-Information Theory

      Vol:
    E101-A No:1
      Page(s):
    249-258

    Tunstall code is known as an optimal variable-to-fixed length (VF) lossless source code under the criterion of average coding rate, which is defined as the codeword length divided by the average phrase length. In this paper we define the average coding rate of a VF code as the expectation of the pointwise coding rate defined by the codeword length divided by the phrase length. We call this type of average coding rate the average pointwise coding rate. In this paper, a new VF code is proposed. An incremental parsing tree construction algorithm like the one that builds Tunstall parsing tree is presented. It is proved that this code is optimal under the criterion of the average pointwise coding rate, and that the average pointwise coding rate of this code converges asymptotically to the entropy of the stationary memoryless source emitting the data to be encoded. Moreover, it is proved that the proposed code attains better worst-case coding rate than Tunstall code.

  • A 197mW 70ms-Latency Full-HD 12-Channel Video-Processing SoC in 16nm CMOS for In-Vehicle Information Systems

    Seiji MOCHIZUKI  Katsushige MATSUBARA  Keisuke MATSUMOTO  Chi Lan Phuong NGUYEN  Tetsuya SHIBAYAMA  Kenichi IWATA  Katsuya MIZUMOTO  Takahiro IRITA  Hirotaka HARA  Toshihiro HATTORI  

     
    PAPER

      Vol:
    E100-A No:12
      Page(s):
    2878-2887

    A 197mW 70ms-latency Full-HD 12-channel video-processing SoC for in-vehicle information systems has been implemented in 16nm CMOS. The SoC integrates 17 video processors of 6 types to operate video processing independently of other processing in CPU/GPU. The synchronous scheme between the video processors achieves 70ms low-latency for driver assistance. The optimized implementation of lossy and lossless video-data compression reduces memory access data by half and power consumption by 20%.

  • A Novel Two-Stage Compression Scheme Combining Polar Coding and Linear Prediction Coding for Fronthaul Links in Cloud-RAN

    Fangliao YANG  Kai NIU  Chao DONG  Baoyu TIAN  Zhihui LIU  

     
    PAPER-Fundamental Theories for Communications

      Pubricized:
    2016/11/29
      Vol:
    E100-B No:5
      Page(s):
    691-701

    The transmission on fronthaul links in the cloud radio access network has become a bottleneck with the increasing data rate. In this paper, we propose a novel two-stage compression scheme for fronthaul links. In the first stage, the commonly used techniques like cyclic prefix stripping and sampling rate adaptation are implemented. In the second stage, a structure called linear prediction coding with decision threshold (LPC-DT) is proposed to remove the redundancies of signal. Considering that the linear prediction outputs have large dynamic range, a two-piecewise quantization with optimized decision threshold is applied to enhance the quantization performance. In order to further lower the transmission rate, a multi-level successive structure of lossless polar source coding is proposed to compress the quantization output with low encoding and decoding complexity. Simulation results demonstrate that the proposed scheme with LPC-DT and LPSC offers not only significantly better compression ratios but also more flexibility in bandwidth settings compared with traditional ones.

  • A High Performance FPGA-Based Sorting Accelerator with a Data Compression Mechanism

    Ryohei KOBAYASHI  Kenji KISE  

     
    PAPER-Computer System

      Pubricized:
    2017/01/30
      Vol:
    E100-D No:5
      Page(s):
    1003-1015

    Sorting is an extremely important computation kernel that has been accelerated in a lot of fields such as databases, image processing, and genome analysis. Given that advent of Internet of Things (IoT) era due to mobile technology progressions, the future needs a sorting method that is available on any environment, such as not only high performance systems like servers but also low computational performance machines like embedded systems. In this paper, we present an FPGA-based sorting accelerator combining Sorting Network and Merge Sorter Tree, which is customizable by means of tuning design parameters. The proposed FPGA accelerator sorts data sent from a host PC via the PCIe bus, and sends back the fully sorted data sequence to it. We also present a detailed analytical model that accurately estimates the sorting performance. Due to these characteristics, designers can know how fast a developed sorting hardware is in advance and can implement the best one to fulfill the cost and performance constraints. Our experiments show that the proposed hardware achieves up to 19.5x sorting performance, compared with Intel Core i7-3770K operating at 3.50GHz, when sorting 256M 32-bits integer elements. However, this result is limited because of insufficient memory bandwidth. To overcome this problem, we propose a data compression mechanism and the experimental result shows that the sorting hardware with it achieves almost 90% of the estimated performance, while the hardware without it does about 60%. In order to allow every designer to easily and freely use this accelerator, the RTL source code is released as open-source hardware.

  • Average Coding Rate of a Multi-Shot Tunstall Code with an Arbitrary Parsing Tree Sequence

    Mitsuharu ARIMURA  

     
    LETTER-Source Coding and Data Compression

      Vol:
    E99-A No:12
      Page(s):
    2281-2285

    Average coding rate of a multi-shot Tunstall code, which is a variation of variable-to-fixed length (VF) lossless source codes, for stationary memoryless sources is investigated. A multi-shot VF code parses a given source sequence to variable-length blocks and encodes them to fixed-length codewords. If we consider the situation that the parsing count is fixed, overall multi-shot VF code can be treated as a one-shot VF code. For this setting of Tunstall code, the compression performance is evaluated using two criterions. The first one is the average coding rate which is defined as the codeword length divided by the average block length. The second one is the expectation of the pointwise coding rate. It is proved that both of the above average coding rate converge to the entropy of a stationary memoryless source under the assumption that the geometric mean of the leaf counts of the multi-shot Tunstall parsing trees goes to infinity.

  • Lossless Data Compression via Substring Enumeration for k-th Order Markov Sources with a Finite Alphabet

    Ken-ichi IWATA  Mitsuharu ARIMURA  

     
    PAPER-Source Coding and Data Compression

      Vol:
    E99-A No:12
      Page(s):
    2130-2135

    A generalization of compression via substring enumeration (CSE) for k-th order Markov sources with a finite alphabet is proposed, and an upper bound of the codeword length of the proposed method is presented. We analyze the worst case maximum redundancy of CSE for k-th order Markov sources with a finite alphabet. The compression ratio of the proposed method asymptotically converges to the optimal one for k-th order Markov sources with a finite alphabet if the length n of a source string tends to infinity.

  • Fully Parallelized LZW Decompression for CUDA-Enabled GPUs

    Shunji FUNASAKA  Koji NAKANO  Yasuaki ITO  

     
    PAPER-GPU computing

      Pubricized:
    2016/08/25
      Vol:
    E99-D No:12
      Page(s):
    2986-2994

    The main contribution of this paper is to present a work-optimal parallel algorithm for LZW decompression and to implement it in a CUDA-enabled GPU. Since sequential LZW decompression creates a dictionary table by reading codes in a compressed file one by one, it is not easy to parallelize it. We first present a work-optimal parallel LZW decompression algorithm on the CREW-PRAM (Concurrent-Read Exclusive-Write Parallel Random Access Machine), which is a standard theoretical parallel computing model with a shared memory. We then go on to present an efficient implementation of this parallel algorithm on a GPU. The experimental results show that our GPU implementation performs LZW decompression in 1.15 milliseconds for a gray scale TIFF image with 4096×3072 pixels stored in the global memory of GeForce GTX 980. On the other hand, sequential LZW decompression for the same image stored in the main memory of Intel Core i7 CPU takes 50.1 milliseconds. Thus, our parallel LZW decompression on the global memory of the GPU is 43.6 times faster than a sequential LZW decompression on the main memory of the CPU for this image. To show the applicability of our GPU implementation for LZW decompression, we evaluated the SSD-GPU data loading time for three scenarios. The experimental results show that the scenario using our LZW decompression on the GPU is faster than the others.

  • Reseeding-Oriented Test Power Reduction for Linear-Decompression-Based Test Compression Architectures

    Tian CHEN  Dandan SHEN  Xin YI  Huaguo LIANG  Xiaoqing WEN  Wei WANG  

     
    PAPER-Computer System

      Pubricized:
    2016/07/25
      Vol:
    E99-D No:11
      Page(s):
    2672-2681

    Linear feedback shift register (LFSR) reseeding is an effective method for test data reduction. However, the test patterns generated by LFSR reseeding generally have high toggle rate and thus cause high test power. Therefore, it is feasible to fill X bits in deterministic test cubes with 0 or 1 properly before encoding the seed to reduce toggle rate. However, X-filling will increase the number of specified bits, thus increase the difficulty of seed encoding, what's more, the size of LFSR will increase as well. This paper presents a test frame which takes into consideration both compression ratio and power consumption simultaneously. In the first stage, the proposed reseeding-oriented X-filling proceeds for shift power (shift filling) and capture power (capture filling) reduction. Then, encode the filled test cubes using the proposed Compatible Block Code (CBC). The CBC can X-ize specified bits, namely turning specified bits into X bits, and can resolve the conflict between low-power filling and seed encoding. Experiments performed on ISCAS'89 benchmark circuits show that our scheme attains a compression ratio of 94.1% and reduces capture power by at least 15% and scan-in power by more than 79.5%.

  • A Novel Dictionary-Based Method for Test Data Compression Using Heuristic Algorithm

    Diancheng WU  Jiarui LI  Leiou WANG  Donghui WANG  Chengpeng HAO  

     
    BRIEF PAPER-Semiconductor Materials and Devices

      Vol:
    E99-C No:6
      Page(s):
    730-733

    This paper presents a novel data compression method for testing integrated circuits within the selective dictionary coding framework. Due to the inverse value of dictionary indices made use of for the compatibility analysis with the heuristic algorithm utilized to solve the maximum clique problem, the method can obtain a higher compression ratio than existing ones.

  • Efficient Implementation and Empirical Evaluation of Compression by Substring Enumeration

    Sho KANAI  Hidetoshi YOKOO  Kosumo YAMAZAKI  Hideaki KANEYASU  

     
    PAPER-Information Theory

      Vol:
    E99-A No:2
      Page(s):
    601-611

    This paper gives an array-based practical encoder for the lossless data compression algorithm known as Compression by Substring Enumeration (CSE). The encoder makes use of the relation between CSE and the Burrows-Wheeler transform. We also modify the decoding algorithm to accommodate to the proposed encoder. Thanks to the proposed encoder and decoder, we can apply CSE to long data of more than tens of megabytes. We show compression results obtained when we perform experiments on such long data. The results empirically validate theoretical predictions on CSE.

  • Almost Sure Convergence Coding Theorems of One-Shot and Multi-Shot Tunstall Codes for Stationary Memoryless Sources

    Mitsuharu ARIMURA  

     
    PAPER-Source Coding

      Vol:
    E98-A No:12
      Page(s):
    2393-2406

    Almost sure convergence coding theorems of one-shot and multi-shot Tunstall codes are proved for stationary memoryless sources. Coding theorem of one-shot Tunstall code is proved in the case that the leaf count of Tunstall tree increases. On the other hand, coding theorem is proved for multi-shot Tunstall code with increasing parsing count, under the assumption that the Tunstall tree grows as the parsing proceeds. In this result, it is clarified that the theorem for the one-shot Tunstall code is not a corollary of the theorem for the multi-shot Tunstall code. In the case of the multi-shot Tunstall code, it can be regarded that the coding theorem is proved for the sequential algorithm such that parsing and coding are processed repeatedly. Cartesian concatenation of trees and geometric mean of the leaf counts of trees are newly introduced, which play crucial roles in the analyses of multi-shot Tunstall code.

1-20hit(80hit)