The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] (42807hit)

10521-10540hit(42807hit)

  • Random-Coding Exponential Error Bounds for Channels with Action-Dependent States

    Tetsunao MATSUTA  Tomohiko UYEMATSU  

     
    PAPER-Shannon Theory

      Vol:
    E96-A No:12
      Page(s):
    2324-2331

    Weissman introduced a coding problem for channels with action-dependent states. In this coding problem, there are two encoders and a decoder. An encoder outputs an action that affects the state of the channel. Then, the other encoder outputs a codeword of the message into the channel by using the channel state. The decoder receives a noisy observation of the codeword, and reconstructs the message. In this paper, we show an exponential error bound for channels with action-dependent states based on the random coding argument.

  • On the Dependence of Error Performance of Spatially Coupled LDPC Codes on Their Design Parameters

    Hiroyuki IHARA  Tomoharu SHIBUYA  

     
    LETTER-Coding Theory

      Vol:
    E96-A No:12
      Page(s):
    2447-2451

    Spatially coupled (SC) low-density parity-check (LDPC) codes are defined by bipartite graphs that are obtained by assembling prototype graphs. The combination and connection of prototype graphs are designated by specifying some parameters, and Kudekar et al. showed that BP threshold of the ensemble of SC LDPC codes agrees with MAP threshold of the ensemble of regular LDPC codes when those parameters are grown up so that the code length tends to infinity. When we design SC LDPC codes with practical code length, however, it is not clear how to set those parameters to enhance the performance of SC LDPC codes. In this paper, we provide the result of numerical experiments that suggest the dependence of error performance of SC LDPC codes over BEC on their design parameters.

  • Recursive Construction of (k+1)-Ary Error-Correcting Signature Code for Multiple-Access Adder Channel

    Shan LU  Jun CHENG  Yoichiro WATANABE  

     
    PAPER-Coding Theory

      Vol:
    E96-A No:12
      Page(s):
    2368-2373

    A recursive construction of (k+1)-ary error-correcting signature code is proposed to identify users for MAAC, even in the presence of channel noise. The recursion is originally from a trivial signature code. In the (j-1)-th recursion, from a signature code with minimum distance of 2j-2, a longer and larger signature code with minimum distance of 2j-1 is obtained. The decoding procedure of signature code is given, which consists of error correction and user identification.

  • A Rectangular Weighting Function Approximating Local Phase Error for Designing Equiripple All-Pass IIR Filters

    Taisaku ISHIWATA  Yoshinao SHIRAKI  

     
    PAPER-Signal Processing

      Vol:
    E96-A No:12
      Page(s):
    2398-2404

    In this paper, we propose a rectangular weighting function that can be used in the method of iteratively reweighted least squares (IRWLS) for designing equiripple all-pass IIR filters. The purpose of introducing this weighting function is to improve the convergence performance in the solution of the IRWLS. The height of each rectangle is designed to be equal to the local maximum of each ripple, and the width of each rectangle is designed so that the area of each rectangle becomes equal to the area of each ripple. Here, the ripple is the absolute value of the phase error. We show experimentally that the convergence performance in the solution of the IRWLS can be improved by using the proposed weighting function.

  • Evaluation of an FPGA-Based Heterogeneous Multicore Platform with SIMD/MIMD Custom Accelerators

    Yasuhiro TAKEI  Hasitha Muthumala WAIDYASOORIYA  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2576-2586

    Heterogeneous multi-core architectures with CPUs and accelerators attract many attentions since they can achieve power-efficient computing in various areas from low-power embedded processing to high-performance computing. Since the optimal architecture is different from application to application, finding the most suitable accelerator is very important. In this paper, we propose an FPGA-based heterogeneous multi-core platform with custom accelerators for power-efficient computing. Using the proposed platform, we evaluate several applications and accelerators to identify many key requirements of the applications and properties of the accelerators. Such an evaluation is very important to select and optimize the most suitable accelerator according to the requirements of an application to achieve the best performance.

  • A 5.83pJ/bit/iteration High-Parallel Performance-Aware LDPC Decoder IP Core Design for WiMAX in 65nm CMOS

    Xiongxin ZHAO  Zhixiang CHEN  Xiao PENG  Dajiang ZHOU  Satoshi GOTO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2623-2632

    In this paper, we propose a synthesizable LDPC decoder IP core for the WiMAX system with high parallelism and enhanced error-correcting performance. By taking the advantages of both layered scheduling and fully-parallel architecture, the decoder can fully support multi-mode decoding specified in WiMAX with the parallelism much higher than commonly used partial-parallel layered LDPC decoder architecture. 6-bit quantized messages are split into bit-serial style and 2bit-width serial processing lines work concurrently so that only 3 cycles are required to decode one layer. As a result, 12∼24 cycles are enough to process one iteration for all the code-rates specified in WiMAX. Compared to our previous bit-serial decoder, it doubles the parallelism and solves the message saturation problem of the bit-serial arithmetic, with minor gate count increase. Power synthesis result shows that the proposed decoder achieves 5.83pJ/bit/iteration energy efficiency which is 46.8% improvement compared to state-of-the-art work. Furthermore, an advanced dynamic quantization (ADQ) technique is proposed to enhance the error-correcting performance in layered decoder architecture. With about 2% area overhead, 6-bit ADQ can achieve the error-correcting performance close to 7-bit fixed quantization with improved error floor performance.

  • High Performance NAND Flash Memory System with a Data Buffer

    Jung-Hoon LEE  Bo-Sung JUNG  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2645-2651

    The objective of this research is to design a high-performance NAND flash memory system with a data buffer. The proposed buffer system in the NAND flash memory consists of two parts, i.e., a fully associative temporal buffer for temporal locality and a fully associative spatial buffer for spatial locality. We propose a new operating mechanism for reducing overhead of flash memory, that is, erase and write operations. According to our simulation results, the proposed buffer system can reduce the write and erase operations by about 73% and 79% for spec application respectively, compared with a fully associative buffer with two times more space. Futhermore, the average memory access time can improve by about 60% compared with other large buffer systems.

  • Clique-Based Architectural Synthesis of Flow-Based Microfluidic Biochips

    Trung Anh DINH  Shigeru YAMASHITA  Tsung-Yi HO  Yuko HARA-AZUMI  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2668-2679

    Microfluidic biochips, also referred to “lab-on-a-chip,” have been recently proposed to integrate all the necessary functions for biochemical analyses. This technology starts a new era of biology science, where a combination of electronic and biology is first introduced. There are several types of microfluidic biochips; among them there has been a great interest in flow-based microfluidic biochips, in which the flows of liquid is manipulated using integrated microvalves. By combining several microvalves, more complex resource units such as micropumps, switches and mixers can be built. For efficient execution, the flows of liquid routes in microfluidic biochips need to be scheduled under some resource constraints and routing constraints. The execution time of a biochemical application depends strongly on the binding and scheduling result. The most previously developed binding and scheduling algorithm is based on heuristics, and there has been no method to obtain optimal results. Considering the above, we propose an optimal method by casting the problem to a clique problem. Moreover, this paper also presents some heuristic techniques for computational time reduction. Experiments demonstrate that the proposed method is able to reduce the execution time of biochemical applications by more than 15% compared with the previous approach. Moreover, the proposed heuristic method is able to produce the results at no or little cost of optimality, in significantly shorter time than the optimal method.

  • Retrieval and Localization of Multiple Specific Objects with Hough Voting Based Ranking and A Contrario Decision

    Pradit MITTRAPIYANURUK  Pakorn KAEWTRAKULPONG  

     
    PAPER-Vision

      Vol:
    E96-A No:12
      Page(s):
    2717-2727

    We present an algorithm for simultaneously recognizing and localizing planar textured objects in an image. The algorithm can scale efficiently with respect to a large number of objects added into the database. In contrast to the current state-of-the-art on large scale image search, our algorithm can accurately work with query images consisting of several specific objects and/or multiple instances of the same object. Our proposed algorithm consists of two major steps. The first step is to generate a set of hypotheses that provides information about the identities and the locations of objects in the image. To serve this purpose, we extend Bag-Of-Visual-Word (BOVW) image retrieval by incorporating a re-ranking scheme based on the Hough voting technique. Subsequently, in the second step, we propose a geometric verification algorithm based on A Contrario decision framework to draw out the final detection results from the generated hypotheses. We demonstrate the performance of the algorithm on the scenario of recognizing CD covers with a database consisting of more than ten thousand images of different CD covers. Our algorithm yield to the detection results of more than 90% precision and recall within a few seconds of processing time per image.

  • Improving the Adaptive Steganographic Methods Based on Modulus Function

    Xin LIAO  Qiaoyan WEN  Jie ZHANG  

     
    LETTER-Cryptography and Information Security

      Vol:
    E96-A No:12
      Page(s):
    2731-2734

    This letter improves two adaptive steganographic methods in Refs. [5], [6], which utilize the remainders of two consecutive pixels to record the information of secret data. Through analysis, we point out that they perform mistakenly under some conditions, and the recipient cannot extract the secret data exactly. We correct these by enlarging the adjusting range of the remainders of two consecutive pixels within the block in the embedding procedure. Furthermore, the readjusting phase in Ref. [6] is improved by allowing every two-pixel block to be fully modified, and then the sender can select the best choice that introduces the smallest embedding distortion. Experimental results show that the improved method not only extracts secret data exactly but also reduces the embedding distortion.

  • An Interference-Aware Clustering Based on Genetic Algorithm for Cell Broadcasting Service

    Kyungho JUN  Sekchin CHANG  

     
    LETTER-Communication Theory and Signals

      Vol:
    E96-A No:12
      Page(s):
    2740-2744

    In this letter, we present a novel interference-aware clustering scheme for cell broadcasting service. The proposed approach is based on a genetic algorithm for re-clustering. Using the genetic algorithm, the suggested method efficiently re-clusters the user nodes when the relays fail in receiving the cell broadcasting message from the base station. The simulation results exhibit that the proposed clustering scheme can maintain much higher capacity than the conventional clustering scheme in the cases of relay outage. The re-clustering method based on genetic algorithm also shows lower complexity than the re-clustering approach based on exhaustive search.

  • On the Irreducibility of Certain Shifts of Finite Type

    Tetsuya KOBAYASHI  Akiko MANADA  Takahiro OTA  Hiroyoshi MORITA  

     
    PAPER-Sequence

      Vol:
    E96-A No:12
      Page(s):
    2415-2421

    A shift of finite type (SFT) is a set of all bi-infinite sequences over some alphabet which is characterized by a finite set of forbidden words. It is a typical example of sofic shifts and has been used in media storage area, such as CD's or DVD's. The study of sofic shifts is based on graph theory, and the irreducibility of shifts is an important property to be considered for the study. In this paper, we will provide some sufficient conditions for an SFT to be irreducible from the perspective of the antidictionary of a word and the number of forbidden words. We also present a necessary and sufficient condition for an SFT to be irreducible when the number of forbidden words is one less than the alphabet size.

  • FOREWORD Open Access

    Shigeki NAKAGAWA  

     
    FOREWORD

      Vol:
    E96-C No:12
      Page(s):
    1451-1451
  • Depth Perception Control during Car Vibration by Hidden Images on Monocular Head-Up Display

    Tsuyoshi TASAKI  Akihisa MORIYA  Aira HOTTA  Takashi SASAKI  Haruhiko OKUMURA  

     
    PAPER-Multimedia Pattern Processing

      Vol:
    E96-D No:12
      Page(s):
    2850-2856

    A novel depth perception control method for a monocular head-up display (HUD) in a car has been developed, which is called the dynamic perspective method. The method changes a size and a position of the HUD image such as arrow for depth perception and achieves a depth perception position of 120 [m] within an error of 30% in a simulation. However, it is difficult to achieve an accurate depth perception in the real world because of car vibration. To solve this problem, we focus on a property, namely, that people complement hidden images by previous continuously observed images. We hide the image on the HUD when the car is vibrated very much. We aim to point at the accurate depth position by using see-through HUD images while having users complement the hidden image positions based on the continuous images before car vibration. We developed a car that detects big vibration by an acceleration sensor and is equipped with our monocular HUD. Our new method pointed at the depth position more accurately than the previous method, which was confirmed by t-test.

  • A GPU Implementation of Dynamic Programming for the Optimal Polygon Triangulation

    Yasuaki ITO  Koji NAKANO  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2596-2603

    This paper presents a GPU (Graphics Processing Units) implementation of dynamic programming for the optimal polygon triangulation. Recently, GPUs can be used for general purpose parallel computation. Users can develop parallel programs running on GPUs using programming architecture called CUDA (Compute Unified Device Architecture) provided by NVIDIA. The optimal polygon triangulation problem for a convex polygon is an optimization problem to find a triangulation with minimum total weight. It is known that this problem for a convex n-gon can be solved using the dynamic programming technique in O(n3) time using a work space of size O(n2). In this paper, we propose an efficient parallel implementation of this O(n3)-time algorithm on the GPU. In our implementation, we have used two new ideas to accelerate the dynamic programming. The first idea (adaptive granularity) is to partition the dynamic programming algorithm into many sequential kernel calls of CUDA, and to select the best parameters for the size and the number of blocks for each kernel call. The second idea (sliding and mirroring arrangements) is to arrange the working data for coalesced access of the global memory in the GPU to minimize the memory access overhead. Our implementation using these two ideas solves the optimal polygon triangulation problem for a convex 8192-gon in 5.57 seconds on the NVIDIA GeForce GTX 680, while a conventional CPU implementation runs in 1939.02 seconds. Thus, our GPU implementation attains a speedup factor of 348.02.

  • HiCrypt: A Specialized Translator for Symmetric Block Cipher and GPGPU

    Keisuke IWAI  Naoki NISHIKAWA  Takakazu KUROKAWA  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2575-2586

    Many-core computer systems with GPUs are coming into mainstream use from high-end computing, including supercomputers, to embedded processors. Consequently, the implementation of cryptographic methods on GPGPU is also becoming popular because of such systems' performance. However, many factors affect the performance of GPUs. To cope with this problem, we developed a new translator, HiCrypt, which can generate an optimized GPGPU program written in both of CUDA and OpenCL from a cipher program written in standard C language with directives. Users must annotate only variables and an encoding/decoding function, which are characteristics of cipher programs, with directives. To evaluate the translator, five representative cipher programs are translated into CUDA and OpenCL programs by the translator. Generated programs perform high throughput almost identical to hand optimized programs for all five cipher programs. HiCrypt will contribute to development and evaluate of new and various symmetric block ciphers using GPGPU.

  • Network Interface Architecture with Scalable Low-Latency Message Receiving Mechanism

    Noboru TANABE  Atsushi OHTA  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2536-2544

    Most of scientists except computer scientists do not want to make efforts for performance tuning with rewriting their MPI applications. In addition, the number of processing elements which can be used by them is increasing year by year. On large-scale parallel systems, the number of accumulated messages on a message buffer tends to increase in some of their applications. Since searching message queue in MPI is time-consuming, system side scalable acceleration is needed for those systems. In this paper, a support function named LHS (Limited-length Head Separation) is proposed. Its performance in searching message buffer and hardware cost are evaluated. LHS accelerates searching message buffer by means of switching location to store limited-length heads of messages. It uses the effects such as increasing hit rate of cache on host with partial off-loading to hardware. Searching speed of message buffer when the order of message reception is different from the receiver's expectation is accelerated 14.3 times with LHS on FPGA-based network interface card (NIC) named DIMMnet-2. This absolute performance is 38.5 times higher than that of IBM BlueGene/P although the frequency is 8.5times slower than BlueGene/P. LHS has higher scalability than ALPU in the performance per frequency. Since these results are obtained with partially on loaded linear searching on old Pentium®4, performance gap will increase using state of art CPU. Therefore, LHS is more suitable for larger parallel systems. The discussions for adopting proposed method to state of art processors and systems are also presented.

  • A Fully Optical Ring Network-on-Chip with Static and Dynamic Wavelength Allocation

    Ahmadou Dit Adi CISSE  Michihiro KOIBUCHI  Masato YOSHIMI  Hidetsugu IRIE  Tsutomu YOSHINAGA  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2545-2554

    Silicon photonics Network-on-Chips (NoCs) have emerged as an attractive solution to alleviate the high power consumption of traditional electronic interconnects. In this paper, we propose a fully optical ring NoC that combines static and dynamic wavelength allocation communication mechanisms. A different wavelength-channel is statically allocated to each destination node for light weight communication. Contention of simultaneous communication requests from multiple source nodes to the destination is solved by a token based arbitration for the particular wavelength-channel. For heavy load communication, a multiwavelength-channel is available by requesting it in execution time from source node to a special node that manages dynamic allocation of the shared multiwavelength-channel among all nodes. We combine these static and dynamic communication mechanisms in a same network that introduces selection techniques based on message size and congestion information. Using a photonic NoC simulator based on Phoenixsim, we evaluate our architecture under uniform random, neighbor, and hotspot traffic patterns. Simulation results show that our proposed fully optical ring NoC presents a good performance by utilizing adequate static and dynamic channels based on the selection techniques. We also show that our architecture can reduce by more than half, the energy consumption necessary for arbitration compared to hybrid photonic ring and mesh NoCs. A comparison with several previous works in term of architecture hardware cost shows that our architecture can be an attractive cost-performance efficient interconnection infrastructure for future SoCs and CMPs.

  • Optimal Parallel Algorithms for Computing the Sum, the Prefix-Sums, and the Summed Area Table on the Memory Machine Models

    Koji NAKANO  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2626-2634

    The main contribution of this paper is to show optimal parallel algorithms to compute the sum, the prefix-sums, and the summed area table on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, and the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. We then go on to show that $Omega({nover w}+{nlover p}+llog n)$ time units are necessary to compute the sum. We also present a parallel algorithm that computes the prefix-sums of n numbers in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Finally, we show that the summed area table of size $sqrt{n} imessqrt{n}$ can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Since the computation of the prefix-sums and the summed area table is at least as hard as the sum computation, these parallel algorithms are also optimal.

  • An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters

    Hui ZHAO  Shuqiang YANG  Hua FAN  Zhikun CHEN  Jinghu XU  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2654-2662

    Scheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.

10521-10540hit(42807hit)