The search functionality is under construction.

Author Search Result

[Author] Chao LU(7hit)

1-7hit
  • A New Method for Chromatic Dispersion Measurement of WDM Components Using Photonic Microwave Technique

    Xiaoke YI  Chao LU  Fang WEI  Wen De ZHONG  Yixin WANG  

     
    PAPER-Measurements Techniques

      Vol:
    E86-C No:7
      Page(s):
    1359-1365

    In the paper, we propose a new method for chromatic dispersion measurement of WDM components in both transmission and reflection, employing photonic microwave technology. The dispersion can be determined by measuring the frequency spectrum range change of the microwave notch filter. The method features the advantages of low-cost and simplicity. Experimental results demonstrate that our setup is capable of measuring relative group delay with better than 1 ps time resolution and the measurement results show a good agreement with that measured by the conventional phase-shift technique.

  • Efficient and Precise Profiling, Modeling and Management on Power and Performance for Power Constrained HPC Systems

    Yuan HE  Yasutaka WADA  Wenchao LUO  Ryuichi SAKAMOTO  Guanqin PAN  Thang CAO  Masaaki KONDO  

     
    PAPER

      Pubricized:
    2020/12/01
      Vol:
    E104-C No:6
      Page(s):
    237-246

    Due to the slowdown of Moore's Law, power limitation has been one of the most critical issues for current and future HPC systems. To more efficiently utilize HPC systems when power budgets or deadlines are given, it is very desirable to accurately estimate the performance or power consumption of applications before conducting their tuned production runs on any specific systems. In order to ease such estimations, we showcase a straight-forward and yet effective method, based on the enhanced power management framework and DSL we developed, to help HPC users to clarify the performance and power relationships of their applications. This method demonstrates an easy process of profiling, modeling and management on both performance and power of HPC systems and applications. In our evaluations, only a few (up to 3) profiled runs are necessary before very precise models of HPC applications can be obtained through this method (and algorithm), which has dramatically improved the efficiency of and lowered the difficulty in utilizing HPC systems under limited power budgets.

  • Hybrid Message-Passing Algorithm and Architecture for Decoding Cyclic Non-binary LDPC Codes

    Yichao LU  Gang HE  Guifen TIAN  Satoshi GOTO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2652-2659

    Recently, non-binary low-density parity-check (NB-LDPC) codes starts to show their superiority in achieving significant coding gains when moderate codeword lengths are adopted. However, the overwhelming decoding complexity keeps NB-LDPC codes from being widely employed in modern communication devices. This paper proposes a hybrid message-passing decoding algorithm which consumes very low computational complexity. It achieves competitive error performance compared with conventional Min-max algorithm. Simulation result on a (255,174) cyclic code shows that this algorithm obtains at least 0.5dB coding gain over other state-of-the-art low-complexity NB-LDPC decoding algorithms. A partial-parallel NB-LDPC decoder architecture for cyclic NB-LDPC codes is also developed based on this algorithm. Optimization schemes are employed to cut off hard decision symbols in RAMs and also to store only part of the reliability messages. In addition, the variable node units are redesigned especially for the proposed algorithm. Synthesis results demonstrate that about 24.3% gates and 12% memories can be saved over previous works.

  • Block Randomized Singular Value Decomposition on GPUs

    Yuechao LU  Yasuyuki MATSUSHITA  Fumihiko INO  

     
    PAPER-Dependable Computing

      Pubricized:
    2020/06/08
      Vol:
    E103-D No:9
      Page(s):
    1949-1959

    Fast computation of singular value decomposition (SVD) is of great interest in various machine learning tasks. Recently, SVD methods based on randomized linear algebra have shown significant speedup in this regime. For processing large-scale data, computing systems with accelerators like GPUs have become the mainstream approach. In those systems, access to the input data dominates the overall process time; therefore, it is needed to design an out-of-core algorithm to dispatch the computation into accelerators. This paper proposes an accurate two-pass randomized SVD, named block randomized SVD (BRSVD), designed for matrices with a slow-decay singular spectrum that is often observed in image data. BRSVD fully utilizes the power of modern computing system architectures and efficiently processes large-scale data in a parallel and out-of-core fashion. Our experiments show that BRSVD effectively moves the performance bottleneck from data transfer to computation, so that outperforms existing randomized SVD methods in terms of speed with retaining similar accuracy.

  • Quantifying Resiliency of Virtualized System with Software Rejuvenation

    Hiroyuki OKAMURA  Jungang GUAN  Chao LUO  Tadashi DOHI  

     
    PAPER

      Vol:
    E98-A No:10
      Page(s):
    2051-2059

    This paper considers how to evaluate the resiliency for virtualized system with software rejuvenation. The software rejuvenation is a proactive technique to prevent the failure caused by aging phenomenon such as resource exhaustion. In particular, according to Gohsh et al. (2010), we compute a quantitative criterion to evaluate resiliency of system by using continuous-time Markov chains (CTMC). In addition, in order to convert general state-based models to CTMCs, we employ PH (phase-type) expansion technique. In numerical examples, we investigate the resiliency of virtualized system with software rejuvenation under two different rejuvenation policies.

  • Cache-Aware GPU Optimization for Out-of-Core Cone Beam CT Reconstruction of High-Resolution Volumes

    Yuechao LU  Fumihiko INO  Kenichi HAGIHARA  

     
    PAPER-Computer System

      Pubricized:
    2016/09/05
      Vol:
    E99-D No:12
      Page(s):
    3060-3071

    This paper proposes a cache-aware optimization method to accelerate out-of-core cone beam computed tomography reconstruction on a graphics processing unit (GPU) device. Our proposed method extends a previous method by increasing the cache hit rate so as to speed up the reconstruction of high-resolution volumes that exceed the capacity of device memory. More specifically, our approach accelerates the well-known Feldkamp-Davis-Kress algorithm by utilizing the following three strategies: (1) a loop organization strategy that identifies the best tradeoff point between the cache hit rate and the number of off-chip memory accesses; (2) a data structure that exploits high locality within a layered texture; and (3) a fully pipelined strategy for hiding file input/output (I/O) time with GPU execution and data transfer times. We implement our proposed method on NVIDIA's latest Maxwell architecture and provide tuning guidelines for adjusting the execution parameters, which include the granularity and shape of thread blocks as well as the granularity of I/O data to be streamed through the pipeline, which maximizes reconstruction performance. Our experimental results show that it took less than three minutes to reconstruct a 20483-voxel volume from 1200 20482-pixel projection images on a single GPU; this translates to a speedup of approximately 1.47 as compared to the previous method.

  • Dynamic Check Message Majority-Logic Decoding Algorithm for Non-binary LDPC Codes

    Yichao LU  Xiao PENG  Guifen TIAN  Satoshi GOTO  

     
    PAPER

      Vol:
    E97-A No:6
      Page(s):
    1356-1364

    Majority-logic algorithms are devised for decoding non-binary LDPC codes in order to reduce computational complexity. However, compared with conventional belief propagation algorithms, majority-logic algorithms suffer from severe bit error performance degradation. This paper presents a low-complexity reliability-based algorithm aiming at improving error correcting ability of majority-logic algorithms. Reliability measures for check nodes are novelly introduced to realize mutual update between variable message and check message, and hence more efficient reliability propagation can be achieved, similar to belief-propagation algorithm. Simulation results on NB-LDPC codes with different characteristics demonstrate that our algorithm can reduce the bit error ratio by more than one order of magnitude and the coding gain enhancement over ISRB-MLGD can reach 0.2-2.0dB, compared with both the ISRB-MLGD and IISRB-MLGD algorithms. Moreover, simulations on typical LDPC codes show that the computational complexity of the proposed algorithm is closely equivalent to ISRB-MLGD algorithm, and is less than 10% of Min-max algorithm. As a result, the proposed algorithm achieves a more efficient trade-off between decoding computational complexity and error performance.