The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Kouji KURIHARA(2hit)

1-2hit
  • Accelerating LiNGAM Causal Discovery with Massive Parallel Execution on Supercomputer Fugaku

    Kazuhito MATSUDA  Kouji KURIHARA  Kentaro KAWAKAMI  Masafumi YAMAZAKI  Fuyuka YAMADA  Tsuguchika TABARU  Ken YOKOYAMA  

     
    PAPER

      Pubricized:
    2022/06/09
      Vol:
    E105-D No:12
      Page(s):
    2032-2039

    Statical causal discovery is an approach to infer the causal relationship between observed variables whose causalities are not revealed. LiNGAM (Linear Non-Gaussian Acyclic Model), an algorithm for causal discovery, can calculate the causal relationship uniquely if the independent components of variables are assumed to be non-Gaussian. However, use-cases of LiNGAM are limited because of its O(d3x) computational complexity, where dx is the number of variables. This paper shows two approaches to accelerate LiNGAM causal discovery: SIMD utilization for LiNGAM's mathematical matrixes operations and MPI parallelization. We evaluate the implementation with the supercomputer Fugaku. Using 96 nodes of Fugaku, our improved version can achieve 17,531 times faster than the original OSS implementation (completed in 17.7 hours).

  • A Binary Translator to Accelerate Development of Deep Learning Processing Library for AArch64 CPU Open Access

    Kentaro KAWAKAMI  Kouji KURIHARA  Masafumi YAMAZAKI  Takumi HONDA  Naoto FUKUMOTO  

     
    PAPER

      Pubricized:
    2021/12/03
      Vol:
    E105-C No:6
      Page(s):
    222-231

    To accelerate deep learning (DL) processes on the supercomputer Fugaku, the authors have ported and optimized oneDNN for Fugaku's CPU, the Fujitsu A64FX. oneDNN is an open-source DL processing library developed by Intel for the x86_64 architecture. The A64FX CPU is based on the Armv8-A architecture. oneDNN dynamically creates the execution code for the computation kernels, which are implemented at the granularity of x86_64 instructions using Xbyak, the Just-In-Time (JIT) assembler for x86_64 architecture. To port oneDNN to A64FX, it must be rewritten into Armv8-A instructions using Xbyak_aarch64, the JIT assembler for the Armv8-A architecture. This is challenging because the number of steps to be rewritten exceeds several tens of thousands of lines. This study presents the Xbyak_translator_aarch64. Xbyak_translator_aarch64 is a binary translator that at runtime converts dynamically produced executable codes for the x86_64 architecture into executable codes for the Armv8-A architecture. Xbyak_translator_aarch64 eliminates the need to rewrite the source code for porting oneDNN to A64FX and allows us to port oneDNN to A64FX quickly.