1-2hit |
Kazuhito MATSUDA Kouji KURIHARA Kentaro KAWAKAMI Masafumi YAMAZAKI Fuyuka YAMADA Tsuguchika TABARU Ken YOKOYAMA
Statical causal discovery is an approach to infer the causal relationship between observed variables whose causalities are not revealed. LiNGAM (Linear Non-Gaussian Acyclic Model), an algorithm for causal discovery, can calculate the causal relationship uniquely if the independent components of variables are assumed to be non-Gaussian. However, use-cases of LiNGAM are limited because of its O(d3x) computational complexity, where dx is the number of variables. This paper shows two approaches to accelerate LiNGAM causal discovery: SIMD utilization for LiNGAM's mathematical matrixes operations and MPI parallelization. We evaluate the implementation with the supercomputer Fugaku. Using 96 nodes of Fugaku, our improved version can achieve 17,531 times faster than the original OSS implementation (completed in 17.7 hours).
Kentaro KAWAKAMI Kouji KURIHARA Masafumi YAMAZAKI Takumi HONDA Naoto FUKUMOTO
To accelerate deep learning (DL) processes on the supercomputer Fugaku, the authors have ported and optimized oneDNN for Fugaku's CPU, the Fujitsu A64FX. oneDNN is an open-source DL processing library developed by Intel for the x86_64 architecture. The A64FX CPU is based on the Armv8-A architecture. oneDNN dynamically creates the execution code for the computation kernels, which are implemented at the granularity of x86_64 instructions using Xbyak, the Just-In-Time (JIT) assembler for x86_64 architecture. To port oneDNN to A64FX, it must be rewritten into Armv8-A instructions using Xbyak_aarch64, the JIT assembler for the Armv8-A architecture. This is challenging because the number of steps to be rewritten exceeds several tens of thousands of lines. This study presents the Xbyak_translator_aarch64. Xbyak_translator_aarch64 is a binary translator that at runtime converts dynamically produced executable codes for the x86_64 architecture into executable codes for the Armv8-A architecture. Xbyak_translator_aarch64 eliminates the need to rewrite the source code for porting oneDNN to A64FX and allows us to port oneDNN to A64FX quickly.