An Efficient Method for Training Deep Learning Networks Distributed

Chenxu WANG; Yutong LU; Zhiguang CHEN; Junnan LI

doi:10.1587/transinf.2020PAP0007

An Efficient Method for Training Deep Learning Networks Distributed

Chenxu WANG, Yutong LU, Zhiguang CHEN, Junnan LI

Full Text Views

0

Cite this

Summary :

Training deep learning (DL) is a computationally intensive process; as a result, training time can become so long that it impedes the development of DL. High performance computing clusters, especially supercomputers, are equipped with a large amount of computing resources, storage resources, and efficient interconnection ability, which can train DL networks better and faster. In this paper, we propose a method to train DL networks distributed with high efficiency. First, we propose a hierarchical synchronous Stochastic Gradient Descent (SGD) strategy, which can make full use of hardware resources and greatly increase computational efficiency. Second, we present a two-level parameter synchronization scheme which can reduce communication overhead by transmitting parameters of the first layer models in shared memory. Third, we optimize the parallel I/O by making each reader read data as continuously as possible to avoid the high overhead of discontinuous data reading. At last, we integrate the LARS algorithm into our system. The experimental results demonstrate that our approach has tremendous performance advantages relative to unoptimized methods. Compared with the native distributed strategy, our hierarchical synchronous SGD strategy (HSGD) can increase computing efficiency by about 20 times.

Publication: IEICE TRANSACTIONS on Information Vol.E103-D No.12 pp.2444-2456

Publication Date: 2020/12/01

Publicized: 2020/09/07

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2020PAP0007

Type of Manuscript: Special Section PAPER (Special Section on Parallel, Distributed, and Reconfigurable Computing, and Networking)

Category: Fundamentals of Information Systems

Authors

Chenxu WANG
  National University of Defense Technology
Yutong LU
  National Supercomputer Center in Guangzhou,Sun Yat-sen University
Zhiguang CHEN
  National Supercomputer Center in Guangzhou,Sun Yat-sen University
Junnan LI
  National University of Defense Technology

Keyword

deep learning, distributed training, hierarchical synchronous stochastic gradient descent, data-parallelism

Cite this

Copy

Chenxu WANG, Yutong LU, Zhiguang CHEN, Junnan LI, "An Efficient Method for Training Deep Learning Networks Distributed" in IEICE TRANSACTIONS on Information, vol. E103-D, no. 12, pp. 2444-2456, December 2020, doi: 10.1587/transinf.2020PAP0007.
Abstract: Training deep learning (DL) is a computationally intensive process; as a result, training time can become so long that it impedes the development of DL. High performance computing clusters, especially supercomputers, are equipped with a large amount of computing resources, storage resources, and efficient interconnection ability, which can train DL networks better and faster. In this paper, we propose a method to train DL networks distributed with high efficiency. First, we propose a hierarchical synchronous Stochastic Gradient Descent (SGD) strategy, which can make full use of hardware resources and greatly increase computational efficiency. Second, we present a two-level parameter synchronization scheme which can reduce communication overhead by transmitting parameters of the first layer models in shared memory. Third, we optimize the parallel I/O by making each reader read data as continuously as possible to avoid the high overhead of discontinuous data reading. At last, we integrate the LARS algorithm into our system. The experimental results demonstrate that our approach has tremendous performance advantages relative to unoptimized methods. Compared with the native distributed strategy, our hierarchical synchronous SGD strategy (HSGD) can increase computing efficiency by about 20 times.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020PAP0007/_p

Copy

@ARTICLE{e103-d_12_2444,
author={Chenxu WANG, Yutong LU, Zhiguang CHEN, Junnan LI, },
journal={IEICE TRANSACTIONS on Information},
title={An Efficient Method for Training Deep Learning Networks Distributed},
year={2020},
volume={E103-D},
number={12},
pages={2444-2456},
abstract={Training deep learning (DL) is a computationally intensive process; as a result, training time can become so long that it impedes the development of DL. High performance computing clusters, especially supercomputers, are equipped with a large amount of computing resources, storage resources, and efficient interconnection ability, which can train DL networks better and faster. In this paper, we propose a method to train DL networks distributed with high efficiency. First, we propose a hierarchical synchronous Stochastic Gradient Descent (SGD) strategy, which can make full use of hardware resources and greatly increase computational efficiency. Second, we present a two-level parameter synchronization scheme which can reduce communication overhead by transmitting parameters of the first layer models in shared memory. Third, we optimize the parallel I/O by making each reader read data as continuously as possible to avoid the high overhead of discontinuous data reading. At last, we integrate the LARS algorithm into our system. The experimental results demonstrate that our approach has tremendous performance advantages relative to unoptimized methods. Compared with the native distributed strategy, our hierarchical synchronous SGD strategy (HSGD) can increase computing efficiency by about 20 times.},
keywords={},
doi={10.1587/transinf.2020PAP0007},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - An Efficient Method for Training Deep Learning Networks Distributed
T2 - IEICE TRANSACTIONS on Information
SP - 2444
EP - 2456
AU - Chenxu WANG
AU - Yutong LU
AU - Zhiguang CHEN
AU - Junnan LI
PY - 2020
DO - 10.1587/transinf.2020PAP0007
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2020
AB - Training deep learning (DL) is a computationally intensive process; as a result, training time can become so long that it impedes the development of DL. High performance computing clusters, especially supercomputers, are equipped with a large amount of computing resources, storage resources, and efficient interconnection ability, which can train DL networks better and faster. In this paper, we propose a method to train DL networks distributed with high efficiency. First, we propose a hierarchical synchronous Stochastic Gradient Descent (SGD) strategy, which can make full use of hardware resources and greatly increase computational efficiency. Second, we present a two-level parameter synchronization scheme which can reduce communication overhead by transmitting parameters of the first layer models in shared memory. Third, we optimize the parallel I/O by making each reader read data as continuously as possible to avoid the high overhead of discontinuous data reading. At last, we integrate the LARS algorithm into our system. The experimental results demonstrate that our approach has tremendous performance advantages relative to unoptimized methods. Compared with the native distributed strategy, our hierarchical synchronous SGD strategy (HSGD) can increase computing efficiency by about 20 times.
ER -