The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] PU(3318hit)

201-220hit(3318hit)

  • LTL Model Checking for Register Pushdown Systems

    Ryoma SENDA  Yoshiaki TAKATA  Hiroyuki SEKI  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2021/08/31
      Vol:
    E104-D No:12
      Page(s):
    2131-2144

    A pushdown system (PDS) is known as an abstract model of recursive programs. For PDS, model checking methods have been studied and applied to various software verification such as interprocedural data flow analysis and malware detection. However, PDS cannot manipulate data values from an infinite domain. A register PDS (RPDS) is an extension of PDS by adding registers to deal with data values in a restricted way. This paper proposes algorithms for LTL model checking problems for RPDS with simple and regular valuations, which are labelings of atomic propositions to configurations with reasonable restriction. First, we introduce RPDS and related models, and then define the LTL model checking problems for RPDS. Second, we give algorithms for solving these problems and also show that the problems are EXPTIME-complete. As practical examples, we show solutions of a malware detection and an XML schema checking in the proposed framework.

  • An FPGA-Based Optimizer Design for Distributed Deep Learning with Multiple GPUs

    Tomoya ITSUBO  Michihiro KOIBUCHI  Hideharu AMANO  Hiroki MATSUTANI  

     
    PAPER

      Pubricized:
    2021/07/01
      Vol:
    E104-D No:12
      Page(s):
    2057-2067

    Since deep learning workloads perform a large number of matrix operations on training data, GPUs (Graphics Processing Units) are efficient especially for the training phase. A cluster of computers each of which equips multiple GPUs can significantly accelerate the deep learning workloads. More specifically, a back-propagation algorithm following a gradient descent approach is used for the training. Although the gradient computation is still a major bottleneck of the training, gradient aggregation and optimization impose both communication and computation overheads, which should also be reduced for further shortening the training time. To address this issue, in this paper, multiple GPUs are interconnected with a PCI Express (PCIe) over 10Gbit Ethernet (10GbE) technology. Since these remote GPUs are interconnected with network switches, gradient aggregation and optimizers (e.g., SGD, AdaGrad, Adam, and SMORMS3) are offloaded to FPGA-based 10GbE switches between remote GPUs; thus, the gradient aggregation and parameter optimization are completed in the network. The proposed FPGA-based 10GbE switches with the four optimizers are implemented on NetFPGA-SUME board. Their resource utilizations are increased by PEs for the optimizers, and they consume up to 56% of the resources. Evaluation results using four remote GPUs connected via the proposed FPGA-based switch demonstrate that these optimizers are accelerated by up to 3.0x and 1.25x compared to CPU and GPU implementations, respectively. Also, the gradient aggregation throughput by the FPGA-based switch achieves up to 98.3% of the 10GbE line rate.

  • Fogcached: A DRAM/NVMM Hybrid KVS Server for Edge Computing

    Kouki OZAWA  Takahiro HIROFUCHI  Ryousei TAKANO  Midori SUGAYA  

     
    PAPER

      Pubricized:
    2021/08/18
      Vol:
    E104-D No:12
      Page(s):
    2089-2096

    With the development of IoT devices and sensors, edge computing is leading towards new services like autonomous cars and smart cities. Low-latency data access is an essential requirement for such services, and a large-capacity cache server is needed on the edge side. However, it is not realistic to build a large capacity cache server using only DRAM because DRAM is expensive and consumes substantially large power. A hybrid main memory system is promising to address this issue, in which main memory consists of DRAM and non-volatile memory. It achieves a large capacity of main memory within the power supply capabilities of current servers. In this paper, we propose Fogcached, that is, the extension of a widely-used KVS (Key-Value Store) server program (i.e., Memcached) to exploit both DRAM and non-volatile main memory (NVMM). We used Intel Optane DCPM as NVMM for its prototype. Fogcached implements a Dual-LRU (Least Recently Used) mechanism that seamlessly extends the memory management of Memcached to hybrid main memory. Fogcached reuses the segmented LRU of Memcached to manage cached objects in DRAM, adds another segmented LRU for those in DCPM and bridges the LRUs by a mechanism to automatically replace cached objects between DRAM and DCPM. Cached objects are autonomously moved between the two memory devices according to their access frequencies. Through experiments, we confirmed that Fogcached improved the peak value of a latency distribution by about 40% compared to Memcached.

  • Fogcached-Ros: DRAM/NVMM Hybrid KVS Server with ROS Based Extension for ROS Application and SLAM Evaluation

    Koki HIGASHI  Yoichi ISHIWATA  Takeshi OHKAWA  Midori SUGAYA  

     
    PAPER

      Pubricized:
    2021/08/18
      Vol:
    E104-D No:12
      Page(s):
    2097-2108

    Recently, edge servers located closer than the cloud have become expected for the purpose of processing the large amount of sensor data generated by IoT devices such as robots. Research has been proposed to improve responsiveness as a cache server by applying KVS (Key-Value Store) to the edge as a method for obtaining high responsiveness. Above all, a hybrid-KVS server that uses both DRAM and NVMM (Non-Volatile Main Memory) devices is expected to achieve both responsiveness and reliability. However, its effectiveness has not been verified in actual applications, and its effectiveness is not clear in terms of its relationship with the cloud. The purpose of this study is to evaluate the effectiveness of hybrid-KVS servers using the SLAM (Simultaneous Localization and Mapping), which is a widely used application in robots and autonomous driving. It is appropriate for applying an edge server and requires responsiveness and reliability. SLAM is generally implemented on ROS (Robot Operating System) middleware and communicates with the server through ROS middleware. However, if we use hybrid-KVS on the edge with the SLAM and ROS, the communication could not be achieved since the message objects are different from the format expected by KVS. Therefore, in this research, we propose a mechanism to apply the ROS memory object to hybrid-KVS by designing and implementing the data serialization function to extend ROS. As a result of the proposed fogcached-ros and evaluation, we confirm the effectiveness of low API overhead, support for data used by SLAM, and low latency difference between the edge and cloud.

  • Experimental Demonstration of a Hard-Type Oscillator Using a Resonant Tunneling Diode and Its Comparison with a Soft-Type Oscillator

    Koichi MAEZAWA  Tatsuo ITO  Masayuki MORI  

     
    BRIEF PAPER-Semiconductor Materials and Devices

      Pubricized:
    2021/06/07
      Vol:
    E104-C No:12
      Page(s):
    685-688

    A hard-type oscillator is defined as an oscillator having stable fixed points within a stable limit cycle. For resonant tunneling diode (RTD) oscillators, using hard-type configuration has a significant advantage that it can suppress spurious oscillations in a bias line. We have fabricated hard-type oscillators using an InGaAs-based RTD, and demonstrated a proper operation. Furthermore, the oscillating properties have been compared with a soft-type oscillator having a same parameters. It has been demonstrated that the same level of the phase noise can be obtained with a much smaller power consumption of approximately 1/20.

  • Time-Optimal Self-Stabilizing Leader Election on Rings in Population Protocols Open Access

    Daisuke YOKOTA  Yuichi SUDO  Toshimitsu MASUZAWA  

     
    PAPER-Algorithms and Data Structures

      Pubricized:
    2021/06/03
      Vol:
    E104-A No:12
      Page(s):
    1675-1684

    We propose a self-stabilizing leader election protocol on directed rings in the model of population protocols. Given an upper bound N on the population size n, the proposed protocol elects a unique leader within O(nN) expected steps starting from any configuration and uses O(N) states. This convergence time is optimal if a given upper bound N is asymptotically tight, i.e., N=O(n).

  • Joint Wireless and Computational Resource Allocation Based on Hierarchical Game for Mobile Edge Computing

    Weiwei XIA  Zhuorui LAN  Lianfeng SHEN  

     
    PAPER-Network

      Pubricized:
    2021/05/14
      Vol:
    E104-B No:11
      Page(s):
    1395-1407

    In this paper, we propose a hierarchical Stackelberg game based resource allocation algorithm (HGRAA) to jointly allocate the wireless and computational resources of a mobile edge computing (MEC) system. The proposed HGRAA is composed of two levels: the lower-level evolutionary game (LEG) minimizes the cost of mobile terminals (MTs), and the upper-level exact potential game (UEPG) maximizes the utility of MEC servers. At the lower-level, the MTs are divided into delay-sensitive MTs (DSMTs) and non-delay-sensitive MTs (NDSMTs) according to their different quality of service (QoS) requirements. The competition among DSMTs and NDSMTs in different service areas to share the limited available wireless and computational resources is formulated as a dynamic evolutionary game. The dynamic replicator is applied to obtain the evolutionary equilibrium so as to minimize the costs imposed on MTs. At the upper level, the exact potential game is formulated to solve the resource sharing problem among MEC servers and the resource sharing problem is transferred to nonlinear complementarity. The existence of Nash equilibrium (NE) is proved and is obtained through the Karush-Kuhn-Tucker (KKT) condition. Simulations illustrate that substantial performance improvements such as average utility and the resource utilization of MEC servers can be achieved by applying the proposed HGRAA. Moreover, the cost of MTs is significantly lower than other existing algorithms with the increasing size of input data, and the QoS requirements of different kinds of MTs are well guaranteed in terms of average delay and transmission data rate.

  • MPTCP-meLearning: A Multi-Expert Learning-Based MPTCP Extension to Enhance Multipathing Robustness against Network Attacks

    Yuanlong CAO  Ruiwen JI  Lejun JI  Xun SHAO  Gang LEI  Hao WANG  

     
    PAPER

      Pubricized:
    2021/07/08
      Vol:
    E104-D No:11
      Page(s):
    1795-1804

    With multiple network interfaces are being widely equipped in modern mobile devices, the Multipath TCP (MPTCP) is increasingly becoming the preferred transport technique since it can uses multiple network interfaces simultaneously to spread the data across multiple network paths for throughput improvement. However, the MPTCP performance can be seriously affected by the use of a poor-performing path in multipath transmission, especially in the presence of network attacks, in which an MPTCP path would abrupt and frequent become underperforming caused by attacks. In this paper, we propose a multi-expert Learning-based MPTCP variant, called MPTCP-meLearning, to enhance MPTCP performance robustness against network attacks. MPTCP-meLearning introduces a new kind of predictor to possibly achieve better quality prediction accuracy for each of multiple paths, by leveraging a group of representative formula-based predictors. MPTCP-meLearning includes a novel mechanism to intelligently manage multiple paths in order to possibly mitigate the out-of-order reception and receive buffer blocking problems. Experimental results demonstrate that MPTCP-meLearning can achieve better transmission performance and quality of service than the baseline MPTCP scheme.

  • An Efficient Public Verifiable Certificateless Multi-Receiver Signcryption Scheme for IoT Environments

    Dae-Hwi LEE  Won-Bin KIM  Deahee SEO  Im-Yeong LEE  

     
    PAPER

      Pubricized:
    2021/07/14
      Vol:
    E104-D No:11
      Page(s):
    1869-1879

    Lightweight cryptographic systems for services delivered by the recently developed Internet of Things (IoT) are being continuously researched. However, existing Public Key Infrastructure (PKI)-based cryptographic algorithms are difficult to apply to IoT services delivered using lightweight devices. Therefore, encryption, authentication, and signature systems based on Certificateless Public Key Cryptography (CL-PKC), which are lightweight because they do not use the certificates of existing PKI-based cryptographic algorithms, are being studied. Of the various public key cryptosystems, signcryption is efficient, and ensures integrity and confidentiality. Recently, CL-based signcryption (CL-SC) schemes have been intensively studied, and a multi-receiver signcryption (MRSC) protocol for environments with multiple receivers, i.e., not involving end-to-end communication, has been proposed. However, when using signcryption, confidentiality and integrity may be violated by public key replacement attacks. In this paper, we develop an efficient CL-based MRSC (CL-MRSC) scheme using CL-PKC for IoT environments. Existing signcryption schemes do not offer public verifiability, which is required if digital signatures are used, because only the receiver can verify the validity of the message; sender authenticity is not guaranteed by a third party. Therefore, we propose a CL-MRSC scheme in which communication participants (such as the gateways through which messages are transmitted) can efficiently and publicly verify the validity of encrypted messages.

  • A Hybrid Retinex-Based Algorithm for UAV-Taken Image Enhancement

    Xinran LIU  Zhongju WANG  Long WANG  Chao HUANG  Xiong LUO  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2021/08/05
      Vol:
    E104-D No:11
      Page(s):
    2024-2027

    A hybrid Retinex-based image enhancement algorithm is proposed to improve the quality of images captured by unmanned aerial vehicles (UAVs) in this paper. Hyperparameters of the employed multi-scale Retinex with chromaticity preservation (MSRCP) model are automatically tuned via a two-phase evolutionary computing algorithm. In the two-phase optimization algorithm, the Rao-2 algorithm is applied to performing the global search and a solution is obtained by maximizing the objective function. Next, the Nelder-Mead simplex method is used to improve the solution via local search. Real UAV-taken images of bad quality are collected to verify the performance of the proposed algorithm. Meanwhile, four famous image enhancement algorithms, Multi-Scale Retinex, Multi-Scale Retinex with Color Restoration, Automated Multi-Scale Retinex, and MSRCP are utilized as benchmarking methods. Meanwhile, two commonly used evolutionary computing algorithms, particle swarm optimization and flower pollination algorithm, are considered to verify the efficiency of the proposed method in tuning parameters of the MSRCP model. Experimental results demonstrate that the proposed method achieves the best performance compared with benchmarks and thus the proposed method is applicable for real UAV-based applications.

  • An Optimistic Synchronization Based Optimal Server Selection Scheme for Delay Sensitive Communication Services Open Access

    Akio KAWABATA  Bijoy Chand CHATTERJEE  Eiji OKI  

     
    PAPER-Network System

      Pubricized:
    2021/04/09
      Vol:
    E104-B No:10
      Page(s):
    1277-1287

    In distributed processing for communication services, a proper server selection scheme is required to reduce delay by ensuring the event occurrence order. Although a conservative synchronization algorithm (CSA) has been used to achieve this goal, an optimistic synchronization algorithm (OSA) can be feasible for synchronizing distributed systems. In comparison with CSA, which reproduces events in occurrence order before processing applications, OSA can be feasible to realize low delay communication as the processing events arrive sequentially. This paper proposes an optimal server selection scheme that uses OSA for distributed processing systems to minimize end-to-end delay under the condition that maximum status holding time is limited. In other words, the end-to-end delay is minimized based on the allowed rollback time, which is given according to the application designing aspects and availability of computing resources. Numerical results indicate that the proposed scheme reduces the delay compared to the conventional scheme.

  • Highly Efficient Sensing Methods of Primary Radio Transmission Systems toward Dynamic Spectrum Sharing-Based 5G Systems Open Access

    Atomu SAKAI  Keiichi MIZUTANI  Takeshi MATSUMURA  Hiroshi HARADA  

     
    PAPER

      Pubricized:
    2021/03/30
      Vol:
    E104-B No:10
      Page(s):
    1227-1236

    The Dynamic Spectrum Sharing (DSS) system, which uses the frequency band allocated to incumbent systems (i.e., primary users) has attracted attention to expand the available bandwidth of the fifth-generation mobile communication (5G) systems in the sub-6GHz band. In Japan, a DSS system in the 2.3GHz band, in which the ARIB STD-B57-based Field Pickup Unit (FPU) is assigned as an incumbent system, has been studied for the secondary use of 5G systems. In this case, the incumbent FPU is a mobile system, and thus, the DSS system needs to use not only a spectrum sharing database but also radio sensors to detect primary signals with high accuracy, protect the primary system from interference, and achieve more secure spectrum sharing. This paper proposes highly efficient sensing methods for detecting the ARIB STD-B57-based FPU signals in the 2.3GHz band. The proposed methods can be applied to two types of the FPU signal; those that apply the Continuous Pilot (CP) mode pilot and the Scattered Pilot (SP) mode pilot. Moreover, we apply a sample addition method and a symbol addition method for improving the detection performance. Even in the 3GPP EVA channel environment, the proposed method can, with a probability of more than 99%, detect the FPU signal with an SNR of -10dB. In addition, we propose a quantized reference signal for reducing the implementation complexity of the complex cross-correlation circuit. The proposed reference signal can reduce the number of quantization bits of the reference signal to 2 bits for in-phase and 3 bits for orthogonal components.

  • A Study on Highly Efficient Dual-Input Power Amplifiers for Large PAPR Signals Open Access

    Atsushi YAMAOKA  Thomas M. HONE  Yoshimasa EGASHIRA  Keiichi YAMAGUCHI  

     
    INVITED PAPER

      Pubricized:
    2021/03/23
      Vol:
    E104-C No:10
      Page(s):
    506-515

    With the advent of 5G and external pressure to reduce greenhouse gas emissions, wireless transceivers with low power consumption are strongly desired for future cellular systems. At the same time, increased modulation order due to the evolution of cellular systems will force power amplifiers to operate at much larger output power back-off to prevent EVM degradation. This paper begins with an analysis of load modulation and asymmetrical Doherty amplifiers. Measurement results will show an apparent 60% efficiency plateau for modulated signals with a large peak-to-average power ratio (PAPR). To exceed this efficiency limitation, the second part of this paper focuses on a new amplification topology based on the amalgamation between Doherty and outphasing. Measurement results of the proposed Doherty-outphasing power amplifier (DOPA) will confirm the feasibility of the approach with a modulated efficiency greater than 70% measured at 10 dB output power back-off.

  • A Noise-Canceling Charge Pump for Area Efficient PLL Design Open Access

    Go URAKAWA  Hiroyuki KOBAYASHI  Jun DEGUCHI  Ryuichi FUJIMOTO  

     
    PAPER

      Pubricized:
    2021/04/20
      Vol:
    E104-C No:10
      Page(s):
    625-634

    In general, since the in-band noise of phase-locked loops (PLLs) is mainly caused by charge pumps (CPs), large-size transistors that occupy a large area are used to improve in-band noise of CPs. With the high demand for low phase noise in recent high-performance communication systems, the issue of the trade-off between occupied area and noise in conventional CPs has become significant. A noise-canceling CP circuit is presented in this paper to mitigate the trade-off between occupied area and noise. The proposed CP can achieve lower noise performance than conventional CPs by performing additional noise cancelation. According to the simulation results, the proposed CP can reduce the current noise to 57% with the same occupied area, or can reduce the occupied area to 22% compared with that of the conventional CPs at the same noise performance. We fabricated a prototype of the proposed CP embedded in a 28-GHz LC-PLL using a 16-nm FinFET process, and 1.2-dB improvement in single sideband integrated phase noise is achieved.

  • Analysis and Design of Continuous-Time Comparator Open Access

    Takahiro MIKI  

     
    INVITED PAPER

      Pubricized:
    2021/10/02
      Vol:
    E104-C No:10
      Page(s):
    635-642

    Applications of continuous-time (CT) comparator include relaxation oscillators, pulse width modulators, and so on. CT comparator receives a differential input and outputs a strobe ideally when the differential input crosses zero. Unlike the DT comparators with positive feedback circuit, amplifiers consuming static power must be employed in CT comparators to amplify the input signal. Therefore, minimization of comparator delay under the constraint of power consumption often becomes an issue. This paper analyzes transient behavior of a CT comparator. Using “constant delay approximation”, the comparator delay is derived as a function of input slew rate, number of stages of the preamplifier, and device parameters in each block. This paper also discusses optimum design of the CT comparator. The condition for minimum comparator delay is derived with keeping power consumption constant. The results include that the optimum DC gain of the preamplifier is e∼e3 per stage depending on the element which dominates load capacitance of the preamplifier.

  • ZigZag Antenna Configuration for MmWave V2V with Relay in Typical Road Scenarios: Design, Analysis and Experiment

    Yue YIN  Haoze CHEN  Zongdian LI  Tao YU  Kei SAKAGUCHI  

     
    PAPER-Antennas and Propagation

      Pubricized:
    2021/04/09
      Vol:
    E104-B No:10
      Page(s):
    1307-1317

    Communication systems operating in the millimeter-wave (mmWave) band have the potential to realize ultra-high throughput and ultra-low latency vehicle-to-vehicle (V2V) communications in 5G and beyond wireless networks. Moreover, because of the weak penetration nature of mmWave, one mmWave channel can be reused in all V2V links, which improves the spectrum efficiency. Although the outstanding performance of the mmWave above has been widely acknowledged, there are still some shortcomings. One of the unavoidable defects is multipath interference. Even though the direct interference link cannot penetrate vehicle bodies, other interference degrades the throughput of the mmWave V2V communication. In this paper, we focus on the multipath interference caused by signal reflections from roads and surroundings, where the interference strength varies in road scenarios. Firstly, we analyze the multipath channel models of mmWave V2V with relay in three typical road scenarios (single straight roads, horizontal curves, and slopes). Their interference differences are clarified. Based on the analysis, a novel method of ZigZag antenna configuration is proposed to guarantee the required data rate. Secondly, the performance of the proposed method is evaluated by simulation. It proves that the ZigZag antenna configuration with an optimal antenna height can significantly suppress the destructive interference, and ensure a throughput over 1Gbps comparing to the conventional antenna configuration at 60GHz band. Furthermore, the effectiveness of ZigZag antenna configuration is demonstrated on a single straight road by outdoor experiments.

  • Receiver Selective Opening CCA Secure Public Key Encryption from Various Assumptions

    Yi LU  Keisuke HARA  Keisuke TANAKA  

     
    PAPER-Cryptography and Information Security

      Pubricized:
    2021/03/16
      Vol:
    E104-A No:9
      Page(s):
    1206-1218

    Receiver selective opening (RSO) attack for public key encryption (PKE) captures a situation where one sender sends messages to multiple receivers, an adversary can corrupt a set of receivers and get their messages and secret keys. Security against RSO attack for a PKE scheme ensures confidentiality of other uncorrupted receivers' ciphertexts. Among all of the RSO security notions, simulation-based RSO security against chosen ciphertext attack (SIM-RSO-CCA security) is the strongest notion. In this paper, we explore constructions of SIM-RSO-CCA secure PKE from various computational assumptions. Toward this goal, we show that a SIM-RSO-CCA secure PKE scheme can be constructed based on an IND-CPA secure PKE scheme and a designated-verifier non-interactive zero-knowledge (DV-NIZK) argument satisfying one-time simulation soundness. Moreover, we give the first construction of DV-NIZK argument satisfying one-time simulation soundness. Consequently, through our generic construction, we obtain the first SIM-RSO-CCA secure PKE scheme under the computational Diffie-Hellman (CDH) or learning parity with noise (LPN) assumption.

  • Counting Convex and Non-Convex 4-Holes in a Point Set

    Young-Hun SUNG  Sang Won BAE  

     
    PAPER-Algorithms and Data Structures

      Pubricized:
    2021/03/18
      Vol:
    E104-A No:9
      Page(s):
    1094-1100

    In this paper, we present an algorithm that counts the number of empty quadrilaterals whose corners are chosen from a given set S of n points in general position. Our algorithm can separately count the number of convex or non-convex empty quadrilaterals in O(T) time, where T denotes the number of empty triangles in S. Note that T varies from Ω(n2) and O(n3) and the expected value of T is known to be Θ(n2) when the n points in S are chosen uniformly and independently at random from a convex and bounded body in the plane. We also show how to enumerate all convex and/or non-convex empty quadrilaterals in S in time proportional to the number of reported quadrilaterals, after O(T)-time preprocessing.

  • Character Design Generation System Using Multiple Users' Gaze Information

    Hiroshi TAKENOUCHI  Masataka TOKUMARU  

     
    PAPER-Human-computer Interaction

      Pubricized:
    2021/05/25
      Vol:
    E104-D No:9
      Page(s):
    1459-1466

    We investigate an interactive evolutionary computation (IEC) using multiple users' gaze information when users partially participate in each design evaluation. Many previous IEC systems have a problem that user evaluation loads are too large. Hence, we proposed to employ user gaze information for evaluating designs generated by IEC systems in order to solve this problem. In this proposed system, users just view the presented designs, not assess, then the system automatically creates users' favorite designs. With the user's gaze information, the proposed system generates coordination that can satisfy many users. In our previous study, we verified the effectiveness of the proposed system from a real system operation viewpoint. However, we did not consider the fluctuation of the users during a solution candidate evaluation. In the actual operation of the proposed system, users may change during the process due to the user interchange. Therefore, in this study, we verify the effectiveness of the proposed system when varying the users participating in each evaluation for each generation. In the experiment, we employ two types of situations as assumed in real environments. The first situation changes the number of users evaluating the designs for each generation. The second situation employs various users from the predefined population to evaluate the designs for each generation. From the experimental results in the first situation, we confirm that, despite the change in the number of users during the solution candidate evaluation, the proposed system can generate coordination to satisfy many users. Also, from the results in the second situation, we verify that the proposed system can also generate coordination which both users who participate in the coordination evaluation can more satisfy.

  • Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale

    Thao-Nguyen TRUONG  Ryousei TAKANO  

     
    PAPER-Information Network

      Pubricized:
    2021/04/23
      Vol:
    E104-D No:8
      Page(s):
    1332-1339

    Data parallelism is the dominant method used to train deep learning (DL) models on High-Performance Computing systems such as large-scale GPU clusters. When training a DL model on a large number of nodes, inter-node communication becomes bottle-neck due to its relatively higher latency and lower link bandwidth (than intra-node communication). Although some communication techniques have been proposed to cope with this problem, all of these approaches target to deal with the large message size issue while diminishing the effect of the limitation of the inter-node network. In this study, we investigate the benefit of increasing inter-node link bandwidth by using hybrid switching systems, i.e., Electrical Packet Switching and Optical Circuit Switching. We found that the typical data-transfer of synchronous data-parallelism training is long-lived and rarely changed that can be speed-up with optical switching. Simulation results on the Simgrid simulator show that our approach speed-up the training time of deep learning applications, especially in a large-scale manner.

201-220hit(3318hit)