Ryo TAKAHASHI Hidenori MATSUO Fumiyuki ADACHI
Ultra-densification of radio access network (RAN) is essential to efficiently handle the ever-increasing mobile data traffic. In this paper, a joint multi-layered user clustering and scheduling is proposed as an inter-cluster interference coordination scheme for ultra-dense RAN using cluster-wise distributed MIMO transmission/reception. The proposed joint multi-layered user clustering and scheduling consists of user clustering using the K-means algorithm, user-cluster layering (called multi-layering) based on the interference-offset-distance (IOD), cluster-antenna association on each layer, and layer-wise round-robin-type scheduling. The user capacity, the sum capacity, and the fairness are evaluated by computer simulations to show the effectiveness of the proposed joint multi-layered user clustering and scheduling. Also shown are uplink and downlink capacity comparisons and optimal IOD setting considering the trade-off between inter-cluster interference mitigation and transmission opportunity.
Thao-Nguyen TRUONG Ryousei TAKANO
Data parallelism is the dominant method used to train deep learning (DL) models on High-Performance Computing systems such as large-scale GPU clusters. When training a DL model on a large number of nodes, inter-node communication becomes bottle-neck due to its relatively higher latency and lower link bandwidth (than intra-node communication). Although some communication techniques have been proposed to cope with this problem, all of these approaches target to deal with the large message size issue while diminishing the effect of the limitation of the inter-node network. In this study, we investigate the benefit of increasing inter-node link bandwidth by using hybrid switching systems, i.e., Electrical Packet Switching and Optical Circuit Switching. We found that the typical data-transfer of synchronous data-parallelism training is long-lived and rarely changed that can be speed-up with optical switching. Simulation results on the Simgrid simulator show that our approach speed-up the training time of deep learning applications, especially in a large-scale manner.
Pan TAN Zhengchun ZHOU Haode YAN Yong WANG
Locally repairable codes (LRCs) with availability have received considerable attention in recent years since they are able to solve many problems in distributed storage systems such as repairing multiple node failures and managing hot data. Constructing LRCs with locality r and availability t (also called (r, t)-LRCs) with new parameters becomes an interesting research subject in coding theory. The objective of this paper is to propose two generic constructions of cyclic (r, t)-LRCs via linearized polynomials over finite fields. These two constructions include two earlier ones of cyclic LRCs from trace functions and truncated trace functions as special cases and lead to LRCs with new parameters that can not be produced by earlier ones.
Koichi SHIRAHATA Amir HADERBACHE Naoto FUKUMOTO Kohta NAKASHIMA
Scalability of distributed DNN training can be limited by slowdown of specific processes due to unexpected hardware failures. We propose a dynamic process exclusion technique so that training throughput is maximized. Our evaluation using 32 processes with ResNet-50 shows that our proposed technique reduces slowdown by 12.5% to 50% without accuracy loss through excluding the slow processes.
Chikako TAKASAKI Atsuko TAKEFUSA Hidemoto NAKADA Masato OGUCHI
With the development of cameras and sensors and the spread of cloud computing, life logs can be easily acquired and stored in general households for the various services that utilize the logs. However, it is difficult to analyze moving images that are acquired by home sensors in real time using machine learning because the data size is too large and the computational complexity is too high. Moreover, collecting and accumulating in the cloud moving images that are captured at home and can be used to identify individuals may invade the privacy of application users. We propose a method of distributed processing over the edge and cloud that addresses the processing latency and the privacy concerns. On the edge (sensor) side, we extract feature vectors of human key points from moving images using OpenPose, which is a pose estimation library. On the cloud side, we recognize actions by machine learning using only the feature vectors. In this study, we compare the action recognition accuracies of multiple machine learning methods. In addition, we measure the analysis processing time at the sensor and the cloud to investigate the feasibility of recognizing actions in real time. Then, we evaluate the proposed system by comparing it with the 3D ResNet model in recognition experiments. The experimental results demonstrate that the action recognition accuracy is the highest when using LSTM and that the introduction of dropout in action recognition using 100 categories alleviates overfitting because the models can learn more generic human actions by increasing the variety of actions. In addition, it is demonstrated that preprocessing using OpenPose on the sensor side can substantially reduce the transfer quantity from the sensor to the cloud.
Yuh YAMASHITA Haruka SUMITA Ryosuke ADACHI Koichi KOBAYASHI
This paper proposes a distributed observer on a sensor network, where communication on the network is randomly performed. This work is a natural extension of Kalman consensus filter approach to the cases involving random communication. In both bidirectional and unidirectional communication cases, gain conditions that guarantee improvement of estimation error convergence compared to the case with no communication are obtained. The obtained conditions are more practical than those of previous studies and give appropriate cooperative gains for a given communication probability. The effectiveness of the proposed method is confirmed by computer simulations.
Young-Woo KWON Sung-Mun PARK Joon-Young CHOI
We propose a system time synchronization method between ARM-based embedded Linux systems. The master Linux with reference clock sends its own system time to the slave Linux via Transmission Control Protocol communication along with a general-purpose input/output (GPIO) signal, and then the slave Linux corrects its own system time by the difference between its own system time at receiving the GPIO signal and the received reference time. The synchronization performance is significantly improved by compensating for the GPIO signal detection latency and the system time acquisition and setting latencies in Linux. These latencies are precisely measured by exploiting the function of Cycle Counter register in ARM coprocessor. Extensive experiments are performed with two ARM-based embedded Linux systems, and the results demonstrate the validity and performance of the proposed synchronization method.
Daiki OGAWA Koichi KOBAYASHI Yuh YAMASHITA
A blockchain, which is well known as one of the distributed ledgers, has attracted in many research fields. In this paper, we discuss the effectiveness and limitation of a blockchain in distributed optimization. In distributed optimization, the original problem is decomposed, and the local problems are solved by multiple agents. In this paper, ADMM (Alternating Direction Method of Multipliers) is utilized as one of the powerful methods in distributed optimization. In ADMM, an aggregator is basically required for collecting the computation result in each agent. Using blockchains, the function of an aggregator can be contained in a distributed ledger, and an aggregator may not be required. As a result, tampering from attackers can be prevented. As an application, we consider energy management systems (EMSs). By numerical experiments, the effectiveness and limitation of blockchain-based distributed optimization are clarified.
Tomoki MURAKAMI Koichi ISHIHARA Hirantha ABEYSEKERA Yasushi TAKATORI
Dense deployments of wireless local area network (WLAN) access points (APs) are accelerating to accommodate the massive wireless traffic from various mobile devices. The AP densification improves the received power at mobile devices; however, total throughput in a target area is saturated by inter-cell interference (ICI) because of the limited number of frequency channels available for WLANs. To substantially mitigate ICI, we developed and described a distributed smart antenna system (D-SAS) proposed for dense WLAN AP deployment in this paper. We also describe a system configuration based on our D-SAS approach. In this approach, the distributed antennas externally attached to each AP can be switched so as to make the transmit power match the mobile device's conditions (received power and packet type). The gains obtained by the antenna switching effectively minimize the transmission power required of each AP. We also describe experimental measurements taken in a stadium using a system prototype, the results show that D-SAS offers double the total throughput attained by a centralized smart antenna system (C-SAS).
Chenxu WANG Yutong LU Zhiguang CHEN Junnan LI
Training deep learning (DL) is a computationally intensive process; as a result, training time can become so long that it impedes the development of DL. High performance computing clusters, especially supercomputers, are equipped with a large amount of computing resources, storage resources, and efficient interconnection ability, which can train DL networks better and faster. In this paper, we propose a method to train DL networks distributed with high efficiency. First, we propose a hierarchical synchronous Stochastic Gradient Descent (SGD) strategy, which can make full use of hardware resources and greatly increase computational efficiency. Second, we present a two-level parameter synchronization scheme which can reduce communication overhead by transmitting parameters of the first layer models in shared memory. Third, we optimize the parallel I/O by making each reader read data as continuously as possible to avoid the high overhead of discontinuous data reading. At last, we integrate the LARS algorithm into our system. The experimental results demonstrate that our approach has tremendous performance advantages relative to unoptimized methods. Compared with the native distributed strategy, our hierarchical synchronous SGD strategy (HSGD) can increase computing efficiency by about 20 times.
How to restore virtual network against substrate network failure (e.g. link cut) is one of the key challenges of network virtualization. The traditional virtual network recovery (VNR) methods are mostly based on the idea of centralized control. However, if multiple virtual networks fail at the same time, their recovery processes are usually queued according to a specific priority, which may increase the average waiting time of users. In this letter, we study distributed virtual network recovery (DVNR) method to improve the virtual network recovery efficiency. We establish exclusive virtual machine (VM) for each virtual network and process recovery requests of multiple virtual networks in parallel. Simulation results show that the proposed DVNR method can obtain recovery success rate closely to centralized VNR method while yield ~70% less average recovery time.
Norihide KITAOKA Eichi SETO Ryota NISHIMURA
We have developed an adaptation method which allows the customization of example-based dialog systems for individual users by applying “plus” and “minus” operations to the distributed representations obtained using the word2vec method. After retrieving user-related profile information from the Web, named entity extraction is applied to the retrieval results. Words with a high term frequency-inverse document frequency (TF-IDF) score are then adopted as user related words. Next, we calculate the similarity between the distrubuted representations of selected user-related words and nouns in the existing example phrases, using word2vec embedding. We then generate phrases adapted to the user by substituting user-related words for highly similar words in the original example phrases. Word2vec also has a special property which allows the arithmetic operations “plus” and “minus” to be applied to distributed word representations. By applying these operations to words used in the original phrases, we are able to determine which user-related words can be used to replace the original words. The user-related words are then substituted to create customized example phrases. We evaluated the naturalness of the generated phrases and found that the system could generate natural phrases.
Takaaki SAWA Fujun HE Akio KAWABATA Eiji OKI
This paper proposes two algorithms, namely Server-User Matching (SUM) algorithm and Extended Server-User Matching (ESUM) algorithm, for the distributed server allocation problem. The server allocation problem is to determine the matching between servers and users to minimize the maximum delay, which is the maximum time to complete user synchronization. We analyze the computational time complexity. We prove that the SUM algorithm obtains the optimal solutions in polynomial time for the special case that all server-server delay values are the same and constant. We provide the upper and lower bounds when the SUM algorithm is applied to the general server allocation problem. We show that the ESUM algorithm is a fixed-parameter tractable algorithm that can attain the optimal solution for the server allocation problem parameterized by the number of servers. Numerical results show that the computation time of ESUM follows the analyzed complexity while the ESUM algorithm outperforms the approach of integer linear programming solved by our examined solver.
Ryo IGARASHI Masamichi FUJIWARA Takuya KANAI Hiro SUZUKI Jun-ichi KANI Jun TERADA
Effective user accommodation will be more and more important in passive optical networks (PONs) in the next decade since the number of subscribers has been leveling off as well and it is becoming more difficult for network operators to keep sufficient numbers of maintenance workers. Drastically reducing the number of small-scale communication buildings while keeping the number of accommodated users is one of the most attractive solutions to meet this situation. To achieve this, we propose two types of long-reach repeater-free upstream transmission configurations for PON systems; (i) one utilizes a semiconductor optical amplifier (SOA) as a pre-amplifier and (ii) the other utilizes distributed Raman amplification (DRA) in addition to the SOA. Our simulations assuming 10G-EPON specifications and transmission experiments on a 10G-EPON prototype confirm that configuration (i) can add a 17km trunk fiber to a normal PON system with 10km access reach and 1 : 64 split (total 27km reach), while configuration (ii) can further expand the trunk fiber distance to 37km (total 47km reach). Network operators can select these configurations depending on their service areas.
One of key technologies in the fifth generation mobile communications is a distributed antenna system (DAS). As DAS creates tightly packed antenna arrangements, inter-user interference degrades its spectrum efficiency. Round-robin (RR) scheduling is known as a scheme that achieves a good trade-off between computational complexity and spectrum efficiency. This paper proposes a user equipment (UE) allocation scheme for RR scheduling. The proposed scheme offers low complexity as the phase of UE allocation sequences are predetermined. Four different phase selection criteria are compared in this paper. Numerical results obtained through computer simulation show that maximum selection, which sequentially searches for the phase with the maximum tentative throughput realizes the best spectrum efficiency next to full search. There is an optimum number of UEs which obtains the largest throughput in single-user allocation while the system throughput improves as the number of UEs increases in 2-user RR scheduling.
Ryuta SHINGAI Yuria HIRAGA Hisakazu FUKUOKA Takamasa MITANI Takashi NAKADA Yasuhiko NAKASHIMA
Modern deep learning has significantly improved performance and has been used in a wide variety of applications. Since the amount of computation required for the inference process of the neural network is large, it is processed not by the data acquisition location like a surveillance camera but by the server with abundant computing power installed in the data center. Edge computing is getting considerable attention to solve this problem. However, edge computing can provide limited computation resources. Therefore, we assumed a divided/distributed neural network model using both the edge device and the server. By processing part of the convolution layer on edge, the amount of communication becomes smaller than that of the sensor data. In this paper, we have evaluated AlexNet and the other eight models on the distributed environment and estimated FPS values with Wi-Fi, 3G, and 5G communication. To reduce communication costs, we also introduced the compression process before communication. This compression may degrade the object recognition accuracy. As necessary conditions, we set FPS to 30 or faster and object recognition accuracy to 69.7% or higher. This value is determined based on that of an approximation model that binarizes the activation of Neural Network. We constructed performance and energy models to find the optimal configuration that consumes minimum energy while satisfying the necessary conditions. Through the comprehensive evaluation, we found that the optimal configurations of all nine models. For small models, such as AlexNet, processing entire models in the edge was the best. On the other hand, for huge models, such as VGG16, processing entire models in the server was the best. For medium-size models, the distributed models were good candidates. We confirmed that our model found the most energy efficient configuration while satisfying FPS and accuracy requirements, and the distributed models successfully reduced the energy consumption up to 48.6%, and 6.6% on average. We also found that HEVC compression is important before transferring the input data or the feature data between the distributed inference processes.
Rui TENG Kazuto YANO Yoshinori SUZUKI
A multi-band wireless local area network (WLAN) enables flexible use of multiple frequency bands. To efficiently monitor radio resources in multi-band WLANs, a distributed-sensing system that employs a number of stations (STAs) is considered to alleviate sensing constraints at access points (APs). This paper examines the distributed sensing that expands the sensing coverage area and monitors multiple object channels by employing STA-based sensing. To avoid issuing unnecessary reports, each STA autonomously judges whether it should make a report by comparing the importance of its own sensing result and that of the overheard report. We address how to efficiently collect the necessary sensing information from a large number of STAs. We propose a reactive reporting scheme that is highly scalable by the number of STAs to collect such sensing results as the channel occupancy ratio. Evaluation results show that the proposed scheme keeps the number of reports low even if the number of STAs increases. Our proposed sensing scheme provides large sensing coverage.
Akio KAWABATA Bijoy CHAND CHATTERJEE Eiji OKI
This paper proposes an efficient server selection scheme in successive participation scenario with participating-domain segmentation. The scheme is utilized by distributed processing systems for real-time interactive communication to suppress the communication latency of a wide-area network. In the proposed scheme, users participate for server selection one after another. The proposed scheme determines a recommended server, and a new user selects the recommended server first. Before each user participates, the recommended servers are determined assuming that users exist in the considered regions. A recommended server is determined for each divided region to minimize the latency. The new user selects the recommended available server, where the user is located. We formulate an integer linear programming problem to determine the recommended servers. Numerical results indicate that, at the cost additional computation, the proposed scheme offers smaller latency than the conventional scheme. We investigate different policies to divide the users' participation for the recommended server finding process in the proposed scheme.
Sanghun CHOI Shuichiro HARUTA Yichen AN Iwao SASASE
Since the owner's data might be leaked from the centralized server storage, the distributed storage schemes with the server storage have been investigated. To ensure the owner's data in those schemes, they use Reed Solomon code. However, those schemes occur the burden of data capacity since the parity data are increased by how much the disconnected data can be restored. Moreover, the calculation time for the restoration will be higher since many parity data are needed to restore the disconnected data. In order to reduce the burden of data capacity and the calculation time, we proposed the server-based distributed storage using Secret Sharing with AES-256 for lightweight safety restoration. Although we use Secret Sharing, the owner's data will be safely kept in the distributed storage since all of the divided data are divided into two pieces with the AES-256 and stored in the peer storage and the server storage. Even though the server storage keeps the divided data, the server and the peer storages might know the pair of divided data via Secret Sharing, the owner's data are secure in the proposed scheme from the inner attack of Secret Sharing. Furthermore, the owner's data can be restored by a few parity data. The evaluations show that our proposed scheme is improved for lightweight, stability, and safety.
Skip Graph is a promising distributed data structure for large scale systems and known for its capability of range queries. Although several methods of routing range queries in Skip Graph have been proposed, they have inefficiencies such as a long path length or a large number of messages. In this paper, we propose a novel routing method for range queries named Split-Forward Broadcasting (SFB). SFB introduces a divide-and-conquer approach, enabling nodes to make full use of their routing tables to forward a range query. It brings about a shorter average path length than existing methods, as well as a smaller number of messages by avoiding duplicate transmission. We clarify the characteristics and effectiveness of SFB through both analytical and experimental comparisons. The results show that SFB can reduce the average path length roughly 30% or more compared with a state-of-the-art method.