1-2hit |
Koya MITSUZUKA Michihiro KOIBUCHI Hideharu AMANO Hiroki MATSUTANI
In parallel processing applications, a few worker nodes called “stragglers”, which execute their tasks significantly slower than other tasks, increase the execution time of the job. In this paper, we propose a network switch based straggler handling system to mitigate the burden of the compute nodes. We also propose how to offload detecting stragglers and computing their results in the network switch with no additional communications between worker nodes. We introduce some approximate techniques for the proxy computation and response at the switch; thus our switch is called “ApproxSW.” As a result of a simulation experiment, the proposed approximation based on task similarity achieves the best accuracy in terms of quality of generated Map outputs. We also analyze how to suppress unnecessary proxy computation by the ApproxSW. We implement ApproxSW on NetFPGA-SUME board that has four 10Gbit Ethernet (10GbE) interfaces and a Virtex-7 FPGA. Experimental results shows that the ApproxSW functions do not degrade the original 10GbE switch performance.
David W. McKEE Xue OUYANG Jie XU
With the evolution of autonomous distributed systems such as smart cities, autonomous vehicles, smart control and scheduling systems there is an increased need for approaches to manage the execution of services to deliver real-time performance. As Cloud-hosted services are increasingly used to provide intelligence and analytic functionality to Internet of Things (IoT) systems, Quality of Service (QoS) techniques must be used to guarantee the timely service delivery. This paper reviews state-of-the-art QoS and Cloud techniques for real-time service delivery and data analysis. A review of straggler mitigation and a classification of real-time QoS techniques is provided. Then a mathematical framework is presented capturing the relationship between the host execution environment and the executing service allowing the response-times to predicted throughout execution. The framework is shown experimentally to reduce the number of QoS violations by 21% and provides alerts during the first 14ms provide alerts for 94% of future violations.