Shuai MU Dongdong LI Yubei CHEN Yangdong DENG Zhihua WANG
By exploiting data-level parallelism, Graphics Processing Units (GPUs) have become a high-throughput, general purpose computing platform. Many real-world applications especially those following a stream processing pattern, however, feature interleaved task-pipelined and data parallelism. Current GPUs are ill equipped for such applications due to the insufficient usage of computing resources and/or the excessive off-chip memory traffic. In this paper, we focus on microarchitectural enhancements to enable task-pipelined execution of data-parallel kernels on GPUs. We propose an efficient adaptive dynamic scheduling mechanism and a moderately modified L2 design. With minor hardware overhead, our techniques orchestrate both task-pipeline and data parallelisms in a unified manner. Simulation results derived by a cycle-accurate simulator on real-world applications prove that the proposed GPU microarchitecture improves the computing throughput by 18% and reduces the overall accesses to off-chip GPU memory by 13%.
Nagao OGINO Hideyuki KOTO Hajime NAKAMURA Shigehiro ANO
As a network evolves following initial deployment, its service functions remain diversified through the openness of the network functions. This indicates that appropriate simplification of the service functions is essential if the evolving network is to achieve the required scalability of service processing and service management. While the screening of service functions is basically performed by network users and the market, several service functions will be automatically simplified based on the growth of the evolving network. This paper verifies the simplification of service functions resulting from the evolution of the network itself. First, the principles that serve as the basis for simplifying the service functions are explained using several practical examples. Next, a simulation model is proposed to verify the simplification of service functions in terms of the priority control function for path routing and load balancing among multiple paths. From the results of the simulation, this study clarifies that the anticipated simplification of service functions is actually realizable and the service performance requirements can be reduced as the network evolves after deployment. When the simplification of service functions can improve network quality, it accelerates the evolution of the network and increases the operator's revenue.
Khamphao SISAAT Hiroaki KIKUCHI Shunji MATSUO Masato TERADA Masashi FUJIWARA Surin KITTITORNKUN
A botnet attacks any Victim Hosts via the multiple Command and Control (C&C) Servers, which are controlled by a botmaster. This makes it more difficult to detect the botnet attacks and harder to trace the source country of the botmaster due to the lack of the logged data about the attacks. To locate the C&C Servers during malware/bot downloading phase, we have analyzed the source IP addresses of downloads to more than 90 independent Honeypots in Japan in the CCC (Cyber Clean Center) dataset 2010 comprising over 1 million data records and almost 1 thousand malware names. Based on GeoIP services, a Time Zone Correlation model has been proposed to determine the correlation coefficient between bot downloads from Japan and other source countries. We found a strong correlation between active malware/bot downloads and time zone of the C&C Servers. As a result, our model confirms that malware/bot downloads are synchronized with time zone (country) of the corresponding C&C Servers so that the botmaster can be possibly traced.
Hae-Chang JEONG Kyung-Whan YEOM
In this paper, a systematic design of X-band (9–10 GHz) 40 W pulse-driven GaN HEMT power amplifier is presented. The design includes device evaluation, verification of designed matching circuits, and measurements of the designed power amplifier. Firstly, the optimum input and output impedances for the selected GaN HEMT chip from TriQuint Semiconductor Inc. are evaluated using load-pull measurement. The selected GaN HEMT shows extremely low optimum impedances, which are obtained using a pre-match load-pull method due to the limitation of the tuning impedance range of conventional impedance tuners. We propose a novel extraction of the optimum impedances with general pre-match circuits. The extracted optimum impedances are found to be close to those computed, using the large signal model supplied from TriQuint Semiconductor. Using the optimum impedances, the matching circuits of the power amplifier are designed employing the combined impedance transformer type based on EM co-simulation. The fabricated power amplifier has a size of 1517.8 mm2, an efficiency above 45%, power gain of 7.7–9.9 dB and output power of 47–44.8 dBm at 9–10 GHz with pulse width of 10 µsec and duty of 10%.
Kazuki SUGENO Shinpei NOGUCHI Mamiko INAMORI Yukitoshi SANADA
Recent interest in wireless power transfer research has been attracting a great deal of attention. To transfer power efficiently and safely in wireless power transfer system, information, such as frequency, required power and element values, need to be transmitted reliably. However, the bandwidth, which is used for exchanging information, is affected by the change of load at the receiver when it is charging. This paper investigates the effect of load fluctuation in data communication using orthogonal frequency division multiplexing (OFDM) modulation in resonant-type wireless power transfer systems. The equivalent circuit used in the transmitting and receiving antennas is a band pass filter (BPF) and its bandwidth is evaluated through circuit simulations. Numerical results obtained through computer simulation show that the bit error rate (BER) performance is affected by the load fluctuation and the efficiency of power transfer.
Shunichi TSUNODA Abu Hena Al MUKTADIR Eiji OKI
Smart OSPF (S-OSPF), a load balancing, shortest-path-based routing scheme, was introduced to improve the routing performances of networks running on OSPF assuming that exact traffic demands are known. S-OSPF distributes traffic from a source node to neighbor nodes, and after reaching the neighbor nodes, traffic is routed according to the OSPF protocol. However, in practice, exact traffic demands are difficult to obtain, and the distribution of unequal traffic to multiple neighbor nodes requires complex functionalities at the source. This paper investigates non-split S-OSPF with the hose model, in which only the total amount of traffic that each node injects into the network and the total amount of traffic each node receives from the network are known, for the first time, with the goal of minimizing the network congestion ratio (maximum link utilization over all links). In non-split S-OSPF, traffic from a source node to a destination node is not split over multiple routes, in other words, it goes via only one neighbor node to the destination node. The routing decision with the hose model is formulated as an integer linear programming (ILP) problem. Since the ILP problem is difficult to solve in a practical time, this paper proposes a heuristic algorithm. In the routing decision process, the proposed algorithm gives the highest priority to the node pair that has the highest product of the total amount of injected traffic by one node and total amount of received traffic by the other node in the pair, where both traffic volumes are specified in the hose model, and enables a source node to select the neighbor node that minimizes network congestion ratio for the worst case traffic condition specified by the hose model. The non-split S-OSPF scheme's network congestion ratios are compared with those of the split S-OSPF and classical shortest path routing (SPR) schemes. Numerical results show that the non-split S-OSPF scheme offers lower network congestion ratios than the classical SPR scheme, and achieves network congestion ratios comparable to the split S-OSPF scheme for larger networks. To validate the non-split S-OSPF scheme, using a testbed network experimentally, we develop prototypes of the non-split S-OSPF path computation server and the non-split S-OSPF router. The functionalities of these prototypes are demonstrated in a non-split S-OSPF network.
Xiangxu MENG Xiaodong WANG Xinye LIN
The GPS trajectory databases serve as bases for many intelligent applications that need to extract some trajectories for future processing or mining. When doing such tasks, spatio-temporal range queries based methods, which find all sub-trajectories within the given spatial extent and time interval, are commonly used. However, the history trajectory indexes of such methods suffer from two problems. First, temporal and spatial factors are not considered simutaneously, resulting in low performance when processing spatio-temporal queries. Second, the efficiency of indexes is sensitive to query size. The query performance changes dramatically as the query size changed. This paper proposes workload-aware Adaptive OcTree based Trajectory clustering Index (ATTI) aiming at optimizing trajectory storage and index performance. The contributions are three-folds. First, the distribution and time delay of the trajectory storage are introduced into the cost model of spatio-temporal range query; Second, the distribution of spatial division is dynamically adjusted based on GPS update workload; Third, the query workload adaptive mechanism is proposed based on virtual OcTree forest. A wide range of experiments are carried out over Microsoft GeoLife project dataset, and the results show that query delay of ATTI could be about 50% shorter than that of the nested index.
Hathaithip NINSONTI Weerasak CHOMKITICHAI Akira BABA Wiyong KANGWANSUPAMONKON Sukon PHANICHPHANT Kazunari SHINBO Keizo KATO Futao KANEKO
We report enhanced photocurrent properties of dye/Au-loaded titanium dioxide (TiO2) films on Au gratings. Au-loaded TiO2 nanopowders were first synthesized by a modified sol-gel method and then prepared by the impregnation method. We also fabricated dye-sensitized solar cells, which were composed of Au grating/Au-TiO2/TMPyP-SCC LbL (20 bilayers)/electrolyte/ITO substrates. Short-circuit photo-current measurements showed that Au-loaded TiO2 with grating-coupled surface plasmon excitation can enhance the short-circuit photocurrentof the fabricated cells.
Bo GU Kyoko YAMORI Sugang XU Yoshiaki TANAKA
Recent studies have shown that the traffic load is often distributed unevenly among the access points. Such load imbalance results in an ineffective bandwidth utilization. The load imbalance and the consequent ineffective bandwidth utilization could be alleviated via intelligently selecting user-AP associations. In this paper, the diversity in users' utilities is sufficiently taken into account, and a Stackelberg leader-follower game is formulated to obtain the optimal user-AP association. The effectiveness of the proposed algorithm on improving the degree of load balance is evaluated via simulations. Simulation results show that the performance of the proposed algorithm is superior to or at least comparable with the best existing algorithms.
Jun'ichi SHIMADA Hitomi TAMURA Masato UCHIDA Yuji OIE
Congestion inherently occurs on the Internet due to traffic concentration on certain nodes or links of networks. The traffic concentration is caused by inefficient use of topological information of networks in existing routing protocols, which reduces to inefficient mapping between traffic demands and network resources. Actually, the route with minimum cost, i.e., number of hops, selected as a transmission route by existing routing protocols would pass through specific nodes with common topological characteristics that could contribute to a large improvement in minimizing the cost. However, this would result in traffic concentration on such specific nodes. Therefore, we propose a measure of the distance between two nodes that is suitable for reducing traffic concentration on specific nodes. To consider the topological characteristics of the congestion points of networks, we define node-to-node distance by using a generalized norm, p-norm, of a vector of which elements are degrees of intermediate nodes of the route. Simulation results show that both the maximum Stress Centrality (SC) and the coefficient of variation of the SC are minimized in some network topologies by selecting transmission routes based on the proposed measure of node-to-node distance.
Ken TANAKA Kenji MIKAWA Manabu HIKIN
Network devices, such as routers or L3 switches, have a feature called packet-filtering for network security. They determine whether or not to pass arriving packets by applying filtering rules to them. If the number of comparisons of packets with rules increases, the time required for a determination will increase, which will result in greater communication delay. Various algorithms for optimizing filtering tables to minimize the load of packet filtering, which directly impacts the communication delay, have been proposed. In this paper, first we introduce an adaptive packet filter based on an algorithm that reconstructs the filtering table according to the frequency distribution of arrival packets. Next, we propose a new reconstruction algorithm based on grouping of dependent rules. Grouping dependent rules makes it possible to sort the rules in the table by the frequency of matching. Finally, we show the effectiveness of our algorithm by comparing it against previously reported algorithms.
Yousic LEE Jae-Dong LEE Taekeun PARK
In this letter, for offloading traffic to Wireless Local Area Network (WLAN) with transport layer mobility where WLAN service is intermittently available, we propose a novel scheme to freeze and melt the timeout handling procedure of SCTP. Simulation results show that the proposed scheme significantly improves the performance in terms of file transfer completion time.
Terutaka TAMAI Shigeru SAWADA Yasuhiro HATTORI
Tin (Sn) contacts are widely applied to connector contacts. Surfaces of plated tin layer are covered with an oxide film that results in high contact resistance. However, it is possible to obtain low contact resistance by using high contact load. Current downsizing trends often make it difficult to obtain high contact loads. Therefore, it is important to conduct basic studies of the contacts resistance characteristics under low contact load conditions. In this study, relationships between contact resistance and the changes of contact traces were examined. When a platinum (Pt) hemisphere contacted to tin plated flat coupon, it was found that the hemisphere surface sank into the softer tin plated flat surface during loading resulting in a piling up tin crystal grains along the periphery of the contact trace. During this process, sudden decrease in contact resistance was observed. To clarify the phenomenon, morphology changes of contact traces were observed by AFM, SEM and EBSD. FEM analysis was also used to analyze the mechanical stress distribution in the tin plated layer. Due to the peculiar distribution of stress, the crystal grains are separated and push out the contact area. This phenomenon is very different from commonly observed decrease in contact resistance due to elastic and plastic deformation inducing mechanical fracture of the surface oxide film.
Paulo GONÇALVES Shubhabrata ROY Thomas BEGIN Patrick LOISEAU
Dynamic resource management has become an active area of research in the Cloud Computing paradigm. Cost of resources varies significantly depending on configuration for using them. Hence efficient management of resources is of prime interest to both Cloud Providers and Cloud Users. In this work we suggest a probabilistic resource provisioning approach that can be exploited as the input of a dynamic resource management scheme. Using a Video on Demand use case to justify our claims, we propose an analytical model inspired from standard models developed for epidemiology spreading, to represent sudden and intense workload variations. We show that the resulting model verifies a Large Deviation Principle that statistically characterizes extreme rare events, such as the ones produced by “buzz/flash crowd effects” that may cause workload overflow in the VoD context. This analysis provides valuable insight on expectable abnormal behaviors of systems. We exploit the information obtained using the Large Deviation Principle for the proposed Video on Demand use-case for defining policies (Service Level Agreements). We believe these policies for elastic resource provisioning and usage may be of some interest to all stakeholders in the emerging context of cloud networking.
This paper presents the basic characteristics of a beam tilting slot antenna element whose forced resonance is realized by reactance loading; its structure complements that of a dipole antenna element. The radiation pattern is tilted using a properly determined driving point position; a single loading reactance is used to obtain the forced resonance without great changes in the tilt angle. Numerical results show that the reactance element needs to be loaded near the driving point in order to obtain the forced resonance of the antenna and the minimum changes in the beam tilt angle at the same time. When the proposed forced resonant beam tilting slot antenna with a 0.8 λ length is driven at -0.2 λ from the center, the main beam tilt angle of 57.7 degrees and the highest power gain of 3.8 dB are obtained. This slot element has a broad bandwidth, unlike the complementary dipole element.
This paper proposes a new three-mode resonator, which consists of a parallel-coupled microstrip line resonator embedded with a slotline resonator, and develops a compact low-loss bandpass filter (BPF) with a sharp roll-off response because of four transmission zeros (TZ) located very near the passband. Resonance mechanism and properties of the three modes are first analyzed by using an eigen-mode analysis, and then an equivalent circuit model is established for expressing a novel coupling scheme of the developed BPF. It is made clear from the results of circuit analysis that the four TZs are produced because of multiple paths between the input/output stub lines formed by the three resonant modes and the direct source/load coupling. The validity of the proposed resonator and filter is supported by the comparison between simulated and measured results.
Haoliang SUN Xiaohui HU Lixiang LIU
The existing routing protocols for the interplanetary backbone network did not consider future link connection and link congestion. A novel routing protocol named CAMARP for the interplanetary backbone network is proposed in this letter. We use wait delay to consider future link connection and make the best next hop selection. A load balancing mechanism is used to avoid congestion. The proposed method leads to a better and more efficient distribution of traffic, and also leads to lower packet drop rates and higher throughput. CAMARP demonstrates good performance in the experiment.
Tuan Anh LE Rim HAW Choong Seon HONG Sungwon LEE
Cubic TCP, one of transport protocols designed for high bandwidth-delay product (BDP) networks, has successfully been deployed in the Internet. Multi-homed computers with multiple interfaces to access the Internet via high speed links will become more popular. In this work, we introduce an extended version of Cubic TCP for multiple paths, called MPCubic. The extension process is approached from an analysis model of Cubic by using coordinated congestion control between paths. MPCubic can spread its traffic across paths in load-balancing manner, while preserving fair sharing with regular TCP, Cubic, and MPTCP at common bottlenecks. Moreover, to improve resilience to link failure, we propose a multipath fast recovery algorithm. The algorithm can significantly reduce the recovery time of data rate after restoration of failed links. These techniques can be useful for resilient high-bandwidth applications (for example, tele-health conference) in disaster-affected areas. Our simulation results show that MPCubic can achieve stability, throughput improvement, fairness, load-balancing, and quick data rate recovery from link failure under a variety of network conditions.
Lazy-task creation is an efficient method of overcoming the overhead of the grain-size problem in parallel computing. Work stealing is an effective load balancing strategy for parallel computing. In this paper, we present dynamic work stealing strategies in a lazy-task creation technique for efficient fine-grain task scheduling. The basic idea is to control load balancing granularity depending on the number of task parents in a stack. The dynamic-length strategy of work stealing uses run-time information, which is information on the load of the victim, to determine the number of tasks that a thief is allowed to steal. We compare it with the bottommost first work stealing strategy used in StackThread/MP, and the fixed-length strategy of work stealing, where a thief requests to steal a fixed number of tasks, as well as other multithreaded frameworks such as Cilk and OpenMP task implementations. The experiments show that the dynamic-length strategy of work stealing performs well in irregular workloads such as in UTS benchmarks, as well as in regular workloads such as Fibonacci, Strassen's matrix multiplication, FFT, and Sparse-LU factorization. The dynamic-length strategy works better than the fixed-length strategy because it is more flexible than the latter; this strategy can avoid load imbalance due to overstealing.
Tuan Anh LE Choong Seon HONG Sungwon LEE
Nowadays portable devices with multiple wireless interfaces and using multimedia services are becoming more popular on the Internet. This paper describes a family of multipath binomial congestion control algorithms for audio/video streaming, where a low variant of transmission rate is important. We extend the fluid model of binomial algorithms for single-path transmission to support the concurrent transmission of packets across multiple paths. We focus on the extension of two particular algorithms, SQRT and IIAD, for multiple paths, called MPSQRT and MPIIAD, respectively. Additionally, we apply the design technique (using the multipath fluid model) for multipath TCP (MPTCP) into the extension of SQRT and IIAD, called fbMPSQRT and fbMPIIAD, respectively. Both two approaches ensure that multipath binomial congestion control algorithms achieve load-balancing, throughput improvement, and fairness to single-path binomial algorithms at shared bottlenecks. Through the simulations and comparison with the uncoordinated protocols MPSQRT/MPIIAD, fbMPSQRT/fbMPIIAD and MPTCP, we find that our extended multipath transport protocols can preserve lower latency and transmission rate variance than MPTCP, fairly share with single-path SQRT/IIAD, MPTCP and TCP, and also can achieve throughput improvements and load-balancing equivalent to those of MPTCP under various scenarios and network conditions.