Duhu MAN Koji NAKANO Yasuaki ITO
The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of computing on CUDA-enabled GPUs. The approximate string matching (ASM) for two strings X and Y of length m and n is a task to find a substring of Y most similar to X. The main contribution of this paper is to show an optimal parallel algorithm for the approximate string matching on the HMM and implement it on GeForce GTX 580 GPU. Our algorithm runs in $O({nover w}+{mnover dw}+{nLover p}+{mnlover p})$ time units on the HMM with p threads, d streaming processors, memory band width w, global memory access latency L, and shared memory access latency l. We also show that the lower bound of the computing time is $Omega({nover w}+{mnover dw}+{nLover p}+{mnlover p})$ time units. Thus, our algorithm for the approximate string matching is time optimal. Further, we implemented our algorithm on GeForce GTX 580 GPU and evaluated the performance. The experimental results show that the ASM of two strings of 1024 and 4M (=222) characters can be done in 419.6ms, while the sequential algorithm can compute it in 27720ms. Thus, our implementation on the GPU attains a speedup factor of 66.1 over the single CPU implementation.
Saran TARNOI Wuttipong KUMWILAISAK Yusheng JI
This paper presents an optimal cooperative routing protocol (OCRP) aiming to improve the in-network cache utilization of the Content-Centric Networking (CCN). The objective of OCRP is to selectively aggregate the multiple flows of interest messages onto the same path in order to improve the cache utilization while mitigating the cache contention of the Content Stores (CSs) of CCN routers on the routing path. The proposed routing protocol consists of three processes: (1) Prefix Popularity Observation; (2) Prefix Group (Un)Subscription; and (3) Forwarding Information Base (FIB) Reconstruction. Prefix Popularity Observation observes the popularly cited prefixes to activate a prefix group (un)subscription function, which lets the Designated Router (DR) know which requester router wants to either join or leave a prefix group. Prefix Group (Un)Subscription lets the DR know which requester router is demanding to join or leave which prefix group. FIB Reconstruction reconstructs the FIB entries of the CCN routers involved in the newly computed optimal cooperative path of all prefix groups. The optimal routing path is obtained by binary linear optimization under a flow conservation constraint, cache contention mitigating constraint, and path length constraint. Two metrics of server load and round-trip hop distance are used to measure the performance of the proposed routing protocol. Simulation results from various network scenarios and various settings show advantages over the shortest path routing and our previously proposed cooperative routing schemes.
Hiroaki KONOURA Dawood ALNAJJAR Yukio MITSUYAMA Hajime SHIMADA Kazutoshi KOBAYASHI Hiroyuki KANBARA Hiroyuki OCHI Takashi IMAGAWA Kazutoshi WAKABAYASHI Masanori HASHIMOTO Takao ONOYE Hidetoshi ONODERA
This paper proposes a mixed-grained reconfigurable architecture consisting of fine-grained and coarse-grained fabrics, each of which can be configured for different levels of reliability depending on the reliability requirement of target applications, e.g. mission-critical applications to consumer products. Thanks to the fine-grained fabrics, the architecture can accommodate a state machine, which is indispensable for exploiting C-based behavioral synthesis to trade latency with resource usage through multi-step processing using dynamic reconfiguration. In implementing the architecture, the strategy of dynamic reconfiguration, the assignment of configuration storage and the number of implementable states are key factors that determine the achievable trade-off between used silicon area and latency. We thus split the configuration bits into two classes; state-wise configuration bits and state-invariant configuration bits for minimizing area overhead of configuration bit storage. Through a case study, we experimentally explore the appropriate number of implementable states. A proof-of-concept VLSI chip was fabricated in 65nm process. Measurement results show that applications on the chip can be working in a harsh radiation environment. Irradiation tests also show the correlation between the number of sensitive bits and the mean time to failure. Furthermore, the temporal error rate of an example application due to soft errors in the datapath was measured and demonstrated for reliability-aware mapping.
Shunzhi ZHU Ying MA Weiwei PAN Xiatian ZHU Guangchun LUO
A Balanced Neighborhood Classifier (BNEC) is proposed for class imbalanced data. This method is not only well positioned to capture the class distribution information, but also has the good merits of high-fitting-performance and simplicity. Experiments on both synthetic and real data sets show its effectiveness.
Daisuke FUKUDA Kenichi WATANABE Naoki IDANI Yuji KANAZAWA Masanori HASHIMOTO
As VLSI process node continue to shrink, chemical mechanical planarization (CMP) process for copper interconnect has become an essential technique for enabling many-layer interconnection. Recently, Edge-over-Erosion error (EoE-error), which originates from overpolishing and could cause yield loss, is observed in various CMP processes, while its mechanism is still unclear. To predict these errors, we propose an EoE-error prediction method that exploits machine learning algorithms. The proposed method consists of (1) error analysis stage, (2) layout parameter extraction stage, (3) model construction stage and (4) prediction stage. In the error analysis and parameter extraction stages, we analyze test chips and identify layout parameters which have an impact on EoE phenomenon. In the model construction stage, we construct a prediction model using the proposed multi-level machine learning method, and do predictions for designed layouts in the prediction stage. Experimental results show that the proposed method attained 2.7∼19.2% accuracy improvement of EoE-error prediction and 0.8∼10.1% improvement of non-EoE-error prediction compared with general machine learning methods. The proposed method makes it possible to prevent unexpected yield loss by recognizing EoE-errors before manufacturing.
Fumio TERAOKA Sho KANEMARU Kazuma YONEMURA Motoki IDE Shinji KAWAGUCHI Kunitake KANEKO
Using “clean-slate approach” to redesign the Internet has attracted considerable attention. ZNA (Z Network Architecture) is one of clean-slate network architectures based on the layered model. The major features of ZNA are as follows: (1) introducing the session layer to provide the applications with sophisticated communication services, (2) employing inter-node cross-layer cooperation to adapt to the dynamically changing network conditions, (3) splitting the node identifier and the node locator for mobility, multi-homing, and heterogeneity of network layer protocols, (4) splitting the data plane and the control plane for high manageability, and (5) introducing a recursive layered model to support network virtualization. This paper focuses on the first three topics as well as the basic design of ZNA.
Qian HU Muqing WU Hailong HAN Ning WANG Chaoyi ZHANG
As a promising future network architecture, Information-centric networking (ICN) has attracted much attention, its ubiquitous in-network caching is one of the key technologies to optimize the dissemination of information. However, considering the diversity of contents and the limitation of cache resources in the Internet, it is usually difficult to find a one-fit-all caching strategy. How to manage the ubiquitous in-network cache in ICN has become an important problem. In this paper, we explore ways to improve cache performance from the three perspectives of spatiality, temporality and availability, based on which we further propose an in-network cache management strategy to support differentiated service. We divide contents requested in the network into different levels and the selection of caching strategies depends on the content level. Furthermore, the corresponding models of utilizing cache resources in spatiality, temporality and availability are also derived for comparison and analysis. Simulation verifies that our differentiated service based cache management strategy can optimize the utilization of cache resources and get higher overall cache performance.
Akihiko KASAGI Koji NAKANO Yasuaki ITO
The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of computation on CUDA-enabled GPUs. The offline permutation is a task to copy numbers stored in an array a of size n to an array b of the same size along a permutation P given in advance. A conventional algorithm can complete the offline permutation by executing b[p[i]] ← a[i] for all i in parallel, where an array p stores the permutation P. We first present that the conventional algorithm runs $D_w(P)+2{nover w}+3L-3$ time units using n threads on the HMM with width w and latency L, where Dw(P) is the distribution of P. We next show that important regular permutations including transpose, shuffle, and bit-reversal permutations run $2{nover w}+2{nover kw}+2L-2$ time units on the HMM with k DMMs. We have implemented permutation algorithms for these regular permutations on GeForce GTX 680 GPU. The experimental results show that these algorithms run much faster than the conventional algorithm. We also present an offline permutation algorithm for any permutation running in $16{nover w}+16{nover kw}+16L-16$ time units on the HMM with k DMMs. Quite surprisingly, our offline permutation algorithm on the GPU achieves better performance than the conventional algorithm in random permutation, although the running time has a large constant factor. We can say that the experimental results provide a good example of GPU computation showing that a complicated but ingenious implementation with a larger constant factor in computing time can outperform a much simpler conventional algorithm.
As a prominent feature of Named Data Networking (NDN), in-network caching plays an important role in improving the performance of content delivery. However, if each NDN router indiscriminately caches every data packet passing by (i.e., Caching Everything Everywhere (CEE)), the result can be unnecessarily frequent cache replacement and cache redundancy in en-route routers and thus in-network caches are not utilized in an efficient way [1], [2]. Moreover, managing these in-network caches in a centralized way may lead to excessive resource consumption since the number of these caches is considerable. This work proposes a distributed and opportunistic on-path caching scheme. To be specific, each en-route router independently picks content items to cache in such a way that popular content is more likely to be cached by routers, especially routers near users, and cache redundancy is reduced. Extensive simulations including trace-driven ones in a PoP-level ISP topology suggest that the proposed scheme improves the average cache hit ratio of users' requests and reduces the average hop count as compared to CEE and the other on-path caching algorithms considered herein.
Ye GAO Masayuki SATO Ryusuke EGAWA Hiroyuki TAKIZAWA Hiroaki KOBAYASHI
Vector processors have significant advantages for next generation multimedia applications (MMAs). One of the advantages is that vector processors can achieve high data transfer performance by using a high bandwidth memory sub-system, resulting in a high sustained computing performance. However, the high bandwidth memory sub-system usually leads to enormous costs in terms of chip area, power and energy consumption. These costs are too expensive for commodity computer systems, which are the main execution platform of MMAs. This paper proposes a new multi-banked cache memory for commodity computer systems called MVP-cache in order to expand the potential of vector architectures on MMAs. Unlike conventional multi-banked cache memories, which employ one tag array and one data array in a sub-cache, MVP-cache associates one tag array with multiple independent data arrays of small-sized cache lines. In this way, MVP-cache realizes less static power consumption on its tag arrays. MVP-cache can also achieve high efficiency on short vector data transfers because the flexibility of data transfers can be improved by independently controlling the data transfers of each data array.
Xiong LUO Xiaohui CHANG Hong LIU
More recently, there has been a growing interest in the study of wireless sensor network (WSN) technologies for Interest of Things (IoT). To improve the positioning accuracy of mobile station under the non-line-of-sight (NLOS) environment, a localization algorithm based on the single-hidden layer feedforward network (SLFN) using extreme learning machine (ELM) for WSN is proposed in this paper. Optimal reduction in the time difference of arrival (TDOA) measurement error is achieved using SLFN optimized by ELM. Compared with those traditional learning algorithms, ELM has its unique feature of a higher generalization capability at a much faster learning speed. After utilizing the ELM by randomly assigning the parameters of hidden nodes in the SLFN, the competitive performance can be obtained on the optimization task for TDOA measurement error. Then, based on that result, Taylor algorithm is implemented to deal with the position problem of mobile station. Experimental results show that the effect of NLOS propagation is reduced based on our proposed algorithm by introducing the ELM into Taylor algorithm. Moreover, in the simulation, the proposed approach, called Taylor-ELM, provides better performance compared with some traditional algorithms, such as least squares, Taylor, backpropagation neural network based Taylor, and Chan positioning methods.
Qian HU Muqing WU Song GUO Hailong HAN Chaoyi ZHANG
Information-centric networking (ICN) is a promising architecture and has attracted much attention in the area of future Internet architectures. As one of the key technologies in ICN, in-network caching can enhance content retrieval at a global scale without requiring any special infrastructure. In this paper, we propose a workload-aware caching policy, LRU-GT, which allows cache nodes to protect newly cached contents for a period of time (guard time) during which contents are protected from being replaced. LRU-GT can utilize the temporal locality and distinguish contents of different popularity, which are both the characteristics of the workload. Cache replacement is modeled as a semi-Markov process under the Independent Reference Model (IRM) assumption and a theoretical analysis proves that popular contents have longer sojourn time in the cache compared with unpopular ones in LRU-GT and the value of guard time can affect the cache hit ratio. We also propose a dynamic guard time adjustment algorithm to optimize the performance. Simulation results show that LRU-GT can reduce the average hops to get contents and improve cache hit ratio.
Cloud computing, a novel distributed paradigm to provide powerful computing capabilities, is usually adopted by developers and researchers to execute complicated IoT applications such as complex workflows. In this scenario, it is fundamentally important to make an effective and efficient workflow application scheduling and execution by fully utilizing the advantages of the cloud (as virtualization and elastic services). However, in the current stage, there is relatively few research for workflow scheduling in cloud environment, where they usually just bring the traditional methods directly into cloud. Without considering the features of cloud, it may raise two kinds of problems: (1) The traditional methods mainly focus on static resource provision, which will cause the waste of resources; (2) They usually ignore the performance fluctuation of virtual machines on the physical machines, therefore it will lead to the estimation error of task execution time. To address these problems, a novel mechanism which can estimate the probability distribution of subtask execution time based on background VM load series over physical machines is proposed. An elastic performance fluctuations-aware stochastic scheduling algorithm is introduced in this paper. The experiments show that our proposed algorithm can outperform the existing algorithms in several metrics and can relieve the influence of performance fluctuations brought by the dynamic nature of cloud.
Operating system (OS) reboots are an essential part of updating kernels and applications on laptops and desktop PCs. Long downtime during OS reboots severely disrupts users' computational activities. This long disruption discourages the users from conducting OS reboots, failing to enforce them to conduct software updates. Although the dynamic updatable techniques have been widely studied, making the system “reboot-free” is still difficult due to their several limitations. As a result, users cannot benefit from new functionality or better performance, and even worse, unfixed vulnerabilities can be exploited by attackers. This paper presents ShadowReboot, a virtual machine monitor (VMM)-based approach that shortens downtime of OS reboots in software updates. ShadowReboot conceals OS reboot activities from user's applications by spawning a VM dedicated to an OS reboot and systematically producing the rebooted state where the updated kernel and applications are ready for use. ShadowReboot provides an illusion to the users that the guest OS travels forward in time to the rebooted state. ShadowReboot offers the following advantages. It can be used to apply patches to the kernels and even system configuration updates. Next, it does not require any special patch requiring detailed knowledge about the target kernels. Lastly, it does not require any target kernel modification. We implemented a prototype in VirtualBox 4.0.10 OSE. Our experimental results show that ShadowReboot successfully updated software on unmodified commodity OS kernels and shortened the downtime of commodity OS reboots on five Linux distributions (Fedora, Ubuntu, Gentoo, Cent, and SUSE) by 91 to 98%.
Masahiro NISHI Koichi SHIN Teruaki YOSHIDA
In the digital terrestrial TV broadcasting system, it is important to evaluate both quantitative levels and sources of overreach interference, because it can degrade the TV service quality. This paper newly proposes an overreach measurement method that simultaneously monitors RSSI (Received Signal Strength Indicator) and CNR (Carrier to Noise power Ratio) of the TV waves and RSSI of FM waves. The results of measurements conducted in Hiroshima prefecture show that our proposed method can evaluate the level of overreach interference in the TV waves and also identify the source of the interference. Total 43 overreach interference events were found in the proposed method from one-year measurement in 2012. Based on M profile data, this paper also shows that the main factor of the overreach interference in this measurement is duct propagation due to meteorological condition.
Ju Hee CHOI Jong Wook KWAK Chu Shik JHON
Non-Volatile Memories (NVMs) are considered as promising memory technologies for Last-Level Cache (LLC) due to their low leakage and high density. However, NVMs have some drawbacks such as high dynamic energy in modifying NVM cells, long latency for write operation, and limited write endurance. A number of approaches have been proposed to overcome these drawbacks. But very little attention is paid to consider the cache coherency issue. In this letter, we suggest a new cache coherence protocol to reduce the write operations of the LLC. In our protocol, the block data of the LLC is updated only if the cache block is written-back from a private cache, which leads to avoiding useless write operations in the LLC. The simulation results show that our protocol provides 27.1% energy savings and 26.3% lifetime improvements in STT-RAM at maximum.
Kriangkrai LIMTHONG Kensuke FUKUDA Yusheng JI Shigeki YAMADA
Detecting a variety of anomalies caused by attacks or accidents in computer networks has been one of the real challenges for both researchers and network operators. An effective technique that could quickly and accurately detect a wide range of anomalies would be able to prevent serious consequences for system security or reliability. In this article, we characterize detection techniques on the basis of learning models and propose an unsupervised learning model for real-time anomaly detection in computer networks. We also conducted a series of experiments to examine capabilities of the proposed model by employing three well-known machine learning algorithms, namely multivariate normal distribution, k-nearest neighbor, and one-class support vector machine. The results of these experiments on real network traffic suggest that the proposed model is a promising solution and has a number of flexible capabilities to detect several types of anomalies in real time.
Han-Peng JIANG Ming-Lung WENG Wei-Mei CHEN
Now that the subject of green computing is receiving a lot of attention, the energy consumption of datacenters has emerged as a significant issue. Consolidation of Virtual Machines (VMs) reduces the energy consumption since VM live migration not only optimizes VM placement, but also switches idle nodes to sleep mode. However, VM migration may negatively impact the performance of the system and lead to violations in SLA (Service Level Agreement) requirements between end users and cloud providers. In this study, we propose a VM consolidation mechanism that reduces the energy consumption of datacenters, eliminates unnecessary migrations, and minimizes the SLA violations. Compared to previous studies, the proposed policy shows a reduction of 2% to 3% in energy consumption, 13% to 41% in VM migration frequency, and 15% to 50% in SLA violations.
Raissa RELATOR Yoshihiro HIROHASHI Eisuke ITO Tsuyoshi KATO
Classification tasks in computer vision and brain-computer interface research have presented several applications such as biometrics and cognitive training. However, like in any other discipline, determining suitable representation of data has been challenging, and recent approaches have deviated from the familiar form of one vector for each data sample. This paper considers a kernel between vector sets, the mean polynomial kernel, motivated by recent studies where data are approximated by linear subspaces, in particular, methods that were formulated on Grassmann manifolds. This kernel takes a more general approach given that it can also support input data that can be modeled as a vector sequence, and not necessarily requiring it to be a linear subspace. We discuss how the kernel can be associated with the Projection kernel, a Grassmann kernel. Experimental results using face image sequences and physiological signal data show that the mean polynomial kernel surpasses existing subspace-based methods on Grassmann manifolds in terms of predictive performance and efficiency.
Caching is considered widely as an efficient way to reduce access latency and network bandwidth consumption. En-route caching, where caches are associated with routing nodes in the network, is proposed in the context of Web cache to exploit fully the potential of caching. To make sensible replacement and placement decision for en-route caching, traditional caching schemes either engage computation-intensive algorithm like dynamic programming or suffer from inferior performance in terms of average access latency. In this article, we propose a new caching scheme with cost-based replacement and random placement, which is named CRRP. The cost-based replacement of CRRP introduces probing request to timely perceive cost change and the random placement is independent of current caching state, of O(1) computational complexity of placement decision. Through extensive simulations, we show that CRRP outperforms a wide range of caching schemes and is very close to the traditional dynamic-programming-based algorithm, in terms of average access delay.