The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] ACH(1072hit)

381-400hit(1072hit)

  • An Optimal Implementation of the Approximate String Matching on the Hierarchical Memory Machine, with Performance Evaluation on the GPU

    Duhu MAN  Koji NAKANO  Yasuaki ITO  

     
    PAPER-GPU

      Vol:
    E97-D No:12
      Page(s):
    3063-3071

    The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of computing on CUDA-enabled GPUs. The approximate string matching (ASM) for two strings X and Y of length m and n is a task to find a substring of Y most similar to X. The main contribution of this paper is to show an optimal parallel algorithm for the approximate string matching on the HMM and implement it on GeForce GTX 580 GPU. Our algorithm runs in $O({nover w}+{mnover dw}+{nLover p}+{mnlover p})$ time units on the HMM with p threads, d streaming processors, memory band width w, global memory access latency L, and shared memory access latency l. We also show that the lower bound of the computing time is $Omega({nover w}+{mnover dw}+{nLover p}+{mnlover p})$ time units. Thus, our algorithm for the approximate string matching is time optimal. Further, we implemented our algorithm on GeForce GTX 580 GPU and evaluated the performance. The experimental results show that the ASM of two strings of 1024 and 4M (=222) characters can be done in 419.6ms, while the sequential algorithm can compute it in 27720ms. Thus, our implementation on the GPU attains a speedup factor of 66.1 over the single CPU implementation.

  • Optimal Cooperative Routing Protocol for Efficient In-Network Cache Management in Content-Centric Networks

    Saran TARNOI  Wuttipong KUMWILAISAK  Yusheng JI  

     
    PAPER

      Vol:
    E97-B No:12
      Page(s):
    2627-2640

    This paper presents an optimal cooperative routing protocol (OCRP) aiming to improve the in-network cache utilization of the Content-Centric Networking (CCN). The objective of OCRP is to selectively aggregate the multiple flows of interest messages onto the same path in order to improve the cache utilization while mitigating the cache contention of the Content Stores (CSs) of CCN routers on the routing path. The proposed routing protocol consists of three processes: (1) Prefix Popularity Observation; (2) Prefix Group (Un)Subscription; and (3) Forwarding Information Base (FIB) Reconstruction. Prefix Popularity Observation observes the popularly cited prefixes to activate a prefix group (un)subscription function, which lets the Designated Router (DR) know which requester router wants to either join or leave a prefix group. Prefix Group (Un)Subscription lets the DR know which requester router is demanding to join or leave which prefix group. FIB Reconstruction reconstructs the FIB entries of the CCN routers involved in the newly computed optimal cooperative path of all prefix groups. The optimal routing path is obtained by binary linear optimization under a flow conservation constraint, cache contention mitigating constraint, and path length constraint. Two metrics of server load and round-trip hop distance are used to measure the performance of the proposed routing protocol. Simulation results from various network scenarios and various settings show advantages over the shortest path routing and our previously proposed cooperative routing schemes.

  • Reliability-Configurable Mixed-Grained Reconfigurable Array Supporting C-Based Design and Its Irradiation Testing

    Hiroaki KONOURA  Dawood ALNAJJAR  Yukio MITSUYAMA  Hajime SHIMADA  Kazutoshi KOBAYASHI  Hiroyuki KANBARA  Hiroyuki OCHI  Takashi IMAGAWA  Kazutoshi WAKABAYASHI  Masanori HASHIMOTO  Takao ONOYE  Hidetoshi ONODERA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E97-A No:12
      Page(s):
    2518-2529

    This paper proposes a mixed-grained reconfigurable architecture consisting of fine-grained and coarse-grained fabrics, each of which can be configured for different levels of reliability depending on the reliability requirement of target applications, e.g. mission-critical applications to consumer products. Thanks to the fine-grained fabrics, the architecture can accommodate a state machine, which is indispensable for exploiting C-based behavioral synthesis to trade latency with resource usage through multi-step processing using dynamic reconfiguration. In implementing the architecture, the strategy of dynamic reconfiguration, the assignment of configuration storage and the number of implementable states are key factors that determine the achievable trade-off between used silicon area and latency. We thus split the configuration bits into two classes; state-wise configuration bits and state-invariant configuration bits for minimizing area overhead of configuration bit storage. Through a case study, we experimentally explore the appropriate number of implementable states. A proof-of-concept VLSI chip was fabricated in 65nm process. Measurement results show that applications on the chip can be working in a harsh radiation environment. Irradiation tests also show the correlation between the number of sensitive bits and the mean time to failure. Furthermore, the temporal error rate of an example application due to soft errors in the datapath was measured and demonstrated for reliability-aware mapping.

  • Balanced Neighborhood Classifiers for Imbalanced Data Sets

    Shunzhi ZHU  Ying MA  Weiwei PAN  Xiatian ZHU  Guangchun LUO  

     
    LETTER-Pattern Recognition

      Vol:
    E97-D No:12
      Page(s):
    3226-3229

    A Balanced Neighborhood Classifier (BNEC) is proposed for class imbalanced data. This method is not only well positioned to capture the class distribution information, but also has the good merits of high-fitting-performance and simplicity. Experiments on both synthetic and real data sets show its effectiveness.

  • Edge-over-Erosion Error Prediction Method Based on Multi-Level Machine Learning Algorithm

    Daisuke FUKUDA  Kenichi WATANABE  Naoki IDANI  Yuji KANAZAWA  Masanori HASHIMOTO  

     
    PAPER-Device and Circuit Modeling and Analysis

      Vol:
    E97-A No:12
      Page(s):
    2373-2382

    As VLSI process node continue to shrink, chemical mechanical planarization (CMP) process for copper interconnect has become an essential technique for enabling many-layer interconnection. Recently, Edge-over-Erosion error (EoE-error), which originates from overpolishing and could cause yield loss, is observed in various CMP processes, while its mechanism is still unclear. To predict these errors, we propose an EoE-error prediction method that exploits machine learning algorithms. The proposed method consists of (1) error analysis stage, (2) layout parameter extraction stage, (3) model construction stage and (4) prediction stage. In the error analysis and parameter extraction stages, we analyze test chips and identify layout parameters which have an impact on EoE phenomenon. In the model construction stage, we construct a prediction model using the proposed multi-level machine learning method, and do predictions for designed layouts in the prediction stage. Experimental results show that the proposed method attained 2.7∼19.2% accuracy improvement of EoE-error prediction and 0.8∼10.1% improvement of non-EoE-error prediction compared with general machine learning methods. The proposed method makes it possible to prevent unexpected yield loss by recognizing EoE-errors before manufacturing.

  • ZNA: A Six-Layer Network Architecture for New Generation Networks —— Focusing on the Session Layer, the Network Layer, and Cross-Layer Cooperation —— Open Access

    Fumio TERAOKA  Sho KANEMARU  Kazuma YONEMURA  Motoki IDE  Shinji KAWAGUCHI  Kunitake KANEKO  

     
    INVITED PAPER

      Vol:
    E97-B No:12
      Page(s):
    2583-2595

    Using “clean-slate approach” to redesign the Internet has attracted considerable attention. ZNA (Z Network Architecture) is one of clean-slate network architectures based on the layered model. The major features of ZNA are as follows: (1) introducing the session layer to provide the applications with sophisticated communication services, (2) employing inter-node cross-layer cooperation to adapt to the dynamically changing network conditions, (3) splitting the node identifier and the node locator for mobility, multi-homing, and heterogeneity of network layer protocols, (4) splitting the data plane and the control plane for high manageability, and (5) introducing a recursive layered model to support network virtualization. This paper focuses on the first three topics as well as the basic design of ZNA.

  • In-Network Cache Management Based on Differentiated Service for Information-Centric Networking

    Qian HU  Muqing WU  Hailong HAN  Ning WANG  Chaoyi ZHANG  

     
    PAPER

      Vol:
    E97-B No:12
      Page(s):
    2616-2626

    As a promising future network architecture, Information-centric networking (ICN) has attracted much attention, its ubiquitous in-network caching is one of the key technologies to optimize the dissemination of information. However, considering the diversity of contents and the limitation of cache resources in the Internet, it is usually difficult to find a one-fit-all caching strategy. How to manage the ubiquitous in-network cache in ICN has become an important problem. In this paper, we explore ways to improve cache performance from the three perspectives of spatiality, temporality and availability, based on which we further propose an in-network cache management strategy to support differentiated service. We divide contents requested in the network into different levels and the selection of caching strategies depends on the content level. Furthermore, the corresponding models of utilizing cache resources in spatiality, temporality and availability are also derived for comparison and analysis. Simulation verifies that our differentiated service based cache management strategy can optimize the utilization of cache resources and get higher overall cache performance.

  • Offline Permutation on the CUDA-enabled GPU

    Akihiko KASAGI  Koji NAKANO  Yasuaki ITO  

     
    PAPER-GPU

      Vol:
    E97-D No:12
      Page(s):
    3052-3062

    The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of computation on CUDA-enabled GPUs. The offline permutation is a task to copy numbers stored in an array a of size n to an array b of the same size along a permutation P given in advance. A conventional algorithm can complete the offline permutation by executing b[p[i]] ← a[i] for all i in parallel, where an array p stores the permutation P. We first present that the conventional algorithm runs $D_w(P)+2{nover w}+3L-3$ time units using n threads on the HMM with width w and latency L, where Dw(P) is the distribution of P. We next show that important regular permutations including transpose, shuffle, and bit-reversal permutations run $2{nover w}+2{nover kw}+2L-2$ time units on the HMM with k DMMs. We have implemented permutation algorithms for these regular permutations on GeForce GTX 680 GPU. The experimental results show that these algorithms run much faster than the conventional algorithm. We also present an offline permutation algorithm for any permutation running in $16{nover w}+16{nover kw}+16L-16$ time units on the HMM with k DMMs. Quite surprisingly, our offline permutation algorithm on the GPU achieves better performance than the conventional algorithm in random permutation, although the running time has a large constant factor. We can say that the experimental results provide a good example of GPU computation showing that a complicated but ingenious implementation with a larger constant factor in computing time can outperform a much simpler conventional algorithm.

  • Opportunistic On-Path Caching for Named Data Networking

    Xiaoyan HU  Jian GONG  

     
    PAPER-Network

      Vol:
    E97-B No:11
      Page(s):
    2360-2367

    As a prominent feature of Named Data Networking (NDN), in-network caching plays an important role in improving the performance of content delivery. However, if each NDN router indiscriminately caches every data packet passing by (i.e., Caching Everything Everywhere (CEE)), the result can be unnecessarily frequent cache replacement and cache redundancy in en-route routers and thus in-network caches are not utilized in an efficient way [1], [2]. Moreover, managing these in-network caches in a centralized way may lead to excessive resource consumption since the number of these caches is considerable. This work proposes a distributed and opportunistic on-path caching scheme. To be specific, each en-route router independently picks content items to cache in such a way that popular content is more likely to be cached by routers, especially routers near users, and cache redundancy is reduced. Extensive simulations including trace-driven ones in a PoP-level ISP topology suggest that the proposed scheme improves the average cache hit ratio of users' requests and reduces the average hop count as compared to CEE and the other on-path caching algorithms considered herein.

  • MVP-Cache: A Multi-Banked Cache Memory for Energy-Efficient Vector Processing of Multimedia Applications

    Ye GAO  Masayuki SATO  Ryusuke EGAWA  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER-Computer System

      Pubricized:
    2014/08/22
      Vol:
    E97-D No:11
      Page(s):
    2835-2843

    Vector processors have significant advantages for next generation multimedia applications (MMAs). One of the advantages is that vector processors can achieve high data transfer performance by using a high bandwidth memory sub-system, resulting in a high sustained computing performance. However, the high bandwidth memory sub-system usually leads to enormous costs in terms of chip area, power and energy consumption. These costs are too expensive for commodity computer systems, which are the main execution platform of MMAs. This paper proposes a new multi-banked cache memory for commodity computer systems called MVP-cache in order to expand the potential of vector architectures on MMAs. Unlike conventional multi-banked cache memories, which employ one tag array and one data array in a sub-cache, MVP-cache associates one tag array with multiple independent data arrays of small-sized cache lines. In this way, MVP-cache realizes less static power consumption on its tag arrays. MVP-cache can also achieve high efficiency on short vector data transfers because the flexibility of data transfers can be improved by independently controlling the data transfers of each data array.

  • A Taylor Based Localization Algorithm for Wireless Sensor Network Using Extreme Learning Machine

    Xiong LUO  Xiaohui CHANG  Hong LIU  

     
    PAPER

      Vol:
    E97-D No:10
      Page(s):
    2652-2659

    More recently, there has been a growing interest in the study of wireless sensor network (WSN) technologies for Interest of Things (IoT). To improve the positioning accuracy of mobile station under the non-line-of-sight (NLOS) environment, a localization algorithm based on the single-hidden layer feedforward network (SLFN) using extreme learning machine (ELM) for WSN is proposed in this paper. Optimal reduction in the time difference of arrival (TDOA) measurement error is achieved using SLFN optimized by ELM. Compared with those traditional learning algorithms, ELM has its unique feature of a higher generalization capability at a much faster learning speed. After utilizing the ELM by randomly assigning the parameters of hidden nodes in the SLFN, the competitive performance can be obtained on the optimization task for TDOA measurement error. Then, based on that result, Taylor algorithm is implemented to deal with the position problem of mobile station. Experimental results show that the effect of NLOS propagation is reduced based on our proposed algorithm by introducing the ELM into Taylor algorithm. Moreover, in the simulation, the proposed approach, called Taylor-ELM, provides better performance compared with some traditional algorithms, such as least squares, Taylor, backpropagation neural network based Taylor, and Chan positioning methods.

  • Workload-Aware Caching Policy for Information-Centric Networking

    Qian HU  Muqing WU  Song GUO  Hailong HAN  Chaoyi ZHANG  

     
    PAPER-Network

      Vol:
    E97-B No:10
      Page(s):
    2157-2166

    Information-centric networking (ICN) is a promising architecture and has attracted much attention in the area of future Internet architectures. As one of the key technologies in ICN, in-network caching can enhance content retrieval at a global scale without requiring any special infrastructure. In this paper, we propose a workload-aware caching policy, LRU-GT, which allows cache nodes to protect newly cached contents for a period of time (guard time) during which contents are protected from being replaced. LRU-GT can utilize the temporal locality and distinguish contents of different popularity, which are both the characteristics of the workload. Cache replacement is modeled as a semi-Markov process under the Independent Reference Model (IRM) assumption and a theoretical analysis proves that popular contents have longer sojourn time in the cache compared with unpopular ones in LRU-GT and the value of guard time can affect the cache hit ratio. We also propose a dynamic guard time adjustment algorithm to optimize the performance. Simulation results show that LRU-GT can reduce the average hops to get contents and improve cache hit ratio.

  • A Performance Fluctuation-Aware Stochastic Scheduling Mechanism for Workflow Applications in Cloud Environment

    Fang DONG  Junzhou LUO  Bo LIU  

     
    PAPER

      Vol:
    E97-D No:10
      Page(s):
    2641-2651

    Cloud computing, a novel distributed paradigm to provide powerful computing capabilities, is usually adopted by developers and researchers to execute complicated IoT applications such as complex workflows. In this scenario, it is fundamentally important to make an effective and efficient workflow application scheduling and execution by fully utilizing the advantages of the cloud (as virtualization and elastic services). However, in the current stage, there is relatively few research for workflow scheduling in cloud environment, where they usually just bring the traditional methods directly into cloud. Without considering the features of cloud, it may raise two kinds of problems: (1) The traditional methods mainly focus on static resource provision, which will cause the waste of resources; (2) They usually ignore the performance fluctuation of virtual machines on the physical machines, therefore it will lead to the estimation error of task execution time. To address these problems, a novel mechanism which can estimate the probability distribution of subtask execution time based on background VM load series over physical machines is proposed. An elastic performance fluctuations-aware stochastic scheduling algorithm is introduced in this paper. The experiments show that our proposed algorithm can outperform the existing algorithms in several metrics and can relieve the influence of performance fluctuations brought by the dynamic nature of cloud.

  • A VMM-Level Approach to Shortening Downtime of Operating Systems Reboots in Software Updates

    Hiroshi YAMADA  Kenji KONO  

     
    PAPER-Software System

      Vol:
    E97-D No:10
      Page(s):
    2663-2675

    Operating system (OS) reboots are an essential part of updating kernels and applications on laptops and desktop PCs. Long downtime during OS reboots severely disrupts users' computational activities. This long disruption discourages the users from conducting OS reboots, failing to enforce them to conduct software updates. Although the dynamic updatable techniques have been widely studied, making the system “reboot-free” is still difficult due to their several limitations. As a result, users cannot benefit from new functionality or better performance, and even worse, unfixed vulnerabilities can be exploited by attackers. This paper presents ShadowReboot, a virtual machine monitor (VMM)-based approach that shortens downtime of OS reboots in software updates. ShadowReboot conceals OS reboot activities from user's applications by spawning a VM dedicated to an OS reboot and systematically producing the rebooted state where the updated kernel and applications are ready for use. ShadowReboot provides an illusion to the users that the guest OS travels forward in time to the rebooted state. ShadowReboot offers the following advantages. It can be used to apply patches to the kernels and even system configuration updates. Next, it does not require any special patch requiring detailed knowledge about the target kernels. Lastly, it does not require any target kernel modification. We implemented a prototype in VirtualBox 4.0.10 OSE. Our experimental results show that ShadowReboot successfully updated software on unmodified commodity OS kernels and shortened the downtime of commodity OS reboots on five Linux distributions (Fedora, Ubuntu, Gentoo, Cent, and SUSE) by 91 to 98%.

  • Proposal of an Overreach Measurement Method for Digital Terrestrial TV Service Using FM Broadcasting Waves

    Masahiro NISHI  Koichi SHIN  Teruaki YOSHIDA  

     
    PAPER-Antennas and Propagation

      Vol:
    E97-B No:10
      Page(s):
    2167-2174

    In the digital terrestrial TV broadcasting system, it is important to evaluate both quantitative levels and sources of overreach interference, because it can degrade the TV service quality. This paper newly proposes an overreach measurement method that simultaneously monitors RSSI (Received Signal Strength Indicator) and CNR (Carrier to Noise power Ratio) of the TV waves and RSSI of FM waves. The results of measurements conducted in Hiroshima prefecture show that our proposed method can evaluate the level of overreach interference in the TV waves and also identify the source of the interference. Total 43 overreach interference events were found in the proposed method from one-year measurement in 2012. Based on M profile data, this paper also shows that the main factor of the overreach interference in this measurement is duct propagation due to meteorological condition.

  • Write Avoidance Cache Coherence Protocol for Non-volatile Memory as Last-Level Cache in Chip-Multiprocessor

    Ju Hee CHOI  Jong Wook KWAK  Chu Shik JHON  

     
    LETTER-Computer System

      Vol:
    E97-D No:8
      Page(s):
    2166-2169

    Non-Volatile Memories (NVMs) are considered as promising memory technologies for Last-Level Cache (LLC) due to their low leakage and high density. However, NVMs have some drawbacks such as high dynamic energy in modifying NVM cells, long latency for write operation, and limited write endurance. A number of approaches have been proposed to overcome these drawbacks. But very little attention is paid to consider the cache coherency issue. In this letter, we suggest a new cache coherence protocol to reduce the write operations of the LLC. In our protocol, the block data of the LLC is updated only if the cache block is written-back from a private cache, which leads to avoiding useless write operations in the LLC. The simulation results show that our protocol provides 27.1% energy savings and 26.3% lifetime improvements in STT-RAM at maximum.

  • Unsupervised Learning Model for Real-Time Anomaly Detection in Computer Networks

    Kriangkrai LIMTHONG  Kensuke FUKUDA  Yusheng JI  Shigeki YAMADA  

     
    PAPER-Information Network

      Vol:
    E97-D No:8
      Page(s):
    2084-2094

    Detecting a variety of anomalies caused by attacks or accidents in computer networks has been one of the real challenges for both researchers and network operators. An effective technique that could quickly and accurately detect a wide range of anomalies would be able to prevent serious consequences for system security or reliability. In this article, we characterize detection techniques on the basis of learning models and propose an unsupervised learning model for real-time anomaly detection in computer networks. We also conducted a series of experiments to examine capabilities of the proposed model by employing three well-known machine learning algorithms, namely multivariate normal distribution, k-nearest neighbor, and one-class support vector machine. The results of these experiments on real network traffic suggest that the proposed model is a promising solution and has a number of flexible capabilities to detect several types of anomalies in real time.

  • Dynamic Consolidation of Virtual Machines in Cloud Datacenters

    Han-Peng JIANG  Ming-Lung WENG  Wei-Mei CHEN  

     
    LETTER

      Vol:
    E97-D No:7
      Page(s):
    1727-1730

    Now that the subject of green computing is receiving a lot of attention, the energy consumption of datacenters has emerged as a significant issue. Consolidation of Virtual Machines (VMs) reduces the energy consumption since VM live migration not only optimizes VM placement, but also switches idle nodes to sleep mode. However, VM migration may negatively impact the performance of the system and lead to violations in SLA (Service Level Agreement) requirements between end users and cloud providers. In this study, we propose a VM consolidation mechanism that reduces the energy consumption of datacenters, eliminates unnecessary migrations, and minimizes the SLA violations. Compared to previous studies, the proposed policy shows a reduction of 2% to 3% in energy consumption, 13% to 41% in VM migration frequency, and 15% to 50% in SLA violations.

  • Mean Polynomial Kernel and Its Application to Vector Sequence Recognition

    Raissa RELATOR  Yoshihiro HIROHASHI  Eisuke ITO  Tsuyoshi KATO  

     
    PAPER-Pattern Recognition

      Vol:
    E97-D No:7
      Page(s):
    1855-1863

    Classification tasks in computer vision and brain-computer interface research have presented several applications such as biometrics and cognitive training. However, like in any other discipline, determining suitable representation of data has been challenging, and recent approaches have deviated from the familiar form of one vector for each data sample. This paper considers a kernel between vector sets, the mean polynomial kernel, motivated by recent studies where data are approximated by linear subspaces, in particular, methods that were formulated on Grassmann manifolds. This kernel takes a more general approach given that it can also support input data that can be modeled as a vector sequence, and not necessarily requiring it to be a linear subspace. We discuss how the kernel can be associated with the Projection kernel, a Grassmann kernel. Experimental results using face image sequences and physiological signal data show that the mean polynomial kernel surpasses existing subspace-based methods on Grassmann manifolds in terms of predictive performance and efficiency.

  • CRRP: Cost-Based Replacement with Random Placement for En-Route Caching

    Sen WANG  Jun BI  Jianping WU  

     
    LETTER-Information Network

      Vol:
    E97-D No:7
      Page(s):
    1914-1917

    Caching is considered widely as an efficient way to reduce access latency and network bandwidth consumption. En-route caching, where caches are associated with routing nodes in the network, is proposed in the context of Web cache to exploit fully the potential of caching. To make sensible replacement and placement decision for en-route caching, traditional caching schemes either engage computation-intensive algorithm like dynamic programming or suffer from inferior performance in terms of average access latency. In this article, we propose a new caching scheme with cost-based replacement and random placement, which is named CRRP. The cost-based replacement of CRRP introduces probing request to timely perceive cost change and the random placement is independent of current caching state, of O(1) computational complexity of placement decision. Through extensive simulations, we show that CRRP outperforms a wide range of caching schemes and is very close to the traditional dynamic-programming-based algorithm, in terms of average access delay.

381-400hit(1072hit)