The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] CAC(297hit)

181-200hit(297hit)

  • Utilization of the On-Chip L2 Cache Area in CC-NUMA Multiprocessors for Applications with a Small Working Set

    Sung Woo CHUNG  Hyong-Shik KIM  Chu Shik JHON  

     
    PAPER-Networking and System Architectures

      Vol:
    E87-D No:7
      Page(s):
    1617-1624

    In CC-NUMA multiprocessor systems, it is important to reduce the remote memory access time. Based upon the fact that increasing the size of the LRU second-level (L2) cache more than a certain value does not reduce the cache miss rate significantly, in this paper, we propose two split L2 caches to utilize the surplus of the L2 cache. The split L2 caches are composed of a traditional LRU cache and another cache to reduce the remote memory access time. Both work together to reduce total L2 cache miss time by keeping remote (or long-distance) blocks as well as recently used blocks. For another cache, we propose two alternatives: an L2-RVC (Level 2 - Remote Victim Cache) and an L2-DAVC (Level 2 - Distance-Aware Victim Cache). The proposed split L2 caches reduce total execution time by up to 27%. It is also found that the proposed split L2 caches outperform the traditional single LRU cache of double size.

  • Enhancing ICP with P2P Technology: Cost, Availability, and Reconfiguration

    Ping-Jer YEH  Yu-Chen CHUANG  Shyan-Ming YUAN  

     
    PAPER-Networking and System Architectures

      Vol:
    E87-D No:7
      Page(s):
    1641-1648

    Traditional Web cache servers based on HTTP and ICP infrastructure tend to have higher hardware and management cost, have difficulty in availability, automatic and dynamic reconfiguration, and may have slow links to some users. We find that peer-to-peer technology can help solve these problems. The peer cache service (PCS) we proposed here leverages each peer's local cache, similar access patterns, fully distributed coordination, and fast communication channels to enhance response time, scale of cacheable objects, and availability. Moreover, incorporating goals and strategies such as making the protocol lightweight and mutually compatible with existing cache infrastructure, supporting mobile devices, undertaking dynamic three-level caching, and exchanging cache meta-information further improve the effectiveness and differentiate our work from other similar-at-first-glance P2P Web cache systems.

  • A Proposal of Effective Cooperative Caching System Based on Random Access Assumption

    Mitsuru ISHII  Shimmi HATTORI  

     
    LETTER-Network

      Vol:
    E87-B No:6
      Page(s):
    1741-1745

    In this letter, we propose an effective cooperative caching system under the assumption that each web object is accessed randomly. Under this assumption, the access frequency per unit time is given by Poisson distribution and the probability distribution of the web object in the future is derived. Based on this probability distribution, one can obtain the criterion to allocate the web objects with more access expected to the cache servers closer to clients. It is also shown that there is a tradeoff between the precision to allocate objects and the efficiency of caching.

  • ILP-Based Program Path Analysis for Bounding Worst-Case Inter-Task Cache Conflicts

    Hiroyuki TOMIYAMA  Nikil DUTT  

     
    LETTER-System Programs

      Vol:
    E87-D No:6
      Page(s):
    1582-1587

    The unpredictable behavior of cache memory makes it difficult to statically analyze the worst-case performance of real-time systems. This problem is further exacerbated in the case of preemptive multitask systems because of inter-task cache interference, called Cache-Related Preemption Delay (CRPD). This paper proposes an approach to analyzing the tight upper bound on CRPD which a task might impose on lower-priority tasks. Our method finds the program execution path which requires the maximum number of cache blocks using an integer linear programming technique. Experimental results show that our approach provides up to 69% tighter bounds on CRPD than a conservative approach.

  • One-Pass Semi-Dynamic Network Decoding Using a Subnetwork Caching Model for Large Vocabulary Continuous Speech Recongnition

    Dong-Hoon AHN  Minhwa CHUNG  

     
    PAPER

      Vol:
    E87-D No:5
      Page(s):
    1164-1174

    This paper presents a new decoding framework for large vocabulary continuous speech recognition that can handle a static search network dynamically. Generally, a static network decoder can use a search space that is globally optimized in advance, and therefore it can run at high speed during decoding. However, its large memory requirement due to the large network size or the spatial complexity of the optimization algorithm often makes it impractical. Our new one-pass semi-dynamic network decoding scheme aims at incorporating such an optimized search network with memory efficiency, but without losing speed. In this framework, a complete search network is organized on the basis of self-structuring subnetworks and is nearly minimized using a modified tail-sharing algorithm. While the decoder runs, it caches subnetworks needed for decoding in memory, whereas static network decoders keep the complete network in memory. The subnetwork caching model is controlled by two levels of caches: local cache obtained by subnetwork caching operations and global cache obtained by subnetwork preloading operations. The model can also be controlled adaptively by using subnetwork profiling operations. Furthermore, it is made simple and fast with compactly designed self-structuring subnetworks. Experimental results on a 25 k-word Korean broadcast news transcription task show that the semi-dynamic decoder can run almost as fast as an equivalent static network decoder under various memory configurations by using the subnetwork caching model.

  • Selective-Sets Resizable Cache Memory Design for High-Performance and Low-Power CPU Core

    Takashi KURAFUJI  Yasunobu NAKASE  Hidehiro TAKATA  Yukinaga IMAMURA  Rei AKIYAMA  Tadao YAMANAKA  Atsushi IWABU  Shutarou YASUDA  Toshitsugu MIWA  Yasuhiro NUNOMURA  Niichi ITOH  Tetsuya KAGEMOTO  Nobuharu YOSHIOKA  Takeshi SHIBAGAKI  Hiroyuki KONDO  Masayuki KOYAMA  Takahiko ARAKAWA  Shuhei IWADE  

     
    PAPER

      Vol:
    E87-C No:4
      Page(s):
    535-542

    We apply a selective-sets resizable cache and a complete hierarchy SRAM for the high-performance and low-power RISC CPU core. The selective-sets resizable cache can change the cache memory size by varying the number of cache sets. It reduces the leakage current by 23% with slight degradation of the worst case operating speed from 213 MHz to 210 MHz. The complete hierarchy SRAM enables the partial swing operation not only in the bit lines, but also in the global signal lines. It reduces the current consumption of the memory by 4.6%, and attains the high-speed access of 1.4 ns in the typical case.

  • A 100 MHz 7.84 mm2 31.7 msec 439 mW 512-Point 2-Dimensional FFT Single-Chip Processor

    Naoto MIYAMOTO  Leo KARNAN  Kazuyuki MARUO  Koji KOTANI  Tadahiro OHMI  

     
    PAPER

      Vol:
    E87-C No:4
      Page(s):
    502-509

    A single-chip 512-point FFT processor is presented. This processor is based on the cached-memory architecture (CMA) with the resource-saving multi-datapath radix-23 computation element. The 2-stage CMA, including a pair of single-port SRAMs, is also introduced to speedup the execution time of the 2-dimensional FFTs. Using the above techniques, we have designed an FFT processor core which integrates 552,000 transistors within an area of 2.82.8 mm2 with CMOS 0.35 µm triple-layer-metal process. This processor can execute a 512-point, 36-bit-complex fixed-point data format, 1-dimensonal FFT in 23.2 µsec and a 2-dimensional one in only 23.8 msec at 133 MHz operation. The power consumption of this processor is 439.6 mW at 3.3 V, 100 MHz operation.

  • A Novel Static Prediction Scheme for Filter Cache Structures

    Kugan VIVEKANANDARAJAH  Thambipillai SRIKANTHAN  Christopher T. CLARKE  Saurav BHATTACHARYYA  

     
    PAPER

      Vol:
    E87-C No:4
      Page(s):
    543-548

    Energy dissipation in cache memories is becoming a major design issue for embedded microprocessors. Predictive filter cache based instruction cache hierarchy has been shown to effectively reduce the energy-delay product. In this paper, a simplified pattern prediction algorithm is proposed for the filter cache hierarchy. The prediction scheme relies on the static nature of the hit or miss pattern of the instruction access streams. The static patterns are maintained in a small 32x1-bit wide Static Pattern Table (SPT). Our investigations show that the proposed prediction algorithm is superior to that based on Next Fetch Prediction Table (NFPT) for all the benchmarks simulated. With the proposed approach, energy delay product reduction of up to 6.79% was evident when compared with that using NFPT. Moreover, since the prediction scheme is based on the static assignment of patterns, it lends well for area and power efficient implementation than that employs dynamic pattern prediction although it is marginally inferior (i.e. 0.69%) in term of energy delay product.

  • Design and Evaluation of a High Speed Routing Lookup Architecture

    Jun ZHANG  JeoungChill SHIM  Hiroyuki KURINO  Mitsumasa KOYANAGI  

     
    PAPER-Implementation and Operation

      Vol:
    E87-B No:3
      Page(s):
    406-412

    The IP routing lookup problem is equivalent to finding the longest prefix of a packet's destination address in a routing table. It is a challenging problem to design a high performance IP routing lookup architecture, because of increasing traffic, higher link speed, frequent updates and increasing routing table size. At first, increasing traffic and higher link speed require that the IP routing can be executed at wire speed. Secondly, frequent routing table updates require that the insertion and deletion operations should be simple and low delay. At last, increasing routing table size hopes that less memory is used in order to reduce cost. Although many schemes to achieve fast lookup exist, less attention is paid on the latter two factors. This paper proposed a novel pipelined IP routing lookup architecture using selective binary search on hash table organized by prefix lengths. The evaluation results show that it can perform IP lookup operations at a maximum rate of one lookup per cycle. The hash operation ratio for one lookup can be reduced to about 1%, less than two hash operations are needed for one table update and only 512 kbytes SRAM is needed for a routing table with about 43000 prefixes. It proves to have higher performance than the existing schemes.

  • A Cache Replacement Policy for Transcoding Proxy Servers

    Kai-Hau YEUNG  Chun-Cheong WONG  Kin-Yeung WONG  Suk-Yu HUI  

     
    LETTER-Multimedia Systems

      Vol:
    E87-B No:1
      Page(s):
    209-211

    A cache replacement policy which takes the transcoding time into account in making replacement decisions, for the emerging transcoding proxy servers is proposed. Simulation results show the proposed policy outperforms the conventional LRU in both the cache hit rate and the average object transcoding time.

  • An Efficient Resource Reservation Method over RSVP Using Moving History of a Mobile Host

    SeongGon CHOI  JunKyun CHOI  

     
    LETTER-Internet

      Vol:
    E86-B No:12
      Page(s):
    3655-3657

    The aim of this paper is to raise utilization of resource and handover success rate at handover time. The proposed method takes advantage of the moving history of a mobile host at connection admission control time. We demonstrate the benefit of the proposed method over previously proposed reservation schemes by simulation.

  • A Self-Adjusting Destage Algorithm with High-Low Water Mark in Cached RAID5

    Young Jin NAM  Chanik PARK  

     
    PAPER-Dependable Systems

      Vol:
    E86-D No:12
      Page(s):
    2527-2535

    The High-Low Water Mark destage (HLWM) algorithm is widely used to enable cached RAID5 to flush dirty data from its write cache to disks due to the simplicity of its operations. It starts and stops a destaging process based on the two thresholds that are configured at the initialization time with the best knowledge of its underlying storage performance capability and its workload pattern which includes traffic intensity, access patterns, etc. However, each time the current workload varies from the original, the thresholds need to be re-configured with the changed workload. This paper proposes an efficient destage algorithm which automatically re-configures its initial thresholds according to the changed traffic intensity and access patterns, called adaptive thresholding. The core of adaptive thresholding is to define the two thresholds as the multiplication of the referenced increasing and decreasing rates of the write cache occupancy level and the time required to fill and empty the write cache. We implement the proposed algorithm upon an actual RAID system and then verify the ability of the auto-reconfiguration with synthetic workloads having a different level of traffic intensity and access patterns. Performance evaluations under well-known traced workloads reveal that the proposed algorithm reduces disk IO traffic by about 12% with a 6% increase in the overwrite ratio compared with the HLWM algorithm.

  • Randomized Caches for Power-Efficiency

    Hans VANDIERENDONCK  Koen De BOSSCHERE  

     
    PAPER-Integrated Electronics

      Vol:
    E86-C No:10
      Page(s):
    2137-2144

    Embedded processors are used in numerous devices executing dedicated applications. This setting makes it worthwhile to optimize the processor to the application it executes, in order to increase its power-efficiency. This paper proposes to enhance direct mapped data caches with automatically tuned randomized set index functions to achieve that goal. We show how randomization functions can be automatically generated and compare them to traditional set-associative caches in terms of performance and energy consumption. A 16 kB randomized direct mapped cache consumes 22% less energy than a 2-way set-associative cache, while it is less than 3% slower. When the randomization function is made configurable (i.e., it can be adapted to the program), the additional reduction of conflicts outweighs the added complexity of the hardware, provided there is a sufficient amount of conflict misses.

  • Total Cost-Aware Proxy Caching with Cooperative Removal Policy

    Tian-Cheng HU  Yasushi IKEDA  Minoru NAKAZAWA  Shimmi HATTORI  

     
    PAPER-Network

      Vol:
    E86-B No:10
      Page(s):
    3050-3062

    Proxy caches have been used for a very long time to enhance the performance of web access. Along with the recent development of CDN (Content Distribution Network), the web proxy caching has also been adopted in many main techniques. This paper presents a new viewpoint on the possible improvement to the cooperative proxy caching, which can reduce outbound traffic and therefore ideally result in better response time. We take notice to the regional total cost of cache objects for optimizing content distribution. By contrast to the regular removal policies based on single proxy server, we prefer to evaluate a retrieved web object based on the metrics gathered from multiply proxy caches regionally. We particularly introduce a concept called post-removal analysis, which is used in measuring the value of the removed objects. Finally, we use the real proxy cache Squid to implement our proposal and modify the well-known cache benchmarking tool Web Polygraph to test this cooperative prototype. The test results prove that the proposed scheme can bring noticeable improvement on the performance of proxy caching.

  • Concurrency Control for Read-Only Client Transactions in Broadcast Disks

    Haengrae CHO  

     
    PAPER-Broadcast Systems

      Vol:
    E86-B No:10
      Page(s):
    3114-3122

    Broadcast disks are suited for disseminating information to a large number of clients in mobile computing environments. In broadcast disks, the server continuously and repeatedly broadcasts all data items in the database to clients without specific requests. The clients monitor the broadcast channel and read data items as they arrive on the broadcast channel. The broadcast channel then becomes a disk from which clients can read data items. In this paper, we propose a cache conscious concurrency control (C4) algorithm to preserve the consistency of read-only client transactions, when the values of broadcast data items are updated at the server. The C4 algorithm is novel in the sense that it can reduce the response time of client transactions with minimal control information to be broadcast from the server. This is achieved by the judicious caching strategy of the client and by adjusting the currency of data items read by the client.

  • Integrated Pre-Fetching and Replacing Algorithm for Graceful Image Caching

    Zhou SU  Teruyoshi WASHIZAWA  Jiro KATTO  Yasuhiko YASUDA  

     
    PAPER-Multimedia Systems

      Vol:
    E86-B No:9
      Page(s):
    2753-2763

    The efficient distribution of stored information has become a major concern in the Internet. Since the web workload characteristics show that more than 60% of network traffic is caused by image documents, how to efficiently distribute image documents from servers to end clients is an important issue. Proxy cache is an efficient solution to reduce network traffic. And it has been shown that an image caching method (Graceful Caching) based on hierarchical coding format performs better than conventional caching schemes in recent years. However, as the capacity of the cache is limited, how to efficiently allocate the cache memory to achieve a minimum expected delay time is still a problem to be resolved. This paper presents an integrated caching algorithm to deal with the above problem for image databases, web browsers, proxies and other similar applications in the Internet. By analyzing the web request distribution of the Graceful Caching, both replacing and pre-fetching algorithms are proposed. We also show that our proposal can be carried out based on information readily available in the proxy server; it flexibly adapts its parameters to the hit rates and access pattern of users' requesting documents in the Graceful Caching. Finally we verify the performance of this algorithm by simulations.

  • Efficient and Scalable Client Clustering for Web Proxy Cache

    Kyungbaek KIM  Daeyeon PARK  

     
    PAPER-Software Systems and Technologies

      Vol:
    E86-D No:9
      Page(s):
    1577-1585

    Many cooperated web cache systems and protocols have been proposed. These systems, however, require expensive resources, such as external bandwidth and CPU power or storage of a proxy, while inducing hefty administrative costs to achieve adequate client population growth. Moreover, a scalability problem in the cache server management still exists. This paper suggests peer-to-peer client-clustering. The client-cluster provides a proxy cache with backup storage which is comprised of the residual resources of the clients. We use DHT based peer-to-peer lookup protocol to manage the client-cluster. With the natural characteristics of this protocol, the client-cluster is self-organizing, fault-tolerant, well-balanced and scalable. Additionally, we propose the Backward ICP which is used to communicate between the proxy cache and the client-cluster, to reduce the overhead of the object replication and to use the resources more efficiently. We examine the performance of the client-cluster via a trace driven simulation and demonstrate effective enhancement of the proxy cache performance.

  • A File System Support for Streaming Media Caching

    Hojung CHA  Jaehak OH  

     
    LETTER-Software Systems

      Vol:
    E86-D No:7
      Page(s):
    1310-1313

    This letter presents the implementation results of an application-level cache file system, MCFS, which is specifically designed to provide efficient caching and transmission mechanisms for streaming media. The file system is built on a virtual file disk which is constructed as a single large file on a general-purpose file system. MCFS suits the access requirement of continuous media caching and provides an efficient I/O mechanism for cache servers. The experimental results show that MCFS outperforms the comparison model and provides a consistent I/O bandwidth.

  • Stream Caching Using Hierarchically Distributed Proxies with Adaptive Segments Assignment

    Zhou SU  Jiro KATTO  Takayuki NISHIKAWA  Munetsugu MURAKAMI  Yasuhiko YASUDA  

     
    PAPER-Proxy Caching

      Vol:
    E86-B No:6
      Page(s):
    1859-1869

    With the advance of high-speed network technologies, availability and popularity of streaming media contents over the Internet has grown rapidly in recent years. Because of their distinct statistical properties and user viewing patterns, traditional delivery and caching schemes for normal web objects such as HTML files or images can not be efficiently applied to streaming media such as audio and video. In this paper, we therefore propose an integrated caching scheme for streaming media with segment-based caching and hierarchically distributed proxies. Firstly, each stream is divided into segments and their caching algorithms are considered to determine how to distribute the segments into different level proxies efficiently. Secondly, by introducing two kinds of segment priorities, segment replacing algorithms are proposed to determine which stream and which segments should be replaced when the cache is full. Finally, a Web-friendly caching scheme is proposed to integrate the streaming caching with the conventional caching of normal web objects. Performance of the proposed algorithms is verified by carrying out simulations.

  • Proxy Caching Mechanisms with Quality Adjustment for Video Streaming Services

    Masahiro SASABE  Yoshiaki TANIGUCHI  Naoki WAKAMIYA  Masayuki MURATA  Hideo MIYAHARA  

     
    PAPER-Proxy Caching

      Vol:
    E86-B No:6
      Page(s):
    1849-1858

    The proxy mechanism widely used in WWW systems offers low-delay data delivery by means of "proxy server." By applying proxy mechanisms to video streaming system, we expect that high-quality and low-delay video distribution can be accomplished without introducing extra load on the system. In addition, it is effective to adapt the quality of cached video data appropriately in the proxy if user requests are diverse due to heterogeneity in terms of the available bandwidth, end-system performance, and user's preferences on the perceived video quality. In this paper, we propose proxy caching mechanisms to accomplish high-quality and low-delay video streaming services. In our proposed system, a video stream is divided into blocks for efficient use of cache buffer. A proxy cache server is assumed to be able to adjust the quality of cached or retrieved video blocks to requests through video filters. We evaluate our proposed mechanisms in terms of the required buffer size, the play-out delay and the video quality through simulation experiments. Furthermore, to verify the practicality of our mechanisms, we implement our proposed mechanisms on a real system and conducted experiments. Through evaluations from several performance aspects, it is shown that our proposed mechanisms can provide users with a low-latency and high-quality video streaming service in a heterogeneous environment.

181-200hit(297hit)