The search functionality is under construction.

Keyword Search Result

[Keyword] cache(201hit)

101-120hit(201hit)

  • Adopting the Drowsy Technique for Instruction Caches: A Soft Error Perspective

    Soong Hyun SHIN  Sung Woo CHUNG  Eui-Young CHUNG  Chu Shik JHON  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E91-A No:7
      Page(s):
    1772-1779

    As technology scales down, leakage energy accounts for a greater proportion of total energy. Applying the drowsy technique to a cache, is regarded as one of the most efficient techniques for reducing leakage energy. However, it increases the Soft Error Rate (SER), thus, many researchers doubt the reliability of the drowsy technique. In this paper, we show several reasons why the instruction cache can adopt the drowsy technique without reliability problems. First, an instruction cache always stores read-only data, leading to soft error recovery by re-fetching the instructions from lower level memory. Second, the effect of the re-fetching caused by soft errors on performance is negligible. Additionally, a considerable percentage of soft errors can occur without harming the performance. Lastly, unrecoverable soft errors can be controlled by the scrubbing method. The simulation results show that the drowsy instruction cache rarely increases the rate of unrecoverable errors and negligibly degrades the performance.

  • Reliable Cache Architectures and Task Scheduling for Multiprocessor Systems

    Makoto SUGIHARA  Tohru ISHIHARA  Kazuaki MURAKAMI  

     
    PAPER

      Vol:
    E91-C No:4
      Page(s):
    410-417

    This paper proposes a task scheduling approach for reliable cache architectures (RCAs) of multiprocessor systems. The RCAs dynamically switch their operation modes for reducing the usage of vulnerable SRAMs under real-time constraints. A mixed integer programming model has been built for minimizing vulnerability under real-time constraints. Experimental results have shown that our task scheduling approach achieved 47.7-99.9% less vulnerability than a conventional one.

  • A New Caching Technique to Support Conjunctive Queries in P2P DHT

    Koji KOBATAKE  Shigeaki TAGASHIRA  Satoshi FUJITA  

     
    PAPER-Computer Systems

      Vol:
    E91-D No:4
      Page(s):
    1023-1031

    P2P DHT (Peer-to-Peer Distributed Hash Table) is one of typical techniques for realizing an efficient management of shared resources distributed over a network and a keyword search over such networks in a fully distributed manner. In this paper, we propose a new method for supporting conjunctive queries in P2P DHT. The basic idea of the proposed technique is to share a global information on past trials by conducting a local caching of search results for conjunctive queries and by registering the fact to the global DHT. Such a result caching is expected to significantly reduce the amount of transmitted data compared with conventional schemes. The effect of the proposed method is experimentally evaluated by simulation. The result of experiments indicates that by using the proposed method, the amount of returned data is reduced by 60% compared with conventional P2P DHT which does not support conjunctive queries.

  • Temperature-Aware Configurable Cache to Reduce Energy in Embedded Systems

    Hamid NOORI  Maziar GOUDARZI  Koji INOUE  Kazuaki MURAKAMI  

     
    PAPER

      Vol:
    E91-C No:4
      Page(s):
    418-431

    Energy consumption is a major concern in embedded computing systems. Several studies have shown that cache memories account for 40% or more of the total energy consumed in these systems. Active power used to be the primary contributor to total power dissipation of CMOS designs, but with the technology scaling, the share of leakage in total power consumption of digital systems continues to grow. Moreover, temperature is another factor that exponentially increases the leakage current. In this paper, we show the effect of temperature on the optimal (minimum-energy-consuming) cache configuration for low energy embedded systems. Our results show that for a given application and technology, the optimal cache size moves toward smaller caches at higher temperatures, due to the larger leakage. Consequently, a Temperature-Aware Configurable Cache (TACC) is an effective way to save energy in finer technologies when the embedded system is used in different temperatures. Our results show that using a TACC, up to 61% energy can be saved for instruction cache and 77% for data cache compared to a configurable cache that has been configured for only the corner-case temperature (100). Furthermore, the TACC also enhances the performance by up to 28% for the instruction cache and up to 17% for the data cache.

  • The Optimization of In-Memory Space Partitioning Trees for Cache Utilization

    Myung Ho YEO  Young Soo MIN  Kyoung Soo BOK  Jae Soo YOO  

     
    PAPER-Database

      Vol:
    E91-D No:2
      Page(s):
    243-250

    In this paper, a novel cache conscious indexing technique based on space partitioning trees is proposed. Many researchers investigated efficient cache conscious indexing techniques which improve retrieval performance of in-memory database management system recently. However, most studies considered data partitioning and targeted fast information retrieval. Existing data partitioning-based index structures significantly degrade performance due to the redundant accesses of overlapped spaces. Specially, R-tree-based index structures suffer from the propagation of MBR (Minimum Bounding Rectangle) information by updating data frequently. In this paper, we propose an in-memory space partitioning index structure for optimal cache utilization. The proposed index structure is compared with the existing index structures in terms of update performance, insertion performance and cache-utilization rate in a variety of environments. The results demonstrate that the proposed index structure offers better performance than existing index structures.

  • 4-Port Unified Data/Instruction Cache Design with Distributed Crossbar and Interleaved Cache-Line Words

    Koh JOHGUCHI  Hans Jurgen MATTAUSCH  Tetsushi KOIDE  Tetsuo HIRONAKA  

     
    LETTER-Integrated Electronics

      Vol:
    E90-C No:11
      Page(s):
    2157-2160

    The presented unified data/instruction cache design uses multiple banks and features 4 ports, distributed crossbar, different word-length for data and instruction ports, interleaved cache-line words and synchronous access with hidden precharge. A 20.5 KByte storage capacity is integrated in 5-metal-layer CMOS logic technology with 200 nm minimum gate length and a 3.4 ns access-cycle time is achieved. The access bandwidth corresponds to 10 ports with standard word-length, while the cost in increased Si-area is only 25% in comparison to a 1-port cache.

  • A Next-Generation Enterprise Server System with Advanced Cache Coherence Chips

    Mariko SAKAMOTO  Akira KATSUNO  Go SUGIZAKI  Toshio YOSHIDA  Aiichiro INOUE  Koji INOUE  Kazuaki MURAKAMI  

     
    PAPER-VLSI Architecture for Communication/Server Systems

      Vol:
    E90-C No:10
      Page(s):
    1972-1982

    Broadcast and synchronization techniques are used for cache coherence control in conventional larger scale snoop-based SMP systems. The penalty for synchronization is directly proportional to system size. Meanwhile, advances in LSI technology now enable placing a memory controller on a CPU die. The latency to access directly linked memory is drastically reduced by an on-die controller. Developing an enterprise server system with these CPUs allows us an opportunity to achieve higher performance. Though the penalty of synchronization is counted whenever a cache miss occurs, it is necessary to improve the coherence method to receive the full benefit of this effect. In this paper, we demonstrate a coherence directory organization that fits into DSM enterprise server systems. Originally, a directory-based method was adopted in high performance computing systems because of its huge scalability in comparison with snoop-based method. Though directory capacity miss and long directory access latency are the major problems of this method, the relaxed scalability requirement of enterprise servers is advantageous to us to solve these problems along with an advanced LSI technology. Our proposed directory solves both problems by implementing a full bit vector level map of the coherence directory on an LSI chip. Our experimental results validate that a system controlled by our proposed directory can surpass a snoop-based system in performance even without applying data localization optimization to an online transaction processing (OLTP) workload.

  • Selective Update Approach to Maintain Strong Web Consistency in Dynamic Content Delivery

    Zhou SU  Masato OGURO  Jiro KATTO  Yasuhiko YASUDA  

     
    PAPER

      Vol:
    E90-B No:10
      Page(s):
    2729-2737

    Content delivery network improves end-user performance by replicating Web contents on a group of geographically distributed sites interconnected over the Internet. However, with the development whereby content distribution systems can manage dynamically changing files, an important issue to be resolved is consistency management, which means the cached replicas on different sites must be updated if the originals change. In this paper, based on the analytical formulation of object freshness, web access distribution and network topology, we derive a novel algorithm as follows: (1) For a given content which has been changed on its original server, only a limited number of its replicas instead of all replicas are updated. (2) After a replica has been selected for update, the latest version will be sent from an algorithm-decided site instead of from its original server. Simulation results verify that the proposed algorithm provides better consistency management than conventional methods with the reduced the old hit ratio and network traffic.

  • An Efficient Cache Invalidation Method in Mobile Client/Server Environment

    Hakjoo LEE  Jonghyun SUH  Sungwon JUNG  

     
    PAPER-Database

      Vol:
    E90-D No:10
      Page(s):
    1672-1677

    In mobile computing environments, cache invalidation techiniques are widely used. However, theses techniques require a large-sized invalidation report and show low cache utilization under high server update rate. In this paper, we propose a new cache-level cache invalidation technique called TTCI (Timestamp Tree-based Cache Invalidation technique) to overcome the above two problems. TTCI also supports selective tuning for a cache-level cache invalidation. We show in our experiment that our technique requires much smaller size of cache invalidation report and improves cache utilization.

  • Cooperative Cache System: A Low Power Cache System for Embedded Processors

    Gi-Ho PARK  Kil-Whan LEE  Tack-Don HAN  Shin-Dug KIM  

     
    PAPER-Digital

      Vol:
    E90-C No:4
      Page(s):
    708-717

    This paper presents a dual data cache system structure, called a cooperative cache system, that is designed as a low power cache structure for embedded processors. The cooperative cache system consists of two caches, i.e., a direct-mapped temporal oriented cache (TOC) and a four-way set-associative spatial oriented cache (SOC). The cooperative cache system achieves improvement in performance and reduction in power consumption by virtue of the structural characteristics of the two caches designed inherently to help each other. An evaluation chip of an embedded processor having the cooperative cache system is manufactured by Samsung Electronics Co. with 0.25 µm 4-metal process technology.

  • How Scalable is Cache-and-Relay Scheme in P2P on-Demand Streaming?

    Yun TANG  Lifeng SUN  Jianguang LUO  Shiqiang YANG  Yuzhuo ZHONG  

     
    LETTER-Network

      Vol:
    E90-B No:4
      Page(s):
    987-989

    In recent years, the inherent effectiveness of Peer-to-Peer (P2P) networks has been advocated to address scalability issues in large scale Internet-based on-Demand streaming services. Most of existing works adopt Cache-and-Relay (CR) scheme to exploit a cooperative paradigm among peers. In this paper, we mainly present our practical evaluation study of the scalability of the CR scheme by taking into account of more than 20,000,000 collected real traces. Based on trace-driven simulations, we conclude that the CR scheme is not as effective as previously reported in terms of saving server bandwidth.

  • Dynamic Reconfiguration of Cache Indexing in Embedded Processors

    Junhee KIM  Sung-Soo LIM  Jihong KIM  

     
    PAPER-VLSI Systems

      Vol:
    E90-D No:3
      Page(s):
    637-647

    Cache performance optimization is an important design consideration in building high-performance embedded processors. Unlike general-purpose microprocessors, embedded processors can take advantages of application-specific information in optimizing the cache performance. One of such examples is to use modified cache index bits (over conventional index bits) based on memory access traces from key target embedded applications so that the number of conflict misses can be reduced. In this paper, we present a novel fine-grained cache reconfiguration technique which allows an intra-program reconfiguration of cache index bits, thus better reflecting the changing characteristics of a program execution. The proposed technique, called dynamic reconfiguration of index bits (DRIB), dynamically changes cache index bits in the function level. This compiler-directed and fine-grained approach allows each function to be executed using its own optimal index bits with no additional hardware support. In order to avoid potential performance degradation by frequent cache invalidations from reconfiguring cache index bits, we describe an efficient algorithm for selecting target functions whose cache index bits are reconfigured. Our algorithm ensures that the number of cache misses reduced by DRIB outnumbers the number of cache misses increased from cache invalidations. We also propose a new cache architecture, Two-Level Indexing (TLI) cache, which further reduces the number of conflict misses by intelligently dividing indexing steps into two stages. Our experimental results show that the DRIP approach combined with the TLI cache reduces the number of cache misses by 35% over the conventional cache indexing technique.

  • Cache Efficient Radix Sort for String Sorting

    Waihong NG  Katsuhiko KAKEHI  

     
    PAPER-Algorithms and Data Structures

      Vol:
    E90-A No:2
      Page(s):
    457-466

    In this paper, we propose CRadix sort, a new string sorting algorithm based on MSD radix sort. CRadix sort causes fewer cache misses than MSD radix sort by uniquely associating a small block of main memory called the key buffer to each key and temporarily storing a portion of each key into its corresponding key buffer. Experimental results in running time comparisons with other string sorting algorithms are provided for showing the effectiveness of CRadix sort.

  • Reliable Parallel File System with Parity Cache Table Support

    Sheng-Kai HUNG  Yarsun HSU  

     
    PAPER-Parallel Processing System

      Vol:
    E90-D No:1
      Page(s):
    22-29

    Providing data availability in a high performance computing environment is very important, especially in this data-intensive world. Most clusters either equip with RAID (Redundant Array of Independent Disks) devices or use redundant nodes to protect data from loss. However, neither of these can really solve the reliability problem incurred in a striped file system. Striping provides an efficient way to increase I/O throughput both in the distributed and parallel paradigms. But it also reduces the overall reliability of a disk system by N fold, where N is the number of independent disks in the system. Parallel Virtual File System (PVFS) is an open source parallel file system which has been widely used in the Linux environment. Its striping structure is good for performance but provides no fault tolerance. We implement Reliable Parallel File System (RPFS) based on PVFS but with reliability support. Our quantitative analysis shows that MTTF (Mean Time To Failure) of our RPFS is better than that of PVFS. Besides, we propose a parity cache table (PCT) to alleviate the penalty of parity updating. The evaluation of our RPFS shows that its read performance is almost the same as that of PVFS (2% to 13% degradation). As to the write performance, 28% to 45% improvement can be achieved depending on the behavior of the operations.

  • Return Address Protection on Cache Memories

    Koji INOUE  

     
    PAPER-Integrated Electronics

      Vol:
    E89-C No:12
      Page(s):
    1937-1947

    The present paper proposes a novel cache architecture, called SCache, to detect buffer overflow attacks at run time. In addition, we evaluate the energy-security efficiency of the proposed architecture. On a return-address store, SCache generates one or more copies of the return address value and saves them as read only in the cache area. The number of copies generated strongly affects both energy consumption and vulnerability. When the return address is loaded (or popped), the cache compares the value loaded from the memory stack with the corresponding copy existing in the cache. If they are not the same, then return-address corruption has occurred. In the present study, the proposed approach is shown to protect more than 99.5% of return-address loads from the threat of buffer overflow attacks, while increasing the total cache-energy consumption by, at worst, approximately 23%, compared to a well-known low-power cache. Furthermore, we explore the tradeoff between energy consumption and security, and our experimental results show that an energy-aware SCache model provides relatively higher security with only a 10% increase in energy consumption.

  • Cache-Based Network Processor Architecture: Evaluation with Real Network Traffic

    Michitaka OKUNO  Shinji NISHIMURA  Shin-ichi ISHIDA  Hiroaki NISHI  

     
    PAPER

      Vol:
    E89-C No:11
      Page(s):
    1620-1628

    A novel cache-based network processor (NP) architecture that can catch up with next generation 100-Gbps packet-processing throughput by exploiting a nature of network traffic is proposed, and the prototype is evaluated with real network traffic traces. This architecture consists of several small processing units (PUs) and a bit-stream manipulation hardware called a burst-stream path (BSP) that has a special cache mechanism called a process-learning cache (PLC) and a cache-miss handler (CMH). The PLC memorizes a packet-processing method with all table-lookup results, and applies it to subsequent packets that have the same information in their header. To avoid packet-processing blocking, the CMH handles cache-miss packets while registration processing is performed at the PLC. The combination of the PLC and CMH enables most packets to skip the execution at the PUs, which dissipate huge power in conventional NPs. We evaluated an FPGA-based prototype with real core network traffic traces of a WIDE backbone router. From the experimental results, we observed a special case where the packet of minimum size appeared in large quantities, and the cache-based NP was able to achieve 100% throughput with only the 10%-throughput PUs due to the existence of very high temporal locality of network traffic. From the whole results, the cache-based NP would be able to achieve 100-Gbps throughput by using 10- to 40-Gbps throughput PUs. The power consumption of the cache-based NP, which consists of 40-Gbps throughput PUs, is estimated to be only 44.7% that of a conventional NP.

  • An Energy-Efficient Partitioned Instruction Cache Architecture for Embedded Processors

    CheolHong KIM  SungWoo CHUNG  ChuShik JHON  

     
    PAPER-Computer Systems

      Vol:
    E89-D No:4
      Page(s):
    1450-1458

    Energy efficiency of cache memories is crucial in designing embedded processors. Reducing energy consumption in the instruction cache is especially important, since the instruction cache consumes a significant portion of total processor energy. This paper proposes a new instruction cache architecture, named Partitioned Instruction Cache (PI-Cache), for reducing dynamic energy consumption in the instruction cache by partitioning it to smaller (less power-consuming) sub-caches. When the proposed PI-Cache is accessed, only one sub-cache is accessed by utilizing the temporal/spatial locality of applications. In the meantime, other sub-caches are not accessed, leading to dynamic energy reduction. The PI-Cache also reduces dynamic energy consumption by eliminating the energy consumed in tag lookup and comparison. Moreover, the performance gap between the conventional instruction cache and the proposed PI-Cache becomes little when the physical cache access time is considered. We evaluated the energy efficiency by running a cycle accurate simulator, SimpleScalar, with power parameters obtained from CACTI. Simulation results show that the PI-Cache improves the energy-delay product by 20%-54% compared to the conventional direct-mapped instruction cache.

  • Adaptive Mode Control for Low-Power Caches Based on Way-Prediction Accuracy

    Hidekazu TANAKA  Koji INOUE  

     
    PAPER-Low Power Methodology

      Vol:
    E88-A No:12
      Page(s):
    3274-3281

    This paper proposes a novel cache architecture for low power consumption, called "Adaptive Way-Predicting Cache (AWP cache)." The AWP cache has multi-operation modes and dynamically adapts the operation mode based on the accuracy of way-prediction results. A confidence counter for way prediction is implemented to each cache set. In order to analyze the effectiveness of the AWP cache, we perform a SRAM design using 0.18 µm CMOS technology and cycle-accurate processor simulations. As the results, for a benchmark program (179.art), it is observed that a performance-aware AWP cache reduces the 49% of performance overhead caused by an original way-predicting cache to 17%. Furthermore, a energy-aware AWP cache achieves 73% of energy reduction, whereas that obtained from the original way-predicting scheme is only 38%, compared to an non-optimized conventional cache. For the consideration of energy-performance efficiency, we see that the energy-aware AWP cache produces better results; the energy-delay product of conventional organization is reduced to only 35% in average which is 6% better than the original way-predicting scheme.

  • CIGMA: Active Inventory Service in Global E-Market Based on Efficient Catalog Management

    Su Myeon KIM  Seungwoo KANG  Heung-Kyu LEE  Junehwa SONG  

     
    PAPER

      Vol:
    E88-D No:12
      Page(s):
    2651-2663

    A fully Internet-connected business environment is subject to frequent changes. To ordinary customers, online shopping under such a dynamic environment can be frustrating. We propose a new E-commerce service called the CIGMA to assist online customers under such an environment. The CIGMA provides catalog comparison and purchase mediation services over multiple shopping sites for ordinary online customers. The service is based on up-to-date information by reflecting the frequent changes in catalog information instantaneously. It also matches the desire of online customers for fast response. This paper describes the CIGMA along with its architecture and the implementation of a working prototype.

  • Minimizing the Directory Size for Large-Scale Shared-Memory Multiprocessors

    Jinseok KONG  Pen-Chung YEW  Gyungho LEE  

     
    PAPER-Computer Systems

      Vol:
    E88-D No:11
      Page(s):
    2533-2543

    Directory-based cache coherence schemes are commonly used in large-scale shared-memory multiprocessors, but most of them rely on heuristics to avoid large hardware requirements. We proposed using physical address mapping on directories to significantly reduce directory size needed. This approach allows the size of directory to grow as O(cn log2 n) as in optimal pointer-based directory schemes [11], where n is the number of nodes in the system and c is the number of cache lines in each cache memory. Performance aspects of the proposed scheme are studied in detail using simulation.

101-120hit(201hit)