The search functionality is under construction.

Keyword Search Result

[Keyword] cache(201hit)

141-160hit(201hit)

  • Randomized Caches for Power-Efficiency

    Hans VANDIERENDONCK  Koen De BOSSCHERE  

     
    PAPER-Integrated Electronics

      Vol:
    E86-C No:10
      Page(s):
    2137-2144

    Embedded processors are used in numerous devices executing dedicated applications. This setting makes it worthwhile to optimize the processor to the application it executes, in order to increase its power-efficiency. This paper proposes to enhance direct mapped data caches with automatically tuned randomized set index functions to achieve that goal. We show how randomization functions can be automatically generated and compare them to traditional set-associative caches in terms of performance and energy consumption. A 16 kB randomized direct mapped cache consumes 22% less energy than a 2-way set-associative cache, while it is less than 3% slower. When the randomization function is made configurable (i.e., it can be adapted to the program), the additional reduction of conflicts outweighs the added complexity of the hardware, provided there is a sufficient amount of conflict misses.

  • Reducing Memory System Energy by Software-Controlled On-Chip Memory

    Masaaki KONDO  Hiroshi NAKAMURA  

     
    PAPER-Architecture and Algorithms

      Vol:
    E86-C No:4
      Page(s):
    580-588

    In recent computer systems, a large portion of energy is consumed by on-chip cache accesses and data movement between cache and off-chip main memory. Reducing these memory system energy is indispensable for future microprocessors because power and thermal issues certainly become a key factor of limiting processor performance. In this paper, we discuss and evaluate how our architecture called SCIMA contributes to energy saving. SCIMA integrates software-controllable memory (SCM) into processor chip. SCIMA can save total memory system energy by using SCM under the support of compiler. The evaluation results reveal that SCIMA can reduce 5-50% of memory system energy and still faster than conventional cache based architecture.

  • A Randomized Online Algorithm for the File Caching Problem

    Seiichiro TANI  Toshiaki MIYAZAKI  

     
    PAPER-Algorithms

      Vol:
    E86-D No:4
      Page(s):
    686-697

    Caching web files reduces user response time as well as network traffic. When implementing caches, the file caching problem must be addressed; the problem is how to determine which files should be evicted from a cache when there is insufficient space for storing a new file so that the sum of the mis-hit (fault) file costs is minimized. Greedy-Dual-Size (GDS) is the best online algorithm in terms of competitiveness, i. e. , (k)/(k-h+1)-competitive, where k and h are the storage space of, respectively, GDS and an optimal offline algorithm. GDS performs very well even in trace-driven simulations. The worst-case time taken to service a request is another important measure for online file caching algorithms since slow response times render caching meaningless from the client's view point. This paper proposes a fast randomized (k)/(k-h+1)-competitive algorithm that performs in O(2log ^* k) time per file eviction or insertion, whereas GDS takes O(log k) time, where 2log ^* k is a much slower increasing function than log k. To confirm its practicality, we conduct trace driven simulations. Experimental results show that our algorithm attains only slightly worse byte hit rates and sufficiently large reduced latency in comparison with GDS, and our algorithm is a good candidate for caches requiring high-speed processing such as second-level caches in the large networks.

  • Improving the Performance of Linux Operating System via Buffer Cache Partitioning and Prefetching

    Heung Seok JEON  Sam H. NOH  

     
    PAPER-Software Systems

      Vol:
    E86-D No:3
      Page(s):
    616-622

    Buffer caching is an integral part of the operating system. In this paper, we propose a scheme that integrates buffer cache management and prefetching via cache partitioning. The scheme, which we call SA-W2R, is simple to implement, making it a feasible solution in real systems. In its basic form, for buffer replacement, it uses the LRU policy. However, its modular design allows for any replacement policy to be incorporated into the scheme. For prefetching, it uses the LRU-One Block Lookahead (LRU-OBL) approach, eliminating any extra burden that is generally necessary in other prefetching approaches. Implementation studies based on the GNU/Linux kernel version 2.2.14 show that the SA-W2R performs better than the scheme currently used, with a maximum increases of 23% for the workloads considered.

  • Caching and Concurrency Control in a Wireless Mobile Computing Environment

    SangKeun LEE  

     
    PAPER-Databases

      Vol:
    E85-D No:8
      Page(s):
    1284-1296

    Caching of frequently accessed data has been shown to be a useful technique for reducing congestion on the narrow bandwidth of wireless channels. However, traditional client/server strategies for supporting transactional cache consistency, which require extensive communications between a client and a server, are not appropriate in a wireless mobile database. This paper proposes two, simple but effective, transactional cache consistency protocols for mobile read-only transactions by utilizing the broadcast-based solutions for the problem of invalidating caches. The novelty of our approach is that the consistency check on accessed data and the commitment protocol are implemented in a truly distributed fashion as an integral part of cache invalidation process. The applicability of proposed techniques is also examined by an analytical study.

  • Asynchronous Cache Invalidation Strategy to Support Read-Only Transaction in Mobile Environments

    SungHun NAM  IlYoung CHUNG  SungHo CHO  ChongSun HWANG  

     
    PAPER-Databases

      Vol:
    E85-D No:2
      Page(s):
    373-385

    The stateless-based cache invalidation schemes for wireless environments can be categorized into either asynchronous or synchronous cache invalidation according to the broadcasting way of invalidation report. However, if the asynchronous cache invalidation scheme attempts to support local processing of read-only transaction, a critical problem may occur; the asynchronous invalidation reports provide no guarantee of waiting time for mobile transactions requesting commit. To solve this problem, the server in our approaches broadcasts two kind of messages, asynchronous invalidation report to reduce transaction latency and periodic guide message to avoid the uncertainty of waiting time for the next invalidation report. This paper presents a simulation-based analysis on the performance of the suggesting algorithms. The simulation experiments show that the local processing algorithms of read-only transaction based on asynchronous cache invalidation scheme get better response time than the algorithm based on synchronous cache invalidation scheme.

  • Designs of Building Blocks for High-Speed, Low-Power Processors

    Tadayoshi ENOMOTO  

     
    PAPER-High-Performance Technologies

      Vol:
    E85-C No:2
      Page(s):
    331-338

    A fast, low-power 16-bit adder, 32-word register file and 512-bit cache SRAM have been developed using 0.25-µm GaAs HEMT technology for future multi-GHz processors. The 16-bit adder, which uses a negative logic binary look-ahead carry structure based on NOR gates, operates at the maximum clock frequency of 1.67 GHz and consumes 134.4 mW at a supply voltage of 0.6 V. The active area is 1.6 mm2 and there are about 1,230 FETs. A new DC/DC level converter has been developed for use in high-speed, low-power storage circuits such as SRAMs and register files. The level converter can increase the DC voltage, which is supplied to an active-load circuit on request, or supply a minimal DC voltage to a load circuit in the stand-by mode. The power dissipation (P) of the 32-word register file with on-chip DC/DC level converters is 459 mW, a reduction to 25.2% of that of an equivalent conventional register file, while the operating frequency (fc) was 5.17 GHz that is 74.8% of fc for the conventional register file. P for the 512-bit cache SRAM with the new DC/DC level converters is 34.3 mW, 89.7% of the value for an equivalent conventional cache SRAM, with the read-access time of 455 psec, only 1.1% longer than that of the conventional cache SRAM.

  • Trends in High-Performance, Low-Power Cache Memory Architectures

    Koji INOUE  Vasily G. MOSHNYAGA  Kazuaki MURAKAMI  

     
    PAPER-High-Performance Technologies

      Vol:
    E85-C No:2
      Page(s):
    304-314

    One of uncompromising requirements from portable computing is energy efficiency, because that affects directly the battery life. On the other hand, portable computing will target more demanding applications, for example moving pictures, so that higher performance is still required. Cache memories have been employed as one of the most important components of computer systems. In this paper, we briefly survey architectural techniques for high performance, low power cache memories.

  • Omitting Cache Look-up for High-Performance, Low-Power Microprocessors

    Koji INOUE  Vasily G. MOSHNYAGA  Kazuaki MURAKAMI  

     
    PAPER-Low-Power Technologies

      Vol:
    E85-C No:2
      Page(s):
    279-287

    In this paper, we propose a novel architecture for low-power direct-mapped instruction caches, called "history-based tag-comparison (HBTC) cache. " The cache attempts to reuse tag-comparison results for avoiding unnecessary tag checks. Execution footprints are recorded into an extended BTB (Branch Target Buffer). In our evaluation, it is observed that the energy for tag comparison can be reduced by more than 90% in many applications.

  • Performance Evaluation of a Load Balancing Routing Algorithm for Clustered Multiple Cache Servers

    Hiroyoshi MIWA  Kazunori KUMAGAI  Shinya NOGAMI  Takeo ABE  Hisao YAMAMOTO  

     
    PAPER

      Vol:
    E85-B No:1
      Page(s):
    147-156

    The explosive growth of World Wide Web usage is causing a number of performance problems, including slow response times, network congestion, and denial of service. Web site that has a huge number of accesses and requires high quality of services, such as a site offering hosting services, or content delivery services, usually uses a cache server to reduce the load on the original server offering the original content. To increase the throughput of the caching process and to improve service availability, multiple cache servers are often positioned in front of the original server. This requires a switch to direct incoming requests to one of the multiple cache servers. In this paper, we propose a routing algorithm for such a switch in front of clustered multiple cache servers and evaluate its performance by simulation. The results show that our routing algorithm is effective when content has request locality and a short period of validity, for example, news, map data, road traffic data, or weather information. We also identify points to consider when the proposed algorithm is applied to a real system.

  • Reducing Cache Energy Dissipation by Using Dual Voltage Supply

    Vasily G. MOSHNYAGA  Hiroshi TSUJI  

     
    PAPER-Optimization of Power and Timing

      Vol:
    E84-A No:11
      Page(s):
    2762-2768

    Due to a large capacitance and enormous access rate, caches dissipate about a third of the total energy consumed by today's processors. In this paper we present a new architectural technique to reduce energy consumption in caches. Unlike previous approaches, which have focused on lowering cache capacitance and the number of accesses, our method exploits a new freedom in cache design, namely the voltage per access. Since in modern block-buffered caches, the loading capacitance operated on block-hit is much less than the capacitance operated on miss, the given clock cycle time is inefficiently utilized during the hit. We propose to trade-off this unused time with the supply voltage, lowering the voltage level on the hit and increasing it on the miss. Experiments show that the approach can half the cache energy dissipation without large performance and area overhead.

  • Effective Reference Probability Incorporating the Effect of Expiration Time in Web Cache

    Jeong-Joon LEE  Kyu-Young WHANG  Yang-Sae MOON  Eui-Kyung HONG  

     
    PAPER-Databases

      Vol:
    E84-D No:9
      Page(s):
    1184-1197

    Web caching has become an important problem when addressing the performance issues in Web applications. The expiration time of the Web data item is useful a piece of information for performance enhancement in Web caching. In this paper, we introduce the notion of the effective reference probability that incorporates the effect of expiration time for Web caching. For a formal approach, we propose the continuous independent reference model extending the existing independent reference model. Based on this model, we define formally the effective reference probability and derive it theoretically. By simply replacing the reference probability in the existing cache replacement algorithms with the effective reference probability, we can take the effect of expiration time into account. The results of performance experiments show that the replacement algorithms using the effective reference probability always outperform existing ones. In particular, when the cache fraction is 0.05 and data update is comparatively frequent (i.e., the update frequency is more than 1/10 of the reference frequency), the performance is enhanced by more than 30% in LRU-2 and 13% in Aggarwal's method. The results show that the effective reference probability significantly enhances the performance of Web caching when the expiration time is given.

  • A Design Framework for Online Algorithms Solving the Object Replacement Problem

    Seiichiro TANI  Toshiaki MIYAZAKI  

     
    PAPER-Algorithms

      Vol:
    E84-D No:9
      Page(s):
    1135-1143

    Network caches reduce network traffic as well as user response time. When implementing network caches, the object replacement problem is one of the core problems; The problem is to determine which objects should be evicted from a cache when there is insufficient space. This paper first formalizes the problem and gives a simple but sufficient condition for deterministic online algorithms to be competitive. Based on the condition, a general framework to make a non-competitive algorithm competitive is constructed. As an application of the framework, an online algorithm, called Competitive_SIZE, is proposed. Both event-driven and trace-driven simulations show that Competitive_SIZE is better than previously proposed algorithms such as LRU (Least Recently Used).

  • Cost Sensitive Space Management in Shared Web Caches

    Hyokyung BAHN  Yong Hyeon SHIN  Kern KOH  

     
    LETTER-Internet

      Vol:
    E84-B No:7
      Page(s):
    1965-1969

    A new Web cache sharing scheme is presented. Our method reduces the duplicated copies of the same objects in global shared Web caches, even though the hot working set of each local cache can be duplicated. Experimental results show that the proposed scheme outperforms existing sharing schemes.

  • Analytical Models and Performance Analyses of Instruction Fetch on Superscalar Processors

    Sun-Mo KIM  Jung-Woo LEE  Soo-Haeng LEE  Sang-Bang CHOI  

     
    PAPER

      Vol:
    E84-A No:6
      Page(s):
    1442-1453

    Cache memories are small fast memories used to temporarily hold the contents of main memory that are likely to be referenced by processors so as to reduce instruction and data access time. In study of cache performance, most of previous works have employed simulation-based methods. However, that kind of researches cannot precisely explain the obtained results. Moreover, when a new processor is designed, huge simulations must be performed again with several different parameters. This research classifies cache structures for superscalar processors into four types, and then represents analytical model of instruction fetch process for each cache type considering various kinds of architectural parameters such as the frequency of branch instructions in program, cache miss rate, cache miss penalty, branch misprediction frequency, and branch misprediction penalty, and etc. To prove the correctness of the proposed models, we performed extensive simulations and compared the results with the analytical models. Simulation results showed that the proposed model can estimate the expected instruction fetch rate accurately within 10% error in most cases. This paper shows that the increase of cache misses reduces the instruction fetch rate more severely than that of branch misprediction does. The model is also able to provide exact relationship between cache miss and branch misprediction for the instruction fetch analysis. The proposed model can explain the causes of performance degradation that cannot be uncovered by the simulation method only.

  • Exploiting Metadata of Absent Objects for Proxy Cache Consistency

    Jooyong KIM  Hyokyung BAHN  Kern KOH  

     
    PAPER-Network

      Vol:
    E84-B No:5
      Page(s):
    1406-1412

    Caching at the Web proxy server plays an important role in reducing the response time, the network traffic, and the load of Web servers. Many recent studies have proposed and examined the replacement and consistency policies for the proxy cache, which plays a central role in the performance of caching components. For better performance, they exploit various metadata of Web objects, such as the reference count, reference time, and modification time information of past behaviors, to estimate the re-reference likelihood and freshness of the objects. However, all of these known to the authors use the metadata only when the actual object is in the cache. We observed from various proxy traces that about 20-30% of clients' requests incurred only the validity checks of cached objects without transferring actual objects from the proxy server. In this case, only the metadata are necessary at the proxy server. This paper proposes a proxy cache consistency policy that uses the metadata even for absent objects. These include the time information of evicted objects from the cache and those out of the header-only replies from Web servers. Trace-driven simulations with public proxy cache traces show that our policy reduces the response time and the number of connections to Web servers significantly.

  • Area-Efficient Multi-Port SRAMs for On-Chip Data-Storage with High Random-Access Bandwidth and Large Storage Capacity

    Hans Jurgen MATTAUSCH  Koji KISHI  Takayuki GYOHTEN  

     
    PAPER-Integrated Electronics

      Vol:
    E84-C No:3
      Page(s):
    410-417

    The recent trend towards highly parallel on-chip data processing, as e.g. in single-chip processors with parallel execution capability of multiple instructions, leads to the requirement of on-chip data storage with high random-access bandwidth, parallel access capability and large capacity. The first two requirements call for the application of multi-ported memories. However, the conventional architecture, based on multi-port storage cells for each bit, cannot efficiently realize the large storage capacity, because cell area explodes due to a quadratic increase with port number (N). A promising method for obtaining area efficiency is to increase the size of the smallest unit with N-port capability, e.g. by introducing N-port capability on the level of blocks of 1-port cells and not for each cell. We report a quantitative analysis of this method for the SRAM case, which is based on design data in a 0.5 µm, 2-metal CMOS technology. Achievable area-reduction magnitudes in comparison to the conventional architecture are found to be enormous and to accelerate as a function of N. Reduction factors to areas < 1/2, < 1/5, < 1/14 and < 1/30 are estimated for 4, 8, 16 and 32 ports, respectively. Since the demerit of the proposed approach is an increased access-rejection probability, a trade-off between area reduction and allowable access-rejection probability is always necessary for practical applications. This is discussed for the application of multi-port cache memories.

  • Effective Caching for NetNews Servers

    Junichi FUNASAKA  Keizo SAISHO  Akira FUKUDA  

     
    PAPER-Databases

      Vol:
    E84-D No:3
      Page(s):
    348-354

    Since the traffic of NetNews is increasing, keeping all articles becomes serious problem from a viewpoint of waste of network bandwidth and the amount of disk usage. In addition, users read not all incoming articles. We have proposed several caching algorithms to overcome this problem and shown that a selective prefetch scheme gives the best system performance among the proposed ones. However, since the selective prefetch scheme employed a simple selecting policy, the scheme gave low hit ratio in some cases. Therefore, this paper intends to improve the selective prefetch scheme from a viewpoint of the amount of disk usage as well as hit ratio. In this paper, we divide the scheme into three factors: reference span, criterion, and threshold in criterion. Through simulation experiments using actual NetNews logs, we investigate the influence of the factors of the reference span and the threshold to system performance. As a result, it is shown that the reference span is more significant factor than the threshold, the selective prefetch scheme with a value around the seven days reference span keeps high hit ratio and reduces the amount of disk usage.

  • A High-Performance/Low-Power On-Chip Memory-Path Architecture with Variable Cache-Line Size

    Koji INOUE  Koji KAI  Kazuaki MURAKAMI  

     
    PAPER

      Vol:
    E83-C No:11
      Page(s):
    1716-1723

    This paper proposes an on-chip memory-path architecture employing the dynamically variable line-size (D-VLS) cache for high performance and low energy consumption. The D-VLS cache exploits the high on-chip memory bandwidth attainable on merged DRAM/logic LSIs by replacing a whole large cache line in one cycle. At the same time, it attempts to avoid frequent evictions by decreasing the cache-line size when programs have poor spatial locality. Activating only on-chip DRAM subarrays corresponding to a replaced cache-line size produces a significant energy reduction. In our simulation, it is observed that our proposed on-chip memory-path architecture, which employs a direct-mapped D-VLS cache, improves the ED (Energy Delay) product by more than 75% over a conventional memory-path model.

  • An Optimistic Cache Consistency Protocol Using Preemptive Approach

    SungHo CHO  Jeong-Hyon HWANG  Kyoung Yul BAE  Chong-Sun HWANG  

     
    PAPER-Databases

      Vol:
    E83-D No:9
      Page(s):
    1772-1780

    In Optimistic Two-Phase Locking (O2PL), when a transaction requests a commit, the transaction can not be committed until all requested locks are obtained. By this reason, O2PL leads to unnecessary waits and operations even though it adopts an optimistic approach. This paper suggests an efficient optimistic cache consistency protocol that provides serializability of committed transactions. Our cache consistency scheme, called PCP (Preemptive Cache Protocol), decides whether to commit or abort without waiting when transactions request commits. In PCP, some transactions that read stale data items can not be aborted, because it adopts a re-ordering scheme to enhance the performance. In addition, for re-ordering, PCP stores only one version of each data item. This paper presents a simulation-based analysis on the performance of PCP with other protocols such as O2PL, Optimistic Concurrency Control and Caching Two-Phase Locking. The simulation experiments show that PCP performs as well as or better than other schemes with low overhead.

141-160hit(201hit)