IEICE global.ieice.org Site

Keyword Search Result

[Keyword] cache(201hit)

121-140hit(201hit)

Load Balancing Routing Algorithm for Reverse Proxy Servers
Satosi KATO Hidetosi OKAMOTO Toyofumi TAKENAKA

PAPER-Internet

Vol:
E88-B No:9
Page(s):
3693-3700
We propose a novel routing algorithm for reverse proxy servers, called load balancing content address hashing (LB-CAH), and evaluate the performance of the proposed routing algorithm compared with that of the content address hashing (CAH) and the hash and slide (HAS) routing algorithms. The proposed LB-CAH routing algorithm calculates the popularity of pages in the load balancer using an LFU caching technique and periodically makes a popularity list. Using this popularity list, the proposed routing algorithm selects a reverse proxy server as follows. When the requested page appears in the popularity list, the request is routed according to the round robin method; otherwise, it is routed according to the content address hashing method. We evaluate and compare the LB-CAH, CAH and HAS routing algorithms by simulation experiments from the viewpoints of load balancing, consumed cache space and cache hit rate. Simulation experiments show that the proposed LB-CAH routing algorithm achieves almost the same degree of load balancing as the HAS algorithm and the same cache hit rate as the CAH algorithm for reverse proxy servers in various web site environments.
A Cache Optimized Multidimensional Index in Disk-Based Environments
Myungsun PARK Sukho LEE

PAPER-Database

Vol:
E88-D No:8
Page(s):
1932-1939
R-trees have been traditionally optimized for I/O performance with disk pages as tree nodes. Recently, researchers have proposed cache-conscious variations of R-trees optimized for CPU cache performance in main memory environments, where the node size is several cache lines wide and more entries are packed in a node by compressing MBR keys. However, because there is a big difference between the node sizes of two types of R-trees, disk-optimized R-trees show poor cache performance while cache-optimized R-trees exhibit poor disk performance. In this paper, we propose a cache and disk optimized R-tree, called PR-tree (Prefetching R-tree). For cache performance, the node size of the PR-tree is wider than a cache line, and the prefetch instruction is used to reduce the number of cache misses. For I/O performance, the nodes of the PR-tree are fitted into one disk page. We represent the detailed analysis of cache misses for range queries, and enumerate all the reasonable in-page leaf and nonleaf node sizes, and heights of in-page trees to figure out tree parameters for the best cache and I/O performance. The PR-tree that we propose achieves better cache performance than the disk-optimized R-tree: a factor of 3.5-15.1 improvement for one-by-one insertions, 6.5-15.1 improvement for deletions, 1.3-1.9 improvement for range queries, and 2.7-9.7 improvement for k-nearest neighbor queries. All experimental results do not show notable declines of I/O performance.
Exploiting Versions for Transactional Cache Consistency
Heum-Geun KANG

PAPER-Database

Vol:
E88-D No:6
Page(s):
1191-1198
The efficiency of algorithms managing data caches has a major impact on the performance of systems that utilize client-side data caching. In these systems, two versions of data can be maintained without additional overhead by exploiting the replication of data in the server's buffer and clients' caches. In this paper, we present a new cache consistency algorithm employing versions: Two Versions-Callback Locking (2V-CBL). Our experimental results indicate that 2V-CBL provides good performance, and in particular outperforms a leading cache consistency algorithm, Asynchronous Avoidance-based Cache Consistency, when some clients run only read-only transactions.
Low-Power Network-Packet-Processing Architecture Using Process-Learning Cache for High-End Backbone Router
Michitaka OKUNO Shin-ichi ISHIDA Hiroaki NISHI

PAPER-Digital

Vol:
E88-C No:4
Page(s):
536-543
A novel cache-based packet-processing-engine (PPE) architecture that achieves low-power consumption and high packet-processing throughput by exploiting the nature of network traffic is proposed. This architecture consists of a processing-unit array and a bit-stream manipulation path called a burst stream path (BSP) that has a special cache mechanism called a process-learning cache (PLC). Network packets, which have the same information in their header, appear repeatedly over a short time. By exploiting that nature, the PLC memorizes the packet-processing method with all results (i. e. , table lookups), and applies it to other packets. The PLC enables most packets to skip the execution at the processing-unit array, which consumes high power. As a practical implementation of the cache-based PPE architecture, P-Gear was designed. In particular, P-Gear was compared with a conventional PPE in terms of silicon die size and power consumption. According to this comparison, in the case of current 0.13-µm CMOS process technology, P-Gear can achieve 100-Gbps (gigabit per second) packet-processing throughput with only 36.5% of the die size and 32.8% of the power consumption required by the conventional PPE. Configurations of both architectures for the 1- to 100-Gbps throughput range were also analyzed. In the throughput range of 10-Gbps or more, P-Gear can achieve the target throughput in a smaller die size than the conventional PPE. And for the whole throughput range, P-Gear can achieve a target throughput at lower power than the conventional PPE.
Quantitative Evaluation of State-Preserving Leakage Reduction Algorithm for L1 Data Caches
Reiko KOMIYA Koji INOUE Vasily G. MOSHNYAGA Kazuaki MURAKAMI

PAPER

Vol:
E88-A No:4
Page(s):
862-868
As the transistor feature sizes and threshold voltages reduce, leakage energy consumption has become an inevitable issue for high-performance microprocessor designs. Since on-chip caches are major contributors of the leakage, a number of researchers have proposed efficient leakage reduction techniques. However, it is still not clear that 1) what kind of algorithm can be considered and 2) how much they have impact on energy and performance. To answer these questions, we explore run-time cache management algorithm, and evaluate the energy-performance efficiency for several alternatives.
Optimal Replication Algorithm for Scalable Streaming Media in Content Delivery Networks
Zhou SU Jiro KATTO Yasuhiko YASUDA

PAPER-Internet Systems

Vol:
E87-D No:12
Page(s):
2723-2732
CDN (Content Delivery Networks) improves end-user performance by replicating web contents on a group of geographically distributed servers. However, repeatedly keeping the entire replica of the original objects into many content servers consumes too much server resource. This problem becomes more serious for the large-sized objects such as streaming media, e.g. high quality video. In this paper, we therefore propose an efficient replication method for layered video streams in CDN, which can reduce user response delays and storage costs simultaneously. Based on an analytical formulation of the cooperative replication of layers and segments of each video stream, we derive a replication algorithm which solves next three problems quantitatively. (1) How many servers should be selected to replicate a given video stream? (2) For a single video stream, how many layers and segments should be stored in a given server? (3) After selecting a group of servers for each video stream, how do we allocate the replication priority (i.e. order) to each server? Simulation results verify that the proposed algorithm efficiently resolves the above problems and provides much better performance than conventional methods.
Density-Based Spam Detector
Kenichi YOSHIDA Fuminori ADACHI Takashi WASHIO Hiroshi MOTODA Teruaki HOMMA Akihiro NAKASHIMA Hiromitsu FUJIKAWA Katsuyuki YAMAZAKI

PAPER-Internet Systems

Vol:
E87-D No:12
Page(s):
2678-2688
The volume of mass unsolicited electronic mail, often known as spam, has recently increased enormously and has become a serious threat not only to the Internet but also to society. This paper proposes a new spam detection method which uses document space density information. Although the proposed method requires extensive e-mail traffic to acquire the necessary information, it can achieve perfect detection (i.e., both recall and precision is 100%) under practical conditions. A direct-mapped cache method contributes to the handling of over 13,000 e-mail messages per second. Experimental results, which were conducted using over 50 million actual e-mail messages, are also reported in this paper.
Caching Policy and Cache Placement for Active Reliable Multicast
Gang FENG Chee Kheong SIEW Kek Wee LOK Kwan Lawrence YEUNG

PAPER-Network

Vol:
E87-B No:11
Page(s):
3230-3241
Active Reliable Multicast (ARM) is a novel loss recovery scheme for large-scale reliable multicast that employs active routers to protect the sender and network bandwidth from unnecessary feedback and repair traffic. Active routers perform NACKs suppression, cache multicast data for local loss recovery, and use scoped retransmission to avoid exposure. Limited active resources at routers need to be optimized to achieve low loss recovery latency and/or high network throughput. In this paper, we study the cache placement strategies and caching policies for ARM. Several heuristics, namely uniform allocation, proportional allocation, max-min fair share and weighted allocation for cache allocation methods are proposed. To further improve the loss recovery performance, caching policies can be employed in conjunction with the cache allocation strategies. Several caching policies, namely complete caching, random caching and deterministic caching, are proposed. Extensive simulation experiments are conducted to evaluate and compare the performance of the proposed strategies and policies. Numerical results reveal that significant performance gains can be achieved when a proper cache placement strategy and a caching policy are used for a given available cache resource. Another interesting finding is that the contributions of the cache placement scheme and caching policy to the recovery latency performance are roughly independent. The obtained insights in this study will provide some design guidelines for optimal active resource allocation and caching polices for reliable multicast communications.
TLB Update-Hint: A Scalable TLB Consistency Algorithm for Cache-Coherent Non-uniform Memory Access Multiprocessors
Byeonghag SEONG Donggook KIM Yangwoo ROH Kyuho PARK Daeyeon PARK

PAPER-Networking and System Architectures

Vol:
E87-D No:7
Page(s):
1682-1692
Shared memory multiprocessors in which each processor has its own TLB must manage consistency among TLBs and a page table. As the large-scale CC-NUMA (cache-coherent non-uniform memory access) shared memory multiprocessors become popular, it is important for TLB consistency management algorithms to be highly scalable. In this paper, we propose a TLB update-hint algorithm as a scalable TLB consistency management solution for CC-NUMA multiprocessors. By using a lazy TLB invalidation approach, we reduced the number of unnecessary processor interruptions and idle-waiting time, and achieved a high level of scalability. Using a shared memory simulator, we evaluated the TLB update-hint algorithm. For performance comparison, we also simulated the TLB shootdown algorithm, one of the most popular TLB consistency algorithms. The simulations demonstrated that the TLB update-hint algorithm scales well in systems with a large number of processors. At 64 node systems, the TLB update-hint algorithm shows 4787% better performance than the TLB shootdown algorithm.
Utilization of the On-Chip L2 Cache Area in CC-NUMA Multiprocessors for Applications with a Small Working Set
Sung Woo CHUNG Hyong-Shik KIM Chu Shik JHON

PAPER-Networking and System Architectures

Vol:
E87-D No:7
Page(s):
1617-1624
In CC-NUMA multiprocessor systems, it is important to reduce the remote memory access time. Based upon the fact that increasing the size of the LRU second-level (L2) cache more than a certain value does not reduce the cache miss rate significantly, in this paper, we propose two split L2 caches to utilize the surplus of the L2 cache. The split L2 caches are composed of a traditional LRU cache and another cache to reduce the remote memory access time. Both work together to reduce total L2 cache miss time by keeping remote (or long-distance) blocks as well as recently used blocks. For another cache, we propose two alternatives: an L2-RVC (Level 2 - Remote Victim Cache) and an L2-DAVC (Level 2 - Distance-Aware Victim Cache). The proposed split L2 caches reduce total execution time by up to 27%. It is also found that the proposed split L2 caches outperform the traditional single LRU cache of double size.
Enhancing ICP with P2P Technology: Cost, Availability, and Reconfiguration
Ping-Jer YEH Yu-Chen CHUANG Shyan-Ming YUAN

PAPER-Networking and System Architectures

Vol:
E87-D No:7
Page(s):
1641-1648
Traditional Web cache servers based on HTTP and ICP infrastructure tend to have higher hardware and management cost, have difficulty in availability, automatic and dynamic reconfiguration, and may have slow links to some users. We find that peer-to-peer technology can help solve these problems. The peer cache service (PCS) we proposed here leverages each peer's local cache, similar access patterns, fully distributed coordination, and fast communication channels to enhance response time, scale of cacheable objects, and availability. Moreover, incorporating goals and strategies such as making the protocol lightweight and mutually compatible with existing cache infrastructure, supporting mobile devices, undertaking dynamic three-level caching, and exchanging cache meta-information further improve the effectiveness and differentiate our work from other similar-at-first-glance P2P Web cache systems.
Dynamic Code Repositioning for Java
Shinji TANAKA Tetsuyasu YAMADA Satoshi SHIRAISHI

PAPER-Software Support and Optimization Techniques

Vol:
E87-D No:7
Page(s):
1737-1742
The sizes of recent Java-based server-side applications, like J2EE containers, have been increasing continuously. Past techniques for improving the performance of Java applications have targeted relatively small applications. Moreover, when the methods of these small target applications are invoked, they are not usually distributed over the entire memory space. As a result, these techniques cannot be applied efficiently to improve the performance of current large applications. We propose a dynamic code repositioning approach to improve the hit rates of instruction caches and translation look-aside buffers. Profiles of method invocations are collected when the application performs with its heaviest processor load, and the code is repositioned based on these profiles. We also discuss a method-splitting technique to significantly reduce the sizes of methods. Our evaluation of a prototype implementing these techniques indicated 5% improvement in the throughput of the application.
A Proposal of Effective Cooperative Caching System Based on Random Access Assumption
Mitsuru ISHII Shimmi HATTORI

LETTER-Network

Vol:
E87-B No:6
Page(s):
1741-1745
In this letter, we propose an effective cooperative caching system under the assumption that each web object is accessed randomly. Under this assumption, the access frequency per unit time is given by Poisson distribution and the probability distribution of the web object in the future is derived. Based on this probability distribution, one can obtain the criterion to allocate the web objects with more access expected to the cache servers closer to clients. It is also shown that there is a tradeoff between the precision to allocate objects and the efficiency of caching.
ILP-Based Program Path Analysis for Bounding Worst-Case Inter-Task Cache Conflicts
Hiroyuki TOMIYAMA Nikil DUTT

LETTER-System Programs

Vol:
E87-D No:6
Page(s):
1582-1587
The unpredictable behavior of cache memory makes it difficult to statically analyze the worst-case performance of real-time systems. This problem is further exacerbated in the case of preemptive multitask systems because of inter-task cache interference, called Cache-Related Preemption Delay (CRPD). This paper proposes an approach to analyzing the tight upper bound on CRPD which a task might impose on lower-priority tasks. Our method finds the program execution path which requires the maximum number of cache blocks using an integer linear programming technique. Experimental results show that our approach provides up to 69% tighter bounds on CRPD than a conservative approach.
A 100 MHz 7.84 mm² 31.7 msec 439 mW 512-Point 2-Dimensional FFT Single-Chip Processor
Naoto MIYAMOTO Leo KARNAN Kazuyuki MARUO Koji KOTANI Tadahiro OHMI

PAPER

Vol:
E87-C No:4
Page(s):
502-509
A single-chip 512-point FFT processor is presented. This processor is based on the cached-memory architecture (CMA) with the resource-saving multi-datapath radix-23 computation element. The 2-stage CMA, including a pair of single-port SRAMs, is also introduced to speedup the execution time of the 2-dimensional FFTs. Using the above techniques, we have designed an FFT processor core which integrates 552,000 transistors within an area of 2.82.8 mm2 with CMOS 0.35 µm triple-layer-metal process. This processor can execute a 512-point, 36-bit-complex fixed-point data format, 1-dimensonal FFT in 23.2 µsec and a 2-dimensional one in only 23.8 msec at 133 MHz operation. The power consumption of this processor is 439.6 mW at 3.3 V, 100 MHz operation.
A Novel Static Prediction Scheme for Filter Cache Structures
Kugan VIVEKANANDARAJAH Thambipillai SRIKANTHAN Christopher T. CLARKE Saurav BHATTACHARYYA

PAPER

Vol:
E87-C No:4
Page(s):
543-548
Energy dissipation in cache memories is becoming a major design issue for embedded microprocessors. Predictive filter cache based instruction cache hierarchy has been shown to effectively reduce the energy-delay product. In this paper, a simplified pattern prediction algorithm is proposed for the filter cache hierarchy. The prediction scheme relies on the static nature of the hit or miss pattern of the instruction access streams. The static patterns are maintained in a small 32x1-bit wide Static Pattern Table (SPT). Our investigations show that the proposed prediction algorithm is superior to that based on Next Fetch Prediction Table (NFPT) for all the benchmarks simulated. With the proposed approach, energy delay product reduction of up to 6.79% was evident when compared with that using NFPT. Moreover, since the prediction scheme is based on the static assignment of patterns, it lends well for area and power efficient implementation than that employs dynamic pattern prediction although it is marginally inferior (i.e. 0.69%) in term of energy delay product.
Selective-Sets Resizable Cache Memory Design for High-Performance and Low-Power CPU Core
Takashi KURAFUJI Yasunobu NAKASE Hidehiro TAKATA Yukinaga IMAMURA Rei AKIYAMA Tadao YAMANAKA Atsushi IWABU Shutarou YASUDA Toshitsugu MIWA Yasuhiro NUNOMURA Niichi ITOH Tetsuya KAGEMOTO Nobuharu YOSHIOKA Takeshi SHIBAGAKI Hiroyuki KONDO Masayuki KOYAMA Takahiko ARAKAWA Shuhei IWADE

PAPER

Vol:
E87-C No:4
Page(s):
535-542
We apply a selective-sets resizable cache and a complete hierarchy SRAM for the high-performance and low-power RISC CPU core. The selective-sets resizable cache can change the cache memory size by varying the number of cache sets. It reduces the leakage current by 23% with slight degradation of the worst case operating speed from 213 MHz to 210 MHz. The complete hierarchy SRAM enables the partial swing operation not only in the bit lines, but also in the global signal lines. It reduces the current consumption of the memory by 4.6%, and attains the high-speed access of 1.4 ns in the typical case.
Design and Evaluation of a High Speed Routing Lookup Architecture
Jun ZHANG JeoungChill SHIM Hiroyuki KURINO Mitsumasa KOYANAGI

PAPER-Implementation and Operation

Vol:
E87-B No:3
Page(s):
406-412
The IP routing lookup problem is equivalent to finding the longest prefix of a packet's destination address in a routing table. It is a challenging problem to design a high performance IP routing lookup architecture, because of increasing traffic, higher link speed, frequent updates and increasing routing table size. At first, increasing traffic and higher link speed require that the IP routing can be executed at wire speed. Secondly, frequent routing table updates require that the insertion and deletion operations should be simple and low delay. At last, increasing routing table size hopes that less memory is used in order to reduce cost. Although many schemes to achieve fast lookup exist, less attention is paid on the latter two factors. This paper proposed a novel pipelined IP routing lookup architecture using selective binary search on hash table organized by prefix lengths. The evaluation results show that it can perform IP lookup operations at a maximum rate of one lookup per cycle. The hash operation ratio for one lookup can be reduced to about 1%, less than two hash operations are needed for one table update and only 512 kbytes SRAM is needed for a routing table with about 43000 prefixes. It proves to have higher performance than the existing schemes.
A Cache Replacement Policy for Transcoding Proxy Servers
Kai-Hau YEUNG Chun-Cheong WONG Kin-Yeung WONG Suk-Yu HUI

LETTER-Multimedia Systems

Vol:
E87-B No:1
Page(s):
209-211
A cache replacement policy which takes the transcoding time into account in making replacement decisions, for the emerging transcoding proxy servers is proposed. Simulation results show the proposed policy outperforms the conventional LRU in both the cache hit rate and the average object transcoding time.
A Self-Adjusting Destage Algorithm with High-Low Water Mark in Cached RAID5
Young Jin NAM Chanik PARK

PAPER-Dependable Systems

Vol:
E86-D No:12
Page(s):
2527-2535
The High-Low Water Mark destage (HLWM) algorithm is widely used to enable cached RAID5 to flush dirty data from its write cache to disks due to the simplicity of its operations. It starts and stops a destaging process based on the two thresholds that are configured at the initialization time with the best knowledge of its underlying storage performance capability and its workload pattern which includes traffic intensity, access patterns, etc. However, each time the current workload varies from the original, the thresholds need to be re-configured with the changed workload. This paper proposes an efficient destage algorithm which automatically re-configures its initial thresholds according to the changed traffic intensity and access patterns, called adaptive thresholding. The core of adaptive thresholding is to define the two thresholds as the multiplication of the referenced increasing and decreasing rates of the write cache occupancy level and the time required to fill and empty the write cache. We implement the proposed algorithm upon an actual RAID system and then verify the ability of the auto-reconfiguration with synthetic workloads having a different level of traffic intensity and access patterns. Performance evaluations under well-known traced workloads reveal that the proposed algorithm reduces disk IO traffic by about 12% with a 6% increase in the overwrite ratio compared with the HLWM algorithm.

121-140hit(201hit)

Keyword Search Result

[Keyword] cache(201hit)

Load Balancing Routing Algorithm for Reverse Proxy Servers

A Cache Optimized Multidimensional Index in Disk-Based Environments

Exploiting Versions for Transactional Cache Consistency

Low-Power Network-Packet-Processing Architecture Using Process-Learning Cache for High-End Backbone Router

Quantitative Evaluation of State-Preserving Leakage Reduction Algorithm for L1 Data Caches

Optimal Replication Algorithm for Scalable Streaming Media in Content Delivery Networks

Density-Based Spam Detector

Caching Policy and Cache Placement for Active Reliable Multicast

TLB Update-Hint: A Scalable TLB Consistency Algorithm for Cache-Coherent Non-uniform Memory Access Multiprocessors

Utilization of the On-Chip L2 Cache Area in CC-NUMA Multiprocessors for Applications with a Small Working Set

Enhancing ICP with P2P Technology: Cost, Availability, and Reconfiguration

Dynamic Code Repositioning for Java

A Proposal of Effective Cooperative Caching System Based on Random Access Assumption

ILP-Based Program Path Analysis for Bounding Worst-Case Inter-Task Cache Conflicts

A 100 MHz 7.84 mm² 31.7 msec 439 mW 512-Point 2-Dimensional FFT Single-Chip Processor

A Novel Static Prediction Scheme for Filter Cache Structures

Selective-Sets Resizable Cache Memory Design for High-Performance and Low-Power CPU Core

Design and Evaluation of a High Speed Routing Lookup Architecture

A Cache Replacement Policy for Transcoding Proxy Servers

A Self-Adjusting Destage Algorithm with High-Low Water Mark in Cached RAID5

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles