IEICE global.ieice.org Site

Keyword Search Result

[Keyword] CAC(297hit)

181-200hit(297hit)

Utilization of the On-Chip L2 Cache Area in CC-NUMA Multiprocessors for Applications with a Small Working Set
Sung Woo CHUNG Hyong-Shik KIM Chu Shik JHON

PAPER-Networking and System Architectures

Vol:
E87-D No:7
Page(s):
1617-1624
In CC-NUMA multiprocessor systems, it is important to reduce the remote memory access time. Based upon the fact that increasing the size of the LRU second-level (L2) cache more than a certain value does not reduce the cache miss rate significantly, in this paper, we propose two split L2 caches to utilize the surplus of the L2 cache. The split L2 caches are composed of a traditional LRU cache and another cache to reduce the remote memory access time. Both work together to reduce total L2 cache miss time by keeping remote (or long-distance) blocks as well as recently used blocks. For another cache, we propose two alternatives: an L2-RVC (Level 2 - Remote Victim Cache) and an L2-DAVC (Level 2 - Distance-Aware Victim Cache). The proposed split L2 caches reduce total execution time by up to 27%. It is also found that the proposed split L2 caches outperform the traditional single LRU cache of double size.
Enhancing ICP with P2P Technology: Cost, Availability, and Reconfiguration
Ping-Jer YEH Yu-Chen CHUANG Shyan-Ming YUAN

PAPER-Networking and System Architectures

Vol:
E87-D No:7
Page(s):
1641-1648
Traditional Web cache servers based on HTTP and ICP infrastructure tend to have higher hardware and management cost, have difficulty in availability, automatic and dynamic reconfiguration, and may have slow links to some users. We find that peer-to-peer technology can help solve these problems. The peer cache service (PCS) we proposed here leverages each peer's local cache, similar access patterns, fully distributed coordination, and fast communication channels to enhance response time, scale of cacheable objects, and availability. Moreover, incorporating goals and strategies such as making the protocol lightweight and mutually compatible with existing cache infrastructure, supporting mobile devices, undertaking dynamic three-level caching, and exchanging cache meta-information further improve the effectiveness and differentiate our work from other similar-at-first-glance P2P Web cache systems.
A Proposal of Effective Cooperative Caching System Based on Random Access Assumption
Mitsuru ISHII Shimmi HATTORI

LETTER-Network

Vol:
E87-B No:6
Page(s):
1741-1745
In this letter, we propose an effective cooperative caching system under the assumption that each web object is accessed randomly. Under this assumption, the access frequency per unit time is given by Poisson distribution and the probability distribution of the web object in the future is derived. Based on this probability distribution, one can obtain the criterion to allocate the web objects with more access expected to the cache servers closer to clients. It is also shown that there is a tradeoff between the precision to allocate objects and the efficiency of caching.
ILP-Based Program Path Analysis for Bounding Worst-Case Inter-Task Cache Conflicts
Hiroyuki TOMIYAMA Nikil DUTT

LETTER-System Programs

Vol:
E87-D No:6
Page(s):
1582-1587
The unpredictable behavior of cache memory makes it difficult to statically analyze the worst-case performance of real-time systems. This problem is further exacerbated in the case of preemptive multitask systems because of inter-task cache interference, called Cache-Related Preemption Delay (CRPD). This paper proposes an approach to analyzing the tight upper bound on CRPD which a task might impose on lower-priority tasks. Our method finds the program execution path which requires the maximum number of cache blocks using an integer linear programming technique. Experimental results show that our approach provides up to 69% tighter bounds on CRPD than a conservative approach.
One-Pass Semi-Dynamic Network Decoding Using a Subnetwork Caching Model for Large Vocabulary Continuous Speech Recongnition
Dong-Hoon AHN Minhwa CHUNG

PAPER

Vol:
E87-D No:5
Page(s):
1164-1174
This paper presents a new decoding framework for large vocabulary continuous speech recognition that can handle a static search network dynamically. Generally, a static network decoder can use a search space that is globally optimized in advance, and therefore it can run at high speed during decoding. However, its large memory requirement due to the large network size or the spatial complexity of the optimization algorithm often makes it impractical. Our new one-pass semi-dynamic network decoding scheme aims at incorporating such an optimized search network with memory efficiency, but without losing speed. In this framework, a complete search network is organized on the basis of self-structuring subnetworks and is nearly minimized using a modified tail-sharing algorithm. While the decoder runs, it caches subnetworks needed for decoding in memory, whereas static network decoders keep the complete network in memory. The subnetwork caching model is controlled by two levels of caches: local cache obtained by subnetwork caching operations and global cache obtained by subnetwork preloading operations. The model can also be controlled adaptively by using subnetwork profiling operations. Furthermore, it is made simple and fast with compactly designed self-structuring subnetworks. Experimental results on a 25 k-word Korean broadcast news transcription task show that the semi-dynamic decoder can run almost as fast as an equivalent static network decoder under various memory configurations by using the subnetwork caching model.
Selective-Sets Resizable Cache Memory Design for High-Performance and Low-Power CPU Core
Takashi KURAFUJI Yasunobu NAKASE Hidehiro TAKATA Yukinaga IMAMURA Rei AKIYAMA Tadao YAMANAKA Atsushi IWABU Shutarou YASUDA Toshitsugu MIWA Yasuhiro NUNOMURA Niichi ITOH Tetsuya KAGEMOTO Nobuharu YOSHIOKA Takeshi SHIBAGAKI Hiroyuki KONDO Masayuki KOYAMA Takahiko ARAKAWA Shuhei IWADE

PAPER

Vol:
E87-C No:4
Page(s):
535-542
We apply a selective-sets resizable cache and a complete hierarchy SRAM for the high-performance and low-power RISC CPU core. The selective-sets resizable cache can change the cache memory size by varying the number of cache sets. It reduces the leakage current by 23% with slight degradation of the worst case operating speed from 213 MHz to 210 MHz. The complete hierarchy SRAM enables the partial swing operation not only in the bit lines, but also in the global signal lines. It reduces the current consumption of the memory by 4.6%, and attains the high-speed access of 1.4 ns in the typical case.
A 100 MHz 7.84 mm² 31.7 msec 439 mW 512-Point 2-Dimensional FFT Single-Chip Processor
Naoto MIYAMOTO Leo KARNAN Kazuyuki MARUO Koji KOTANI Tadahiro OHMI

PAPER

Vol:
E87-C No:4
Page(s):
502-509
A single-chip 512-point FFT processor is presented. This processor is based on the cached-memory architecture (CMA) with the resource-saving multi-datapath radix-23 computation element. The 2-stage CMA, including a pair of single-port SRAMs, is also introduced to speedup the execution time of the 2-dimensional FFTs. Using the above techniques, we have designed an FFT processor core which integrates 552,000 transistors within an area of 2.82.8 mm2 with CMOS 0.35 µm triple-layer-metal process. This processor can execute a 512-point, 36-bit-complex fixed-point data format, 1-dimensonal FFT in 23.2 µsec and a 2-dimensional one in only 23.8 msec at 133 MHz operation. The power consumption of this processor is 439.6 mW at 3.3 V, 100 MHz operation.
A Novel Static Prediction Scheme for Filter Cache Structures
Kugan VIVEKANANDARAJAH Thambipillai SRIKANTHAN Christopher T. CLARKE Saurav BHATTACHARYYA

PAPER

Vol:
E87-C No:4
Page(s):
543-548
Energy dissipation in cache memories is becoming a major design issue for embedded microprocessors. Predictive filter cache based instruction cache hierarchy has been shown to effectively reduce the energy-delay product. In this paper, a simplified pattern prediction algorithm is proposed for the filter cache hierarchy. The prediction scheme relies on the static nature of the hit or miss pattern of the instruction access streams. The static patterns are maintained in a small 32x1-bit wide Static Pattern Table (SPT). Our investigations show that the proposed prediction algorithm is superior to that based on Next Fetch Prediction Table (NFPT) for all the benchmarks simulated. With the proposed approach, energy delay product reduction of up to 6.79% was evident when compared with that using NFPT. Moreover, since the prediction scheme is based on the static assignment of patterns, it lends well for area and power efficient implementation than that employs dynamic pattern prediction although it is marginally inferior (i.e. 0.69%) in term of energy delay product.
Design and Evaluation of a High Speed Routing Lookup Architecture
Jun ZHANG JeoungChill SHIM Hiroyuki KURINO Mitsumasa KOYANAGI

PAPER-Implementation and Operation

Vol:
E87-B No:3
Page(s):
406-412
The IP routing lookup problem is equivalent to finding the longest prefix of a packet's destination address in a routing table. It is a challenging problem to design a high performance IP routing lookup architecture, because of increasing traffic, higher link speed, frequent updates and increasing routing table size. At first, increasing traffic and higher link speed require that the IP routing can be executed at wire speed. Secondly, frequent routing table updates require that the insertion and deletion operations should be simple and low delay. At last, increasing routing table size hopes that less memory is used in order to reduce cost. Although many schemes to achieve fast lookup exist, less attention is paid on the latter two factors. This paper proposed a novel pipelined IP routing lookup architecture using selective binary search on hash table organized by prefix lengths. The evaluation results show that it can perform IP lookup operations at a maximum rate of one lookup per cycle. The hash operation ratio for one lookup can be reduced to about 1%, less than two hash operations are needed for one table update and only 512 kbytes SRAM is needed for a routing table with about 43000 prefixes. It proves to have higher performance than the existing schemes.
A Cache Replacement Policy for Transcoding Proxy Servers
Kai-Hau YEUNG Chun-Cheong WONG Kin-Yeung WONG Suk-Yu HUI

LETTER-Multimedia Systems

Vol:
E87-B No:1
Page(s):
209-211
A cache replacement policy which takes the transcoding time into account in making replacement decisions, for the emerging transcoding proxy servers is proposed. Simulation results show the proposed policy outperforms the conventional LRU in both the cache hit rate and the average object transcoding time.
An Efficient Resource Reservation Method over RSVP Using Moving History of a Mobile Host
SeongGon CHOI JunKyun CHOI

LETTER-Internet

Vol:
E86-B No:12
Page(s):
3655-3657
The aim of this paper is to raise utilization of resource and handover success rate at handover time. The proposed method takes advantage of the moving history of a mobile host at connection admission control time. We demonstrate the benefit of the proposed method over previously proposed reservation schemes by simulation.
A Self-Adjusting Destage Algorithm with High-Low Water Mark in Cached RAID5
Young Jin NAM Chanik PARK

PAPER-Dependable Systems

Vol:
E86-D No:12
Page(s):
2527-2535
The High-Low Water Mark destage (HLWM) algorithm is widely used to enable cached RAID5 to flush dirty data from its write cache to disks due to the simplicity of its operations. It starts and stops a destaging process based on the two thresholds that are configured at the initialization time with the best knowledge of its underlying storage performance capability and its workload pattern which includes traffic intensity, access patterns, etc. However, each time the current workload varies from the original, the thresholds need to be re-configured with the changed workload. This paper proposes an efficient destage algorithm which automatically re-configures its initial thresholds according to the changed traffic intensity and access patterns, called adaptive thresholding. The core of adaptive thresholding is to define the two thresholds as the multiplication of the referenced increasing and decreasing rates of the write cache occupancy level and the time required to fill and empty the write cache. We implement the proposed algorithm upon an actual RAID system and then verify the ability of the auto-reconfiguration with synthetic workloads having a different level of traffic intensity and access patterns. Performance evaluations under well-known traced workloads reveal that the proposed algorithm reduces disk IO traffic by about 12% with a 6% increase in the overwrite ratio compared with the HLWM algorithm.
Randomized Caches for Power-Efficiency
Hans VANDIERENDONCK Koen De BOSSCHERE

PAPER-Integrated Electronics

Vol:
E86-C No:10
Page(s):
2137-2144
Embedded processors are used in numerous devices executing dedicated applications. This setting makes it worthwhile to optimize the processor to the application it executes, in order to increase its power-efficiency. This paper proposes to enhance direct mapped data caches with automatically tuned randomized set index functions to achieve that goal. We show how randomization functions can be automatically generated and compare them to traditional set-associative caches in terms of performance and energy consumption. A 16 kB randomized direct mapped cache consumes 22% less energy than a 2-way set-associative cache, while it is less than 3% slower. When the randomization function is made configurable (i.e., it can be adapted to the program), the additional reduction of conflicts outweighs the added complexity of the hardware, provided there is a sufficient amount of conflict misses.
Total Cost-Aware Proxy Caching with Cooperative Removal Policy
Tian-Cheng HU Yasushi IKEDA Minoru NAKAZAWA Shimmi HATTORI

PAPER-Network

Vol:
E86-B No:10
Page(s):
3050-3062
Proxy caches have been used for a very long time to enhance the performance of web access. Along with the recent development of CDN (Content Distribution Network), the web proxy caching has also been adopted in many main techniques. This paper presents a new viewpoint on the possible improvement to the cooperative proxy caching, which can reduce outbound traffic and therefore ideally result in better response time. We take notice to the regional total cost of cache objects for optimizing content distribution. By contrast to the regular removal policies based on single proxy server, we prefer to evaluate a retrieved web object based on the metrics gathered from multiply proxy caches regionally. We particularly introduce a concept called post-removal analysis, which is used in measuring the value of the removed objects. Finally, we use the real proxy cache Squid to implement our proposal and modify the well-known cache benchmarking tool Web Polygraph to test this cooperative prototype. The test results prove that the proposed scheme can bring noticeable improvement on the performance of proxy caching.
Concurrency Control for Read-Only Client Transactions in Broadcast Disks
Haengrae CHO

PAPER-Broadcast Systems

Vol:
E86-B No:10
Page(s):
3114-3122
Broadcast disks are suited for disseminating information to a large number of clients in mobile computing environments. In broadcast disks, the server continuously and repeatedly broadcasts all data items in the database to clients without specific requests. The clients monitor the broadcast channel and read data items as they arrive on the broadcast channel. The broadcast channel then becomes a disk from which clients can read data items. In this paper, we propose a cache conscious concurrency control (C4) algorithm to preserve the consistency of read-only client transactions, when the values of broadcast data items are updated at the server. The C4 algorithm is novel in the sense that it can reduce the response time of client transactions with minimal control information to be broadcast from the server. This is achieved by the judicious caching strategy of the client and by adjusting the currency of data items read by the client.
Integrated Pre-Fetching and Replacing Algorithm for Graceful Image Caching
Zhou SU Teruyoshi WASHIZAWA Jiro KATTO Yasuhiko YASUDA

PAPER-Multimedia Systems

Vol:
E86-B No:9
Page(s):
2753-2763
The efficient distribution of stored information has become a major concern in the Internet. Since the web workload characteristics show that more than 60% of network traffic is caused by image documents, how to efficiently distribute image documents from servers to end clients is an important issue. Proxy cache is an efficient solution to reduce network traffic. And it has been shown that an image caching method (Graceful Caching) based on hierarchical coding format performs better than conventional caching schemes in recent years. However, as the capacity of the cache is limited, how to efficiently allocate the cache memory to achieve a minimum expected delay time is still a problem to be resolved. This paper presents an integrated caching algorithm to deal with the above problem for image databases, web browsers, proxies and other similar applications in the Internet. By analyzing the web request distribution of the Graceful Caching, both replacing and pre-fetching algorithms are proposed. We also show that our proposal can be carried out based on information readily available in the proxy server; it flexibly adapts its parameters to the hit rates and access pattern of users' requesting documents in the Graceful Caching. Finally we verify the performance of this algorithm by simulations.
Efficient and Scalable Client Clustering for Web Proxy Cache
Kyungbaek KIM Daeyeon PARK

PAPER-Software Systems and Technologies

Vol:
E86-D No:9
Page(s):
1577-1585
Many cooperated web cache systems and protocols have been proposed. These systems, however, require expensive resources, such as external bandwidth and CPU power or storage of a proxy, while inducing hefty administrative costs to achieve adequate client population growth. Moreover, a scalability problem in the cache server management still exists. This paper suggests peer-to-peer client-clustering. The client-cluster provides a proxy cache with backup storage which is comprised of the residual resources of the clients. We use DHT based peer-to-peer lookup protocol to manage the client-cluster. With the natural characteristics of this protocol, the client-cluster is self-organizing, fault-tolerant, well-balanced and scalable. Additionally, we propose the Backward ICP which is used to communicate between the proxy cache and the client-cluster, to reduce the overhead of the object replication and to use the resources more efficiently. We examine the performance of the client-cluster via a trace driven simulation and demonstrate effective enhancement of the proxy cache performance.
A File System Support for Streaming Media Caching
Hojung CHA Jaehak OH

LETTER-Software Systems

Vol:
E86-D No:7
Page(s):
1310-1313
This letter presents the implementation results of an application-level cache file system, MCFS, which is specifically designed to provide efficient caching and transmission mechanisms for streaming media. The file system is built on a virtual file disk which is constructed as a single large file on a general-purpose file system. MCFS suits the access requirement of continuous media caching and provides an efficient I/O mechanism for cache servers. The experimental results show that MCFS outperforms the comparison model and provides a consistent I/O bandwidth.
Stream Caching Using Hierarchically Distributed Proxies with Adaptive Segments Assignment
Zhou SU Jiro KATTO Takayuki NISHIKAWA Munetsugu MURAKAMI Yasuhiko YASUDA

PAPER-Proxy Caching

Vol:
E86-B No:6
Page(s):
1859-1869
With the advance of high-speed network technologies, availability and popularity of streaming media contents over the Internet has grown rapidly in recent years. Because of their distinct statistical properties and user viewing patterns, traditional delivery and caching schemes for normal web objects such as HTML files or images can not be efficiently applied to streaming media such as audio and video. In this paper, we therefore propose an integrated caching scheme for streaming media with segment-based caching and hierarchically distributed proxies. Firstly, each stream is divided into segments and their caching algorithms are considered to determine how to distribute the segments into different level proxies efficiently. Secondly, by introducing two kinds of segment priorities, segment replacing algorithms are proposed to determine which stream and which segments should be replaced when the cache is full. Finally, a Web-friendly caching scheme is proposed to integrate the streaming caching with the conventional caching of normal web objects. Performance of the proposed algorithms is verified by carrying out simulations.
Proxy Caching Mechanisms with Quality Adjustment for Video Streaming Services
Masahiro SASABE Yoshiaki TANIGUCHI Naoki WAKAMIYA Masayuki MURATA Hideo MIYAHARA

PAPER-Proxy Caching

Vol:
E86-B No:6
Page(s):
1849-1858
The proxy mechanism widely used in WWW systems offers low-delay data delivery by means of "proxy server." By applying proxy mechanisms to video streaming system, we expect that high-quality and low-delay video distribution can be accomplished without introducing extra load on the system. In addition, it is effective to adapt the quality of cached video data appropriately in the proxy if user requests are diverse due to heterogeneity in terms of the available bandwidth, end-system performance, and user's preferences on the perceived video quality. In this paper, we propose proxy caching mechanisms to accomplish high-quality and low-delay video streaming services. In our proposed system, a video stream is divided into blocks for efficient use of cache buffer. A proxy cache server is assumed to be able to adjust the quality of cached or retrieved video blocks to requests through video filters. We evaluate our proposed mechanisms in terms of the required buffer size, the play-out delay and the video quality through simulation experiments. Furthermore, to verify the practicality of our mechanisms, we implement our proposed mechanisms on a real system and conducted experiments. Through evaluations from several performance aspects, it is shown that our proposed mechanisms can provide users with a low-latency and high-quality video streaming service in a heterogeneous environment.

181-200hit(297hit)

Keyword Search Result

[Keyword] CAC(297hit)

Utilization of the On-Chip L2 Cache Area in CC-NUMA Multiprocessors for Applications with a Small Working Set

Enhancing ICP with P2P Technology: Cost, Availability, and Reconfiguration

A Proposal of Effective Cooperative Caching System Based on Random Access Assumption

ILP-Based Program Path Analysis for Bounding Worst-Case Inter-Task Cache Conflicts

One-Pass Semi-Dynamic Network Decoding Using a Subnetwork Caching Model for Large Vocabulary Continuous Speech Recongnition

Selective-Sets Resizable Cache Memory Design for High-Performance and Low-Power CPU Core

A 100 MHz 7.84 mm² 31.7 msec 439 mW 512-Point 2-Dimensional FFT Single-Chip Processor

A Novel Static Prediction Scheme for Filter Cache Structures

Design and Evaluation of a High Speed Routing Lookup Architecture

A Cache Replacement Policy for Transcoding Proxy Servers

An Efficient Resource Reservation Method over RSVP Using Moving History of a Mobile Host

A Self-Adjusting Destage Algorithm with High-Low Water Mark in Cached RAID5

Randomized Caches for Power-Efficiency

Total Cost-Aware Proxy Caching with Cooperative Removal Policy

Concurrency Control for Read-Only Client Transactions in Broadcast Disks

Integrated Pre-Fetching and Replacing Algorithm for Graceful Image Caching

Efficient and Scalable Client Clustering for Web Proxy Cache

A File System Support for Streaming Media Caching

Stream Caching Using Hierarchically Distributed Proxies with Adaptive Segments Assignment

Proxy Caching Mechanisms with Quality Adjustment for Video Streaming Services

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles