IEICE global.ieice.org Site

Keyword Search Result

[Keyword] cache(201hit)

61-80hit(201hit)

Exploiting the Task-Pipelined Parallelism of Stream Programs on Many-Core GPUs
Shuai MU Dongdong LI Yubei CHEN Yangdong DENG Zhihua WANG

PAPER-Computer System

Vol:
E96-D No:10
Page(s):
2194-2207
By exploiting data-level parallelism, Graphics Processing Units (GPUs) have become a high-throughput, general purpose computing platform. Many real-world applications especially those following a stream processing pattern, however, feature interleaved task-pipelined and data parallelism. Current GPUs are ill equipped for such applications due to the insufficient usage of computing resources and/or the excessive off-chip memory traffic. In this paper, we focus on microarchitectural enhancements to enable task-pipelined execution of data-parallel kernels on GPUs. We propose an efficient adaptive dynamic scheduling mechanism and a moderately modified L2 design. With minor hardware overhead, our techniques orchestrate both task-pipeline and data parallelisms in a unified manner. Simulation results derived by a cycle-accurate simulator on real-world applications prove that the proposed GPU microarchitecture improves the computing throughput by 18% and reduces the overall accesses to off-chip GPU memory by 13%.
A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts
Masayuki SATO Ryusuke EGAWA Hiroyuki TAKIZAWA Hiroaki KOBAYASHI

PAPER-Computer System

Vol:
E96-D No:9
Page(s):
2047-2054
Chip multiprocessors (CMPs) improve performance by simultaneously executing multiple threads using integrated multiple cores. However, since these cores commonly share one cache, inter-thread cache conflicts often limit the performance improvement by multi-threading. This paper focuses on two causes of inter-thread cache conflicts. In shared caches of CMPs, cached data fetched by one thread are frequently evicted by another thread. Such an eviction, called inter-thread kickout (ITKO), is one of the major causes of inter-thread cache conflicts. The other cause is capacity shortage that occurs when one cache is shared by threads demanding large cache capacities. If the total capacity demanded by the threads exceeds the actual cache capacity, the threads compete to use the limited cache capacity, resulting in capacity shortage. To address inter-thread cache conflicts, we must take into account both ITKOs and capacity shortage. Therefore, this paper proposes a capacity-aware thread scheduling method combined with cache partitioning. In the proposed method, inter-thread cache conflicts due to ITKOs and capacity shortage are decreased by cache partitioning and thread scheduling, respectively. The proposed scheduling method estimates the capacity demand of each thread with an estimation method used in the cache partitioning mechanism. Based on the estimation used for cache partitioning, the thread scheduler decides thread combinations sharing one cache so as to avoid capacity shortage. Evaluation results suggest that the proposed method can improve overall performance by up to 8.1%, and the performance of individual threads by up to 12%. The results also show that both cache partitioning and thread scheduling are indispensable to avoid both ITKOs and capacity shortage simultaneously. Accordingly, the proposed method can significantly reduce the inter-thread cache conflicts and hence improve performance.
Revisiting Shared Cache Contention Problems: A Practical Hardware-Software Cooperative Approach
Eunji PAK Sang-Hoon KIM Jaehyuk HUH Seungryoul MAENG

PAPER-Computer System

Vol:
E96-D No:7
Page(s):
1457-1466
Although shared caches allow the dynamic allocation of limited cache capacity among cores, traditional LRU replacement policies often cannot prevent negative interference among cores. To address the contention problem in shared caches, cache partitioning and application scheduling techniques have been extensively studied. Partitioning explicitly determines cache capacity for each core to maximize the overall throughput. On the other hand, application scheduling by operating systems groups the least interfering applications for each shared cache, when multiple shared caches exist in systems. Although application scheduling can mitigate the contention problem without any extra hardware support, its effect can be limited for some severe contentions. This paper proposes a low cost solution, based on application scheduling with a simple cache insertion control. Instead of using a full hardware-based cache partitioning mechanism, the proposed technique mostly relies on application scheduling. It selectively uses LRU insertion to the shared caches, which can be added with negligible hardware changes from the current commercial processor designs. For the completeness of cache interference evaluation, this paper examines all possible mixes from a set of applications, instead of using a just few selected mixes. The evaluation shows that the proposed technique can mitigate the cache contention problem effectively, close to the ideal scheduling and partitioning.
Active Breadcrumbs: Adaptive Distribution of In-Network Guidance Information for Content-Oriented Networks
Masayuki KAKIDA Yosuke TANIGAWA Hideki TODE

PAPER

Vol:
E96-B No:7
Page(s):
1670-1679
Lately, access loads on servers are increasing due to larger content size and higher request frequency in content distribution networks. Breadcrumbs (BC), an architecture with guidance information for locating a content cache, is designed to reduce the server load and to form content-oriented network autonomously in cooperation with cached contents over IP network. We also proposed Breadcrumbs+ which solves BC's endless routing loop problem. However, Breadcrumbs takes only a passive approach; BC entries are created only when a content is downloaded and only at routers on the download path but not at any other routers. We expect that active and adaptive control of guidance information with simple complexity improves its performance with keeping scalability. In this paper, we propose Active Breadcrumbs which achieves efficient content retrieval and load-balancing through active and adaptive control of guidance information by cache-nodes themselves. In addition, we show the effectiveness of Active Breadcrumbs through the extensive computer simulation.
Bayesian Theory Based Adaptive Proximity Data Accessing for CMP Caches
Guohong LI Zhenyu LIU Sanchuan GUO Dongsheng WANG

PAPER

Vol:
E96-A No:6
Page(s):
1293-1305
As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they induce higher overall L1 miss latencies because of the longer average distance between the requestor and the home node, and the potential congestions at certain nodes. We observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. In order to leverage the aforementioned property, we propose Bayesian Theory based Adaptive Proximity Data Accessing (APDA). In our proposal, we organize the multi-core into clusters of 2x2 nodes, and introduce the Proximity Data Prober (PDP) to detect whether an L1 miss can be served by one of the cluster L1 caches. Furthermore, we devise the Bayesian Decision Classifier (BDC) to adaptively select the remote L2 cache or the neighboring L1 node as the server according to the minimum miss cost. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the APDA can reduce the execution time by 20% and reduce the energy by 14% compared to a standard multi-core with a shared L2. The experimental results demonstrate that our proposal outperforms the up-to-date mechanisms, such as ASR, DCC and RNUCA.
A High-Speed Trace-Driven Cache Configuration Simulator for Dual-Core Processor L1 Caches
Masashi TAWADA Masao YANAGISAWA Nozomu TOGAWA

PAPER

Vol:
E96-A No:6
Page(s):
1283-1292
Recently, multi-core processors are used in embedded systems very often. Since application programs is much limited running on embedded systems, there must exists an optimal cache memory configuration in terms of power and area. Simulating application programs on various cache configurations is one of the best options to determine the optimal one. Multi-core cache configuration simulation, however, is much more complicated and takes much more time than single-core cache configuration simulation. In this paper, we propose a very fast dual-core L1 cache configuration simulation algorithm. We first propose a new data structure where just a single data structure represents two or more multi-core cache configurations with different cache associativities. After that, we propose a new multi-core cache configuration simulation algorithm using our new data structure associated with new theorems. Experimental results demonstrate that our algorithm obtains exact simulation results but runs 20 times faster than a conventional approach.
Reconfiguring Cache Associativity: Adaptive Cache Design for Wide-Range Reliable Low-Voltage Operation Using 7T/14T SRAM
Jinwook JUNG Yohei NAKATA Shunsuke OKUMURA Hiroshi KAWAGUCHI Masahiko YOSHIMOTO

PAPER

Vol:
E96-C No:4
Page(s):
528-537
This paper presents an adaptive cache architecture for wide-range reliable low-voltage operations. The proposed associativity-reconfigurable cache consists of pairs of cache ways so that it can exploit the recovery feature of the novel 7T/14T SRAM cell. Each pair has two operating modes that can be selected based upon the required voltage level of current operating conditions: normal mode for high performance and dependable mode for reliable low-voltage operations. We can obtain reliable low-voltage operations by application of the dependable mode to weaker pairs that cannot operate reliably at low voltages. Meanwhile leaving stronger pairs in the normal mode, we can minimize performance losses. Our chip measurement results show that the proposed cache can trade off its associativity with the minimum operating voltage. Moreover, it can decrease the minimum operating voltage by 140 mV achieving 67.48% and 26.70% reduction of the power dissipation and energy per instruction. Processor simulation results show that designing the on-chip caches using the proposed scheme results in 2.95% maximum IPC losses, but it can be chosen various performance levels. Area estimation results show that the proposed cache adds area overhead of 1.61% and 5.49% in 32-KB and 256-KB caches, respectively.
Bypass Extended Stack Processing for Anti-Thrashing Replacement in Shared Last Level Cache of Chip Multiprocessors
Young-Sik EOM Jong Wook KWAK Seong-Tae JHANG Chu-Shik JHON

LETTER-Computer System

Vol:
E96-D No:2
Page(s):
370-374
Chip Multiprocessors (CMPs) allow different applications to share LLC (Last Level Cache). Since each application has different cache capacity demand, LLC capacity should be partitioned in accordance with the demands. Existing partitioning algorithms estimate the capacity demand of each core by stack processing considering the LRU (Least Recently Used) replacement policy only. However, anti-thrashing replacement algorithms like BIP (Binary Insertion Policy) and BIP-Bypass emerged to overcome the thrashing problem of LRU replacement policy in a working set greater than the available cache size. Since existing stack processing cannot estimate the capacity demand with anti-thrashing replacement policy, partitioning algorithms also cannot partition cache space with anti-thrashing replacement policy. In this letter, we prove that BIP replacement policy is not feasible to stack processing but BIP-bypass is. We modify stack processing to accommodate BIP-Bypass. In addition, we propose the pipelined hardware of modified stack processing. With this hardware, we can get the success function of the various capacities with anti-thrashing replacement policy and assess the cache capacity of shared cache adequate to each core in real time.
Adaptive Insertion and Promotion Policies Based on Least Recently Used Replacement
Wenbing JIN Xuanya LI Yanyong YU Yongzhi WANG

LETTER-Computer System

Vol:
E96-D No:1
Page(s):
124-128
To improve Last-Level Cache (LLC) management, numerous approaches have been proposed requiring additional hardware budget and increased overhead. A number of these approaches even change the organization of the existing cache design. In this letter, we propose Adaptive Insertion and Promotion (AIP) policies based on Least Recently Used (LRU) replacement. AIP dynamically inserts a missed line in the middle of the cache list and promotes a reused line several steps left, realizing the combination of LRU and LFU policies deliberately under a single unified scheme. As a result, it benefits workloads with high locality as well as with many frequently reused lines. Most importantly, AIP requires no additional hardware other than a typical LRU list, thus it can be easily integrated into the existing hardware with minimal changes. Other issues around LLC such as scans, thrashing and dead lines are all explored in our study. Experimental results on the gem5 simulator with SPEC CUP2006 benchmarks indicate that AIP outperforms LRU replacement policy by an average of 5.8% on the misses per thousand instructions metric.
Using Cacheline Reuse Characteristics for Prefetcher Throttling
Hidetsugu IRIE Takefumi MIYOSHI Goki HONJO Kei HIRAKI Tsutomu YOSHINAGA

PAPER-Computer Architecture

Vol:
E95-D No:12
Page(s):
2928-2938
One of the significant issues of processor architecture is to overcome memory latency. Prefetching can greatly improve cache performance, but it has the drawback of cache pollution, unless its aggressiveness is properly set. Several techniques that have been proposed for prefetcher throttling use accuracy as a metric, but their robustness were not sufficient because of the variations in programs' working set sizes and cache capacities. In this study, we revisit prefetcher throttling from the viewpoint of data lifetime. Exploiting the characteristics of cache line reuse, we propose Cache-Convection-Control-based Prefetch Optimization Plus (CCCPO+), which enhances the feedback algorithm of our previous CCCPO. Evaluation results showed that this novel approach achieved a 30% improvement over no prefetching in the geometric mean of the SPEC CPU 2006 benchmark suite with 256 KB LLC, 1.8% over the latest prefetcher throttling, and 0.5% over our previous CCCPO. Moreover, it showed superior stability compared to related works, while lowering the hardware cost.
Evaluation of a New Power-Gating Scheme Utilizing Data Retentiveness on Caches
Kyundong KIM Seidai TAKEDA Shinobu MIWA Hiroshi NAKAMURA

PAPER-Logic Synthesis, Test and Verification

Vol:
E95-A No:12
Page(s):
2301-2308
Caches are one of the most leakage consuming components in modern processor because of massive amount of transistors. To reduce leakage power of caches, several techniques using power-gating (PG) were proposed. Despite of its high leakage saving, a side effect of PG for caches is the loss of data during a sleep. If useful data is lost in sleep mode, it should be fetched again from a lower level memory. This consumes a considerable amount of energy, which very unfortunately mitigates the leakage saving. This paper proposes a new PG scheme considering data retentiveness of SRAM. After entering the sleep mode, data of an SRAM cell is not lost immediately and is usable by checking the validity of the data. Therefore, we utilize data retentiveness of SRAM to avoid energy overhead for data recovery, which results in further chance of leakage saving. To check availability, we introduce a simple hardware whose overhead is ignorable. Our experimental result shows that utilizing data retentiveness saves up to 32.42% of more leakage than conventional PG.
Scalable Cache-Optimized Concurrent FIFO Queue for Multicore Architectures
Changwoo MIN Hyung Kook JUN Won Tae KIM Young Ik EOM

LETTER

Vol:
E95-D No:12
Page(s):
2956-2957
A concurrent FIFO queue is a widely used fundamental data structure for parallelizing software. In this letter, we introduce a novel concurrent FIFO queue algorithm for multicore architecture. We achieve better scalability by reducing contention among concurrent threads, and improve performance by optimizing cache-line usage. Experimental results on a server with eight cores show that our algorithm outperforms state-of-the-art algorithms by a factor of two.
Cache-Aware Virtual Machine Scheduling on Multi-Core Architecture
Cheol-Ho HONG Young-Pil KIM Seehwan YOO Chi-Young LEE Chuck YOO

PAPER-Software System

Vol:
E95-D No:10
Page(s):
2377-2392
Facing practical limits to increasing processor frequencies, manufacturers have resorted to multi-core designs in their commercial products. In multi-core implementations, cores in a physical package share the last-level caches to improve inter-core communication. To efficiently exploit this facility, operating systems must employ cache-aware schedulers. Unfortunately, virtualization software, which is a foundation technology of cloud computing, is not yet cache-aware or does not fully exploit the locality of the last-level caches. In this paper, we propose a cache-aware virtual machine scheduler for multi-core architectures. The proposed scheduler exploits the locality of the last-level caches to improve the performance of concurrent applications running on virtual machines. For this purpose, we provide a space-partitioning algorithm that migrates and clusters communicating virtual CPUs (VCPUs) in the same cache domain. Second, we provide a time-partitioning algorithm that co-schedules or schedules in sequence clustered VCPUs. Finally, we present a theoretical analysis that proves our scheduling algorithm is more efficient in supporting concurrent applications than the default credit scheduler in Xen. We implemented our virtual machine scheduler in the recent Xen hypervisor with para-virtualized Linux-based operating systems. We show that our approach can improve performance of concurrent virtual machines by up to 19% compared to the credit scheduler.
Dynamic Allocation of SPM Based on Time-Slotted Cache Conflict Graph for System Optimization
Jianping WU Ming LING Yang ZHANG Chen MEI Huan WANG

PAPER-Computer System

Vol:
E95-D No:8
Page(s):
2039-2052
This paper proposes a novel dynamic Scratch-pad Memory allocation strategy to optimize the energy consumption of the memory sub-system. Firstly, the whole program execution process is sliced into several time slots according to the temporal dimension; thereafter, a Time-Slotted Cache Conflict Graph (TSCCG) is introduced to model the behavior of Data Cache (D-Cache) conflicts within each time slot. Then, Integer Nonlinear Programming (INP) is implemented, which can avoid time-consuming linearization process, to select the most profitable data pages. Virtual Memory System (VMS) is adopted to remap those data pages, which will cause severe Cache conflicts within a time slot, to SPM. In order to minimize the swapping overhead of dynamic SPM allocation, a novel SPM controller with a tightly coupled DMA is introduced to issue the swapping operations without CPU's intervention. Last but not the least, this paper discusses the fluctuation of system energy profit based on different MMU page size as well as the Time Slot duration quantitatively. According to our design space exploration, the proposed method can optimize all of the data segments, including global data, heap and stack data in general, and reduce the total energy consumption by 27.28% on average, up to 55.22% with a marginal performance promotion. And comparing to the conventional static CCG (Cache Conflicts Graph), our approach can obtain 24.7% energy profit on average, up to 30.5% with a sight boost in performance.
Throttling Capacity Sharing Using Life Time and Reuse Time Prediction in Private L2 Caches of Chip Multiprocessors
Young-Sik EOM Jong Wook KWAK Seong Tae JHANG Chu Shik JHON

LETTER-Computer System

Vol:
E95-D No:6
Page(s):
1676-1679
In Chip Multi-Processors (CMPs), private L2 caches have potential benefits in future CMPs, e.g. small access latency, performance isolation, tile-friendly architecture and simple low bandwidth on-chip interconnect. But the major weakness of private cache is the higher cache miss rate caused by small private cache capacity. To deal with this problem, private caches can share capacity through spilling replaced blocks to other private caches. However, indiscriminate spilling can make capacity problem worse and influence performance negatively. This letter proposes throttling capacity sharing (TCS) for effective capacity sharing in private L2 caches. TCS determines whether to spill a replaced block by predicting reuse possibility, based on life time and reuse time. In our performance evaluation, TCS improves weighted speedup by 48.79%, 6.37% and 5.44% compared to non-spilling, Cooperative Caching with best spill probability (CC) and Dynamic Spill-Receive (DSR), respectively.
Accurate and Simplified Prediction of L2 Cache Vulnerability for Cost-Efficient Soft Error Protection
Yu CHENG Anguo MA Minxuan ZHANG

PAPER-Trust

Vol:
E95-D No:1
Page(s):
56-66
Soft errors caused by energetic particle strikes in on-chip cache memories have become a critical challenge for microprocessor design. Architectural vulnerability factor (AVF), which is defined as the probability that a transient fault in the structure would result in a visible error in the final output of a program, has been widely employed for accurate soft error rate estimation. Recent studies have found that designing soft error protection techniques with the awareness of AVF is greatly helpful to achieve a tradeoff between performance and reliability for several structures (i.e., issue queue, reorder buffer). Considering large on-chip L2 cache, redundancy-based protection techniques (such as ECC) have been widely employed for L2 cache data integrity with high costs. Protecting caches without accurate knowledge of the vulnerability characteristics may lead to the over-protection, thus incurring high overheads. Therefore, designing AVF-aware protection techniques would be attractive for designers to achieve a cost-efficient protection for caches, especially at early design stage. In this paper, we propose an improved AVF estimation framework for conducing comprehensive characterization of dynamic behavior and predictability of L2 cache vulnerability. We propose to employ Bayesian Additive Regression Trees (BART) method to accurately model the variation of L2 cache AVF and to quantitatively explain the important effects of several key performance metrics on L2 cache AVF. Then we employ bump hunting technique to extract some simple selecting rules based on several key performance metrics for a simplified and fast estimation of L2 cache AVF. Using the simplified L2 cache AVF estimator, we develop an AVF-aware ECC technique as an example to demonstrate the cost-efficient advantages of the AVF prediction based dynamic fault tolerant techniques. Experimental results show that compared with traditional full ECC technique, AVF-aware ECC technique reduces the L2 cache access latency by 16.5% and saves power consumption by 11.4% for SPEC2K benchmarks averagely.
A New Recovery Mechanism in Superscalar Microprocessors by Recovering Critical Misprediction
Jiongyao YE Yu WAN Takahiro WATANABE

PAPER-High-Level Synthesis and System-Level Design

Vol:
E94-A No:12
Page(s):
2639-2648
Current trends in modern out-of-order processors involve implementing deeper pipelines and a large instruction window to achieve high performance, which lead to the penalty of the branch misprediction recovery being a critical factor in overall processor performance. Multi path execution is proposed to reduce this penalty by executing both paths following a branch, simultaneously. However, there are some drawbacks in this mechanism, such as design complexity caused by processing both paths after a branch and performance degradation due to hardware resource competition between two paths. In this paper, we propose a new recovery mechanism, called Recovery Critical Misprediction (RCM), to reduce the penalty of branch misprediction recovery. The mechanism uses a small trace cache to save the decoded instructions from the alternative path following a branch. Then, during the subsequent predictions, the trace cache is accessed. If there is a hit, the processor forks the second path of this branch at the renamed stage so that the design complexity in the fetch stage and decode stage is alleviated. The most contribution of this paper is that our proposed mechanism employs critical path prediction to identify the branches that will be most harmful if mispredicted. Only the critical branch can save its alternative path into the trace cache, which not only increases the usefulness of a limited size of trace cache but also avoids the performance degradation caused by the forked non-critical branch. Experimental results employing SPECint 2000 benchmark show that a processor with our proposed RCM improves IPC value by 10.05% compared with a conventional processor.
A 98 GMACs/W 32-Core Vector Processor in 65 nm CMOS
Xun HE Xin JIN Minghui WANG Dajiang ZHOU Satoshi GOTO

PAPER-High-Level Synthesis and System-Level Design

Vol:
E94-A No:12
Page(s):
2609-2618
This paper presents a high-performance dual-issue 32-core SIMD platform for image and video processing. The SIMD cores support 8/16 bits SIMD MAC instructions, and vertical vector access. Eight cores with a 4-ports L2 cache are connected by CIB bus as a cluster. Four clusters are connected by mesh network. This hierarchical network can provide more than 192 GB/s low latency inter-core BW in average. The 4-ports L2 cache architecture is also designed to provide 192 GB/s L2 cache BW. To reduce coherence operation in large-scale SMP, an application specified protocol is proposed. Compared with MOESI, 67.8% of L1 cache energy can be saved in 32 cores case. The whole system including 32 vector cores, 256 KB L2 cache, 64-bit DDRII PHY and two PLL units, occupy 25 mm2 in 65 nm CMOS. It can achieve a peak performance of 375 GMACs and 98 GMACs/W at 1.2 V.
Web Cache Design and Implementation for Efficient SNMP Monitoring towards Internet-Scale Network Management
Ahmad Kamil ABDUL HAMID Yoshihiro KAWAHARA Tohru ASAMI

PAPER-Network Management/Operation

Vol:
E94-B No:10
Page(s):
2817-2827
In this paper, we propose an SNMP-aware web cache design that has two main objectives: (1) to avoid overload of network devices by SNMP requests, and (2) guaranteeing the monitoring time granularity of SNMP Object Identifiers (OID) for a large scale network such as the Internet. To meet these objectives, a cache is built into an RESTful active proxy, called Tambourine, which is the gateway for accessing management information through the Internet. Tambourine changes the landscape of traditional SNMP monitoring by allowing the Internet users to monitor closed-domain network devices through translating requests in HTTP into SNMP. However, the typical web cache algorithm can not be used in Tambourine due to two main reasons: (1) SNMP is not a cache-aware protocol and therefore can not provide Tambourine with the caching rules that need to be applied, and (2) the cache in Tambourine needs to accommodate two SNMP monitoring patterns: periodic and on-demand polling. In order for efficient periodic polling, SNMP traffic is reduced by a multi-TTL cache and user (or Manager)-side aggregation. For efficient on-demand polling, four-state transition is used to categorize OIDs into dynamic and static objects, each of which is allocated an optimum TTL. To provide users with a proper time stamp, the cache time stamp is included in the response to the users' request. Our experiments show that our cache design gives the staleness of 0 and a bounded number of SNMP requests even when the number of users' requests goes to infinity.
Probabilistic Broadcast-Based Cache Invalidation Scheme for Location Dependent Data in Mobile Environments
Shigeaki TAGASHIRA Yutaka KAMINISHI Yutaka ARAKAWA Teruaki KITASUKA Akira FUKUDA

PAPER-Data Engineering, Web Information Systems

Vol:
E94-D No:8
Page(s):
1590-1601
Data caching is widely known as an effective power-saving technique, in which mobile devices use local caches instead of original data placed on a server, in order to reduce the power consumption necessary for network accesses. In such data caching, a cache invalidation mechanism is important in preventing these devices from unintentionally accessing invalid data. In this paper, we propose a broadcast-based protocol for cache invalidation in a location-aware system. The proposed protocol is designed to reduce the access time required for obtaining necessary invalidation reports through broadcast media and to avoid client-side sleep fragmentation while retrieving the reports. In the proposed protocol, a Bloom filter is used as the data structure of an invalidation report, in order to probabilistically check the invalidation of caches. Furthermore, we propose three broadcast scheduling methods that are intended to achieve flexible broadcasting structured by the Bloom filter: fragmentation avoidance scheduling method (FASM), metrics balancing scheduling method (MBSM), and minimizing access time scheduling method (MASM). The broadcast schedule is arranged for consecutive accesses to geographically neighboring invalidation reports. In addition, the effectiveness of the proposed methods is evaluated by simulation. The results indicate that the MBSM and MASM achieve a high rate of performance scheduling. Compared to the FASM, the MBSM reduces the access time by 34%, while the fragmentations on the resultant schedule increase by 40%, and the MASM reduces the access time by 40%, along with an 85% increase in the number of fragmentations.

61-80hit(201hit)

Keyword Search Result

[Keyword] cache(201hit)

Exploiting the Task-Pipelined Parallelism of Stream Programs on Many-Core GPUs

A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts

Revisiting Shared Cache Contention Problems: A Practical Hardware-Software Cooperative Approach

Active Breadcrumbs: Adaptive Distribution of In-Network Guidance Information for Content-Oriented Networks

Bayesian Theory Based Adaptive Proximity Data Accessing for CMP Caches

A High-Speed Trace-Driven Cache Configuration Simulator for Dual-Core Processor L1 Caches

Reconfiguring Cache Associativity: Adaptive Cache Design for Wide-Range Reliable Low-Voltage Operation Using 7T/14T SRAM

Bypass Extended Stack Processing for Anti-Thrashing Replacement in Shared Last Level Cache of Chip Multiprocessors

Adaptive Insertion and Promotion Policies Based on Least Recently Used Replacement

Using Cacheline Reuse Characteristics for Prefetcher Throttling

Evaluation of a New Power-Gating Scheme Utilizing Data Retentiveness on Caches

Scalable Cache-Optimized Concurrent FIFO Queue for Multicore Architectures

Cache-Aware Virtual Machine Scheduling on Multi-Core Architecture

Dynamic Allocation of SPM Based on Time-Slotted Cache Conflict Graph for System Optimization

Throttling Capacity Sharing Using Life Time and Reuse Time Prediction in Private L2 Caches of Chip Multiprocessors

Accurate and Simplified Prediction of L2 Cache Vulnerability for Cost-Efficient Soft Error Protection

A New Recovery Mechanism in Superscalar Microprocessors by Recovering Critical Misprediction

A 98 GMACs/W 32-Core Vector Processor in 65 nm CMOS

Web Cache Design and Implementation for Efficient SNMP Monitoring towards Internet-Scale Network Management

Probabilistic Broadcast-Based Cache Invalidation Scheme for Location Dependent Data in Mobile Environments

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles