Yasuyuki KAWAMURA Akira MATSUBAYASHI
We study the online file allocation problem on ring networks. In this paper, we present a 7-competitive randomized algorithm against an adaptive online adversary on uniform cactus graphs. The algorithm is deterministic if the file size is 1. Moreover, we obtain lower bounds of 4.25 and 3.833 for a deterministic algorithm and a randomized algorithm against an adaptive online adversary, respectively, on ring networks.
Nobuaki TOJO Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
Recently, two-level cache, L1 cache and L2 cache, is commonly used in a processor. Particularly in an embedded system whereby a single application or a class of applications is repeatedly executed on a processor, its cache configuration can be customized such that an optimal one is achieved. An optimal two-level cache configuration can be obtained which minimizes overall memory access time or memory energy consumption by varying the three cache parameters: the number of sets, a line size, and an associativity, for L1 cache and L2 cache. In this paper, we first extend the L1 cache simulation algorithm so that we can explore two-level cache configuration. Second, we propose two-level cache design space exploration algorithms: CRCB-T1 and CRCB-T2, each of which is based on applying Cache Inclusion Property to two-level cache configuration. Each of the proposed algorithms realizes exact cache simulation but decreases the number of cache hit/miss judgments by a factor of several thousands. Experimental results show that, by using our approach, the number of cache hit/miss judgments required to optimize a cache configurations is reduced to 1/50-1/5500 compared to the exhaustive approach. As a result, our proposed approach totally runs an average of 1398.25 times faster compared to the exhaustive approach. Our proposed cache simulation approach achieves the world fastest two-level cache design space exploration.
Bloom filters are widely used for various network applications. Because of the limited size of on-chip memory and the large volume of network traffic, Bloom filters are often required to update their contents incrementally. Two techniques have been used for this purpose: cold cache and double buffering. Cold cache outperforms double buffering in terms of the average cache ratio. However, double buffering works much better than cold cache for the worst-case cache hit ratio. In this paper, we propose a new updating scheme for Bloom filters, which updates the contents of Bloom filter incrementally while the assigned memory space is fully utilized. The proposed scheme works better than cold cache in terms of the average cache hit ratio. At the same time, it outperforms double buffering for the worst-case cache hit ratio.
Yi WAN Takuya ASAKA Tatsuro TAKAHASHI
User Generated Content (UGC) VoD services such as YouTube are becoming more and more popular, and their maintenance costs are growing as well. Many P2P solutions have been proposed to reduce server load in such systems, but almost all of them focus on the single-video approach, which only has limited effect on the systems serving short videos such as UGC. The purpose of this paper is to investigate the potential of an alternative approach, the multi-video approach, and we use a very simple method called collaborative caching to show that methods using the multi-video approach are generally more suitable for current UGC VoD systems. We also study the influence of the major design factors through simulations and provide guidelines for efficiently building systems with this method.
Nobuaki TOJO Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
In an embedded system where a single application or a class of applications is repeatedly executed on a processor, its cache configuration can be customized such that an optimal one is achieved. We can have an optimal cache configuration which minimizes overall memory access time by varying the three cache parameters: the number of sets, a line size, and an associativity. In this paper, we first propose two cache simulation algorithms: CRCB1 and CRCB2, based on Cache Inclusion Property. They realize exact cache simulation but decrease the number of cache hit/miss judgments dramatically. We further propose three more cache design space exploration algorithms: CRMF1, CRMF2, and CRMF3, based on our experimental observations. They can find an almost optimal cache configuration from the viewpoint of access time. By using our approach, the number of cache hit/miss judgments required for optimizing cache configurations is reduced to 1/10-1/50 compared to conventional approaches. As a result, our proposed approach totally runs an average of 3.2 times faster and a maximum of 5.3 times faster compared to the fastest approach proposed so far. Our proposed cache simulation approach achieves the world fastest cache design space exploration when optimizing total memory access time.
Ngo Anh VIEN Nguyen Hoang VIET SeungGwan LEE TaeChoong CHUNG
In this paper, we solve the call admission control (CAC) and routing problem in an integrated network that handles several classes of calls of different values and with different resource requirements. The problem of maximizing the average reward (or cost) of admitted calls per unit time is naturally formulated as a semi-Markov Decision Process (SMDP) problem, but is too complex to allow for an exact solution. Thus in this paper, a policy gradient algorithm, together with a decomposition approach, is proposed to find the dynamic (state-dependent) optimal CAC and routing policy among a parameterized policy space. To implement that gradient algorithm, we approximate the gradient of the average reward. Then, we present a simulation-based algorithm to estimate the approximate gradient of the average reward (called GSMDP algorithm), using only a single sample path of the underlying Markov chain for the SMDP of CAC and routing problem. The algorithm enhances performance in terms of convergence speed, rejection probability, robustness to the changing arrival statistics and an overall received average revenue. The experimental simulations will compare our method's performance with other existing methods and show the robustness of our method.
Takayuki FUJIWARA Eiji OKI Kohei SHIOMOTO
A call admission control (CAC) scheme based on statistical information is proposed, called the statistical CAC scheme. A conventional scheme needs to manage session information for each link to update the residual bandwidth of a network in real time. This scheme has a scalability problem in terms of network size. The statistical CAC rejects session setup requests in accordance to a pre-computed ratio, called the rejection ratio. The rejection ratio is computed by using statistical information about the bandwidth requested for each link so that the congestion probability is less than an upper bound specified by a network operator. The statistical CAC is more scalable in terms of network size than the conventional scheme because it does not need to keep accommodated session state information. Numerical results show that the statistical CAC, even without exact session state information, only slightly degrades network utilization compared with the conventional scheme.
Chul-Woong YANG Ki Yong LEE Yon Dohn CHUNG Myoung Ho KIM Yoon-Joon LEE
In this paper, we propose an effective Web cache admission control algorithm. By selectively admitting objects into the cache, the proposed algorithm can significantly reduce the amount of disk I/O on a Web cache while maintaining a high hit ratio. The proposed algorithm adaptively adjusts its own admission control parameter, requiring no user-supplied parameters. Through extensive experiments, we show the effectiveness of the proposed algorithm.
Gi-Ho PARK Jung-Wook PARK Hoi-Jin LEE Gunok JUNG Sung-Bae PARK Shin-Dug KIM
This paper presents a cache way enabling mechanism using branch target addresses. This mechanism uses branch prediction information to avoid the power consumption due to unnecessary cache way access by enabling only the cache way(s) that should be accessed. The proposed cache way enabling mechanism reduces the power consumption of the instruction cache by 63% without any performance degradation of the processor. An ARM1136 processor simulator and the Synopsys PrimeTime are used to perform the performance/power simulation and static timing analysis of the proposed mechanisms respectively.
Jae-Ho CHOI Sang-Hyun PARK Myong-Soo LEE SangKeun LEE
With the growth of wireless computing and the popularity of eXtensible Markup Language (XML), wireless XML data management is emerging as an important research area. In this paper, cache invalidation methodology with XML update is addressed in wireless computing environments. A family of XML cache invalidation strategies, called S-XIR, D-XIR and E-XIR, is suggested. Using S-XIR and D-XIR, the unchanged part of XML data, only its structure changes, can be effectively reused in client caching. E-XIR, which uses prefetching, can further improve access time. Simulations are carried out to evaluate the proposed methodology; they show that the proposed strategies improve both tuning time and access time significantly. In particular, the proposed strategies are on average about 4 to 12 times better than the previous approach in terms of tuning time.
In recent years, heterogeneous devices have been employed frequently in mobile storage systems because a combination of such devices can supply a synergistically useful storage solution by taking advantage of each device. One important design constraint in heterogeneous storage systems is to mitigate I/O performance degradation stemming from the difference between access times of different devices. To this end, there has not been much work to devise proper buffer cache management algorithms. This paper presents a novel buffer cache management algorithm which considers both I/O cost per device and workload patterns in mobile computing systems with a heterogeneous storage pair of a hard disk and a NAND flash memory. In order to minimize the total I/O cost under varying workload patterns, the proposed algorithm employs a dynamic cache partitioning technique over different devices and manages each partition according to request patterns and I/O types along with the temporal locality. Trace-based simulations show that the proposed algorithm reduces the total I/O cost and flash write count significantly over the existing buffer cache algorithms on typical mobile traces.
In this letter, we propose a cache organization that substantially reduces the memory bandwidth of motion compensation (MC) in the H.264/AVC decoders. To reduce duplicated memory accesses to P and B pictures, we employ a four-way set-associative cache in which its index bits are composed of horizontal and vertical address bits of the frame buffer and each line stores an 8 2 pixel data in the reference frames. Moreover, we alleviate the data fragmentation problem by selecting its line size that equals the minimum access size of the DDR SDRAM. The bandwidth of the optimized cache averaged over five QCIF IBBP image sequences requires only 129% of the essential bandwidth of an H.264/AVC MC.
Akihiro NAKAO Kengo SASAKI Shu YAMAMOTO
We observe that P2P traffic has peculiar characteristics as opposed to the other type of traffic such as web browsing and file transfer. Since they exploit swarm effect -- a multitude of end points downloading the same content piece by piece nearly at the same time, thus, increasing the effectiveness of caching -- the same pieces of data end up traversing the network over and over again within mostly a short time window. In the light of this observation, we propose a network layer packet-level caching for reducing the volume of emerging P2P traffic, transparently to the P2P applications -- without affecting operations of the P2P applications at all -- rather than banning it, restricting it, or modifying P2P systems themselves. Unlike the other caching techniques, we aim to provide as generic a caching mechanism as possible at network layer -- without knowing much detail of P2P application protocols -- to extend applicability to arbitrary P2P protocols. Our preliminary evaluation shows that our approach is expected to reduce a significant amount of P2P traffic transparently to P2P applications.
Maziar GOUDARZI Tadayuki MATSUMURA Tohru ISHIHARA
The share of leakage in cache power consumption increases with technology scaling. Choosing a higher threshold voltage (Vth) and/or gate-oxide thickness (Tox) for cache transistors improves leakage, but impacts cell delay. We show that due to uncorrelated random within-die delay variation, only some (not all) of cells actually violate the cache delay after the above change. We propose to add a spare cache way to replace delay-violating cache-lines separately in each cache-set. By SPICE and gate-level simulations in a commercial 90 nm process, we show that choosing higher Vth, Tox and adding one spare way to a 4-way 16 KB cache reduces leakage power by 42%, which depending on the share of leakage in total cache power, gives up to 22.59% and 41.37% reduction of total energy respectively in L1 instruction- and L2 unified-cache with a negligible delay penalty, but without sacrificing cache capacity or timing-yield.
Soong Hyun SHIN Sung Woo CHUNG Eui-Young CHUNG Chu Shik JHON
As technology scales down, leakage energy accounts for a greater proportion of total energy. Applying the drowsy technique to a cache, is regarded as one of the most efficient techniques for reducing leakage energy. However, it increases the Soft Error Rate (SER), thus, many researchers doubt the reliability of the drowsy technique. In this paper, we show several reasons why the instruction cache can adopt the drowsy technique without reliability problems. First, an instruction cache always stores read-only data, leading to soft error recovery by re-fetching the instructions from lower level memory. Second, the effect of the re-fetching caused by soft errors on performance is negligible. Additionally, a considerable percentage of soft errors can occur without harming the performance. Lastly, unrecoverable soft errors can be controlled by the scrubbing method. The simulation results show that the drowsy instruction cache rarely increases the rate of unrecoverable errors and negligibly degrades the performance.
Hamid NOORI Maziar GOUDARZI Koji INOUE Kazuaki MURAKAMI
Energy consumption is a major concern in embedded computing systems. Several studies have shown that cache memories account for 40% or more of the total energy consumed in these systems. Active power used to be the primary contributor to total power dissipation of CMOS designs, but with the technology scaling, the share of leakage in total power consumption of digital systems continues to grow. Moreover, temperature is another factor that exponentially increases the leakage current. In this paper, we show the effect of temperature on the optimal (minimum-energy-consuming) cache configuration for low energy embedded systems. Our results show that for a given application and technology, the optimal cache size moves toward smaller caches at higher temperatures, due to the larger leakage. Consequently, a Temperature-Aware Configurable Cache (TACC) is an effective way to save energy in finer technologies when the embedded system is used in different temperatures. Our results show that using a TACC, up to 61% energy can be saved for instruction cache and 77% for data cache compared to a configurable cache that has been configured for only the corner-case temperature (100). Furthermore, the TACC also enhances the performance by up to 28% for the instruction cache and up to 17% for the data cache.
Koji KOBATAKE Shigeaki TAGASHIRA Satoshi FUJITA
P2P DHT (Peer-to-Peer Distributed Hash Table) is one of typical techniques for realizing an efficient management of shared resources distributed over a network and a keyword search over such networks in a fully distributed manner. In this paper, we propose a new method for supporting conjunctive queries in P2P DHT. The basic idea of the proposed technique is to share a global information on past trials by conducting a local caching of search results for conjunctive queries and by registering the fact to the global DHT. Such a result caching is expected to significantly reduce the amount of transmitted data compared with conventional schemes. The effect of the proposed method is experimentally evaluated by simulation. The result of experiments indicates that by using the proposed method, the amount of returned data is reduced by 60% compared with conventional P2P DHT which does not support conjunctive queries.
Makoto SUGIHARA Tohru ISHIHARA Kazuaki MURAKAMI
This paper proposes a task scheduling approach for reliable cache architectures (RCAs) of multiprocessor systems. The RCAs dynamically switch their operation modes for reducing the usage of vulnerable SRAMs under real-time constraints. A mixed integer programming model has been built for minimizing vulnerability under real-time constraints. Experimental results have shown that our task scheduling approach achieved 47.7-99.9% less vulnerability than a conventional one.
Sungwook KIM Myungwhan CHOI Sungchun KIM
New multimedia services over cellular/WLAN overlay networks require different Quality of Service (QoS) levels. Therefore, an efficient network management system is necessary in order to realize QoS sensitive multimedia services while enhancing network performance. In this paper, we propose a new online network management framework for overlay networks. Our online approach to network management exhibits dynamic adaptability, flexibility, and responsiveness to the traffic conditions in multimedia networks. Simulation results indicate that our proposed framework can strike the appropriate balance between performance criteria under widely varying diverse traffic loads.
Myung Ho YEO Young Soo MIN Kyoung Soo BOK Jae Soo YOO
In this paper, a novel cache conscious indexing technique based on space partitioning trees is proposed. Many researchers investigated efficient cache conscious indexing techniques which improve retrieval performance of in-memory database management system recently. However, most studies considered data partitioning and targeted fast information retrieval. Existing data partitioning-based index structures significantly degrade performance due to the redundant accesses of overlapped spaces. Specially, R-tree-based index structures suffer from the propagation of MBR (Minimum Bounding Rectangle) information by updating data frequently. In this paper, we propose an in-memory space partitioning index structure for optimal cache utilization. The proposed index structure is compared with the existing index structures in terms of update performance, insertion performance and cache-utilization rate in a variety of environments. The results demonstrate that the proposed index structure offers better performance than existing index structures.