In Information-Centric Networking (ICN), different routing and caching schemes have been proposed to efficiently utilize in-network caches and reduce network traffic. Most of them assume that the popularity distribution of user-requested content is homogeneous. However, the actual popularity distribution measured on the Internet is reported to possess spatial and temporal localities, which can heavily affect caching performance in ICN. Breadcrumbs (BC) routing is a key solution to mitigate performance degradation due to spatial locality because of its ability to flexibly discover cached contents in the off-path. In this paper, we deeply investigate the spatial effects of BC by revealing where utilized cached contents are located, how BC discovers these contents, what kind of contents are found, and how BC fill in the locality gap of content popularity. We also focus on another time-dimension perspective, i.e., the temporal locality of content popularity, and conduct a comprehensive study of how BC routing can be adapted to the spatiotemporal locality of content popularity in ICN.
Mikiya YOSHIDA Yusuke ITO Yurino SATO Hiroyuki KOGA
Information-centric networking (ICN) provides low-latency content delivery with in-network caching, but delivery latency depends on cache distance from consumers. To reduce delivery latency, a scheme to cluster domains and retain the main popular content in each cluster with a cache distribution range has been proposed, which enables consumers to retrieve content from neighboring clusters/caches. However, when the distribution of content popularity changes, all content caches may not be distributed adequately in a cluster, so consumers cannot retrieve them from nearby caches. We therefore propose a dynamic clustering scheme to adjust the cache distribution range in accordance with the change in content popularity and evaluate the effectiveness of the proposed scheme through simulation.
Zhaolin MA Jiali YOU Haojiang DENG
Due to the increase in the volume of data and intensified concurrent requests, distributed caching is commonly used to manage high-concurrency requests and alleviate pressure on databases. However, there is limited research on distributed record mapping caching, and traditional caching algorithms have suboptimal resolution performance for mapping records that typically follow a long-tail distribution. To address the aforementioned issue, in this paper, we propose a recommendation-based adaptive auxiliary caching method, AC-REC, which delivers the primary cache record along with a list of additional cache records. The method uses request correlations as a basis for recommendations, customizes the number of additional cache entries provided, and dynamically adjusts the time-to-live. We conducted evaluations to compare the performance of our method against various benchmark strategies. The results show that our proposed method, as compared to the conventional LCE method, increased the cache hit ratio by an average of 20%, Moreover, this improvement is achieved while effectively utilizing the cache space. We believe that our strategy will contribute an effective solution to the related studies in both traditional network architecture and caching in paradigms like ICN.
Ken NAKAMURA Yuya OMORI Daisuke KOBAYASHI Koyo NITTA Kimikazu SANO Masayuki SATO Hiroe IWASAKI Hiroaki KOBAYASHI
This paper proposes an efficient reference image sharing method for the image-division parallel video encoding architecture. This method efficiently reduces the amount of data transfer by using pre-transfer with area prediction and on-demand transfer with a transfer management table. Experimental results show that the data transfer can be reduced to 19.8-35.3% of the conventional method on average without major degradation of coding performance. This makes it possible to reduce the required bandwidth of the inter-chip transfer interface by saving the amount of data transfer.
The ARM TrustZone architecture, which provides hardware-assisted isolation, is widely adopted in mobile and IoT devices. The security of ARM TrustZone relies on the idea of splitting system-on-chip hardware and software into two worlds, namely normal world and secure world. There are legitimate channels at the hardware level that the normal world and the secure world can use to communicate with each other. To protect these channels from being abused, research efforts were invested on restricting the access to these channels from normal world components. Therefore, only predefined and legitimate normal world components can use cross-world communication channels. In this work, we present a study on data covert channels that can bypass such protection mechanisms and smuggle sensitive information. We first analyze causes of the noise in the covert channel between two worlds. Then, we evaluate the accuracy and bandwidth of covert channels built by our PRIME+COUNT method with one built by PRIME+PROBE method. Our results demonstrate that PRIME+COUNT is an effective technique for enabling cross-world covert channels in the ARM TrustZone.
The shared last level cache (SLLC) in tile chip multiprocessors (TCMP) provides a low off-chip miss rate, but it causes a long on-chip access latency. In the two-level cache hierarchy, data replication stores replicas of L1 victims in the local LLC (L2 cache) to obtain a short local LLC access latency on the next accesses. Many data replication mechanisms have been proposed, but they do not consider both L1 victim reuse behaviors and LLC replica reception capability. They either produce many useless replicas or increase LLC pressure, which limits the improvement of system performance. In this paper, we propose a two-level cache aware adaptive data replication mechanism (TCDR), which controls replication based on both L1 victim reuse behaviors prediction and LLC replica reception capability monitoring. TCDR not only increases the accuracy of L1 replica selection, but also avoids the pressure of replication on LLC. The results show that TCDR improves the system performance with reasonable hardware overhead.
Jiaheng LIU Ryusuke EGAWA Hiroyuki TAKIZAWA
As the number of cores on a processor increases, cache hierarchies contain more cache levels and a larger last level cache (LLC). Thus, the power and energy consumption of the cache hierarchy becomes non-negligible. Meanwhile, because the cache usage behaviors of individual applications can be different, it is possible to achieve higher energy efficiency of the computing system by determining the appropriate cache configurations for individual applications. This paper proposes a cache control mechanism to improve energy efficiency by adjusting a cache hierarchy to each application. Our mechanism first bypasses and disables a less-significant cache level, then partially disables the LLC, and finally adjusts the associativity if it suffers from a large number of conflict misses. The mechanism can achieve significant energy saving at the sacrifice of small performance degradation. The evaluation results show that our mechanism improves energy efficiency by 23.9% and 7.0% on average over the baseline and the cache-level bypassing mechanisms, respectively. In addition, even if the LLC resource contention occurs, the proposed mechanism is still effective for improving energy efficiency.
Saifeng HOU Yuxiang HU Le TIAN Zhiguang DANG
This work proposes NFD.P4, a cache implementation scheme in Named Data Networking (NDN), to solve the problem of insufficient cache space of prgrammable switch and realize the practical application of NDN. We transplant the cache function of NDN.P4 to the NDN Forwarding Daemon (NFD) cache server, which replace the memory space of programmable switch.
Hiroki OKADA Masato YOSHIMI Celimuge WU Tsutomu YOSHINAGA
In this study, we propose a mechanism called adaptive failsoft control to address peak traffic in mobile live streaming, using a chasing playback function. Although a cache system is avaliable to support the chasing playback function for live streaming in a base station and device-to-device communication, the request concentration by highlight scenes influences the traffic load owing to data unavailability. To avoid data unavailability, we adapted two live streaming features: (1) streaming data while switching the video quality, and (2) time variability of the number of requests. The second feature enables a fallback mechanism for the cache system by prioritizing cache eviction and terminating the transfer of cache-missed requests. This paper discusses the simulation results of the proposed mechanism, which adopts a request model appropriate for (a) avoiding peak traffic and (b) maintaining continuity of service.
Kouki OZAWA Takahiro HIROFUCHI Ryousei TAKANO Midori SUGAYA
With the development of IoT devices and sensors, edge computing is leading towards new services like autonomous cars and smart cities. Low-latency data access is an essential requirement for such services, and a large-capacity cache server is needed on the edge side. However, it is not realistic to build a large capacity cache server using only DRAM because DRAM is expensive and consumes substantially large power. A hybrid main memory system is promising to address this issue, in which main memory consists of DRAM and non-volatile memory. It achieves a large capacity of main memory within the power supply capabilities of current servers. In this paper, we propose Fogcached, that is, the extension of a widely-used KVS (Key-Value Store) server program (i.e., Memcached) to exploit both DRAM and non-volatile main memory (NVMM). We used Intel Optane DCPM as NVMM for its prototype. Fogcached implements a Dual-LRU (Least Recently Used) mechanism that seamlessly extends the memory management of Memcached to hybrid main memory. Fogcached reuses the segmented LRU of Memcached to manage cached objects in DRAM, adds another segmented LRU for those in DCPM and bridges the LRUs by a mechanism to automatically replace cached objects between DRAM and DCPM. Cached objects are autonomously moved between the two memory devices according to their access frequencies. Through experiments, we confirmed that Fogcached improved the peak value of a latency distribution by about 40% compared to Memcached.
Koki HIGASHI Yoichi ISHIWATA Takeshi OHKAWA Midori SUGAYA
Recently, edge servers located closer than the cloud have become expected for the purpose of processing the large amount of sensor data generated by IoT devices such as robots. Research has been proposed to improve responsiveness as a cache server by applying KVS (Key-Value Store) to the edge as a method for obtaining high responsiveness. Above all, a hybrid-KVS server that uses both DRAM and NVMM (Non-Volatile Main Memory) devices is expected to achieve both responsiveness and reliability. However, its effectiveness has not been verified in actual applications, and its effectiveness is not clear in terms of its relationship with the cloud. The purpose of this study is to evaluate the effectiveness of hybrid-KVS servers using the SLAM (Simultaneous Localization and Mapping), which is a widely used application in robots and autonomous driving. It is appropriate for applying an edge server and requires responsiveness and reliability. SLAM is generally implemented on ROS (Robot Operating System) middleware and communicates with the server through ROS middleware. However, if we use hybrid-KVS on the edge with the SLAM and ROS, the communication could not be achieved since the message objects are different from the format expected by KVS. Therefore, in this research, we propose a mechanism to apply the ROS memory object to hybrid-KVS by designing and implementing the data serialization function to extend ROS. As a result of the proposed fogcached-ros and evaluation, we confirm the effectiveness of low API overhead, support for data used by SLAM, and low latency difference between the edge and cloud.
In cloud radio access networks (C-RANs) architecture, the Hybrid Automatic Repeat Request (HARQ) protocol imposes a strict limit on the latency between the baseband unit (BBU) pool and the remote radio head (RRH), which is a key challenge in the adoption of C-RANs. In this letter, we propose a joint edge caching and network coding strategy (ENC) in the C-RANs with multicast fronthaul to improve the performance of HARQ and thus achieve ultra-low latency in 5G cellular systems. We formulate the edge caching design as an optimization problem for maximizing caching utility so as to obtain the optimal caching time. Then, for real-time data flows with different latency constraints, we propose a scheduling policy based on network coding group (NCG) to maximize coding opportunities and thus improve the overall latency performance of multicast fronthaul transmission. We evaluate the performance of ENC by conducting simulation experiments based on NS-3. Numerical results show that ENC can efficiently reduce the delivery delay.
Rei NAKAGAWA Satoshi OHZAHATA Ryo YAMAMOTO Toshihiko KATO
Recently, information centric network (ICN) has attracted attention because cached content delivery from router's cache storage improves quality of service (QoS) by reducing redundant traffic. Then, adaptive video streaming is applied to ICN to improve client's quality of experience (QoE). However, in the previous approaches for the cache control, the router implicitly caches the content requested by a user for the other users who may request the same content subsequently. As a result, these approaches are not able to use the cache effectively to improve client's QoE because the cached contents are not always requested by the other users. In addition, since the previous cache control does not consider network congestion state, the adaptive bitrate (ABR) algorithm works incorrectly and causes congestion, and then QoE degrades due to unnecessary congestion. In this paper, we propose an explicit cache placement notification for congestion-aware adaptive video streaming over ICN (CASwECPN) to mitigate congestion. CASwECPN encourages explicit feedback according to the congestion detection in the router on the communication path. While congestion is detected, the router caches the requested content to its cache storage and explicitly notifies the client that the requested content is cached (explicit cache placement and notification) to mitigate congestion quickly. Then the client retrieve the explicitly cached content in the router detecting congestion according to the general procedures of ICN. The simulation experiments show that CASwECPN improves both QoS and client's QoE in adaptive video streaming that adjusts the bitrate adaptively every video segment download. As a result, CASwECPN effectively uses router's cache storage as compared to the conventional cache control policies.
Chaoran ZHOU Jianping ZHAO Tai MA Xin ZHOU
In Internet applications, when users search for information, the search engines invariably return some invalid webpages that do not contain valid information. These invalid webpages interfere with the users' access to useful information, affect the efficiency of users' information query and occupy Internet resources. Accurate and fast filtering of invalid webpages can purify the Internet environment and provide convenience for netizens. This paper proposes an invalid webpage filtering model (HAIF) based on deep learning and hierarchical attention mechanism. HAIF improves the semantic and sequence information representation of webpage text by concatenating lexical-level embeddings and paragraph-level embeddings. HAIF introduces hierarchical attention mechanism to optimize the extraction of text sequence features and webpage tag features. Among them, the local-level attention layer optimizes the local information in the plain text. By concatenating the input embeddings and the feature matrix after local-level attention calculation, it enriches the representation of information. The tag-level attention layer introduces webpage structural feature information on the attention calculation of different HTML tags, so that HAIF is better applicable to the Internet resource field. In order to evaluate the effectiveness of HAIF in filtering invalid pages, we conducted various experiments. Experimental results demonstrate that, compared with other baseline models, HAIF has improved to various degrees on various evaluation criteria.
Jianli CAO Zhikui CHEN Yuxin WANG He GUO Pengcheng WANG
Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.
Tomohiro KORIKAWA Akio KAWABATA Fujun HE Eiji OKI
The performance of packet processing applications is dependent on the memory access speed of network systems. Table lookup requires fast memory access and is one of the most common processes in various packet processing applications, which can be a dominant performance bottleneck. Therefore, in Network Function Virtualization (NFV)-aware environments, on-chip fast cache memories of a CPU of general-purpose hardware become critical to achieve high performance packet processing speeds of over tens of Gbps. Also, multiple types of applications and complex applications are executed in the same system simultaneously in carrier network systems, which require adequate cache memory capacities as well. In this paper, we propose a packet processing architecture that utilizes interleaved 3 Dimensional (3D)-stacked Dynamic Random Access Memory (DRAM) devices as off-chip Last Level Cache (LLC) in addition to several levels of dedicated cache memories of each CPU core. Entries of a lookup table are distributed in every bank and vault to utilize both bank interleaving and vault-level memory parallelism. Frequently accessed entries in 3D-stacked DRAM are also cached in on-chip dedicated cache memories of each CPU core. The evaluation results show that the proposed architecture reduces the memory access latency by 57%, and increases the throughput by 100% while reducing the blocking probability but about 10% compared to the architecture with shared on-chip LLC. These results indicate that 3D-stacked DRAM can be practical as off-chip LLC in parallel packet processing systems.
Hayato YAMAKI Hiroaki NISHI Shinobu MIWA Hiroki HONDA
We propose a technique to reduce compulsory misses of packet processing cache (PPC), which largely affects both throughput and energy of core routers. Rather than prefetching data, our technique called response prediction cache (RPC) speculatively stores predicted data in PPC without additional access to the low-throughput and power-consuming memory (i.e., TCAM). RPC predicts the data related to a response flow at the arrival of the corresponding request flow, based on the request-response model of internet communications. Our experimental results with 11 real-network traces show that RPC can reduce the PPC miss rate by 13.4% in upstream and 47.6% in downstream on average when we suppose three-layer PPC. Moreover, we extend RPC to adaptive RPC (A-RPC) that selects the use of RPC in each direction within a core router for further improvement in PPC misses. Finally, we show that A-RPC can achieve 1.38x table-lookup throughput with 74% energy consumption per packet, when compared to conventional PPC.
Makoto TAKITA Masanori HIROTOMO Masakatu MORII
The network load is increasing due to the spread of content distribution services. Caching is recognized as a technique to reduce the peak network load by storing popular content into memories of users. Coded caching is a new caching approach based on a carefully designed content placement to create coded multicasting opportunities. Coded caching schemes in single-rate networks are evaluated by the tradeoff between the size of memory and that of delivered data. For considering the network with multiple transmission rates, it is crucial how to operate multicast. In multicast delivery, a sender must communicate to intended receivers at a rate that is available to all receivers. Multicast scheduling method of determining rates to deliver are evaluated by throughput and delay in multi-rate wireless networks. In this paper, we discuss coded caching in the multi-rate wireless networks. We newly define a measure for evaluating the coded caching scheme as coded caching delay and propose a new coded caching scheme. Also, we compare the proposed coded caching scheme with conventional coded caching schemes and show that the proposed scheme is suitable for multi-rate wireless networks.
Junyao RAN Youhua FU Hairong WANG Chen LIU
We propose to use clustered interference alignment for the situation where the backhaul link capacity is limited and the base station is cache-enabled given MIMO interference channels, when the number of Tx-Rx pairs exceeds the feasibility constraint of interference alignment. We optimize clustering with the soft cluster size constraint algorithm by adding a cluster size balancing process. In addition, the CSI overhead is quantified as a system performance indicator along with the average throughput. Simulation results show that cluster size balancing algorithm generates clusters that are more balanced as well as attaining higher long-term throughput than the soft cluster size constraint algorithm. The long-term throughput is further improved under high SNR by reallocating the capacity of the backhaul links based on the clustering results.
The static deployment of Virtualized Network Functions (VNFs) introduces 1) significant degradation of Quality of Service (QoS), 2) inefficiency in the network and computing resource utilization, and 3) Network Function Virtualization (NFV)-based services with insufficient scalability, optimality, and flexibility. Caching VNFs is a promising solution to satisfy the dynamic demand to deploy a variety of VNFs and to maximize the performance as well as cost effectiveness. Although the concept of Content Delivery Network (CDN) is popular for efficiently caching and distributing contents, VNF deployment does not realize the benefit of CDN-based caching approaches. The challenges to caching VNFs are 1) to cover the large variety of VNFs and their properties, including the necessity of service chaining, and 2) to achieve high acceptance ratio given the limited availability of resources. This paper proposes Function Delivery Network (FDN), which is a cluster of distributed edge hypervisors for caching VNFs over a Software-Defined Network (SDN). The deployment and quality of the network function can be significantly improved by serving them closer to the end-users from the cached VNFs. FDN introduces a new strategy called Value-based caching that considers 1) the locality of reference and performance parameters of network and edge hypervisors together and 2) a partial deployment of service chains across multiple edge hypervisors for further efficient utilization of hypervisors resources. Evaluations on different patterns of input requests confirm that Value-based caching introduces significant improvement on both QoS and resource utilization in NFV.