1-17hit |
The shared last level cache (SLLC) in tile chip multiprocessors (TCMP) provides a low off-chip miss rate, but it causes a long on-chip access latency. In the two-level cache hierarchy, data replication stores replicas of L1 victims in the local LLC (L2 cache) to obtain a short local LLC access latency on the next accesses. Many data replication mechanisms have been proposed, but they do not consider both L1 victim reuse behaviors and LLC replica reception capability. They either produce many useless replicas or increase LLC pressure, which limits the improvement of system performance. In this paper, we propose a two-level cache aware adaptive data replication mechanism (TCDR), which controls replication based on both L1 victim reuse behaviors prediction and LLC replica reception capability monitoring. TCDR not only increases the accuracy of L1 replica selection, but also avoids the pressure of replication on LLC. The results show that TCDR improves the system performance with reasonable hardware overhead.
Kazutaka KIKUTA Li YI Lilong ZOU Motoyuki SATO
In this paper, we propose a cross-correlation method applied to multistatic ground penetrating radar (GPR) data sets to detect road pavement damage. Pavement cracks and delamination cause variations in electromagnetic wave propagation. The proposed method can detect velocity change using cross-correlation of data traces at different times. An artificially damaged airport taxiway model was measured, and the method captures the positions of damaged parts.
Kimitoshi TAKAHASHI Kento AIDA Tomoya TANJO Jingtao SUN Kazushige SAGA
Linux container technology and clusters of the containers are expected to make web services consisting of multiple web servers and a load balancer portable, and thus realize easy migration of web services across the different cloud providers and on-premise datacenters. This prevents service to be locked-in a single cloud provider or a single location and enables users to meet their business needs, e.g., preparing for a natural disaster. However existing container management systems lack the generic implementation to route the traffic from the internet into the web service consisting of container clusters. For example, Kubernetes, which is one of the most popular container management systems, is heavily dependent on cloud load balancers. If users use unsupported cloud providers or on-premise datacenters, it is up to users to route the traffic into their cluster while keeping the redundancy and scalability. This means that users could easily be locked-in the major cloud providers including GCP, AWS, and Azure. In this paper, we propose an architecture for a group of containerized load balancers with ECMP redundancy. We containerize Linux ipvs and exabgp, and then implement an experimental system using standard Linux boxes and open source software. We also reveal that our proposed system properly route the traffics with redundancy. Our proposed load balancers are usable even if the infrastructure does not have supported load balancers by Kubernetes and thus free users from lock-ins.
Zhi LIU Cai XU Mengmeng ZHANG Wen YUE
Virtual Reality (VR) 360 degree video has ultra-high definition. Reducing the coding complexity becomes a key consideration in coding algorithm design. In this paper, a novel candidate mode pruning process is introduced between Rough Mode Decision and Most Probable Mode based on the statistical analysis of the intra-coding parameters used in VR 360 degree video coding under Cubemap projection (CMP) format. In addition, updated coding bits thresholds for VR 360 degree video are designed in the proposed algorithm. The experimental results show that the proposed algorithm brings 38.73% and 23.70% saving in average coding time at the cost of only 1.4% and 2.1% Bjontegaard delta rate increase in All-Intra mode and Randomaccess mode, respectively.
Daisuke FUKUDA Kenichi WATANABE Naoki IDANI Yuji KANAZAWA Masanori HASHIMOTO
As VLSI process node continue to shrink, chemical mechanical planarization (CMP) process for copper interconnect has become an essential technique for enabling many-layer interconnection. Recently, Edge-over-Erosion error (EoE-error), which originates from overpolishing and could cause yield loss, is observed in various CMP processes, while its mechanism is still unclear. To predict these errors, we propose an EoE-error prediction method that exploits machine learning algorithms. The proposed method consists of (1) error analysis stage, (2) layout parameter extraction stage, (3) model construction stage and (4) prediction stage. In the error analysis and parameter extraction stages, we analyze test chips and identify layout parameters which have an impact on EoE phenomenon. In the model construction stage, we construct a prediction model using the proposed multi-level machine learning method, and do predictions for designed layouts in the prediction stage. Experimental results show that the proposed method attained 2.7∼19.2% accuracy improvement of EoE-error prediction and 0.8∼10.1% improvement of non-EoE-error prediction compared with general machine learning methods. The proposed method makes it possible to prevent unexpected yield loss by recognizing EoE-errors before manufacturing.
This letter proposes a practical scheme that can estimate ADSL link rates. The proposed scheme allows us to estimate ADSL link rates from measurements made at the NOC using existing communications protocols and network node facilities; it imposes no heavy traffic overhead. The proposed scheme consists of two major steps. The first step is to collect measured data of round trip times (RTT) for both long and short packets to find their minimum values of RTTs by sending Internet Control Message Protocol (ICMP) echo request messages. The second step is to estimate the ADSL down- and up-link rates by using the difference in RTT between long and short packets and the experimentally-obtained correlated relationships between ADSL down- and up-link rates. RTTs are experimentally measured for an IP network, and it is shown that the down- and up-link rates can be obtained in a simple manner.
Xiaomin JIA Pingjing LU Caixia SUN Minxuan ZHANG
Chip Multi-Processors (CMPs) emerge as a mainstream architectural design alternative for high performance parallel and distributed computing. Last Level Cache (LLC) management is critical to CMPs because off-chip accesses often require a long latency. Due to its short access latency, well performance isolation and easy scalability, private cache is an attractive design alternative for LLC of CMPs. This paper proposes program Behavior Identification-based Cache Sharing (BICS) for LLC management. BICS is based on a private cache organization for the shorter access latency. Meanwhile, BICS tries to simulate a shared cache organization by allowing evicted blocks of one private LLC to be saved at peer LLCs. This technique is called spilling. BICS identifies cache behavior types of applications at runtime. When a cache block is evicted from a private LLC, cache behavior characteristics of the local application are evaluated so as to determine whether the block is to be spilled. Spilled blocks are allowed to replace some valid blocks of the peer LLCs as long as the interference is within a reasonable level. Experimental results using a full system CMP simulator show that BICS improves the overall throughput by as much as 14.5%, 12.6%, 11.0% and 11.7% (on average 8.8%, 4.8%, 4.0% and 6.8%) over private cache, shared cache, Utility-based Cache Partitioning (UCP) scheme and the baseline spilling-based organization Cooperative Caching (CC) respectively on a 4-core CMP for SPEC CPU2006 benchmarks.
Seong-Hee PARK Seong-Hee LEE Il-Soon JANG Sang-Sung CHOI Je-Hoon LEE Younggap YOU
This paper presented a new method to transfer isochronous data through an IEEE 1394 over UWB (ultra wideband) network. The goal of this research is to implement a complete heterogeneous system without commercial IEEE 1394 link chips supporting the bridge-aware function. The method resolving this dedicated chip-less situation, was employed a new bridge adapting a pseudo connection management protocol (CMP). This approach made a wired 1394 devices as an IEEE 1394 over UWB device. This method allowed an IEEE 1394 equipment to transfer an isochronous data using a UWB wireless communication network. The result of this approach was demonstrated successfully via an IEEE 1394 over UWB bridge module. The proposed CMP and IEEE 1394 over UWB bridge module can exchange isochronous data through an IEEE 1394 over UWB network. This method makes an IEEE 1394 equipment transfer an isochronous data using a UWB wireless channel.
Atsushi KUROKAWA Toshiki KANAMOTO Tetsuya IBE Akira KASEBE Wei Fong CHANG Tetsuro KAGE Yasuaki INOUE Hiroo MASUDA
Floating dummy metal fills inserted for planarization of multi-dielectric layers have created serious problems because of increased interconnect capacitance and the enormous number of fills. We present new dummy filling methods to reduce the interconnect capacitance and the number of dummy metal fills needed. These techniques include three ways of filling: 1) improved floating square fills, 2) floating parallel lines, and 3) floating perpendicular lines (with spacing between dummy metal fills above and below signal lines). We also present efficient formulas for estimating the appropriate spacing and number of fills. In our experiments, the capacitance increase using the conventional regular square method was 13.1%, while that using the methods of improved square fills, extended parallel lines, and perpendicular lines were 2.7%, 2.4%, and 1.0%, respectively. Moreover, the number of necessary dummy metal fills can be reduced by two orders of magnitude through use of the parallel line method.
Hiroyuki TSUJIKAWA Kenji SHIMAZAKI Shozo HIRANO Kazuhiro SATO Masanori HIROFUJI Junichi SHIMADA Mitsumi ITO Kiyohito MUKAI
In the move toward higher clock rates and advanced process technologies, designers of the latest electronic products are finding increasing silicon failure with respect to noise. On the other hand, the minimum dimension of patterns on LSIs is much smaller than the wavelength of exposure, making it difficult for LSI manufacturers to obtain high yield. In this paper, we present a solution to reduce power-supply noise in LSI microchips. The proposed design methodology also considers design for manufacturability (DFM) at the same time as power integrity. The method was successfully applied to the design of a system-on-chip (SOC), achieving a 13.1-13.2% noise reduction in power-supply voltage and uniformity of pattern density for chemical mechanical polishing (CMP).
Masataka SUZUKI Tsutomu MATSUMOTO
We describe a scheme of secret communication over the Internet utilizing the potentiality of the TCP/IP protocol suite in a non-standard way. Except for the sender and the receiver of the secret communication it does not need any entities installed with special software. Moreover it does not require them to share any key beforehand. Such features of the scheme stem from the use of IP datagrams with spoofed source addresses and their related error messages for the Internet Control Message Protocol (ICMP) induced by artificial faults. Countermeasures against IP spoofing are deployed in various places since it is often used together with attacks such as distributed denial of service (DDoS) and SPAM mailing. Thus we examine the environment where the scheme works as an intention and also clarify the conditions to obsolete the scheme. Furthermore we estimate the amount of secretly communicated data by the scheme and storage requirements for the receivers and those for the observers who monitor the traffic to detect the very existence of such a secret communication. We also discuss various issues including the sender anonymity achieved by the scheme.
Jung-Sik JEONG Kei SAKAGUCHI Jun-ichi TAKADA Kiyomichi ARAKI
This paper presents the performance of the Directionally Constrained Minimization of Power (DCMP) and the Zero-Forcing (ZF) in the Angular Spread (AS) environment. To obtain the optimal weights for both methods, the Extended Array Mode Vector (EAMV) is employed. It is known that the EAMV represents the instantaneous AS as well as the instantaneous DOA in the slow fading channel. As a result, it is shown that the DCMP and the ZF using the EAMV estimates can improve the Signal-to-Interference-plus-Noise Ratio (SINR) considerably, as compared with those using the Direction of Arrival (DOA) information only. At the same time, the intrinsic problems causing the performance loss in the DCMP and the ZF are revisited. From this, the reasons for the performance deterioration are analyzed, in relation with the AS, the number of samples, the number of antenna elements, and the spatial correlation coefficient of the signals. It follows that the optimal signal combining techniques using the EAMV estimates can diminish such effects.
Hideo KASAMI Shuichi OBAYASHI Hiroki SHOKI
Space division multiple access (SDMA) is an attractive technique to increase the channel capacity of wireless communication systems. In this paper, we first propose a new process to accomplish SDMA using an adaptive array at a base station receiver of broadband fixed wireless access (FWA) systems. Unlike other methods, the proposed process does not need highly accurate direction-of-arrival (DOA) estimation and is suitable to Directionally Constrained Minimization of Power (DCMP) algorithm in order to serve multiple fixed terminals. A newly modified DCMP with phase only control is proposed as well. The algorithm to control phase weights, uses only the array output power and does not require the complex baseband signals from individual array elements. The pattern measurement results in an anechoic chamber show that the proposed algorithm can direct a null to an interference while maintaining the gain to the desired signal.
Takumi MORI Kohei OHTA Nei KATO Hideaki SONE Glenn MANSFIELD Yoshiaki NEMOTO
Network traffic contains many symptoms of various network faults. Symptoms of faults aggregate and are manifested in the aggregate traffic characteristics generally observed by a traffic monitor. It is very difficult for a manager or an NMS (Network Management Station) to isolate the symptoms manifested in the aggregate traffic characteristics. Especially, transit networks, like a backbone network, deal with many types of traffic. So, symptom isolation must be efficient. In this paper, we propose a powerful algorithm for symptom isolation. This algorithm is based on the popular SNMP-based RMON technology. Using dynamically constructed aggregate, fresh symptoms can be isolated efficiently. We apply the algorithm to two operational transit networks which connects some LANs and WANs, and evaluate it using trace data collected from these networks. The results show a significant improvement in the fault management capability and accuracy. Furthermore, the characteristics of fault symptoms and the various factors for effective system configuration are discussed.
Tadayuki SAKAKIBARA Katsuyoshi KITAI Tadaaki ISOBE Shigeko YAZAWA Teruo TANAKA Yoshiko TAMAKI Yasuhiro INAGAMI
We propose an instruction-based variable priority scheme (IBVPS) which achieves high sustained memory throughput on a TCMP type vector supercomputer. Generally, there are two approaches to arbitrating interprocessor memory access conflict: request level priority control and fixed priority control. Each approach, however, affects performance in its own way: In the case of request level priority control, mutual obstruction causes a performance degradation, and in the case of fixed priority control, memory bank monopoly causes a performance degradation. Mutual obstruction refers to the interference among access requests coming from different instructions; memory bank monopoly refers to the un-interrupted accessing of the same memory bank by a series of higher priority instructions. The strategy of the instruction-based variable priority scheme consists in: (a) generally changing the priority assignment of all load/store pipelines at the end of any instruction running in the system, and (b) changing the priority assignment of all load/store pipelines more than once in the middle of an access instruction with a stride greater than 1 or an indirect access instruction which may monopolize some memory banks for an extended period of time. This strategy reduces mutual obstruction because the priority assignment is reshuffled for the entire group of load/store pipelines at a time. it also reduces memory bank monopoly because the opportunity for memory access is made equal among different instructions by changing the priority assignment at the end of an instruction. Moreover, it prevents the memory bank monopoly by a memory access instruction with a stride greater than 1 or an indirect access instruction, by changing the priority assignment more frequently. Consequently, high sustained memory throughput is achieved on TCMP type vector supercomputers.
Tsutomu TASHIRO Takasuke HASHIMOTO Fumihiko SATO Yoshihiro HAYASHI Toru TATSUMI
A 7-mask self-aligned SiGe base bipolar transistor has been newly developed. This transistor offers several advancements to a super self-aligned selectively grown SiGe base (SSSB) transistor which has a selectively grown SiGe-base layer formed by a cold-wall ultra high vacuum (UHV)/CVD system. The advancements are as follows: (1) a BPSG-filled arbitrarywidth trench isolation on a SOI is formed by a high-uniformity CMP with a hydro-chuck for reducing the number of isolation fabrication steps, (2) polysilicon-plug emitter and collector electrodes are made simultaneously using an in-situ phosphorusdoped polysilicon film to decrease the distance between emitter and collector electrodes and also to reduce the fabrication steps of the elecrodes, (3) a n+-buried collector layer is made by a high-energy phosphorus ion-implantation technique to eliminate collector epitaxial growth, and (4) a germanium profile in the neutral base region is optimized to increase the fT value without increasing leakage current at the base-cellector junction. In the developed transistor, a high performance of 80-GHz fT and mask-steps reduction are simultaneously achieved.
Optimal static load balancing problems in open BCMP queueing networks with state-independent arrival and service rates are studied. Their examples include optimal static load balancing in distributed computer systems and static routing in communication networks. We refer to the load balancing policy of minimizing the overall mean response (or sojourn) time of a job as the overall optimal policy. We show the conditions that the solutions of the overall optimal policy satisfy and show that the policy uniquely determines the utilization of each service center, the mean delay for each class and each path class, etc., although the solution, the utilization for each class, the mean delay for all classes at each service center, etc., may not be unique. Then we give tha linear relations that characterize the set whose elements are the optimal solutions, and discuss the condition wherein the overall optimal policy has a unique solution. In parametric analysis and numerical calculation of optimal values of performance variables we must ensure whether they can be uniquely determined.