Takafumi TANAKA Masahiko JINNO
Many detailed studies ranging from networking to hardware as well as standardization activities over the last few years have advanced the performance of the elastic optical network. Thanks to these intensive works, the elastic optical network has been becoming feasible. This paper reviews the recent advances in the elastic optical network from the aspects of networking technology and hardware design. For the former, we focus on the efficient elastic network design technology related to routing and spectrum assignment (RSA) of elastic optical paths including network optimization or standardization activities, and for the latter, two key enabling technologies are discussed: elastic transponders/regenerators and gridless optical switches. Making closely-dependent networking and hardware technologies work synergistically is the key factor in implementing truly effective elastic optical networks.
Younchan JUNG Marnel PERADILLA J. William ATWOOD
Currently, a correspondent host will have difficulties in establishing a direct session path to a mobile host because of the partial deployment of MIPv6-aware mobile hosts. Even MIPv6-aware hosts will spend up to several seconds to obtain the new location of the mobile host during Layer 3 (L3) handover. This paper proposes an application-level mobility management scheme that can solve the problems related to the increase of Internet traffic end-to-end delay under the current situation that most of the mobile devices are MIPv6-non-aware. The proposed Secure Mobility Management Application (SMMA) enables the updates of care-of address to be faster and more reliable even when L3 handovers occur frequently. SMMA uses a cross-layer approach for session mobility management with the support of Binding Updates to the home agent via IPSec tunnels. The main feature of SMMA is to handle the session-related mobility management for which operation starts just after the completion of name resolution as a pre-call mobility management, which operates in conjunction with the DNS. Our session-related mobility management introduces three new signaling messages: SS-Create for session state creation, SS-Refresh for session state extension and SS-Renewal for updating new care-of address at the mid-session. Finally, this paper analyzes the work load imposed on a mobile host to create a session state and the security strength of the SS-Renewal message, which depends on the key size used.
Hirotoshi HONMA Yoko NAKAJIMA Yuta IGARASHI Shigeru MASUYAMA
Consider a simple undirected graph G = (V,E) with vertex set V and edge set E. Let G-u be a subgraph induced by the vertex set V-{u}. The distance δG(x,y) is defined as the length of the shortest path between vertices x and y in G. The vertex u ∈ V is a hinge vertex if there exist two vertices x,y ∈ V-{u} such that δG-u(x,y)>δG(x,y). Let U be a set consisting of all hinge vertices of G. The neighborhood of u is the set of all vertices adjacent to u and is denoted by N(u). We define d(u) = max{δG-u(x,y) | δG-u(x,y)>δG(x,y),x,y ∈ N(u)} for u ∈ U as detour degree of u. A maximum detour hinge vertex problem is to find a hinge vertex u with maximum d(u) in G. In this paper, we proposed an algorithm to find the maximum detour hinge vertex on an interval graph that runs in O(n2) time, where n is the number of vertices in the graph.
We consider the problem of optimizing the quantizer design for distributed estimation systems where all nodes located at different sites collect measurements and transmit quantized data to a fusion node, which then produces an estimate of the parameter of interest. For this problem, the goal is to minimize the amount of information that the nodes have to transmit in order to attain a certain application accuracy. We propose an iterative quantizer design algorithm that seeks to find a non-regular mapping between quantization partitions and their codewords so as to minimize global distortion such as the estimation error. We apply the proposed algorithm to a system where an acoustic amplitude sensor model is employed at each node for source localization. Our experiments demonstrate that a significant performance gain can be achieved by our technique as compared with standard typical designs and even with distributed novel designs recently published.
An-Sheng CHAO Cheng-Wu LIN Hsin-Wen TING Soon-Jyh CHANG
The proposed stimulus design for linearity test is embedded in a differential successive approximation register analog-to-digital converter (SAR ADC), i.e. a design for testability (DFT). The proposed DFT is compatible to the pattern generator (PG) and output response analyzer (ORA) with the cost of 12.4-% area of the SAR ADC. The 10-bit SAR ADC prototype is verified in a 0.18-µm CMOS technology and the measured differential nonlinearity (DNL) error is between -0.386 and 0.281 LSB at 1-MS/s.
Chang-shuai WANG Jong-wha CHONG
In this paper, a novel White-RGB (WRGB) color filter array-based imaging system for cell phone is presented to reduce noise and reproduce color in low illumination. The core process is based on adaptive diagonal color separation to recover color components from a white signal using diagonal reference blocks and location-based color ratio estimation in the luminance space. The experiments, which are compared with the RGB and state-of-the-art WRGB approaches, show that our imaging system performs well for various spatial frequency images and color restoration in low-light environments.
Shinobu MIWA Takara INOUE Hiroshi NAKAMURA
Turbo mode, which accelerates many applications without major change of existing systems, is widely used in commercial processors. Since time duration or powerfulness of turbo mode depends on peak temperature of a processor chip, reducing the peak temperature can reinforce turbo mode. This paper presents that adding small amount of hardware allows microprocessors to reduce the peak temperature drastically and then to reinforce turbo mode successfully. Our approach is to find out a few small units that become heat sources in a processor and to appropriately duplicate them for reduction of their power density. By duplicating the limited units and using the copies evenly, the processor can show significant performance improvement while achieving area-efficiency. The experimental result shows that the proposed method achieves up to 14.5% of performance improvement in exchange for 2.8% of area increase.
Amir Masoud GHAREHBAGHI Masahiro FUJITA
This paper presents a method for automatic rectification of design bugs in processors. Given a golden sequential instruction-set architecture model of a processor and its erroneous detailed cycle-accurate model at the micro-architecture level, we perform symbolic simulation and property checking combined with concrete simulation iteratively to detect the buggy location and its corresponding fix. We have used the truth-table model of the function that is required for correction, which is a very general model. Moreover, we do not represent the truth-table explicitly in the design. We use, instead, only the required minterms, which are obtained from the output of our backend formal engine. This way, we avoid adding any new variable for representing the truth-table. Therefore, our correction model is scalable to the number of inputs of the truth-table that could grow exponentially. We have shown the effectiveness of our method on a complex out-of-order superscalar processor supporting atomic execution of instructions. Our method reduces the model size for correction by 6.0x and total correction time by 12.6x, on average, compared to our previous work.
Shiho HAGIWARA Takanori DATE Kazuya MASU Takashi SATO
This paper proposes a novel and an efficient method termed hypersphere sampling to estimate the circuit yield of low-failure probability with a large number of variable sources. Importance sampling using a mean-shift Gaussian mixture distribution as an alternative distribution is used for yield estimation. Further, the proposed method is used to determine the shift locations of the Gaussian distributions. This method involves the bisection of cones whose bases are part of the hyperspheres, in order to locate probabilistically important regions of failure; the determination of these regions accelerates the convergence speed of importance sampling. Clustering of the failure samples determines the required number of Gaussian distributions. Successful static random access memory (SRAM) yield estimations of 6- to 24-dimensional problems are presented. The number of Monte Carlo trials has been reduced by 2-5 orders of magnitude as compared to conventional Monte Carlo simulation methods.
Yohei NAKATA Yuta KIMI Shunsuke OKUMURA Jinwook JUNG Takuya SAWADA Taku TOSHIKAWA Makoto NAGATA Hirofumi NAKANO Makoto YABUUCHI Hidehiro FUJIWARA Koji NII Hiroyuki KAWAI Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
This paper presents a resilient cache memory for dynamic variation tolerance in a 40-nm CMOS. The cache can perform sustained operations under a large-amplitude voltage droop. To realize sustained operation, the resilient cache exploits 7T/14T bit-enhancing SRAM and on-chip voltage/temperature monitoring circuit. 7T/14T bit-enhancing SRAM can reconfigure itself dynamically to a reliable bit-enhancing mode. The on-chip voltage/temperature monitoring circuit can sense a precise supply voltage level of a power rail of the cache. The proposed cache can dynamically change its operation mode using the voltage/temperature monitoring result and can operate reliably under a large-amplitude voltage droop. Experimental result shows that it does not fail with 25% and 30% droop of Vdd and it provides 91 times better failure rate with a 35% droop of Vdd compared with the conventional design.
SinNyoung KIM Akira TSUCHIYA Hidetoshi ONODERA
This paper proposes a radiation-hardened phase-locked loop (RH-PLL) with a switchable dual modular redundancy (DMR) structure. After radiation strikes, unhardened PLLs suffer clock perturbations. Conventional RH-PLLs have been proposed to reduce recovery time after perturbation. However, this recovery still requires tens of clock cycles. Our proposal involves ‘detecting’ and ‘switching’, rather than ‘recovering’ from clock perturbation. Detection speed is crucial for robust perturbation-immunity. We identify types of clock perturbation and then propose a set of detectors to detect each type. With this method, the detectors guarantee high-speed detection that leads to perturbation-immune switching from a radiated clock to an undistorted clock. The proposed RH-PLL was fabricated and then verified with a radiation test on real silicon.
Kumpei YOSHIKAWA Kouji ICHIKAWA Makoto NAGATA
An LSI Chip-Package-Board integrated power noise simulation model and its validity is discussed in this paper. A unified power delivery network (PDN) of LSI chip, package, and printed circuit board (PCB) is connected with on-chip power supply current models with capacitor charging expression. The proposed modeling flow is demonstrated for the 32-bit microprocessor in a 1.0V 90nm CMOS technology. The PDN of the system that includes a chip, bonding wires and a printed circuit board is modeled in an equivalent circuit. The on-chip power supply noise monitoring technique and the magnetic probe method is applied for validating simulation results. Simulations and measurements explore power supply noise generation with the dependency on operating frequencies in the wide range from 10MHz to 300MHz, under the operation mode of dynamic frequency scaling, and in the long time operation with various operation codes. It is confirmed that the proposed power supply noise simulation model is helpful for the noise estimation throughout the design phase of the LSI system.
Christian Henry Wijaya OEY Sangman MOH
One of the most important requirements for a routing protocol in wireless body area networks (WBANs) is to lower the network's temperature increase. The temperature of a node is closely related to its activities. The proactive routing approach, which is used by existing routing protocols for WBANs, tends to produce a higher temperature increase due to more frequent activities, compared to the on-demand reactive routing approach. In this paper, therefore, we propose a reactive routing protocol for WBANs called priority-based temperature-aware routing (PTR). In addition to lowering the temperature increase, the protocol also recognizes vital nodes and prioritizes them so they are able to achieve higher throughput. Simulation results show that the PTR protocol achieves a 50% lower temperature increase compared to the conventional temperature-aware routing protocol and is able to improve throughput of vital nodes by 35% when the priority mode is enabled.
Akira FUJIMAKI Masamitsu TANAKA Ryo KASAGI Katsumi TAKAGI Masakazu OKADA Yuhi HAYAKAWA Kensuke TAKATA Hiroyuki AKAIKE Nobuyuki YOSHIKAWA Shuichi NAGASAWA Kazuyoshi TAKAGI Naofumi TAKAGI
We describe a large-scale integrated circuit (LSI) design of rapid single-flux-quantum (RSFQ) circuits and demonstrate several reconfigurable data-path (RDP) processor prototypes based on the ISTEC Advanced Process (ADP2). The ADP2 LSIs are made up of nine Nb layers and Nb/AlOx/Nb Josephson junctions with a critical current density of 10kA/cm2, allowing higher operating frequencies and integration. To realize truly large-scale RSFQ circuits, careful design is necessary, with several compromises in the device structure, logic gates, and interconnects, balancing the competing demands of integration density, design flexibility, and fabrication yield. We summarize numerical and experimental results related to the development of a cell-based design in the ADP2, which features a unit cell size reduced to 30-µm square and up to four strip line tracks in the unit cell underneath the logic gates. The ADP LSIs can achieve ∼10 times the device density and double the operating frequency with the same power consumption per junction as conventional LSIs fabricated using the Nb four-layer process. We report the design and test results of RDP processor prototypes using the ADP2 cell library. The RDP processors are composed of many arrays of floating-point units (FPUs) and switch networks, and serve as accelerators in a high-performance computing system. The prototypes are composed of two-dimensional arrays of several arithmetic logic units instead of FPUs. The experimental results include a successful demonstration of full operation and reconfiguration in a 2×2 RDP prototype made up of 11.5k junctions at 45GHz after precise timing design. Partial operation of a 4×4 RDP prototype made up of 28.5k-junctions is also demonstrated, indicating the scalability of our timing design.
Kazuyoshi TAKAGI Nobutaka KITO Naofumi TAKAGI
Superconducting Single-Flux-Quantum (SFQ) devices have been paid much attention as alternative devices for digital circuits, because of their high switching speed and low power consumption. For large-scale circuit design, the role of computer-aided design environment is significant. As the characteristics of the SFQ devices are different from conventional devices, a new design environment is required. In this paper, we propose a new timing-aware circuit description method which can be used for SFQ circuit design. Based on the description and the dedicated algorithms we have been developing for SFQ logic circuit design, we propose an integrated design flow for SFQ logic circuits. We have designed a circuit using our developed design tools along with the design flow and demonstrated the correct operation.
Tsang-Chi KAN Ying-Jung CHEN Hung-Ming HONG Shanq-Jang RUAN
Well designed redundant via-aware standard cells (SCs) can increase the redundant via1 insertion rate in cell-based designs. However, in conventional methods, manual- and visual-based checks are required to locate pins and tune the geometries of layouts. These tasks can be very time consuming and unreliable. In this work, an O(Nlog N) redundant via-aware standard cell optimization scheme is developed. The proposed method is an efficient layout check and optimization scheme that considers various redundant via configurations including the double-via and rectangle-via to shorten the design time for standard cells. The optimized SCs effectively increase the redundant via insertion rate, and in particular the insertion rate of via1 for both concurrent routing and post-layout optimization. Furthermore, an automatic layout checker and optimizer are more efficient in identifying expandable metal 1 pins in libraries that contain numerous cells than are conventional visual check and manual optimization. Therefore, the proposed scheme not only solves the problem of a low via1 insertion rate in nanometer regimes, but also provides an efficient layout optimizer for designing standard cells. Experimental results indicate that the optimized standard cells increase the double-via1 insertion rates by 21.9%.
Hiroshi NINOMIYA Manabu KOBAYASHI Yasuyuki MIURA Shigeyoshi WATANABE
This letter describes a design methodology for an arithmetic logic unit (ALU) incorporating reconfigurability based on double-gate carbon nanotube field-effect transistors (DG-CNTFETs). The design of a DG-CNTFET with an ambipolar-property-based reconfigurable static logic circuit is simple and straightforward using an ambipolar binary decision diagram (Am-BDD), which represents the cornerstone for the automatic pass transistor logic (PTL) synthesis flows of ambipolar devices. In this work, an ALU with 16 functions is synthesized by the design methodology of a DG-CNTFET-based reconfigurable static logic circuit. Furthermore, it is shown that the proposed ALU is much more flexible and practical than a conventional DG-CNTFET-based reconfigurable ALU.
Ittetsu TANIGUCHI Kohei AOKI Hiroyuki TOMIYAMA Praveen RAGHAVAN Francky CATTHOOR Masahiro FUKUI
A fast and accurate architecture exploration for high performance and low energy VLIW data-path is proposed. The main contribution is a method to find Pareto optimal FU structures, i.e., the optimal number of FUs and the best instruction assignment for each FU. The proposed architecture exploration method is based on GA and enables the effective exploration of vast solution space. Experimental results showed that proposed method was able to achieve fast and accurate architecture exploration. For most cases, the estimation error was less than 1%.
Ha-Nguyen TRAN Yohannes D. ALEMSEGED Hiroshi HARADA
Spectrum sensing is one of the methods to identify available white spaces for secondary usage which was specified by the regulators. However, signal quality to be sensed can plunge to a very low signal-to-noise-ratio due to signal propagation and hence readings from individual sensors will be unreliable. Distributed sensing by the cooperation of multiple sensors is one way to cope with this problem because the diversity gain due to the combining effect of data captured at different position will assist in detecting signals that might otherwise not be detected by a single sensor. In effect, the probability of detection can be improved. We have implemented a distributed sensing system to evaluate the performance of different cooperative sensing algorithms. In this paper we describe our implementation and measurement experience which include the system design, specification of the system, measurement method, the issues and solutions. This paper also confirms the performance enhancement offered by distributed sensing algorithms, and describes several ideas for further enhancement of the sensing quality.
Zhen ZHANG Shouyi YIN Leibo LIU Shaojun WEI
TSV-interconnected 3D chips face problems such as high cost, low yield and large power dissipation. We propose a wireless 3D on-chip-network architecture for application-specific SoC design, using inductive-coupling interconnect instead of TSV for inter-layer communication. Primary design challenge of inductive-coupling 3D SoC is allocating wireless links in the 3D on-chip network effectively. We develop a design flow fully exploiting the design space brought by wireless links while providing flexible tradeoff for user's choice. Experimental results show that our design brings great improvement over uniform design and Sunfloor algorithm on latency (5% to 20%) and power consumption (10% to 45%).