The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] on-chip(144hit)

41-60hit(144hit)

  • The Organization of On-Chip Data Memory in One Coarse-Grained Reconfigurable Architecture

    Yansheng WANG  Leibo LIU  Shouyi YIN  Min ZHU  Peng CAO  Jun YANG  Shaojun WEI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E96-A No:11
      Page(s):
    2218-2229

    RCP (Reconfigurable Computing Processor) is intended to fill the gap between ASIC and GPP (General Purpose processor), which achieves much higher energy efficiency than GPP, while is much more flexible than ASIC. In this paper, one organization of on-chip data memory called LIBODM (LIfetime Based On-chip Data Memory) is proposed to reduce the reference delay for data and on-chip data memory size in RCP. In the LIBODM, the allocation of data is based on the data dependency. The data with low data dependency are stored off-chip to save the storage costs, while the data with high data dependency are stored on-chip to reduce the reference delay. Besides, in the LIBODM, the on-chip data are classified into two types, and the classification is based on the lifetime of data. For short lifetime data, they are preferred to be stored into FIFO to increase the reuse ratio of memory space naturally. For long lifetime data, they are preferred to be stored into RAM for several time references. The LIBODM has been testified in one CGRA (Coarse Grained Reconfigurable Architecture) called RPU (Reconfigurable Processing Unit), and two RPUs has been integrated in a RCP-REMUS_HP (High Performance version of Reconfigurable MUlti-media System) focused on video decoding. Thanks to the LIBODM, although the size of on-chip data memory in REMUS_HP is small, a high performance can still be achieved. Compared with XPP and ADRES, in REMUS_HP, the on-chip data memory size at same performance level is only 23.9% and 14.8%. REMUS_HP is implemented on a 48.9mm2 silicon with TSMC 65nm technology. Simulation shows that 1920*1088 @30fps can be achieved for H.264 high-profile decoding when exploiting a 200MHz working frequency. Compared with the high performance version of XPP, the performance is 150% boosted, while the energy efficiency is 17.59x boosted.

  • Fault Diagnosis and Reconfiguration Method for Network-on-Chip Based Multiple Processor Systems with Restricted Private Memories

    Masashi IMAI  Tomohiro YONEDA  

     
    PAPER

      Vol:
    E96-D No:9
      Page(s):
    1914-1925

    We propose a fault diagnosis and reconfiguration method based on the Pair and Swap scheme to improve the reliability and the MTTF (Mean Time To Failure) of network-on-chip based multiple processor systems where each processor core has its private memory. In the proposed scheme, two identical copies of a given task are executed on a pair of processor cores and the results are compared repeatedly in order to detect processor faults. If a fault is detected by mismatches, the fault is identified and isolated using a TMR (Triple Module Redundancy) and the system is reconfigured by the redundant processor cores. We propose that each task is quadruplicated and statically assigned to private memories so that each memory has only two different tasks. We evaluate the reliability of the proposed quadruplicated task allocation scheme in the viewpoint of MTTF. As a result, the MTTF of the proposed scheme is over 4.3 times longer than that of the duplicated task allocation scheme.

  • Potential of Fault-Detection Coverage by means of On-Chip Redundancy - IEC61508: Are There Royal Roads to SIL 4?

    Nobuyasu KANEKAWA  

     
    PAPER

      Vol:
    E96-D No:9
      Page(s):
    1907-1913

    This paper investigates potential to improve fault-detection coverage by means of on-chip redundancy. The international standard on functional safety, namely, IEC61508 Ed. 2.0 Part 2 Annex E.3 prescribes the upper bound of βIC (common cause failure (CCF) ratio to all failures) is 0.25 to satisfy frequency upper bound of dangerous failure in the safety function for SIL (Safety Integrated Level) 3. On the other hand, this paper argues that the βIC does not necessarily have to be less than 0.25 for SIL 3, and that the upper bound of βIC can be determined depending on failure rate λ and CCF detection coverage. In other words, the frequency upper bound of dangerous failure for SIL3 can also be satisfied with βIC higher than 0.25 if the failure rate λ is lower than 400[fit]. Moreover, the paper shows that on-chip redundancy has potential to satisfy SIL 4 requirement; the frequency upper bound of dangerous failure for SIL4 can be satisfied with feasible ranges of βIC, λ and CCF coverage which can be realized by redundant code.

  • Measurements and Simulation of Sensitivity of Differential-Pair Transistors against Substrate Voltage Variation

    Satoshi TAKAYA  Yoji BANDO  Toru OHKAWA  Toshiharu TAKARAMOTO  Toshio YAMADA  Masaaki SOUDA  Shigetaka KUMASHIRO  Tohru MOGAMI  Makoto NAGATA  

     
    PAPER

      Vol:
    E96-C No:6
      Page(s):
    884-893

    The response of differential pairs against low-frequency substrate voltage variation is captured in a combined transistor and substrate network models. The model generation is regularized for variation of transistor geometries including channel sizes, fingering and folding, and the placements of guard bands. The expansion of the models for full-chip substrate noise analysis is also discussed. The substrate sensitivity of differential pairs is evaluated through on-chip substrate coupling measurements in a 90 nm CMOS technology with more than 64 different geometries and operating conditions. The trends and strengths of substrate sensitivity are shown to be well consistent between simulation and measurements.

  • 60 GHz Millimeter-Wave CMOS Integrated On-Chip Open Loop Resonator Bandpass Filters on Patterned Ground Shields

    Ramesh K. POKHAREL  Xin LIU  Dayang A.A. MAT  Ruibing DONG  Haruichi KANAYA  Keiji YOSHIDA  

     
    PAPER-Microwaves, Millimeter-Waves

      Vol:
    E96-C No:2
      Page(s):
    270-276

    This paper presents the design of a second-order and a fourth-order bandpass filter (BPF) for 60 GHz millimeter-wave applications in 0.18 µm CMOS technology. The proposed on-chip BPFs employ the folded open loop structure designed on pattern ground shields. The adoption of a folded structure and utilization of multiple transmission zeros in the stopband permit the compact size and high selectivity for the BPF. Moreover, the pattern ground shields obviously slow down the guided waves which enable further reduction in the physical length of the resonator, and this, in turn, results in improvement of the insertion losses. A very good agreement between the electromagnetic (EM) simulations and measurement results has been achieved. As a result, the second-order BPF has the center frequency of 57.5 GHz, insertion loss of 2.77 dB, bandwidth of 14 GHz, return loss less than 27.5 dB and chip size of 650 µm810 µm (including bonding pads) while the fourth-order BPF has the center frequency of 57 GHz, insertion loss of 3.06 dB, bandwidth of 12 GHz, return loss less than 30 dB with chip size of 905 µm810 µm (including bonding pads).

  • Novel Fuse Scheme with a Short Repair Time to Maximize Good Chips per Wafer in Advanced SoCs

    Chizu MATSUMOTO  Yuichi HAMAMURA  Michinobu NAKAO  Kaname YAMASAKI  Yoshikazu SAITO  Shun'ichi KANEKO  

     
    PAPER-Semiconductor Materials and Devices

      Vol:
    E96-C No:1
      Page(s):
    108-114

    Repairing embedded memories (e-memories) on an advanced system-on-chip (SoC) product is a key technique used to improve product yield. However, increasing the die area of SoC products equipped with various types of e-memories on the die is an issue. A fuse scheme can be used to resolve this issue. However, several fuse schemes that have been proposed to decrease the die area result in an increased repair time. Therefore, in this paper, we propose a novel fuse scheme that decreases both die area and repair time. Moreover, our approach is applied to a 65 nm SoC product. The results indicate that the proposed fuse scheme effectively decreases the die area and repair time of advanced SoC products.

  • Power Gating Implementation for Supply Noise Mitigation with Body-Tied Triple-Well Structure

    Yasumichi TAKAI  Masanori HASHIMOTO  Takao ONOYE  

     
    PAPER-Circuit Design

      Vol:
    E95-A No:12
      Page(s):
    2220-2225

    This paper investigates power gating implementations that mitigate power supply noise. We focus on the body connection of power-gated circuits, and examine the amount of power supply noise induced by power-on rush current and the contribution of a power-gated circuit as a decoupling capacitance during the sleep mode. To figure out the best implementation, we designed and fabricated a test chip in 65 nm process. Experimental results with measurement and simulation reveal that the power-gated circuit with body-tied structure in triple-well is the best implementation from the following three points; power supply noise due to rush current, the contribution of decoupling capacitance during the sleep mode and the leakage reduction thanks to power gating.

  • Performance Improvement and Congestion Reduction of Large FPGAs Using On-Chip Microwave Interconnects

    Mohammad Taghi TEIMOORI  Ali JAHANIAN  Adel DOKHANCHI  

     
    PAPER

      Vol:
    E95-C No:10
      Page(s):
    1610-1619

    Microwave interconnects have been proposed recently to break-down long wires in large integrated circuits. In this paper, using of coplanar waveguide RF interconnects in FPGAs is explored to improve performance and reduce routing congestion. We propose a new FPGA architecture consisting of both metal wires and RF receivers/transmitters corresponding with an algorithm to route the proposed FPGA. Experimental results show that used routing tracks and routing congestion are reduced by 23.8% and 7.06%, respectively and performance of the attempted benchmarks is improved by about 33% using this technique. These benefits are earned in reasonable cost of area and power consumption which is negligible for large and complex circuits.

  • A Locality-Aware Hybrid NoC Configuration Algorithm Utilizing the Communication Volume among IP Cores

    Seungju LEE  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E95-A No:9
      Page(s):
    1538-1549

    Network-on-chip (NoC) architectures have emerged as a promising solution to the lack of scalability in multi-processor systems-on-chips (MPSoCs). With the explosive growth in the usage of multimedia applications, it is expected that NoC serves as a multimedia server supporting multi-class services. In this paper, we propose a configuration algorithm for a hybrid bus-NoC architecture together with simulation results. Our target architecture is a hybrid bus-NoC architecture, called busmesh NoC, which is a generalized version of a hybrid NoC with local buses. In our BMNoC configuration algorithm, cores which have a heavy communication volume between them are mapped in a cluster node (CN) and connected by a local bus. CNs can have communication with each other via edge switches (ESes) and mesh routers (MRs). With this hierarchical communication network, our proposed algorithm can improve the latency as compared with conventional methods. Several realistic applications applied to our algorithm illustrate the better performance than earlier studies and feasibility of our proposed algorithm.

  • A Low-Cost and Energy-Efficient Multiprocessor System-on-Chip for UWB MAC Layer

    Hao XIAO  Tsuyoshi ISSHIKI  Arif Ullah KHAN  Dongju LI  Hiroaki KUNIEDA  Yuko NAKASE  Sadahiro KIMURA  

     
    PAPER-Computer System

      Vol:
    E95-D No:8
      Page(s):
    2027-2038

    Ultra-wideband (UWB) technology has attracted much attention recently due to its high data rate and low emission power. Its media access control (MAC) protocol, WiMedia MAC, promises a lot of facilities for high-speed and high-quality wireless communication. However, these benefits in turn involve a large amount of computational load, which challenges the traditional uniprocessor architecture based implementation method to provide the required performance. However, the constrained cost and power budget, on the other hand, makes using commercial multiprocessor solutions unrealistic. In this paper, a low-cost and energy-efficient multiprocessor system-on-chip (MPSoC), which tackles at once the aspects of system design, software migration and hardware architecture, is presented for the implementation of UWB MAC layer. Experimental results show that the proposed MPSoC, based on four simple RISC processors and shared-memory infrastructure, achieves up to 45% performance improvement and 65% power saving, but takes 15% less area than the uniprocessor implementation.

  • SOBR: A High-Performance Shared Output Buffered Router for Networks-on-Chip

    Yancang CHEN  Lunguo XIE  

     
    LETTER-Computer System

      Vol:
    E95-D No:7
      Page(s):
    2002-2005

    This paper presents a single-cycle shared output buffered router for Networks-on-Chip. In output ports, each input port always has an output virtual-channel (VC) which can be exchanged by VC swapper. Its critical path is only 24 logic gates, and it reduces 9.4% area overhead compared with the classical router.

  • EMI Camera LSI (EMcam) with On-Chip Loop Antenna Matrix to Measure EMI Noise Spectrum and Distribution

    Naoki MASUNAGA  Koichi ISHIDA  Takayasu SAKURAI  Makoto TAKAMIYA  

     
    PAPER

      Vol:
    E95-C No:6
      Page(s):
    1059-1066

    This paper presents a new type of electromagnetic interference (EMI) measurement system. An EMI Camera LSI (EMcam) with a 124 on-chip 25050 µm2 loop antenna matrix in 65 nm CMOS is developed. EMcam achieves both the 2D electric scanning and 60 µm-level spatial precision. The down-conversion architecture increases the bandwidth of EMcam and enables the measurement of EMI spectrum up to 3.3 GHz. The shared IF-block scheme is proposed to relax both the increase of power and area penalty, which are inherent issues of the matrix measurement. The power and the area are reduced by 74% and 73%, respectively. EMI measurement with the smallest 3212 µm2 antenna to date is also demonstrated.

  • Long-Range Asynchronous On-Chip Link Based on Multiple-Valued Single-Track Signaling

    Naoya ONIZAWA  Atsushi MATSUMOTO  Takahiro HANYU  

     
    PAPER-Circuit Theory

      Vol:
    E95-A No:6
      Page(s):
    1018-1029

    We have developed a long-range asynchronous on-chip data-transmission link based on multiple-valued single-track signaling for a highly reliable asynchronous Network-on-Chip. In the proposed signaling, 1-bit data with control information is represented by using a one-digit multi-level signal, so serial data can be transmitted asynchronously using only a single wire. The small number of wires alleviates the routing complexity of wiring long-range interconnects. The use of current-mode signaling makes it possible to transmit data at high speed without buffers or repeaters over a long interconnect wire because of the low-voltage swing of signaling, and it leads to low-latency data transmission. We achieve a latency of 0.45 ns, a throughput of 1.25 Gbps, and energy dissipation of 0.58 pJ/bit with a 10-mm interconnect wire under a 0.13 µm CMOS technology. This represents an 85% decrease in latency, a 150% increase in throughput, and a 90% decrease in energy dissipation compared to a conventional serial asynchronous data-transmission link.

  • A Process-Variation-Adaptive Network-on-Chip with Variable-Cycle Routers and Variable-Cycle Pipeline Adaptive Routing

    Yohei NAKATA  Hiroshi KAWAGUCHI  Masahiko YOSHIMOTO  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    523-533

    As process technology is scaled down, a typical system on a chip (SoC) becomes denser. In scaled process technology, process variation becomes greater and increasingly affects the SoC circuits. Moreover, the process variation strongly affects network-on-chips (NoCs) that have a synchronous network across the chip. Therefore, its network frequency is degraded. We propose a process-variation-adaptive NoC with a variation-adaptive variable-cycle router (VAVCR). The proposed VAVCR can configure its cycle latency adaptively on a processor core basis, corresponding to the process variation. It can increase the network frequency, which is limited by the process variation in a conventional router. Furthermore, we propose a variable-cycle pipeline adaptive routing (VCPAR) method with VAVCR; the proposed VCPAR can reduce packet latency and has tolerance to network congestion. The total execution time reduction of the proposed VAVCR with VCPAR is 15.7%, on average, for five task graphs.

  • Hybrid Wired/Wireless On-Chip Network Design for Application-Specific SoC

    Shouyi YIN  Yang HU  Zhen ZHANG  Leibo LIU  Shaojun WEI  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    495-505

    Hybrid wired/wireless on-chip network is a promising communication architecture for multi-/many-core SoC. For application-specific SoC design, it is important to design a dedicated on-chip network architecture according to the application-specific nature. In this paper, we propose a heuristic wireless link allocation algorithm for creating hybrid on-chip network architecture. The algorithm can eliminate the performance bottleneck by replacing multi-hop wired paths by high-bandwidth single-hop long-range wireless links. The simulation results show that the hybrid on-chip network designed by our algorithm improves the performance in terms of both communication delay and energy consumption significantly.

  • Support Efficient and Fault-Tolerant Multicast in Bufferless Network-on-Chip

    Chaochao FENG  Zhonghai LU  Axel JANTSCH  Minxuan ZHANG  Xianju YANG  

     
    PAPER-Computer System

      Vol:
    E95-D No:4
      Page(s):
    1052-1061

    In this paper, we propose three Deflection-Routing-based Multicast (DRM) schemes for a bufferless NoC. The DRM scheme without packets replication (DRM_noPR) sends multicast packet through a non-deterministic path. The DRM schemes with adaptive packets replication (DRM_PR_src and DRM_PR_all) replicate multicast packets at the source or intermediate node according to the destination position and the state of output ports to reduce the average multicast latency. We also provide fault-tolerant supporting in these schemes through a reinforcement-learning-based method to reconfigure the routing table to tolerate permanent faulty links in the network. Simulation results illustrate that the DRM_PR_all scheme achieves 41%, 43% and 37% less latency on average than that of the DRM_noPR scheme and 27%, 29% and 25% less latency on average than that of the DRM_PR_src scheme under three synthetic traffic patterns respectively. In addition, all three fault-tolerant DRM schemes achieve acceptable performance degradation at various link fault rates without any packet lost.

  • All-Digital PMOS and NMOS Process Variability Monitor Utilizing Shared Buffer Ring and Ring Oscillator

    Tetsuya IIZUKA  Kunihiro ASADA  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    627-634

    This paper proposes an all-digital process variability monitor based on a shared structure of a buffer ring and a ring oscillator. The proposed circuit monitors the PMOS and NMOS process variabilities independently according to a count number of a single pulse which propagates on the ring during the buffer ring mode, and an oscillation period during the ring oscillator mode. Using this shared-ring structure, we reduce the occupation area about 40% without loss of process variability monitoring properties compared with the conventional circuit. The proposed shared-ring circuit has been fabricated in 65 nm CMOS process and the measurement results with two different wafer lots show the feasibility of the proposed process variability monitoring scheme.

  • On-Chip In-Place Measurements of Vth and Signal/Substrate Response of Differential Pair Transistors

    Yoji BANDO  Satoshi TAKAYA  Toru OHKAWA  Toshiharu TAKARAMOTO  Toshio YAMADA  Masaaki SOUDA  Shigetaka KUMASHIRO  Tohru MOGAMI  Makoto NAGATA  

     
    PAPER-Electronic Circuits

      Vol:
    E95-C No:1
      Page(s):
    137-145

    In-place AC measurements of the signal gain and substrate sensitivity of differential pair transistors of an analog amplifier are combined with DC characterization of the threshold voltage (Vth) of the same transistors. An on-chip continuous time waveform monitoring technique enables in-place matrix measurements of differential pair transistors with a variety of channel sizes and geometry, allowing the wide coverage of experiments about the transistor-level physical layout dependency of substrate noise response. A prototype test structure uses a 90-nm CMOS technology and demonstrates the geometry-dependent variation of substrate sensitivity of transistors in operation.

  • A 65-nm CMOS Fully Integrated Shock-Wave Antenna Array with On-Chip Jitter and Pulse-Delay Adjustment for Millimeter-Wave Active Imaging Application

    Nguyen Ngoc MAI KHANH  Masahiro SASAKI  Kunihiro ASADA  

     
    PAPER-Device and Circuit Modeling and Analysis

      Vol:
    E94-A No:12
      Page(s):
    2554-2562

    This paper presents a 65-nm CMOS 8-antenna array transmitter operating in 117–130-GHz range for short range and portable millimeter-wave (mm-wave) active imaging applications. Each antenna element is a new on-chip antenna located on the top metal. By using on-chip transformer, pulse output of each resistor-less mm-wave pulse generators (PG) are sent to each integrated antenna. To adjust pulse delays for the purpose of pulse beam-forming, a 7-bit digitally programmable delay circuit (DPDC) is added to each of PGs. Moreover, in order to dynamically adjust pulse delays among eight SW's outputs, we implemented on-chip jitter and relative skew measuring circuit with 20-bit digital output to achieve cumulative distribution (CDF) and probability density (PDF) functions from which DPDC's input codes are decided to align eight antenna's output pulses. Two measured radiation peaks after relative skew alignment are obtained at (θ; φ) angles of (-56; 0) and (+57; 0). Measurement results shows that beam-forming angles of the fully integrated antenna array can be adjusted by digital input codes and by the on-chip skew adjustment circuit for active imaging applications.

  • Flexible Test Scheduling for an Asynchronous On-Chip Interconnect through Special Data Transfer

    Tsuyoshi IWAGAKI  Eiri TAKEDA  Mineo KANEKO  

     
    PAPER-Logic Synthesis, Test and Verification

      Vol:
    E94-A No:12
      Page(s):
    2563-2570

    This paper proposes a test scheduling method for stuck-at faults in a CHAIN interconnect, which is an asynchronous on-chip interconnect architecture, with scan ability. Special data transfer which is permitted only during test, is exploited to realize a more flexible test schedule than that of a conventional approach. Integer linear programming (ILP) models considering such special data transfer are developed according to the types of modules under test in a CHAIN interconnect. The obtained models are processed by using an ILP solver. This framework can not only obtain optimal test schedules but also easily introduce additional constraints such as a test power budget. Experimental results using benchmark circuits show that the proposed method can reduce test application time compared to that achieved by the conventional method.

41-60hit(144hit)