The search functionality is under construction.

Keyword Search Result

[Keyword] on chip(40hit)

1-20hit(40hit)

  • Passage of Faulty Nodes: A Novel Approach for Fault-Tolerant Routing on NoCs

    Yota KUROKAWA  Masaru FUKUSHI  

     
    PAPER

      Vol:
    E102-A No:12
      Page(s):
    1702-1710

    This paper addresses the problem of developing an efficient fault-tolerant routing method for 2D mesh Network-on-Chips (NoCs) to realize dependable and high performance many core systems. Existing fault-tolerant routing methods have two critical problems of high communication latency and low node utilization. Unlike almost all existing methods where packets always detour faulty nodes, we propose a novel and unique approach that packets can pass through faulty nodes. For this approach, we enhance the common NoC architecture by adding switches and links around each node and propose a fault-tolerant routing method with no virtual channels based on the well-known simple XY routing method. Simulation results show that the proposed method reduces average communication latency by about 97.1% compared with the existing method, without sacrificing fault-free nodes.

  • Hardware-Based Principal Component Analysis for Hybrid Neural Network Trained by Particle Swarm Optimization on a Chip

    Tuan Linh DANG  Yukinobu HOSHINO  

     
    PAPER-Neural Networks and Bioengineering

      Vol:
    E102-A No:10
      Page(s):
    1374-1382

    This paper presents a hybrid architecture for a neural network (NN) trained by a particle swarm optimization (PSO) algorithm. The NN is implemented on the hardware side while the PSO is executed by a processor on the software side. In addition, principal component analysis (PCA) is also applied to reduce correlated information. The PCA module is implemented in hardware by the SystemVerilog programming language to increase operating speed. Experimental results showed that the proposed architecture had been successfully implemented. In addition, the hardware-based NN trained by PSO (NN-PSO) program was faster than the software-based NN trained by the PSO program. The proposed NN-PSO with PCA also obtained better recognition rates than the NN-PSO without-PCA.

  • Waffle: A New Photonic Plasmonic Router for Optical Network on Chip

    Chao TANG  Huaxi GU  Kun WANG  

     
    LETTER-Computer System

      Pubricized:
    2018/05/29
      Vol:
    E101-D No:9
      Page(s):
    2401-2403

    Optical interconnect is a promising candidate for network on chip. As the key element in the network on chip, the routers greatly affect the performance of the whole system. In this letter, we proposed a new router architecture, Waffle, based on compact 2×2 hybrid photonic-plasmonic switching elements. Also, an optimized architecture, Waffle-XY, was designed for the network employed XY routing algorithm. Both Waffle and Waffle-XY are strictly non-blocking architectures and can be employed in the popular mesh-like networks. Theoretical analysis illustrated that Waffle and Waffle-XY possessed a better performance compared with several representative routers.

  • An 18 µW Spur Cancelled Clock Generator for Recovering Receiver Sensitivity in Wireless SoCs

    Yosuke OGASAWARA  Ryuichi FUJIMOTO  Tsuneo SUZUKI  Kenichi SAMI  

     
    PAPER

      Vol:
    E100-C No:6
      Page(s):
    529-538

    A novel spur cancelled clock generator (SCCG) capable of recovering RX sensitivity degradations caused by digital clocks in wireless SoCs is presented. Clock spurs that degrade RX sensitivities are canceled by applying the SCCG to digital circuits or ADCs. The SCCG is integrated into a Bluetooth Low Energy (BLE) SoC fabricated in a 65 nm CMOS process. A measured clock spur reduction of 34 dB and an RX sensitivity recovery of 5 dB are achieved by the proposed SCCG. The power consumption and occupied area of the SCCG is only 18 µW and 40 μm × 120 μm, respectively.

  • Novel Chip Stacking Methods to Extend Both Horizontally and Vertically for Many-Core Architectures with ThrouChip Interface

    Hiroshi NAKAHARA  Tomoya OZAKI  Hiroki MATSUTANI  Michihiro KOIBUCHI  Hideharu AMANO  

     
    PAPER-Architecture

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2871-2880

    The increase of recent non-recurrent engineering cost (design, mask and test cost) have made large System-on-Chip (SoC) difficult to develop especially with advanced technology. We radically explore an approach for cheap and flexible chip stacking by using Inductive coupling ThruChip Interface (TCI). In order to connect a large number of small chips for building a large scale system, novel chip stacking methods called the linear stacking and staggered stacking are proposed. They enable the system to be extended to x or/and y dimensions, not only to z dimension. Here, a novel chip staking layout, and its deadlock-free routing design for the case using single-core chips and multi-core chips are shown. The network with 256 nodes formed by the proposed stacking improves the latency of 2D mesh by 13.8% and the performance of NAS Parallel Benchmarks by 5.4% on average compared to that of 2D mesh.

  • Vertical Link On/Off Regulations for Inductive-Coupling Based Wireless 3-D NoCs

    Hao ZHANG  Hiroki MATSUTANI  Yasuhiro TAKE  Tadahiro KURODA  Hideharu AMANO  

     
    PAPER-Computer System

      Vol:
    E96-D No:12
      Page(s):
    2753-2764

    We propose low-power techniques for wireless three-dimensional Network-on-Chips (wireless 3-D NoCs), in which the connections among routers on the same chip are wired while the routers on different chips are connected wirelessly using inductive-coupling. The proposed low-power techniques stop the clock and power supplies to the transmitter of the wireless vertical links only when their utilizations are higher than the threshold. Meanwhile, the whole wireless vertical link will be shut down when the utilization is lower than the threshold in order to reduce the power consumption of wireless 3-D NoCs. This paper uses an on-demand method, in which the dormant data transmitter or the whole vertical link will be activated as long as a flit comes. Full-system many-core simulations using power parameters derived from a real chip implementation show that the proposed low-power techniques reduce the power consumption by 23.4%-29.3%, while the performance overhead is less than 2.4%.

  • A Standard-Cell Based On-Chip NMOS and PMOS Performance Monitor for Process Variability Compensation

    Toshiyuki YAMAGISHI  Tatsuo SHIOZAWA  Koji HORISAKI  Hiroyuki HARA  Yasuo UNEKAWA  

     
    PAPER

      Vol:
    E96-C No:6
      Page(s):
    894-902

    A completely-digital, on-chip performance monitor is newly proposed in this paper. In addition to a traditional ring oscillator, the proposed monitor has a special buffer chain whose output duty ratio is emphasized by the difference between NMOS and PMOS performances. Thus the performances of NMOS and PMOS transistor can accurately be estimated independently. By using only standard cells, the monitor achieves a small occupied area and process portability. To demonstrate the accuracy of performance estimation and the usability of the monitor, we have fabricated the proposed monitor using 90 nm CMOS process. The estimated errors of the drain saturation current of NMOS and PMOS transistors are 2.0% and 3.4%, respectively. A D/A converter has been also fabricated to verify the usability of the proposed monitor. The output amplitude variation of the D/A converter is successfully reduced to 50.0% by the calibration using the proposed monitor.

  • Floorplanning and Topology Synthesis for Application-Specific Network-on-Chips

    Wei ZHONG  Song CHEN  Bo HUANG  Takeshi YOSHIMURA  Satoshi GOTO  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1174-1184

    Application-Specific Network-on-Chips (ASNoCs) have been proposed as a more promising solution than regular NoCs to the global communication challenges for particular applications in nanoscale System-on-Chip (SoC) designs. In ASNoC Design, one of the key challenges is to generate the most suitable and power efficient NoC topology under the constraints of the application specification. In this work, we present a two-step floorplanning (TSF) algorithm, integrating topology synthesis into floorplanning phase, to automate the synthesis of such ASNoC topologies. At the first-step floorplanning, during the simulated annealing, we explore the optimal positions and clustering of cores and implement an incremental path allocation algorithm to predictively evaluate the power consumption of the generated NoC topology. At the second-step floorplanning, we explore the optimal positions of switches and network interfaces on the floorplan. A power and timing aware path allocation algorithm is also integrated into this step to determine the connectivity across different switches. Experimental results on a variety of benchmarks show that our algorithm can produce greatly improved solutions over the latest works.

  • Energy- and Traffic-Balance-Aware Mapping Algorithm for Network-on-Chip

    Zhi DENG  Huaxi GU  Yingtang YANG  Hua YOU  

     
    LETTER-Computer System

      Vol:
    E96-D No:3
      Page(s):
    719-722

    In this paper, an energy- and traffic-balance-aware mapping algorithm from IP cores to nodes in a network is proposed for application-specific Network-on-Chip(NoC). The multi-objective optimization model is set up by considering the NoC architecture, and addressed by the proposed mapping algorithm that decomposes mapping optimization into a number of scalar subproblems simultaneously. In order to show performance of the proposed algorithm, the application specific benchmark is applied in the simulation. The experimental results demonstrate that the algorithm has advantages in energy consumption and traffic balance over other algorithms.

  • 60 GHz Millimeter-Wave CMOS Integrated On-Chip Open Loop Resonator Bandpass Filters on Patterned Ground Shields

    Ramesh K. POKHAREL  Xin LIU  Dayang A.A. MAT  Ruibing DONG  Haruichi KANAYA  Keiji YOSHIDA  

     
    PAPER-Microwaves, Millimeter-Waves

      Vol:
    E96-C No:2
      Page(s):
    270-276

    This paper presents the design of a second-order and a fourth-order bandpass filter (BPF) for 60 GHz millimeter-wave applications in 0.18 µm CMOS technology. The proposed on-chip BPFs employ the folded open loop structure designed on pattern ground shields. The adoption of a folded structure and utilization of multiple transmission zeros in the stopband permit the compact size and high selectivity for the BPF. Moreover, the pattern ground shields obviously slow down the guided waves which enable further reduction in the physical length of the resonator, and this, in turn, results in improvement of the insertion losses. A very good agreement between the electromagnetic (EM) simulations and measurement results has been achieved. As a result, the second-order BPF has the center frequency of 57.5 GHz, insertion loss of 2.77 dB, bandwidth of 14 GHz, return loss less than 27.5 dB and chip size of 650 µm810 µm (including bonding pads) while the fourth-order BPF has the center frequency of 57 GHz, insertion loss of 3.06 dB, bandwidth of 12 GHz, return loss less than 30 dB with chip size of 905 µm810 µm (including bonding pads).

  • A Hybrid Photonic Burst-Switched Interconnection Network for Large-Scale Manycore System

    Quanyou FENG  Huanzhong LI  Wenhua DOU  

     
    PAPER-Computer Architecture

      Vol:
    E95-D No:12
      Page(s):
    2908-2918

    With the trend towards increasing number of cores, for example, 1000 cores, interconnection network in manycore chips has become the critical bottleneck for providing communication infrastructures among on-chip cores as well as to off-chip memory. However, conventional on-chip mesh topologies do not scale up well because remote cores are generally separated by too many hops due to the small-radix routers within these networks. Moreover, projected scaling of electrical processor-memory network appears unlikely to meet the enormous demand for memory bandwidth while satisfying stringent power budget. Fortunately, recent advances in 3D integration technology and silicon photonics have provided potential solutions to these challenges. In this paper, we propose a hybrid photonic burst-switched interconnection network for large-scale manycore processors. We embed an electric low-diameter flattened butterfly into 3D stacking layers using integer linear programming, which results in a scalable low-latency network for inter-core packets exchange. Furthermore, we use photonic burst switching (PBS) for processor-memory network. PBS is an adaptation of optical burst switching for chip-scale communication, which can significantly improve the power efficiency by leveraging sub-wavelength, bandwidth-efficient optical switching. Using our physically-accurate network-level simulation environment, we examined the system feasibility and performances. Simulation results show that our hybrid network achieves up to 25% of network latency reduction and up to 6 times energy savings, compared to conventional on-chip mesh network and optical circuit-switched memory access scheme.

  • Cluster Generation and Network Component Insertion for Topology Synthesis of Application-Specific Network-on-Chips

    Wei ZHONG  Takeshi YOSHIMURA  Bei YU  Song CHEN  Sheqin DONG  Satoshi GOTO  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    534-545

    Network-on-Chips (NoCs) have been proposed as a solution for addressing the global communication challenges in System-on-Chip (SoC) architectures that are implemented in nanoscale technologies. For the use of NoCs to be feasible in today's industrial designs, a custom-tailored, power- efficient NoC topology that satisfies the application characteristics is required. In this work, we present a design methodology that automates the synthesis of such application-specific NoC topologies. We present a method which integrates partitioning into floorplanning phase to explore optimal clustering of cores during floorplanning with minimized link and switch power consumption. Based on the size of applications, we also present an Integer Linear Programming and a heuristic method to place switches and network interfaces on the floorplan. Then, a power and timing aware path allocation algorithm is carried out to determine the connectivity across different switches. We perform experiments on several SoC benchmarks and present a comparison with the latest work. For small applications, the NoC topologies synthesized by our method show large improvements in power consumption (27.54%), hop-count (4%) and running time (66%) on average. And for large applications, the synthesized topologies result in large power (31.77%), hop-count (29%) and running time (94.18%) on average.

  • Evaluation of SRAM-Core Susceptibility against Power Supply Voltage Variation

    Takuya SAWADA  Taku TOSHIKAWA  Kumpei YOSHIKAWA  Hidehiro TAKATA  Koji NII  Makoto NAGATA  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    586-593

    The susceptibility of a static random access memory (SRAM) core against static and dynamic variation of power supply voltage is evaluated, by using on-chip diagnosis structures of memory built-in self testing (MBIST) and on-chip voltage waveform monitoring (OCM). The SRAM core of interest in this paper is a synthesizable version applicable to general systems-on-a-chip (SoC) design, and fabricated in a 90 nm CMOS technology. RF power injection to power supply networks is quantified by OCM. The number of resultant erroneous bits as well as their distribution in the cell array is given by MBIST. The frequency-dependent sensitivity reflects the highly capacitive nature of densely integrated SRAM cells.

  • Estimation of Nb Junction Temperature Raised Due to Thermal Heat from Bias Resistor

    Keisuke KUROIWA  Masaki KADOWAKI  Masataka MORIYA  Hiroshi SHIMADA  Yoshinao MIZUGAKI  

     
    PAPER

      Vol:
    E95-C No:3
      Page(s):
    355-359

    Superconducting integrated circuits should be operated at low temperature below a half of their critical temperatures. Thermal heat from a bias resistor could rise the temperature in Josephson junctions, and would reduce their critical currents. In this study, we estimate the temperature in a Josephson junction heated by a bias resistor at the bath temperature of 4.2 K, and introduce a parameter β that connects the thermal heat from a bias resistor and the temperature elevation of a Josephson junction. By using β, the temperature in the Josephson junction can be estimated as functions of the current through the resistor.

  • Performance-Aware Hybrid Algorithm for Mapping IPs onto Mesh-Based Network on Chip

    Guang SUN  Shijun LIN  Depeng JIN  Yong LI  Li SU  Yuanyuan ZHANG  Lieguang ZENG  

     
    PAPER-Computer System

      Vol:
    E94-D No:5
      Page(s):
    1000-1007

    Network on Chip (NoC) is proposed as a new intra-chip communication infrastructure. In current NoC design, one related problem is mapping IP cores onto NoC architectures. In this paper, we propose a performance-aware hybrid algorithm (PHA) for mesh-based NoC to optimize performance indexes such as latency, energy consumption and maximal link bandwidth. The PHA is a hybrid algorithm, which integrates the advantages of Greedy Algorithm, Genetic Algorithm and Simulated Annealing Algorithm. In the PHA, there are three features. First, it generates a fine initial population efficiently in a greedy swap way. Second, effective global parallel search is implemented by genetic operations such as crossover and mutation, which are implemented with adaptive probabilities according to the diversity of population. Third, probabilistic acceptance of a worse solution using simulated annealing method greatly improves the performance of local search. Compared with several previous mapping algorithms such as MOGA and TGA, simulation results show that our algorithm enhances the performance by 30.7%, 23.1% and 25.2% in energy consumption, latency and maximal link bandwidth respectively. Moreover, simulation results demonstrate that our PHA approach has the highest convergence speed among the three algorithms. These results show that our proposed mapping algorithm is more effective and efficient.

  • Power Minimization for Dual- and Triple-Supply Digital Circuits via Integer Linear Programming

    Ki-Yong AHN  Chong-Min KYUNG  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E92-A No:9
      Page(s):
    2318-2325

    This paper proposes an Integer Linear Programming (ILP)-based power minimization method by partitioning into regions, first, with three different VDD's(PM3V), and, secondly, with two different VDD's(PM2V). To reduce the solving time of triple-VDD case (PM3V), we also proposed a partitioned ILP method(p-PM3V). The proposed method provides 29% power saving on the average in the case of triple-VDD compared to the case of single VDD. Power reduction of PM3V compared to Clustered Voltage Scaling (CVS) was about 18%. Compared to the unpartitioned ILP formulation(PM3V), the partitioned ILP method(p-PM3V) reduced the total solution time by 46% at the cost of additional power consumption within 1.3%.

  • Design of an Area-Efficient and Low-Power Hierarchical NoC Architecture Based on Circuit Switching

    Woo Joo KIM  Sung Hee LEE  Sun Young HWANG  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E92-A No:3
      Page(s):
    890-899

    This paper presents a hierarchical NoC architecture to support GT (Guaranteed Throughput) signals to process multimedia data in embedded systems. The architecture provides a communication environment that meets the diverse conditions of communication constraints among IPs in power and area. With a system based on packet switching, which requires storage/control circuits to support GT signals, it is hard to satisfy design constraints in area, scalability and power consumption. This paper proposes a hierarchical 444 mesh-type NoC architecture based on circuit switching, which is capable of processing GT signals requiring high throughput. The proposed NoC architecture shows reduction in area by 50.2% and in power consumption by 57.4% compared with the conventional NoC architecture based on circuit switching. These figures amount to by 72.4% and by 86.1%, when compared with an NoC architecture based on packet switching. The proposed NoC architecture operates in the maximum throughput of 19.2 Gb/s.

  • Design of an Area-Efficient and Low-Power NoC Architecture Using a Hybrid Network Topology

    Woo Joo KIM  Sun Young HWANG  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E91-A No:11
      Page(s):
    3297-3303

    This paper proposes a novel hybrid NoC structure and a dynamic job distribution algorithm which can reduce system area and power consumption by reducing packet drop rate for various multimedia applications. The proposed NoC adopts different network structures between sub-clusters. Network structure is determined by profiling application program so that packet drop rate can be minimized. The proposed job distribution algorithm assigns every job to the sub-cluster where packet drop rate can be minimized for each multimedia application program. The proposed scheme targets multimedia applications frequently used in modern embedded systems, such as MPEG4 and MP3 decoders, GPS positioning systems, and OFDM demodulators. Experimental results show that packet drop rate was reduced by 31.6% on the average, when compared to complex network structure topologies consisting of sub-clusters of same topology. Chip area and power consumption were reduced by 16.0% and 34.0%, respectively.

  • An Image-Moment Sensor with Variable-Length Pipeline Structure

    Atsushi IWASHITA  Takashi KOMURO  Masatoshi ISHIKAWA  

     
    PAPER-Image Sensor/Vision Chip

      Vol:
    E90-C No:10
      Page(s):
    1876-1883

    A 128128 pixel functional image sensor was implemented. The sensor was able to capture images at 1,000 frame/s and extract the sizes and positions of 10 objects/frame when clocked at 8 MHz. The size of each pixel was 18 µm18 µm and the fill factor was 28%. The chip, 3.24 mm3.48 mm in size, was implemented with a 0.35 µm CMOS sensor process; the power consumption was 29.7 mW at 8 MHz.

  • Column-Parallel Vision Chip Architecture for High-Resolution Line-of-Sight Detection Including Saccade

    Junichi AKITA  Hiroaki TAKAGI  Keisuke DOUMAE  Akio KITAGAWA  Masashi TODA  Takeshi NAGASAKI  Toshio KAWASHIMA  

     
    PAPER-Image Sensor/Vision Chip

      Vol:
    E90-C No:10
      Page(s):
    1869-1875

    Although the line-of-sight (LoS) is expected to be useful as input methodology for computer systems, the application area of the conventional LoS detection system composed of video camera and image processor is restricted in the specialized area, such as academic research, due to its large size and high cost. There is a rapid eye motion, so called 'saccade' in our eye motion, which is expected to be useful for various applications. Because of the saccade's very high speed, it is impossible to track the saccade without using high speed camera. The authors have been proposing the high speed vision chip for LoS detection including saccade based on the pixel parallel processing architecture, however, its resolution is very low for the large size of its pixel. In this paper, we propose and discuss an architecture of the vision chip for LoS detection including saccade based on column-parallel processing manner for increasing the resolution with keeping high processing speed.

1-20hit(40hit)