The search functionality is under construction.

Keyword Search Result

[Keyword] network-on-chip(36hit)

1-20hit(36hit)

  • Hybrid, Asymmetric and Reconfigurable Input Unit Designs for Energy-Efficient On-Chip Networks

    Xiaoman LIU  Yujie GAO  Yuan HE  Xiaohan YUE  Haiyan JIANG  Xibo WANG  

     
    PAPER

      Pubricized:
    2023/04/10
      Vol:
    E106-C No:10
      Page(s):
    570-579

    The complexity and scale of Networks-on-Chip (NoCs) are growing as more processing elements and memory devices are implemented on chips. However, under strict power budgets, it is also critical to lower the power consumption of NoCs for the sake of energy efficiency. In this paper, we therefore present three novel input unit designs for on-chip routers attempting to shrink their power consumption while still conserving the network performance. The key idea behind our designs is to organize buffers in the input units with characteristics of the network traffic in mind; as in our observations, only a small portion of the network traffic are long packets (composed of multiple flits), which means, it is fair to implement hybrid, asymmetric and reconfigurable buffers so that they are mainly targeting at short packets (only having a single flit), hence the smaller power consumption and area overhead. Evaluations show that our hybrid, asymmetric and reconfigurable input unit designs can achieve an average reduction of energy consumption per flit by 45%, 52.3% and 56.2% under 93.6% (for hybrid designs) and 66.3% (for asymmetric and reconfigurable designs) of the original router area, respectively. Meanwhile, we only observe minor degradation in network latency (ranging from 18.4% to 1.5%, on average) with our proposals.

  • A Compression Router for Low-Latency Network-on-Chip

    Naoya NIWA  Yoshiya SHIKAMA  Hideharu AMANO  Michihiro KOIBUCHI  

     
    PAPER-Computer System

      Pubricized:
    2022/11/08
      Vol:
    E106-D No:2
      Page(s):
    170-180

    Network-on-Chips (NoCs) are important components for scalable many-core processors. Because the performance of parallel applications is usually sensitive to the latency of NoCs, reducing it is a primary requirement. In this study, a compression router that hides the (de)compression-operation delay is proposed. The compression router (de)compresses the contents of the incoming packet before the switch arbitration is completed, thus shortening the packet length without latency penalty and reducing the network injection-and-ejection latency. Evaluation results show that the compression router improves up to 33% of the parallel application performance (conjugate gradients (CG), fast Fourier transform (FT), integer sort (IS), and traveling salesman problem (TSP)) and 63% of the effective network throughput by 1.8 compression ratio on NoC. The cost is an increase in router area and its energy consumption by 0.22mm2 and 1.6 times compared to the conventional virtual-channel router. Another finding is that off-loading the decompressor onto a network interface decreases the compression-router area by 57% at the expense of the moderate increase in communication latency.

  • Simple Oblivious Routing Method to Balance Load in Network-on-Chip

    Jiao GUAN  Jueping CAI  Ruilian XIE  Yequn WANG  Jinzhi LAI  

     
    LETTER-Computer System

      Pubricized:
    2021/06/30
      Vol:
    E104-D No:10
      Page(s):
    1749-1752

    This letter presents an oblivious and load-balanced routing (OLBR) method without virtual channels for 2D mesh Network-on-chip (NoC). To balance the traffic load of network and avoid deadlock, OLBR divides network nodes into two regions, one region contains the nodes of east and west sides of NoC, in which packets are routed by odd-even turn rule with Y direction preference (OE-YX), and the remaining nodes are divided to the other region, in which packets are routed by odd-even turn rule with alterable priority arbitration (OE-APA). Simulation results show that OLBR's saturation throughput can be improved than related works by 11.73% and OLBR balances the traffic load over entire network.

  • DORR: A DOR-Based Non-Blocking Optical Router for 3D Photonic Network-on-Chips

    Meaad FADHEL  Huaxi GU  Wenting WEI  

     
    PAPER-Computer System

      Pubricized:
    2021/01/27
      Vol:
    E104-D No:5
      Page(s):
    688-696

    Recently, researchers paid more attention on designing optical routers, since they are essential building blocks of all photonic interconnection architectures. Thus, improving them could lead to a spontaneous improvement in the overall performance of the network. Optical routers suffer from the dilemma of increased insertion loss and crosstalk, which upraises the power consumed as the network scales. In this paper, we propose a new 7×7 non-blocking optical router based on the Dimension Order Routing (DOR) algorithm. Moreover, we develop a method that can ensure the least number of MicroRing Resonators (MRRs) in an optical router. Therefore, by reducing these optical devices, the optical router proposed can decrease the crosstalk and insertion loss of the network. This optical router is evaluated and compared to Ye's router and the optimized crossbar for 3D Mesh network that uses XYZ routing algorithm. Unlike many other proposed routers, this paper evaluates optical routers not only from router level prospective yet also consider the overall network level condition. The appraisals show that our optical router can reduce the worst-case network insertion loss by almost 8.7%, 46.39%, 39.3%, and 41.4% compared to Ye's router, optimized crossbar, optimized universal OR, and Optimized VOTEX, respectively. Moreover, it decreases the Optical Signal-to-Noise Ratio (OSNR) worst-case by almost 27.92%, 88%, 77%, and 69.6% compared to Ye's router, optimized crossbar, optimized universal OR, and Optimized VOTEX, respectively. It also reduces the power consumption by 3.22%, 23.99%, 19.12%, and 20.18% compared to Ye's router, optimized crossbar, optimized universal OR, and Optimized VOTEX, respectively.

  • LEF: An Effective Routing Algorithm for Two-Dimensional Meshes

    Thiem Van CHU  Kenji KISE  

     
    PAPER-Computer System

      Pubricized:
    2019/07/09
      Vol:
    E102-D No:10
      Page(s):
    1925-1941

    We design a new oblivious routing algorithm for two-dimensional mesh-based Networks-on-Chip (NoCs) called LEF (Long Edge First) which offers high throughput with low design complexity. LEF's basic idea comes from conventional wisdom in choosing the appropriate dimension-order routing (DOR) algorithm for supercomputers with asymmetric mesh or torus interconnects: routing longest dimensions first provides better performance than other strategies. In LEF, we combine the XY DOR and the YX DOR. When routing a packet, which DOR algorithm is chosen depends on the relative position between the source node and the destination node. Decisions of selecting the appropriate DOR algorithm are not fixed to the network shape but instead made on a per-packet basis. We also propose an efficient deadlock avoidance method for LEF in which the use of virtual channels is more flexible than in the conventional method. We evaluate LEF against O1TURN, another effective oblivious routing algorithm, and a minimal adaptive routing algorithm based on the odd-even turn model. The evaluation results show that LEF is particularly effective when the communication is within an asymmetric mesh. In a 16×8 NoC, LEF even outperforms the adaptive routing algorithm in some cases and delivers from around 4% up to around 64.5% higher throughput than O1TURN. Our results also show that the proposed deadlock avoidance method helps to improve LEF's performance significantly and can be used to improve O1TURN's performance. We also examine LEF in large-scale NoCs with thousands of nodes. Our results show that, as the NoC size increases, the performance of the routing algorithms becomes more strongly influenced by the resource allocation policy in the network and the effect is different for each algorithm. This is evident in that results of middle-scale NoCs with around 100 nodes cannot be applied directly to large-scale NoCs.

  • An Optimized Low-Power Optical Memory Access Network for Kilocore Systems

    Tao LIU  Huaxi GU  Yue WANG  Wei ZOU  

     
    LETTER-Computer System

      Pubricized:
    2019/02/04
      Vol:
    E102-D No:5
      Page(s):
    1085-1088

    An optimized low-power optical memory access network is proposed to alleviate the cost of microring resonators (MRs) in kilocore systems, such as the pass-by loss and integration difficulty. Compared with traditional electronic bus interconnect, the proposed network reduces power consumption and latency by 80% to 89% and 21% to 24%. Moreover, the new network decreases the number of MRs by 90.6% without an increase in power consumption and latency when making a comparison with Optical Ring Network-on-Chip (ORNoC).

  • BMM: A Binary Metaheuristic Mapping Algorithm for Mesh-Based Network-on-Chip

    Xilu WANG  Yongjun SUN  Huaxi GU  

     
    LETTER-Fundamentals of Information Systems

      Pubricized:
    2018/11/26
      Vol:
    E102-D No:3
      Page(s):
    628-631

    The mapping optimization problem in Network-on-Chip (NoC) is constraint and NP-hard, and the deterministic algorithms require considerable computation time to find an exact optimal mapping solution. Therefore, the metaheuristic algorithms (MAs) have attracted great interests of researchers. However, most MAs are designed for continuous problems and suffer from premature convergence. In this letter, a binary metaheuristic mapping algorithm (BMM) with a better exploration-exploitation balance is proposed to solve the mapping problem. The binary encoding is used to extend the MAs to the constraint problem and an adaptive strategy is introduced to combine Sine Cosine Algorithm (SCA) and Particle Swarm Algorithm (PSO). SCA is modified to explore the search space effectively, while the powerful exploitation ability of PSO is employed for the global optimum. A set of well-known applications and large-scale synthetic cores-graphs are used to test the performance of BMM. The results demonstrate that the proposed algorithm can improve the energy consumption more significantly than some other heuristic algorithms.

  • The Performance Evaluation of a 3D Torus Network Using Partial Link-Sharing Method in NoC Router Buffer

    Naohisa FUKASE  Yasuyuki MIURA  Shigeyoshi WATANABE  M.M. HAFIZUR RAHMAN  

     
    PAPER-Computer System

      Pubricized:
    2017/06/30
      Vol:
    E100-D No:10
      Page(s):
    2478-2492

    The high performance network-on-chip (NoC) router using minimal hardware resources to minimize the layout area is very essential for NoC design. In this paper, we have proposed a memory sharing method of a wormhole routed NoC architecture to alleviate the area overhead of a NoC router. In the proposed method, a memory is shared by multiple physical links by using a multi-port memory. In this paper, we have proposed a partial link-sharing method and evaluated the communication performance using the proposed method. It is revealed that the resulted communication performance by the proposed methods is higher than that of the conventional method, and the progress ratio of the 3D-torus network is higher than that of 2D-torus network. It is shown that the improvement of communication performance using partial link sharing method is achieved with slightly increase of hardware cost.

  • AXI-NoC: High-Performance Adaptation Unit for ARM Processors in Network-on-Chip Architectures

    Xuan-Tu TRAN  Tung NGUYEN  Hai-Phong PHAN  Duy-Hieu BUI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E100-A No:8
      Page(s):
    1650-1660

    The increasing demand on scalability and reusability of system-on-chip design as well as the decoupling between computation and communication has motivated the growth of the Network-on-Chip (NoC) paradigm in the last decade. In NoC-based systems, the computational resources (i.e. IPs) communicate with each other using a network infrastructure. Many works have focused on the development of NoC architectures and routing mechanisms, while the interfacing between network and associated IPs also needs to be considered. In this paper, we present a novel efficient AXI (AMBA eXtensible Interface) compliant network adapter for NoC architectures, which is named an AXI-NoC adapter. The proposed network adapter achieves high communication throughput of 20.8Gbits/s and consumes 4.14mW at the operating frequency of 650MHz. It has a low area footprint (952 gates, approximate to 2,793µm2 with CMOS 45nm technology) thanks to its effective hybrid micro-architectures and with zero latency thanks to the proposed mux-selection method.

  • Low-Cost Adaptive and Fault-Tolerant Routing Method for 2D Network-on-Chip

    Ruilian XIE  Jueping CAI  Xin XIN  Bo YANG  

     
    LETTER-Computer System

      Pubricized:
    2017/01/20
      Vol:
    E100-D No:4
      Page(s):
    910-913

    This letter presents a Preferable Mad-y (PMad-y) turn model and Low-cost Adaptive and Fault-tolerant Routing (LAFR) method that use one and two virtual channels along the X and Y dimensions for 2D mesh Network-on-Chip (NoC). Applying PMad-y rules and using the link status of neighbor routers within 2-hops, LAFR can tolerate multiple faulty links and routers in more complicated faulty situations and impose the reliability of network without losing the performance of network. Simulation results show that LAFR achieves better saturation throughput (0.98% on average) than those of other fault-tolerant routing methods and maintains high reliability of more than 99.56% on average. For achieving 100% reliability of network, a Preferable LAFR (PLAFR) is proposed.

  • Multi-Voltage Variable Pipeline Routers with the Same Clock Frequency for Low-Power Network-on-Chips Systems

    Akram BEN AHMED  Hiroki MATSUTANI  Michihiro KOIBUCHI  Kimiyoshi USAMI  Hideharu AMANO  

     
    PAPER

      Vol:
    E99-C No:8
      Page(s):
    909-917

    In this paper, the Multi-voltage (multi-Vdd) variable pipeline router is proposed to reduce the power consumption of Network-on-Chips (NoCs) designed for Chip Multi-processors (CMPs). The multi-Vdd variable pipeline router adjusts its pipeline depth (i.e., communication latency) and supply voltage level in response to the applied workload. Unlike Dynamic Voltage and Frequency Scaling (DVFS) routers, the operating frequency remains the same for all routers throughout the CMP; thus, omitting the need to synchronize neighboring routers working at different frequencies. Two types of router architectures are presented: a Coarse-Grained Variable Pipeline (CG-VP) router that changes the voltage supplied to the entire router, and a Fine-Grained Variable Pipeline (FG-VP) router that uses a finer power partition. The evaluation results showed that the CG-VP and FG-VP routers achieve a 22.9% and 35.3% power reduction on average with 14% and 23% area overhead in comparison with a baseline router without variable pipelines, respectively. Thanks to the adopted look-ahead mechanism to switch the supply voltage, the performance overhead is only 4.4%.

  • An Efficient Highly Adaptive and Deadlock-Free Routing Algorithm for 3D Network-on-Chip

    Lian ZENG  Tieyuan PAN  Xin JIANG  Takahiro WATANABE  

     
    PAPER

      Vol:
    E99-A No:7
      Page(s):
    1334-1344

    As the semiconductor technology continues to develop, hundreds of cores will be deployed on a single die in the future Chip-Multiprocessors (CMPs) design. Three-Dimensional Network-on-Chips (3D NoCs) has become an attractive solution which can provide impressive high performance. An efficient and deadlock-free routing algorithm is a critical to achieve the high performance of network-on-chip. Traditional methods based on deterministic and turn model are deadlock-free, but they are unable to distribute the traffic loads over the network. In this paper, we propose an efficient, adaptive and deadlock-free algorithm (EAR) based on a novel routing selection strategy in 3D NoC, which can distribute the traffic loads not only in intra-layers but also in inter-layers according to congestion information and path diversity. Simulation results show that the proposed method achieves the significant performance improvement compared with others.

  • A Fast Hierarchical Arbitration in Optical Network-on-Chip Based on Multi-Level Priority QoS

    Jie JIAN  Mingche LAI  Liquan XIAO  

     
    PAPER-Fiber-Optic Transmission for Communications

      Vol:
    E99-B No:4
      Page(s):
    875-884

    With the development of silicon-based Nano-photonics, Optical Network on Chip (ONoC) is, due to its high bandwidth and low latency, becoming an important choice for future multi-core networks. As a key ONoC technology, the arbitration scheme should provide differential arbitration service with high throughput and low latency for various types and priorities of traffic in CMPs. In this work, we propose a fast hierarchical arbitration scheme based on multi-level priority QoS. First, given multi-priority data buffer queue, arbiters provide differential transmissions with fair service for all nodes and guarantee the max-transmit-delay and min-communication-bandwidth for all queues. Second, arbiter adopts the transmit bound resource reservation scheme to reserve time slots for all nodes fairly, thereby achieving a throughput of 100%. Third, we propose fast arbitration with a layout of fast optical arbitration channels (FOACs) to reduce the arbitration period, thereby reducing packet transmitting delay. Simulation results show that with our hierarchical arbitration scheme, all nodes are allocated almost equal service access probability under various traffic patterns; thus, the min-communication-bandwidth and max-transmit-delay is guaranteed to be 5% and 80 cycles, respectively, under the overload demands. This scheme improves throughput by 17% compared to FeatherWeight under a self-similar traffic pattern and decreases arbitration delay by 15% compare to 2-pass arbitration, incurring a total power overhead of 5%.

  • Design of an Energy-Efficient Ternary Current-Mode Intra-Chip Communication Link for an Asynchronous Network-on-Chip

    Akira MOCHIZUKI  Hirokatsu SHIRAHAMA  Yuma WATANABE  Takahiro HANYU  

     
    PAPER-Communication for VLSI

      Vol:
    E97-D No:9
      Page(s):
    2304-2311

    An energy-efficient intra-chip communication link circuit with ternary current signaling is proposed for an asynchronous Network-on-Chip. The data signal encoded by an asynchronous three-state protocol is represented by a small-voltage-swing three-level intermediate signal, which results in the reduction of transition delay and achieving energy-efficient data transfer. The three-level voltage is generated by using a combination of dynamically controlled current sources with feedback loop mechanism. Moreover, the proposed circuit contains a power-saving scheme where the dynamically controlled transistors also are utilized. By cutting off the current paths when the data transfer on the communication link is inactive, the power dissipation can be greatly reduced. It is demonstrated that the average data-transfer speed is about 1.5 times faster than that of a binary CMOS implementation using a 130nm CMOS technology at the supply voltage of 1.2V.

  • Scalable Connection-Based Time Division Multiple Access Architecture for Wireless Network-on-Chip

    Shijun LIN  Zhaoshan LIU  Jianghong SHI  Xiaofang WU  

     
    BRIEF PAPER-Integrated Electronics

      Vol:
    E97-C No:9
      Page(s):
    918-921

    In this paper, we propose a scalable connection-based time division multiple access architecture for wireless NoC. In this architecture, only one-hop transmission is needed when a packet is transmitted from one wired subnet to another wired subnet, which improves the communication performance and cuts down the energy consumption. Furthermore, by carefully designing the central arbiter, the bandwidth of the wireless channel can be fully used. Simulation results show that compared with the traditional WCube wireless NoC architecture, the proposed architecture can greatly improve the network throughput, and cut down the transmission latency and energy consumption with a reasonable area overhead.

  • High-Throughput Partially Parallel Inter-Chip Link Architecture for Asynchronous Multi-Chip NoCs

    Naoya ONIZAWA  Akira MOCHIZUKI  Hirokatsu SHIRAHAMA  Masashi IMAI  Tomohiro YONEDA  Takahiro HANYU  

     
    PAPER-Dependable Computing

      Vol:
    E97-D No:6
      Page(s):
    1546-1556

    This paper introduces a partially parallel inter-chip link architecture for asynchronous multi-chip Network-on-Chips (NoCs). The multi-chip NoCs that operate as a large NoC have been recently proposed for very large systems, such as automotive applications. Inter-chip links are key elements to realize high-performance multi-chip NoCs using a limited number of I/Os. The proposed asynchronous link based on level-encoded dual-rail (LEDR) encoding transmits several bits in parallel that are received by detecting the phase information of the LEDR signals at each serial link. It employs a burst-mode data transmission that eliminates a per-bit handshake for a high-speed operation, but the elimination may cause data-transmission errors due to cross-talk and power-supply noises. For triggering data retransmission, errors are detected from the embedded phase information; error-detection codes are not used. The throughput is theoretically modelled and is optimized by considering the bit-error rate (BER) of the link. Using delay parameters estimated for a 0.13 µm CMOS technology, the throughput of 8.82 Gbps is achieved by using 10 I/Os, which is 90.5% higher than that of a link using 9 I/Os without an error-detection method operating under negligible low BER (<10-20).

  • Architecture and Evaluation of Low Power Many-Core SoC with Two 32-Core Clusters

    Takashi MIYAMORI  Hui XU  Hiroyuki USUI  Soichiro HOSODA  Toru SANO  Kazumasa YAMAMOTO  Takeshi KODAKA  Nobuhiro NONOGAKI  Nau OZAKI  Jun TANABE  

     
    PAPER

      Vol:
    E97-C No:4
      Page(s):
    360-368

    New media processing applications such as image recognition and AR (Augment Reality) have become into practical on embedded systems for automotive, digital-consumer and mobile products. Many-core processors have been proposed to realize much higher performance than multi-core processors. We have developed a low-power many-core SoC for multimedia applications in 40nm CMOS technology. Within a 210mm2 die, two 32-core clusters are integrated with dynamically reconfigurable processors, hardware accelerators, 2-channel DDR3 I/Fs, and other peripherals. Processor cores in the cluster share a 2MB L2 cache connected through a tree-based Network-on-Chip (NoC). Its total peak performance exceeds 1.5TOPS (Tera Operations Per Second). The high scalability and low power consumption are accomplished by parallelized software for multimedia applications. In case of face detection, the performance scales up to 64 cores and the SoC consumes only 2.21W. Moreover, it can execute the 1080p 48fps H.264 decoding about 520mW by 28 cores and the 4K2K 15fps super resolution about 770mW by 32 cores in one cluster. Exploiting parallelism by low power processor cores, the many-core SoC provides several tens of times better energy efficiency than that of a high performance desk-top quad-core processor.

  • RONoC: A Reconfigurable Architecture for Application-Specific Optical Network-on-Chip

    Huaxi GU  Zheng CHEN  Yintang YANG  Hui DING  

     
    LETTER-Computer System

      Vol:
    E97-D No:1
      Page(s):
    142-145

    Optical Network-on-Chip (ONoC) is a promising emerging technology, which can solve the bottlenecks faced by electrical on-chip interconnection. However, the existing proposals of ONoC are mostly built on fixed topologies, which are not flexible enough to support various applications. To make full use of the limited resource and provide a more efficient approach for resource allocation, RONoC (Reconfigurable Optical Network-on-Chip) is proposed in this letter. The topology can be reconfigured to meet the requirement of different applications. An 8×8 nonblocking router is also designed, together with the communication mechanism. The simulation results show that the saturation load of RONoC is 2 times better than mesh, and the energy consumption is 25% lower than mesh.

  • A Fully Optical Ring Network-on-Chip with Static and Dynamic Wavelength Allocation

    Ahmadou Dit Adi CISSE  Michihiro KOIBUCHI  Masato YOSHIMI  Hidetsugu IRIE  Tsutomu YOSHINAGA  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2545-2554

    Silicon photonics Network-on-Chips (NoCs) have emerged as an attractive solution to alleviate the high power consumption of traditional electronic interconnects. In this paper, we propose a fully optical ring NoC that combines static and dynamic wavelength allocation communication mechanisms. A different wavelength-channel is statically allocated to each destination node for light weight communication. Contention of simultaneous communication requests from multiple source nodes to the destination is solved by a token based arbitration for the particular wavelength-channel. For heavy load communication, a multiwavelength-channel is available by requesting it in execution time from source node to a special node that manages dynamic allocation of the shared multiwavelength-channel among all nodes. We combine these static and dynamic communication mechanisms in a same network that introduces selection techniques based on message size and congestion information. Using a photonic NoC simulator based on Phoenixsim, we evaluate our architecture under uniform random, neighbor, and hotspot traffic patterns. Simulation results show that our proposed fully optical ring NoC presents a good performance by utilizing adequate static and dynamic channels based on the selection techniques. We also show that our architecture can reduce by more than half, the energy consumption necessary for arbitration compared to hybrid photonic ring and mesh NoCs. A comparison with several previous works in term of architecture hardware cost shows that our architecture can be an attractive cost-performance efficient interconnection infrastructure for future SoCs and CMPs.

  • Fault Diagnosis and Reconfiguration Method for Network-on-Chip Based Multiple Processor Systems with Restricted Private Memories

    Masashi IMAI  Tomohiro YONEDA  

     
    PAPER

      Vol:
    E96-D No:9
      Page(s):
    1914-1925

    We propose a fault diagnosis and reconfiguration method based on the Pair and Swap scheme to improve the reliability and the MTTF (Mean Time To Failure) of network-on-chip based multiple processor systems where each processor core has its private memory. In the proposed scheme, two identical copies of a given task are executed on a pair of processor cores and the results are compared repeatedly in order to detect processor faults. If a fault is detected by mismatches, the fault is identified and isolated using a TMR (Triple Module Redundancy) and the system is reconfigured by the redundant processor cores. We propose that each task is quadruplicated and statically assigned to private memories so that each memory has only two different tasks. We evaluate the reliability of the proposed quadruplicated task allocation scheme in the viewpoint of MTTF. As a result, the MTTF of the proposed scheme is over 4.3 times longer than that of the duplicated task allocation scheme.

1-20hit(36hit)