1-3hit |
Bing XU Shouyi YIN Leibo LIU Shaojun WEI
Coarse Grained Reconfigurable Architectures (CGRAs) are promising platform based on its high-performance and low cost. Researchers have developed efficient compilers for mapping compute-intensive applications on CGRA using modulo scheduling. In order to generate loop kernel, every stage of kernel are forced to have the same execution time which is determined by the critical PE. Hence non-critical PEs can decrease the supply voltage according to its slack time. The variable Dual-VDD CGRA incorporates this feature to reduce power consumption. Previous work mainly focuses on calculating a global optimal VDDL using overall optimization method that does not fully exploit the flexibility of architecture. In this brief, we adopt variable optimal VDDL in each stage of kernel concerning their pattern respectively instead of the fixed simulated global optimal VDDL. Experiment shows our proposed heuristic approach could reduce the power by 27.6% on average without decreasing performance. The compilation time is also acceptable.
Yan CHEN Jing ZHANG Yuebing XU Yingjie ZHANG Renyuan ZHANG Yasuhiko NAKASHIMA
An efficient resistive random access memory (ReRAM) structure is developed for accelerating convolutional neural network (CNN) powered by the in-memory computation. A novel ReRAM cell circuit is designed with two-directional (2-D) accessibility. The entire memory system is organized as a 2-D array, in which specific memory cells can be identically accessed by both of column- and row-locality. For the in-memory computations of CNNs, only relevant cells in an identical sub-array are accessed by 2-D read-out operations, which is hardly implemented by conventional ReRAM cells. In this manner, the redundant access (column or row) of the conventional ReRAM structures is prevented to eliminated the unnecessary data movement when CNNs are processed in-memory. From the simulation results, the energy and bandwidth efficiency of the proposed memory structure are 1.4x and 5x of a state-of-the-art ReRAM architecture, respectively.
Bo LIU Hui HU Chao HU Bo XU Bing XU
Maximizing the profit of datacenter networks (DCNs) demands to satisfy more flows' requirements simultaneously, but existing schemes always allocate resource based on single flow attribute, which cannot carry out accurate resource allocation and make many flows failed. In this letter, we propose Highest Priority Flow First (HPFF) to maximize DCN profit, which allocates resource for flows according to the priority. HPFF employs a utility function that considers multiple flow attributes, including flow size, deadline and demanded bandwidth, to calculate the priority for each flow. The experiments on the testbed show that HPFF can improve the network profit by 6.75%-19.7% and decrease the number of failed flow by 26.3%-83.3% compared with existing schemes under real DCN workloads.