The search functionality is under construction.

Author Search Result

[Author] Yuan HE(6hit)

1-6hit
  • A Runtime Optimization Selection Framework to Realize Energy Efficient Networks-on-Chip

    Yuan HE  Masaaki KONDO  Takashi NAKADA  Hiroshi SASAKI  Shinobu MIWA  Hiroshi NAKAMURA  

     
    PAPER-Architecture

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2881-2890

    Networks-on-Chip (or NoCs, for short) play important roles in modern and future multi-core processors as they are highly related to both performance and power consumption of the entire chip. Up to date, many optimization techniques have been developed to improve NoC's bandwidth, latency and power consumption. But a clear answer to how energy efficiency is affected with these optimization techniques is yet to be found since each of these optimization techniques comes with its own benefits and overheads while there are also too many of them. Thus, here comes the problem of when and how such optimization techniques should be applied. In order to solve this problem, we build a runtime framework to throttle these optimization techniques based on concise performance and energy models. With the help of this framework, we can successfully establish adaptive selections over multiple optimization techniques to further improve performance or energy efficiency of the network at runtime.

  • Efficient and Precise Profiling, Modeling and Management on Power and Performance for Power Constrained HPC Systems

    Yuan HE  Yasutaka WADA  Wenchao LUO  Ryuichi SAKAMOTO  Guanqin PAN  Thang CAO  Masaaki KONDO  

     
    PAPER

      Pubricized:
    2020/12/01
      Vol:
    E104-C No:6
      Page(s):
    237-246

    Due to the slowdown of Moore's Law, power limitation has been one of the most critical issues for current and future HPC systems. To more efficiently utilize HPC systems when power budgets or deadlines are given, it is very desirable to accurately estimate the performance or power consumption of applications before conducting their tuned production runs on any specific systems. In order to ease such estimations, we showcase a straight-forward and yet effective method, based on the enhanced power management framework and DSL we developed, to help HPC users to clarify the performance and power relationships of their applications. This method demonstrates an easy process of profiling, modeling and management on both performance and power of HPC systems and applications. In our evaluations, only a few (up to 3) profiled runs are necessary before very precise models of HPC applications can be obtained through this method (and algorithm), which has dramatically improved the efficiency of and lowered the difficulty in utilizing HPC systems under limited power budgets.

  • Energy Efficiency Based Multi Service Heterogeneous Access Network Selection Algorithm

    Meng-Yuan HE  Ling-Yun JIANG  

     
    PAPER-Network System

      Pubricized:
    2023/04/24
      Vol:
    E106-B No:10
      Page(s):
    881-890

    In the current heterogeneous wireless communication system, the sharp rise in energy consumption and the emergence of new service types pose great challenges to nowadays radio access network selection algorithms which do not take care of these new trends. So the proposed energy efficiency based multi-service heterogeneous access network selection algorithm-ESRS (Energy Saving Radio access network Selection) is intended to reduce the energy consumption caused by the traffic in the mobile network system composed of Base Stations (BSs) and Access Points (APs). This algorithm models the access network selection problem as a Multiple-Attribute Decision-Making (MADM) problem. To solve this problem, lots of methods are combined, including analytic Hierarchy Process (AHP), weighted grey relational analysis (GRA), entropy theory, simple additive weight (SAW), and utility function theory. There are two main steps in this algorithm. At first, the proposed algorithm gets the result of the user QoS of each network by dealing with the related QoS parameters, in which entropy theory and AHP are used to determine the QoS comprehensive weight, and the SAW is used to get each network's QoS. In addition to user QoS, parameters including user throughput, energy consumption utility and cost utility are also calculated in this step. In the second step, the fuzzy theory is used to define the weight of decision attributes, and weighted grey relational analysis (GRA) is used to calculate the network score, which determines the final choice. Because the fuzzy weight has a preference for the low energy consumption, the energy consumption of the traffic will be saved by choosing the network with the least energy consumption as much as possible. The simulation parts compared the performance of ESRS, ABE and MSNS algorithms. The numerical results show that ESRS algorithm can select the appropriate network based on the service demands and network parameters. Besides, it can effectively reduce the system energy consumption and overall cost while still maintaining a high overall QoS value and a high system throughput, when compared with the other two algorithms.

  • Hybrid, Asymmetric and Reconfigurable Input Unit Designs for Energy-Efficient On-Chip Networks

    Xiaoman LIU  Yujie GAO  Yuan HE  Xiaohan YUE  Haiyan JIANG  Xibo WANG  

     
    PAPER

      Pubricized:
    2023/04/10
      Vol:
    E106-C No:10
      Page(s):
    570-579

    The complexity and scale of Networks-on-Chip (NoCs) are growing as more processing elements and memory devices are implemented on chips. However, under strict power budgets, it is also critical to lower the power consumption of NoCs for the sake of energy efficiency. In this paper, we therefore present three novel input unit designs for on-chip routers attempting to shrink their power consumption while still conserving the network performance. The key idea behind our designs is to organize buffers in the input units with characteristics of the network traffic in mind; as in our observations, only a small portion of the network traffic are long packets (composed of multiple flits), which means, it is fair to implement hybrid, asymmetric and reconfigurable buffers so that they are mainly targeting at short packets (only having a single flit), hence the smaller power consumption and area overhead. Evaluations show that our hybrid, asymmetric and reconfigurable input unit designs can achieve an average reduction of energy consumption per flit by 45%, 52.3% and 56.2% under 93.6% (for hybrid designs) and 66.3% (for asymmetric and reconfigurable designs) of the original router area, respectively. Meanwhile, we only observe minor degradation in network latency (ranging from 18.4% to 1.5%, on average) with our proposals.

  • Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMV

    Siyi HU  Makiko ITO  Takahide YOSHIKAWA  Yuan HE  Hiroshi NAKAMURA  Masaaki KONDO  

     
    PAPER

      Pubricized:
    2023/07/20
      Vol:
    E106-D No:12
      Page(s):
    2015-2025

    Widely adopted by machine learning and graph processing applications nowadays, sparse matrix-Vector multiplication (SpMV) is a very popular algorithm in linear algebra. This is especially the case for fully-connected MLP layers, which dominate many SpMV computations and play a substantial role in diverse services. As a consequence, a large fraction of data center cycles is spent on SpMV kernels. Meanwhile, despite having efficient storage options against sparsity (such as CSR or CSC), SpMV kernels still suffer from the problem of limited memory bandwidth during data transferring because of the memory hierarchy of modern computing systems. In more detail, we find that both integer and floating-point data used in SpMV kernels are handled plainly without any necessary pre-processing. Therefore, we believe bandwidth conservation techniques, such as data compression, may dramatically help SpMV kernels when data is transferred between the main memory and the Last Level Cache (LLC). Furthermore, we also observe that convergence conditions in some typical scientific computation benchmarks (based on SpMV kernels) will not be degraded when adopting lower precision floating-point data. Based on these findings, in this work, we propose a simple yet effective data compression scheme that can be extended to general purpose computing architectures or HPC systems preferably. When it is adopted, a best-case speedup of 1.92x is made. Besides, evaluations with both the CG kernel and the PageRank algorithm indicate that our proposal introduces negligible overhead on both the convergence speed and the accuracy of final results.

  • Correlation Distributions between an m-Sequence and Its Niho Decimation Sequences of Short Period

    Yongbo XIA  Shiyuan HE  Shaoping CHEN  

     
    PAPER-Information Theory

      Vol:
    E102-A No:2
      Page(s):
    450-457

    Let d=2pm-1 be the Niho decimation over $mathbb{F}_{p^{2m}}$ satisfying $gcd(d,p^{2m}-1)=3$, where m is an odd positive integer and p is a prime with p ≡ 2(mod 3). The cross-correlation function between the p-ary m-sequence of period p2m-1 and its every d-decimation sequence with short period $ rac{p^{2m}-1}{3}$ is investigated. It is proved that for each d-decimation sequence, the cross-correlation function takes four values and the corresponding correlation distribution is completely determined. This extends the results of Niho and Helleseth for the case gcd(d, p2m-1)=1.