The search functionality is under construction.

Author Search Result

[Author] Takatsugu ONO(7hit)

1-7hit
  • Evaluating Energy-Efficiency of DRAM Channel Interleaving Schemes for Multithreaded Programs

    Satoshi IMAMURA  Yuichiro YASUI  Koji INOUE  Takatsugu ONO  Hiroshi SASAKI  Katsuki FUJISAWA  

     
    PAPER-Computer System

      Pubricized:
    2018/06/08
      Vol:
    E101-D No:9
      Page(s):
    2247-2257

    The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of DRAM continues to grow, and it is not rare that DRAM consumes higher power than processors on modern servers. Therefore, a reduction in the DRAM energy consumption is a critical challenge to reduce the system-level energy consumption. Although it is well known that improving row buffer locality(RBL) and bank-level parallelism (BLP) is effective to reduce the DRAM energy consumption, our preliminary evaluation on a real server demonstrates that RBL is generally low across 15 multithreaded benchmarks. In this paper, we investigate the memory access patterns of these benchmarks using a simulator and observe that cache line-grained channel interleaving schemes, which are widely applied to modern servers including multiple memory channels, hurt the RBL each of the benchmarks potentially possesses. In order to address this problem, we focus on a row-grained channel interleaving scheme and compare it with three cache line-grained schemes. Our evaluation shows that it reduces the DRAM energy consumption by 16.7%, 12.3%, and 5.5% on average (up to 34.7%, 28.2%, and 12.0%) compared to the other schemes, respectively.

  • A Flexible Direct Attached Storage for a Data Intensive Application

    Takatsugu ONO  Yotaro KONISHI  Teruo TANIMOTO  Noboru IWAMATSU  Takashi MIYOSHI  Jun TANAKA  

     
    PAPER-Storage System

      Pubricized:
    2015/09/15
      Vol:
    E98-D No:12
      Page(s):
    2168-2177

    Big data analysis and a data storing applications require a huge volume of storage and a high I/O performance. Applications can achieve high level of performance and cost efficiency by exploiting the high I/O performance of direct attached storages (DAS) such as internal HDDs. With the size of stored data ever increasing, it will be difficult to replace servers since internal HDDs contain huge amounts of data. Generally, the data is copied via Ethernet when transferring the data from the internal HDDs to the new server. However, the amount of data will continue to rapidly increase, and thus, it will be hard to make these types of transfers through the Ethernet since it will take a long time. A storage area network such as iSCSI can be used to avoid this problem because the data can be shared with the servers. However, this decreases the level of performance and increases the costs. Improving the flexibility without incurring I/O performance degradation is required in order to improve the DAS architecture. In response to this issue, we propose FlexDAS, which improves the flexibility of direct attached storage by using a disk area network (DAN) without degradation the I/O performance. A resource manager connects or disconnects the computation nodes to the HDDs via the FlexDAS switch, which supports the SAS or SATA protocols. This function enables for the servers to be replaced in a short period of time. We developed a prototype FlexDAS switch and quantitatively evaluated the architecture. Results show that the FlexDAS switch can disconnect and connect the HDD to the server in just 1.16 seconds. We also confirmed that the FlexDAS improves the performance of the data intensive applications by up to 2.84 times compared with the iSCSI.

  • Reducing On-Chip DRAM Energy via Data Transfer Size Optimization

    Takatsugu ONO  Koji INOUE  Kazuaki MURAKAMI  Kenji YOSHIDA  

     
    PAPER

      Vol:
    E92-C No:4
      Page(s):
    433-443

    This paper proposes a software-controllable variable line-size (SC-VLS) cache architecture for low power embedded systems. High bandwidth between logic and a DRAM is realized by means of advanced integrated technology. System-in-Silicon is one of the architectural frameworks to realize the high bandwidth. An ASIC and a specific SRAM are mounted onto a silicon interposer. Each chip is connected to the silicon interposer by eutectic solder bumps. In the framework, it is important to reduce the DRAM energy consumption. The specific DRAM needs a small cache memory to improve the performance. We exploit the cache to reduce the DRAM energy consumption. During application program executions, an adequate cache line size which produces the lowest cache miss ratio is varied because the amount of spatial locality of memory references changes. If we employ a large cache line size, we can expect the effect of prefetching. However, the DRAM energy consumption is larger than a small line size because of the huge number of banks are accessed. The SC-VLS cache is able to change a line size to an adequate one at runtime with a small area and power overheads. We analyze the adequate line size and insert line size change instructions at the beginning of each function of a target program before executing the program. In our evaluation, it is observed that the SC-VLS cache reduces the DRAM energy consumption up to 88%, compared to a conventional cache with fixed 256 B lines.

  • Critical Path Based Microarchitectural Bottleneck Analysis for Out-of-Order Execution

    Teruo TANIMOTO  Takatsugu ONO  Koji INOUE  

     
    PAPER

      Vol:
    E102-A No:6
      Page(s):
    758-766

    Correctly understanding microarchitectural bottlenecks is important to optimize performance and energy of OoO (Out-of-Order) processors. Although CPI (Cycles Per Instruction) stack has been utilized for this purpose, it stacks architectural events heuristically by counting how many times the events occur, and the order of stacking affects the result, which may be misleading. It is because CPI stack does not consider the execution path of dynamic instructions. Critical path analysis (CPA) is a well-known method to identify the critical execution path of dynamic instruction execution on OoO processors. The critical path consists of the sequence of events that determines the execution time of a program on a certain processor. We develop a novel representation of CPCI stack (Cycles Per Critical Instruction stack), which is CPI stack based on CPA. The main challenge in constructing CPCI stack is how to analyze a large number of paths because CPA often results in numerous critical paths. In this paper, we show that there are more than ten to the tenth power critical paths in the execution of only one thousand instructions in 35 benchmarks out of 48 from SPEC CPU2006. Then, we propose a statistical method to analyze all the critical paths and show a case study using the benchmarks.

  • Real-Time Frame-Rate Control for Energy-Efficient On-Line Object Tracking

    Yusuke INOUE  Takatsugu ONO  Koji INOUE  

     
    PAPER

      Vol:
    E101-A No:12
      Page(s):
    2297-2307

    On-line object tracking (OLOT) has been a core technology in computer vision, and its importance has been increasing rapidly. Because this technology is utilized for battery-operated products, energy consumption must be minimized. This paper describes a method of adaptive frame-rate optimization to satisfy that requirement. An energy trade-off occurs between image capturing and object tracking. Therefore, the method optimizes the frame-rate based on always changed object speed for minimizing the total energy while taking into account the trade-off. Simulation results show a maximum energy reduction of 50.0%, and an average reduction of 35.9% without serious tracking accuracy degradation.

  • Parallel Precomputation with Input Value Prediction for Model Predictive Control Systems

    Satoshi KAWAKAMI  Takatsugu ONO  Toshiyuki OHTSUKA  Koji INOUE  

     
    PAPER-Real-time Systems

      Pubricized:
    2018/09/18
      Vol:
    E101-D No:12
      Page(s):
    2864-2877

    We propose a parallel precomputation method for real-time model predictive control. The key idea is to use predicted input values produced by model predictive control to solve an optimal control problem in advance. It is well known that control systems are not suitable for multi- or many-core processors because feedback-loop control systems are inherently based on sequential operations. However, since the proposed method does not rely on conventional thread-/data-level parallelism, it can be easily applied to such control systems without changing the algorithm in applications. A practical evaluation using three real-world model predictive control system simulation programs demonstrates drastic performance improvement without degrading control quality offered by the proposed method.

  • Towards Ultra-High-Speed Cryogenic Single-Flux-Quantum Computing Open Access

    Koki ISHIDA  Masamitsu TANAKA  Takatsugu ONO  Koji INOUE  

     
    INVITED PAPER

      Vol:
    E101-C No:5
      Page(s):
    359-369

    CMOS microprocessors are limited in their capacity for clock speed improvement because of increasing computing power, i.e., they face a power-wall problem. Single-flux-quantum (SFQ) circuits offer a solution with their ultra-fast-speed and ultra-low-power natures. This paper introduces our contributions towards ultra-high-speed cryogenic SFQ computing. The first step is to design SFQ microprocessors. From qualitatively and quantitatively evaluating past-designed SFQ microprocessors, we have found that revisiting the architecture of SFQ microprocessors and on-chip caches is the first critical challenge. On the basis of cross-layer discussions and analysis, we came to the conclusion that a bit-parallel gate-level pipeline architecture is the best solution for SFQ designs. This paper summarizes our current research results targeting SFQ microprocessors and on-chip cache architectures.