The search functionality is under construction.

Author Search Result

[Author] Hiroyuki TOMIYAMA(30hit)

1-20hit(30hit)

  • Static Mapping with Dynamic Switching of Multiple Data-Parallel Applications on Embedded Many-Core SoCs

    Ittetsu TANIGUCHI  Junya KAIDA  Takuji HIEDA  Yuko HARA-AZUMI  Hiroyuki TOMIYAMA  

     
    PAPER-Fundamentals of Information Systems

      Vol:
    E97-D No:11
      Page(s):
    2827-2834

    This paper studies mapping techniques of multiple applications on embedded many-core SoCs. The mapping techniques proposed in this paper are static which means the mapping is decided at design time. The mapping techniques take into account both inter-application and intra-application parallelism in order to fully utilize the potential parallelism of the many-core architecture. Additionally, the proposed static mapping supports dynamic application switching, which means the applications mapped onto the same cores are switched to each other at runtime. Two approaches are proposed for static mapping: one approach is based on integer linear programming and the other is based on a greedy algorithm. Experimental results show the effectiveness of the proposed techniques.

  • Function Call Optimization for Efficient Behavioral Synthesis

    Yuko HARA  Hiroyuki TOMIYAMA  Shinya HONDA  Hiroaki TAKADA  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E90-A No:9
      Page(s):
    2032-2036

    Behavioral synthesis, which automatically synthesizes an RTL circuit from a sequential program, is one of promising technologies to improve the design productivity. This paper proposes a function call optimization method in behavioral synthesis from large sequential programs with a number of functions. We formulate the optimization problem using integer linear programming. Our experimental results show the reduction in the circuit area by up to 44.6%, compared with a traditional method.

  • Module Selection Using Manufacturing Information

    Hiroyuki TOMIYAMA  Hiroto YASUURA  

     
    PAPER-High-level Synthesis

      Vol:
    E81-A No:12
      Page(s):
    2576-2584

    Since manufacturing processes inherently fluctuate, LSI chips which are produced from the same design have different propagation delays. However, the difference in delays caused by the process fluctuation has rarely been considered in most of existing high-level synthesis systems. This paper presents a new approach to module selection in high-level synthesis, which exploits the difference in functional unit delays. First, a module library model which assumes the probabilistic nature of functional unit delays is presented. Then, we propose a module selection problem and an algorithm which minimizes the cost per faultless chip. Experimental results demonstrate that the proposed algorithm finds optimal module selections which would not have been explored without manufacturing information.

  • Static Mapping of Multiple Data-Parallel Applications on Embedded Many-Core SoCs

    Junya KAIDA  Yuko HARA-AZUMI  Takuji HIEDA  Ittetsu TANIGUCHI  Hiroyuki TOMIYAMA  Koji INOUE  

     
    LETTER-Computer System

      Vol:
    E96-D No:10
      Page(s):
    2268-2271

    This paper studies the static mapping of multiple applications on embedded many-core SoCs. The mapping techniques proposed in this paper take into account both inter-application and intra-application parallelism in order to fully utilize the potential parallelism of the many-core architecture. Two approaches are proposed for static mapping: one approach is based on integer linear programming and the other is based on a greedy algorithm. Experiments show the effectiveness of the proposed techniques.

  • Soft-Core Processor Architecture for Embedded System Design

    Eko Fajar NURPRASETYO  Akihiko INOUE  Hiroyuki TOMIYAMA  Hiroto YASUURA  

     
    PAPER

      Vol:
    E81-C No:9
      Page(s):
    1416-1423

    In the design of an embedded system, an architecture of core processor strongly affects the performance and cost of the total system. This paper discusses a scalable processor architecture, called soft-core processor, which can be tuned for a target system. System designers can optimize several design parameters such as the datapath width and instruction set, and generate customized processors for their application. Design of Bung-DLX as a prototype of soft-core processor is presented in this paper. An experiment of system design using our processor has shown that the optimized processor chip area halves when the critical path delay is reduced to one third of the original one.

  • An Effective GA-Based Scheduling Algorithm for FlexRay Systems

    Shan DING  Hiroyuki TOMIYAMA  Hiroaki TAKADA  

     
    PAPER-System Programs

      Vol:
    E91-D No:8
      Page(s):
    2115-2123

    An advanced communication system, the FlexRay system, has been developed for future automotive applications. It consists of time-triggered clusters, such as drive-by-wire in cars, in order to meet different requirements and constraints between various sensors, processors, and actuators. In this paper, an approach to static scheduling for FlexRay systems is proposed. Our experimental results show that the proposed scheduling method significantly reduces up to 36.3% of the network traffic compared with a past approach.

  • Impacts of Compiler Optimizations on Address Bus Energy: An Empirical Study

    Hiroyuki TOMIYAMA  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E87-A No:10
      Page(s):
    2815-2820

    Energy consumption is one of the most critical constraints in the design of portable embedded systems. This paper describes an empirical study about the impacts of compiler optimizations on the energy consumption of the address bus between processor and instruction memory. Experiments using a number of real-world applications are presented, and the results show that transitions on the instruction address bus can be significantly reduced (by 85% on the average) by the compiler optimizations together with bus encoding.

  • Static Mapping of Parallelizable Tasks under Deadline Constraints

    Yining XU  Ittetsu TANIGUCHI  Hiroyuki TOMIYAMA  

     
    LETTER

      Vol:
    E100-A No:7
      Page(s):
    1500-1502

    Task mapping is one of the most important design processes in embedded manycore systems. This paper proposes a static task mapping technique for manycore real-time systems. The technique minimizes the number of cores while satisfying deadline constraints of individual tasks.

  • Fast and Accurate Architecture Exploration for High Performance and Low Energy VLIW Data-Path

    Ittetsu TANIGUCHI  Kohei AOKI  Hiroyuki TOMIYAMA  Praveen RAGHAVAN  Francky CATTHOOR  Masahiro FUKUI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E97-A No:2
      Page(s):
    606-615

    A fast and accurate architecture exploration for high performance and low energy VLIW data-path is proposed. The main contribution is a method to find Pareto optimal FU structures, i.e., the optimal number of FUs and the best instruction assignment for each FU. The proposed architecture exploration method is based on GA and enables the effective exploration of vast solution space. Experimental results showed that proposed method was able to achieve fast and accurate architecture exploration. For most cases, the estimation error was less than 1%.

  • Memory Data Organization for Low-Energy Address Buses

    Hiroyuki TOMIYAMA  Hiroaki TAKADA  Nikil D. DUTT  

     
    PAPER

      Vol:
    E87-C No:4
      Page(s):
    606-612

    Energy consumption has become one of the most critical constraints in the design of portable multimedia systems. For media applications, address buses between processor and data memory consume a considerable amount of energy due to their large capacitance and frequent accesses. This paper studies impacts of memory data organization on the address bus energy. Our experiments show that the address bus activity is significantly reduced by 50% through exploring memory data organization and encoding address buses.

  • Partitioning of Behavioral Descriptions with Exploiting Function-Level Parallelism

    Yuko HARA  Hiroyuki TOMIYAMA  Shinya HONDA  Hiroaki TAKADA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E93-A No:2
      Page(s):
    488-499

    A novel method to efficiently synthesize hardware from a large behavioral description in behavioral synthesis is proposed. For a program with functions executable in parallel, this proposed method determines a behavioral partitioning which simultaneously minimizes the overall datapath area and the complexity of the controller while maximizing performance of a synthesized circuit by fully exploiting function-level parallelism of a behavioral description. This method is formulated as an integer programming problem. Experimental results demonstrate that this method leads to a shift of the explorable design space so that superior solutions which could not be explored by earlier work are included, showing the effectiveness of our proposed method.

  • Function-Level Partitioning of Sequential Programs for Efficient Behavioral Synthesis

    Yuko HARA  Hiroyuki TOMIYAMA  Shinya HONDA  Hiroaki TAKADA  Katsuya ISHII  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E90-A No:12
      Page(s):
    2853-2862

    This paper proposes a behavioral level partitioning method for efficient behavioral synthesis from a large sequential program consisting of a set of functions. Our method optimally determines functions to be inlined into the main module and the other functions to be synthesized into sub modules in such a way that the overall datapath is minimized while the complexity of individual modules is lower than a certain level. The partitioning problem is formulated as an integer programming problem. Experimental results show the effectiveness of the proposed method.

  • Simultaneous Scheduling and Core-Type Optimization for Moldable Fork-Join Tasks on Heterogeneous Multicores

    Hiroki NISHIKAWA  Kana SHIMADA  Ittetsu TANIGUCHI  Hiroyuki TOMIYAMA  

     
    PAPER

      Pubricized:
    2021/09/01
      Vol:
    E105-A No:3
      Page(s):
    540-548

    With the demand for energy-efficient and high- performance computing, multicore architecture has become more appealing than ever. Multicore task scheduling is one of domains in parallel computing which exploits the parallelism of multicore. Unlike traditional scheduling, multicore task scheduling has recently been studied on the assumption that tasks have inherent parallelism and can be split into multiple sub-tasks in data parallel fashion. However, it is still challenging to properly determine the degree of parallelism of tasks and mapping on multicores. Our proposed scheduling techniques determine the degree of parallelism of tasks, and sub-tasks are decided which type of cores to be assigned to heterogeneous multicores. In addition, two approaches to hardware/software codesign for heterogeneous multicore systems are proposed. The works optimize the types of cores organized in the architecture simultaneously with scheduling of the tasks such that the overall energy consumption is minimized under a deadline constraint, a warm start approach is also presented to effectively solve the problem. The experimental results show the simultaneous scheduling and core-type optimization technique remarkably reduces the energy consumption.

  • Satsuki: An Integrated Processor Synthesis and Compiler Generation System

    Barry SHACKLEFORD  Mitsuhiro YASUDA  Etsuko OKUSHI  Hisao KOIZUMI  Hiroyuki TOMIYAMA  Hiroto YASUURA  

     
    PAPER-Hardware-Software Codesign

      Vol:
    E79-D No:10
      Page(s):
    1373-1381

    Entire systems on a chip (SOCs) embodying a processor, memory, and system-specific peripheral hardware are now an everyday reality. The current generation of SOC designers are driven more than ever by the need to lower chip cost, while at the same time being faced with demands to get designs to market more quickly. It was to support this new community of designers that we developed Satsuki-an integrated processor synthesis and compiler generation system. By allowing the designer to tune the processor design to the bitwidth and performance required by the application, minimum cost designs are achieved. Using synthesis to implement the processor in the same technology as the rest of the chip, allows for global chip optimization from the perspective of the system as a whole and assures design portability. The integral compiler generator, driven by the same parameters used for processor synthesis, promotes high-level expression of application algorithms while at the same time isolating the application software from the processor implementation. Synthesis experiments incorporating a 0.8 micron CMOS gate array have produced designs ranging from a 45 MHz, 1,500 gate, 8-bit processor with a 4-word register file to a 31 MHz, 9,800 gate, 32-bit processor with a 16-word register file.

  • Static Mapping of Multiple Parallel Applications on Non-Hierarchical Manycore Embedded Systems

    Yining XU  Yang LIU  Junya KAIDA  Ittetsu TANIGUCHI  Hiroyuki TOMIYAMA  

     
    LETTER

      Vol:
    E99-A No:7
      Page(s):
    1417-1419

    This paper proposes a static application mapping technique, based on integer linear programming, for non-hierarchical manycore embedded systems. Unlike previous work which was designed for hierarchical manycore SoCs, this work allows more flexible application mapping to achieve higher performance. The experimental results show the effectiveness of this work.

  • Static Task Scheduling Algorithms Based on Greedy Heuristics for Battery-Powered DVS Systems

    Tetsuo YOKOYAMA  Gang ZENG  Hiroyuki TOMIYAMA  Hiroaki TAKADA  

     
    PAPER-Software System

      Vol:
    E93-D No:10
      Page(s):
    2737-2746

    The principles for good design of battery-aware voltage scheduling algorithms for both aperiodic and periodic task sets on dynamic voltage scaling (DVS) systems are presented. The proposed algorithms are based on greedy heuristics suggested by several battery characteristics and Lagrange multipliers. To construct the proposed algorithms, we use the battery characteristics in the early stage of scheduling more properly. As a consequence, the proposed algorithms show superior results on synthetic examples of periodic and aperiodic tasks from the task sets which are excerpted from the comparative work, on uni- and multi-processor platforms, respectively. In particular, for some large task sets, the proposed algorithms enable previously unschedulable task sets due to battery exhaustion to be schedulable.

  • An Integrated Framework for Energy Optimization of Embedded Real-Time Applications

    Hideki TAKASE  Gang ZENG  Lovic GAUTHIER  Hirotaka KAWASHIMA  Noritoshi ATSUMI  Tomohiro TATEMATSU  Yoshitake KOBAYASHI  Takenori KOSHIRO  Tohru ISHIHARA  Hiroyuki TOMIYAMA  Hiroaki TAKADA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E97-A No:12
      Page(s):
    2477-2487

    This paper presents a framework for reducing the energy consumption of embedded real-time systems. We implemented the presented framework as both an optimization toolchain and an energy-aware real-time operating system. The framework consists of the integration of multiple techniques to optimize the energy consumption. The main idea behind our approach is to utilize trade-offs between the energy consumption and the performance of different processor configurations during task checkpoints, and to maintain memory allocation during task context switches. In our framework, a target application is statically analyzed at both intra-task and inter-task levels. Based on these analyzed results, runtime optimization is performed in response to the behavior of the application. A case study shows that our toolchain and real-time operating systems have achieved energy reduction while satisfying the real-time performance. The toolchain has also been successfully applied to a practical application.

  • ILP-Based Program Path Analysis for Bounding Worst-Case Inter-Task Cache Conflicts

    Hiroyuki TOMIYAMA  Nikil DUTT  

     
    LETTER-System Programs

      Vol:
    E87-D No:6
      Page(s):
    1582-1587

    The unpredictable behavior of cache memory makes it difficult to statically analyze the worst-case performance of real-time systems. This problem is further exacerbated in the case of preemptive multitask systems because of inter-task cache interference, called Cache-Related Preemption Delay (CRPD). This paper proposes an approach to analyzing the tight upper bound on CRPD which a task might impose on lower-priority tasks. Our method finds the program execution path which requires the maximum number of cache blocks using an integer linear programming technique. Experimental results show that our approach provides up to 69% tighter bounds on CRPD than a conservative approach.

  • ILP-Based Scheduling for Parallelizable Tasks

    Kana SHIMADA  Shogo KITANO  Ittetsu TANIGUCHI  Hiroyuki TOMIYAMA  

     
    LETTER

      Vol:
    E100-A No:7
      Page(s):
    1503-1505

    Task scheduling is one of the most important processes in the design of multicore computing systems. This paper presents a technique for scheduling of malleable tasks. Our scheduling technique decides not only the execution order of the tasks but also the number of cores assigned to the individual tasks, simultaneously. We formulate the scheduling problem as an integer linear programming (ILP) problem, and the optimal schedule can be obtained by solving the ILP problem. Experiments using a standard task-set suite clarify the strength of this work.

  • High-Level Synthesis of Software Function Calls

    Masanari NISHIMURA  Nagisa ISHIURA  Yoshiyuki ISHIMORI  Hiroyuki KANBARA  Hiroyuki TOMIYAMA  

     
    LETTER-High-Level Synthesis and System-Level Design

      Vol:
    E91-A No:12
      Page(s):
    3556-3558

    This letter presents a novel framework in high-level synthesis where hardware modules synthesized from functions in a given ANSI-C program can call the other software functions in the program. This enables high-level synthesis from C programs that contains calls to hard-to-synthesize functions, such as dynamic memory management, I/O request, or very large and complex functions. A single-thread implementation scheme is shown, whose correctness has been verified through register transfer level simulation.

1-20hit(30hit)