Ittetsu TANIGUCHI Junya KAIDA Takuji HIEDA Yuko HARA-AZUMI Hiroyuki TOMIYAMA
This paper studies mapping techniques of multiple applications on embedded many-core SoCs. The mapping techniques proposed in this paper are static which means the mapping is decided at design time. The mapping techniques take into account both inter-application and intra-application parallelism in order to fully utilize the potential parallelism of the many-core architecture. Additionally, the proposed static mapping supports dynamic application switching, which means the applications mapped onto the same cores are switched to each other at runtime. Two approaches are proposed for static mapping: one approach is based on integer linear programming and the other is based on a greedy algorithm. Experimental results show the effectiveness of the proposed techniques.
Yuko HARA Hiroyuki TOMIYAMA Shinya HONDA Hiroaki TAKADA
Behavioral synthesis, which automatically synthesizes an RTL circuit from a sequential program, is one of promising technologies to improve the design productivity. This paper proposes a function call optimization method in behavioral synthesis from large sequential programs with a number of functions. We formulate the optimization problem using integer linear programming. Our experimental results show the reduction in the circuit area by up to 44.6%, compared with a traditional method.
Hiroyuki TOMIYAMA Hiroto YASUURA
Since manufacturing processes inherently fluctuate, LSI chips which are produced from the same design have different propagation delays. However, the difference in delays caused by the process fluctuation has rarely been considered in most of existing high-level synthesis systems. This paper presents a new approach to module selection in high-level synthesis, which exploits the difference in functional unit delays. First, a module library model which assumes the probabilistic nature of functional unit delays is presented. Then, we propose a module selection problem and an algorithm which minimizes the cost per faultless chip. Experimental results demonstrate that the proposed algorithm finds optimal module selections which would not have been explored without manufacturing information.
Junya KAIDA Yuko HARA-AZUMI Takuji HIEDA Ittetsu TANIGUCHI Hiroyuki TOMIYAMA Koji INOUE
This paper studies the static mapping of multiple applications on embedded many-core SoCs. The mapping techniques proposed in this paper take into account both inter-application and intra-application parallelism in order to fully utilize the potential parallelism of the many-core architecture. Two approaches are proposed for static mapping: one approach is based on integer linear programming and the other is based on a greedy algorithm. Experiments show the effectiveness of the proposed techniques.
Eko Fajar NURPRASETYO Akihiko INOUE Hiroyuki TOMIYAMA Hiroto YASUURA
In the design of an embedded system, an architecture of core processor strongly affects the performance and cost of the total system. This paper discusses a scalable processor architecture, called soft-core processor, which can be tuned for a target system. System designers can optimize several design parameters such as the datapath width and instruction set, and generate customized processors for their application. Design of Bung-DLX as a prototype of soft-core processor is presented in this paper. An experiment of system design using our processor has shown that the optimized processor chip area halves when the critical path delay is reduced to one third of the original one.
Shan DING Hiroyuki TOMIYAMA Hiroaki TAKADA
An advanced communication system, the FlexRay system, has been developed for future automotive applications. It consists of time-triggered clusters, such as drive-by-wire in cars, in order to meet different requirements and constraints between various sensors, processors, and actuators. In this paper, an approach to static scheduling for FlexRay systems is proposed. Our experimental results show that the proposed scheduling method significantly reduces up to 36.3% of the network traffic compared with a past approach.
Energy consumption is one of the most critical constraints in the design of portable embedded systems. This paper describes an empirical study about the impacts of compiler optimizations on the energy consumption of the address bus between processor and instruction memory. Experiments using a number of real-world applications are presented, and the results show that transitions on the instruction address bus can be significantly reduced (by 85% on the average) by the compiler optimizations together with bus encoding.
Yining XU Ittetsu TANIGUCHI Hiroyuki TOMIYAMA
Task mapping is one of the most important design processes in embedded manycore systems. This paper proposes a static task mapping technique for manycore real-time systems. The technique minimizes the number of cores while satisfying deadline constraints of individual tasks.
Ittetsu TANIGUCHI Kohei AOKI Hiroyuki TOMIYAMA Praveen RAGHAVAN Francky CATTHOOR Masahiro FUKUI
A fast and accurate architecture exploration for high performance and low energy VLIW data-path is proposed. The main contribution is a method to find Pareto optimal FU structures, i.e., the optimal number of FUs and the best instruction assignment for each FU. The proposed architecture exploration method is based on GA and enables the effective exploration of vast solution space. Experimental results showed that proposed method was able to achieve fast and accurate architecture exploration. For most cases, the estimation error was less than 1%.
Hiroyuki TOMIYAMA Hiroaki TAKADA Nikil D. DUTT
Energy consumption has become one of the most critical constraints in the design of portable multimedia systems. For media applications, address buses between processor and data memory consume a considerable amount of energy due to their large capacitance and frequent accesses. This paper studies impacts of memory data organization on the address bus energy. Our experiments show that the address bus activity is significantly reduced by 50% through exploring memory data organization and encoding address buses.
Yuko HARA Hiroyuki TOMIYAMA Shinya HONDA Hiroaki TAKADA
A novel method to efficiently synthesize hardware from a large behavioral description in behavioral synthesis is proposed. For a program with functions executable in parallel, this proposed method determines a behavioral partitioning which simultaneously minimizes the overall datapath area and the complexity of the controller while maximizing performance of a synthesized circuit by fully exploiting function-level parallelism of a behavioral description. This method is formulated as an integer programming problem. Experimental results demonstrate that this method leads to a shift of the explorable design space so that superior solutions which could not be explored by earlier work are included, showing the effectiveness of our proposed method.
Yuko HARA Hiroyuki TOMIYAMA Shinya HONDA Hiroaki TAKADA Katsuya ISHII
This paper proposes a behavioral level partitioning method for efficient behavioral synthesis from a large sequential program consisting of a set of functions. Our method optimally determines functions to be inlined into the main module and the other functions to be synthesized into sub modules in such a way that the overall datapath is minimized while the complexity of individual modules is lower than a certain level. The partitioning problem is formulated as an integer programming problem. Experimental results show the effectiveness of the proposed method.
Hiroki NISHIKAWA Kana SHIMADA Ittetsu TANIGUCHI Hiroyuki TOMIYAMA
With the demand for energy-efficient and high- performance computing, multicore architecture has become more appealing than ever. Multicore task scheduling is one of domains in parallel computing which exploits the parallelism of multicore. Unlike traditional scheduling, multicore task scheduling has recently been studied on the assumption that tasks have inherent parallelism and can be split into multiple sub-tasks in data parallel fashion. However, it is still challenging to properly determine the degree of parallelism of tasks and mapping on multicores. Our proposed scheduling techniques determine the degree of parallelism of tasks, and sub-tasks are decided which type of cores to be assigned to heterogeneous multicores. In addition, two approaches to hardware/software codesign for heterogeneous multicore systems are proposed. The works optimize the types of cores organized in the architecture simultaneously with scheduling of the tasks such that the overall energy consumption is minimized under a deadline constraint, a warm start approach is also presented to effectively solve the problem. The experimental results show the simultaneous scheduling and core-type optimization technique remarkably reduces the energy consumption.
Barry SHACKLEFORD Mitsuhiro YASUDA Etsuko OKUSHI Hisao KOIZUMI Hiroyuki TOMIYAMA Hiroto YASUURA
Entire systems on a chip (SOCs) embodying a processor, memory, and system-specific peripheral hardware are now an everyday reality. The current generation of SOC designers are driven more than ever by the need to lower chip cost, while at the same time being faced with demands to get designs to market more quickly. It was to support this new community of designers that we developed Satsuki-an integrated processor synthesis and compiler generation system. By allowing the designer to tune the processor design to the bitwidth and performance required by the application, minimum cost designs are achieved. Using synthesis to implement the processor in the same technology as the rest of the chip, allows for global chip optimization from the perspective of the system as a whole and assures design portability. The integral compiler generator, driven by the same parameters used for processor synthesis, promotes high-level expression of application algorithms while at the same time isolating the application software from the processor implementation. Synthesis experiments incorporating a 0.8 micron CMOS gate array have produced designs ranging from a 45 MHz, 1,500 gate, 8-bit processor with a 4-word register file to a 31 MHz, 9,800 gate, 32-bit processor with a 16-word register file.
Yining XU Yang LIU Junya KAIDA Ittetsu TANIGUCHI Hiroyuki TOMIYAMA
This paper proposes a static application mapping technique, based on integer linear programming, for non-hierarchical manycore embedded systems. Unlike previous work which was designed for hierarchical manycore SoCs, this work allows more flexible application mapping to achieve higher performance. The experimental results show the effectiveness of this work.
Tetsuo YOKOYAMA Gang ZENG Hiroyuki TOMIYAMA Hiroaki TAKADA
The principles for good design of battery-aware voltage scheduling algorithms for both aperiodic and periodic task sets on dynamic voltage scaling (DVS) systems are presented. The proposed algorithms are based on greedy heuristics suggested by several battery characteristics and Lagrange multipliers. To construct the proposed algorithms, we use the battery characteristics in the early stage of scheduling more properly. As a consequence, the proposed algorithms show superior results on synthetic examples of periodic and aperiodic tasks from the task sets which are excerpted from the comparative work, on uni- and multi-processor platforms, respectively. In particular, for some large task sets, the proposed algorithms enable previously unschedulable task sets due to battery exhaustion to be schedulable.
Hideki TAKASE Gang ZENG Lovic GAUTHIER Hirotaka KAWASHIMA Noritoshi ATSUMI Tomohiro TATEMATSU Yoshitake KOBAYASHI Takenori KOSHIRO Tohru ISHIHARA Hiroyuki TOMIYAMA Hiroaki TAKADA
This paper presents a framework for reducing the energy consumption of embedded real-time systems. We implemented the presented framework as both an optimization toolchain and an energy-aware real-time operating system. The framework consists of the integration of multiple techniques to optimize the energy consumption. The main idea behind our approach is to utilize trade-offs between the energy consumption and the performance of different processor configurations during task checkpoints, and to maintain memory allocation during task context switches. In our framework, a target application is statically analyzed at both intra-task and inter-task levels. Based on these analyzed results, runtime optimization is performed in response to the behavior of the application. A case study shows that our toolchain and real-time operating systems have achieved energy reduction while satisfying the real-time performance. The toolchain has also been successfully applied to a practical application.
The unpredictable behavior of cache memory makes it difficult to statically analyze the worst-case performance of real-time systems. This problem is further exacerbated in the case of preemptive multitask systems because of inter-task cache interference, called Cache-Related Preemption Delay (CRPD). This paper proposes an approach to analyzing the tight upper bound on CRPD which a task might impose on lower-priority tasks. Our method finds the program execution path which requires the maximum number of cache blocks using an integer linear programming technique. Experimental results show that our approach provides up to 69% tighter bounds on CRPD than a conservative approach.
Kana SHIMADA Shogo KITANO Ittetsu TANIGUCHI Hiroyuki TOMIYAMA
Task scheduling is one of the most important processes in the design of multicore computing systems. This paper presents a technique for scheduling of malleable tasks. Our scheduling technique decides not only the execution order of the tasks but also the number of cores assigned to the individual tasks, simultaneously. We formulate the scheduling problem as an integer linear programming (ILP) problem, and the optimal schedule can be obtained by solving the ILP problem. Experiments using a standard task-set suite clarify the strength of this work.
Masanari NISHIMURA Nagisa ISHIURA Yoshiyuki ISHIMORI Hiroyuki KANBARA Hiroyuki TOMIYAMA
This letter presents a novel framework in high-level synthesis where hardware modules synthesized from functions in a given ANSI-C program can call the other software functions in the program. This enables high-level synthesis from C programs that contains calls to hard-to-synthesize functions, such as dynamic memory management, I/O request, or very large and complex functions. A single-thread implementation scheme is shown, whose correctness has been verified through register transfer level simulation.