1-5hit |
Takayuki AKAMINE Mohamad Sofian ABU TALIP Yasunori OSANA Naoyuki FUJITA Hideharu AMANO
Computational fluid dynamics (CFD) is an important tool for designing aircraft components. FaSTAR (Fast Aerodynamics Routines) is one of the most recent CFD packages and has various subroutines. However, its irregular and complicated data structure makes it difficult to execute FaSTAR on parallel machines due to memory access problem. The use of a reconfigurable platform based on field programmable gate arrays (FPGAs) is a promising approach to accelerating memory-bottlenecked applications like FaSTAR. However, even with hardware execution, a large number of pipeline stalls can occur due to read-after-write (RAW) data hazards. Moreover, it is difficult to predict when such stalls will occur because of the unstructured mesh used in FaSTAR. To eliminate this problem, we developed an out-of-order mechanism for permuting the data order so as to prevent RAW hazards. It uses an execution monitor and a wait buffer. The former identifies the state of the computation units, and the latter temporarily stores data to be processed in the computation units. This out-of-order mechanism can be applied to various types of computations with data dependency by changing the number of execution monitors and wait buffers in accordance with the equations used in the target computation. An out-of-order system can be reconfigured by automatic changing of the parameters. Application of the proposed mechanism to five subroutines in FaSTAR showed that its use reduces the number of stalls to less than 1% compared to without the mechanism. In-order execution was speeded up 2.6-fold and software execution was speeded up 2.9-fold using an Intel Core 2 Duo processor with a reasonable amount of overhead.
Mohamad Sofian ABU TALIP Takayuki AKAMINE Yasunori OSANA Naoyuki FUJITA Hideharu AMANO
Computational Fluid Dynamics (CFD) is used as a common design tool in the aerospace industry. UPACS, a package for CFD, is convenient for users, since a customized simulator can be built just by selecting desired functions. The problem is its computation speed, which is difficult to enhance by using the clusters due to its complex memory access patterns. As an economical solution, accelerators using FPGAs are hopeful candidate. However, the total scale of UPACS is too large to be implemented on small numbers of FPGAs. For cost efficient implementation, partial reconfiguration which dynamically loads only required functions is proposed in this paper. Here, the MUSCL scheme, which is used frequently in UPACS, is selected as a target. Partial reconfiguration is applied to the flux limiter functions (FLF) in MUSCL. Four FLFs are implemented for Turbulence MUSCL (TMUSCL) and eight FLFs are for Convection MUSCL (CMUSCL). All FLFs are developed independently and separated from the top MUSCL module. At start-up, only required FLFs are selected and deployed in the system without interfering the other modules. This implementation has successfully reduced the resource utilization by 44% to 63%. Total power consumption also reduced by 33%. Configuration speed is improved by 34-times faster as compared to full reconfiguration method. All implemented functions achieved at least 17 times speed-up performance compared with the software implementation.
Hideyuki ITO Ryusuke KONISHI Hiroshi NAKADA Hideyuki TSUBOI Yuichi OKUYAMA Akira NAGOYA
Design points and the results seen in the development of a dynamically reconfigurable logic LSI, PCA-2, are described. PCA-2 enables the realization of flexible parallel processing based on the autonomous reconfiguration of logic circuits. To realize this feature, we introduce an asynchronous circuit design and a homogeneous cell array structure. PCA-2 represents an advance on the earlier LSI, PCA-1. Cutting edge CMOS technology is used to realize the structural merits of PCA hardware. Compared to PCA-1, PCA-2 offers 16 times greater integration level for programmable logic. Due to miniaturization and design refinement, PCA-2 provides a 6-fold increase in the circuit frequency of the configuration controller and a 3-fold increase in the operating frequency of the programmable logic. The results gained confirm the effects of refinement and the suitability of our architecture for device miniaturization.
Isamu KAJITANI Masaya IWATA Nobuyuki OTSU Tetsuya HIGUCHI
This paper presents a new reconfigurable hardware paradigm, called evolvable hardware (EHW), and its application to the biomedical engineering problem of an artificial hand controller. Evolvable hardware is based on the idea of combining a reconfigurable hardware device with an artificial intelligence robust search technique called genetic algorithms (GAs) to execute reconfiguration autonomously. The first version of the EHW chip was designed in 1998, and this paper describes the latest improvements to the EHW chip, as well as outlining its architecture and the hardware implementation of the GA operations. Execution speed for genetic operations is shown to be about 38.7 times faster with the hardware implementation than with software program running on an AMD Athlon processor (1.2GHz). As an application of the EHW chip, this paper introduces a controller for a multi-functional prosthetic-hand, and presents experimental data in which a practical myoelectric pattern classification rate of 97.8% was achieved through the application of the EHW chip.
Hidehisa NAGANO Akihiro MATSUURA Akira NAGOYA
This paper proposes a method for implementing a metric computation accelerator for fractal image compression using reconfigurable hardware. The most time-consuming part in the encoding of this compression is computation of metrics among image blocks. In our method, each processing element (PE) configured for an image block accelerates these computations by pipeline processing. Furthermore, by configuring the PE for a specific image block, we can reduce the number of adders, which are the main computing elements, by a half even in the worst case.