IEICE global.ieice.org Site

Keyword Search Result

[Keyword] field programmable gate array (FPGA)(13hit)

1-13hit

Resource Efficient Top-K Sorter on FPGA
Binhao HE Meiting XUE Shubiao LIU Feng YU Weijie CHEN

LETTER-Digital Signal Processing

Pubricized:
2022/03/02
Vol:
E105-A No:9
Page(s):
1372-1376
The top-K sorting is a variant of sorting used heavily in applications such as database management systems. Recently, the use of field programmable gate arrays (FPGAs) to accelerate sorting operation has attracted the interest of researchers. However, existing hardware top-K sorting algorithms are either resource-intensive or of low throughput. In this paper, we present a resource-efficient top-K sorting architecture that is composed of L cascading sorting units, and each sorting unit is composed of P sorting cells. K=PL largest elements are produced when a variable length input sequence is processed. This architecture can operate at a high frequency while consuming fewer resources. The experimental results show that our architecture achieved a maximum 1.2x throughput-to-resource improvement compared to previous studies.
Reconfigurable Out-of-Order System for Fluid Dynamics Computation Using Unstructured Mesh
Takayuki AKAMINE Mohamad Sofian ABU TALIP Yasunori OSANA Naoyuki FUJITA Hideharu AMANO

PAPER-Computer System

Vol:
E97-D No:5
Page(s):
1225-1234
Computational fluid dynamics (CFD) is an important tool for designing aircraft components. FaSTAR (Fast Aerodynamics Routines) is one of the most recent CFD packages and has various subroutines. However, its irregular and complicated data structure makes it difficult to execute FaSTAR on parallel machines due to memory access problem. The use of a reconfigurable platform based on field programmable gate arrays (FPGAs) is a promising approach to accelerating memory-bottlenecked applications like FaSTAR. However, even with hardware execution, a large number of pipeline stalls can occur due to read-after-write (RAW) data hazards. Moreover, it is difficult to predict when such stalls will occur because of the unstructured mesh used in FaSTAR. To eliminate this problem, we developed an out-of-order mechanism for permuting the data order so as to prevent RAW hazards. It uses an execution monitor and a wait buffer. The former identifies the state of the computation units, and the latter temporarily stores data to be processed in the computation units. This out-of-order mechanism can be applied to various types of computations with data dependency by changing the number of execution monitors and wait buffers in accordance with the equations used in the target computation. An out-of-order system can be reconfigured by automatic changing of the parameters. Application of the proposed mechanism to five subroutines in FaSTAR showed that its use reduces the number of stalls to less than 1% compared to without the mechanism. In-order execution was speeded up 2.6-fold and software execution was speeded up 2.9-fold using an Intel Core 2 Duo processor with a reasonable amount of overhead.
Partial Reconfiguration of Flux Limiter Functions in MUSCL Scheme Using FPGA
Mohamad Sofian ABU TALIP Takayuki AKAMINE Yasunori OSANA Naoyuki FUJITA Hideharu AMANO

PAPER-Computer System

Vol:
E95-D No:10
Page(s):
2369-2376
Computational Fluid Dynamics (CFD) is used as a common design tool in the aerospace industry. UPACS, a package for CFD, is convenient for users, since a customized simulator can be built just by selecting desired functions. The problem is its computation speed, which is difficult to enhance by using the clusters due to its complex memory access patterns. As an economical solution, accelerators using FPGAs are hopeful candidate. However, the total scale of UPACS is too large to be implemented on small numbers of FPGAs. For cost efficient implementation, partial reconfiguration which dynamically loads only required functions is proposed in this paper. Here, the MUSCL scheme, which is used frequently in UPACS, is selected as a target. Partial reconfiguration is applied to the flux limiter functions (FLF) in MUSCL. Four FLFs are implemented for Turbulence MUSCL (TMUSCL) and eight FLFs are for Convection MUSCL (CMUSCL). All FLFs are developed independently and separated from the top MUSCL module. At start-up, only required FLFs are selected and deployed in the system without interfering the other modules. This implementation has successfully reduced the resource utilization by 44% to 63%. Total power consumption also reduced by 33%. Configuration speed is improved by 34-times faster as compared to full reconfiguration method. All implemented functions achieved at least 17 times speed-up performance compared with the software implementation.
FPS-RAM: Fast Prefix Search RAM-Based Hardware for Forwarding Engine
Kazuya ZAITSU Koji YAMAMOTO Yasuto KURODA Kazunari INOUE Shingo ATA Ikuo OKA

PAPER-Network System

Vol:
E95-B No:7
Page(s):
2306-2314
Ternary content addressable memory (TCAM) is becoming very popular for designing high-throughput forwarding engines on routers. However, TCAM has potential problems in terms of hardware and power costs, which limits its ability to deploy large amounts of capacity in IP routers. In this paper, we propose new hardware architecture for fast forwarding engines, called fast prefix search RAM-based hardware (FPS-RAM). We designed FPS-RAM hardware with the intent of maintaining the same search performance and physical user interface as TCAM because our objective is to replace the TCAM in the market. Our RAM-based hardware architecture is completely different from that of TCAM and has dramatically reduced the costs and power consumption to 62% and 52%, respectively. We implemented FPS-RAM on an FPGA to examine its lookup operation.
Hardware Implementation of an Inverse Function Delayed Neural Network Using Stochastic Logic
Hongge LI Yoshihiro HAYAKAWA Shigeo SATO Koji NAKAJIMA

PAPER-Biocybernetics, Neurocomputing

Vol:
E89-D No:9
Page(s):
2572-2578
In this paper, the authors present a new digital circuit of neuron hardware using a field programmable gate array (FPGA). A new Inverse function Delayed (ID) neuron model is implemented. The Inverse function Delayed model, which includes the BVP model, has superior associative properties thanks to negative resistance. An associative memory based on the ID model with self-connections has possibilities of improving its basin sizes and memory capacity. In order to decrease circuit area, we employ stochastic logic. The proposed neuron circuit completes the stimulus response output, and its retrieval property with negative resistance is superior to a conventional nonlinear model in basin size of an associative memory.
Prototype Implementation of Real-Time ML Detectors for Spatial Multiplexing Transmission
Toshiaki KOIKE Yukinaga SEKI Hidekazu MURATA Susumu YOSHIDA Kiyomichi ARAKI

PAPER-Wireless Communication Technologies

Vol:
E89-B No:3
Page(s):
845-852
We developed two types of practical maximum-likelihood detectors (MLD) for multiple-input multiple-output (MIMO) systems, using a field programmable gate array (FPGA) device. For implementations, we introduced two simplified metrics called a Manhattan metric and a correlation metric. Using the Manhattan metric, the detector needs no multiplication operations, at the cost of a slight performance degradation within 1 dB. Using the correlation metric, the MIMO-MLD can significantly reduce the complexity in both multiplications and additions without any performance degradation. This paper demonstrates the bit-error-rate performance of these MLD prototypes at a 1 Gbps-order real-time processing speed, through the use of an all-digital baseband 44 MIMO testbed integrated on the same FPGA chip.
SPFD-Based Flexible Transformation of LUT-Based FPGA Circuits
Katsunori TANAKA Shigeru YAMASHITA Yahiko KAMBAYASHI

PAPER-VLSI Design Technology and CAD

Vol:
E88-A No:4
Page(s):
1038-1046
In this paper, we present the condition for the effective wire addition in Look-Up-Table-based (LUT-based) field programmable gate array (FPGA) circuits, and an optimization procedure utilizing the effective wire addition. Each wire has different characteristics, such as delay and power dissipation. Therefore, the replacement of one critical wire for the circuit performance with many non-critical ones, i.e., many-addition-for-one-removal (m-for-1) is sufficiently useful. However, the conventional logic optimization methods based on sets of pairs of functions to be distinguished (SPFDs) for LUT-based FPGA circuits do not make use of the m-for-1 manipulation, and perform only simple replacement and removal, i.e., the one-addition-for-one-removal (1-for-1) manipulation and the no-addition-for-one-removal (0-for-1) manipulation, respectively. Since each LUT can realize an arbitrary internal function with respect to a specified number of input variables, there is no sufficient condition at the logic design level for simple wire addition. Moreover, in general, simple addition of a wire has no effects for removal of another wire, and it is important to derive the condition for non-simple and effective wire addition. We found the SPFD-based condition that wire addition is likely to make another wire redundant or replaceable, and developed an optimization procedure utilizing this effective wire addition. According to the experimental results, when we focused on the delay reduction of LUT-based FPGA circuits, our method reduced the delay by 24.2% from the initial circuits, while the conventional SPFD-based logic optimization and the enhanced global rewiring reduced it by 14.2% and 18.0%, respectively. Thus, our method presented in this paper is sufficiently practical, and is expected to improve the circuit performance.
Secure Download System Based on Software Defined Radio Composed of FPGAs
Hironori UCHIKAWA Kenta UMEBAYASHI Ryuji KOHNO

PAPER

Vol:
E85-B No:12
Page(s):
2601-2609
In this paper, we focus attention on the development of security techniques using software defined radio (SDR) technologies. We propose a new secure download system which uses the characteristics of the field programmable gate arrays (FPGAs) composing the SDR. The proposed system has the novelty that realization of high security encipherment is possible. This is achieved using the characteristic of FPGAs which allows systems to be arranged in a variety of different layouts, as well as by using the configuration information as the key. This unifies the renewal of the key and the encipherment. In addition the proposed system has the merit that it has high security against illegal acquisition such as a wiretapping, and can also be used in conjunction with any other current cipher algorithm. As an evaluation of the security, we show that the proposed system has high immunity to illegal acquisition of software using replay attack, by verification of the protocol as well as by numerical computation. The proposed system can therefore realize high security software downloads based on SDR.
Look Up Table Compaction Based on Folding of Logic Functions
Shinji KIMURA Atsushi ISHII Takashi HORIYAMA Masaki NAKANISHI Hirotsugu KAJIHARA Katsumasa WATANABE

PAPER-Logic Synthesis

Vol:
E85-A No:12
Page(s):
2701-2707
The paper describes the folding method of logic functions to reduce the size of memories to keep the functions. The folding is based on the relation of fractions of logic functions. If the logic function includes 2 or 3 same parts, then only one part should be kept and other parts can be omitted. We show that the logic function of 1-bit addition can be reduced to half size using the bit-wise NOT relation and the bit-wise OR relation. The paper also introduces 3-1 LUT's with the folding mechanism. A full adder can be implemented using only one 3-1 LUT with the folding. Multi-bit AND and OR operations can be mapped to our LUT's not using the extra cascading circuit but using the carry circuit for addition. We have also tested the mapping capability of 4 input functions to our 3-1 LUT's with folding and carry propagation mechanisms. We have shown the reduction of the area consumption when using our LUT's compared to the case using 4-1 LUT's on several benchmark circuits.
A Routability Driven Technology Mapping Algorithm for LUT Based FPGA Designs
Chi-Chou KAO Yen-Tai LAI

PAPER-FPGA Systhesis

Vol:
E84-A No:11
Page(s):
2690-2696
This paper presents a CAD technology mapping algorithm for LUT-based FPGAs. Since interconnections in an FPGA must be accomplished with limited routing resources, routability is the most important objective in a technology mapping algorithm. To optimize routability, the goal of the algorithm is the production of a design with a minimum interconnection. The Min-cut algorithm is first used to partition a graph representing a Boolean network into clusters so that the total number of interconnections between clusters is minimum. To decrease further the number of interconnections needed, clusters are then merged into larger clusters by a pairing technique. This algorithm has been tested on the MCNC benchmark circuits. Compared with other LUT-based FPGA mapping algorithms, the algorithm produces better routability characteristics.
A General Framework to Use Various Decomposition Methods for LUT Network Synthesis
Shigeru YAMASHITA Hiroshi SAWADA Akira NAGOYA

PAPER-VLSI Design Technology and CAD

Vol:
E84-A No:11
Page(s):
2915-2922
This paper presents a new framework for synthesizing look-up table (LUT) networks. Some of the existing LUT network synthesis methods are based on one or two functional (Boolean) decompositions. Our method also uses functional decompositions, but we try to use various decomposition methods, which include algebraic decompositions. Therefore, this method can be thought of as a general framework for synthesizing LUT networks by integrating various decomposition methods. We use a cost database file which is a unique characteristic in our method. We also present comparisons between our method and some well-known LUT network synthesis methods, and evaluate the final results after placement and routing. Although our method is rather heuristic in nature, the experimental results are encouraging.
Defect and Fault Tolerance SRAM-Based FPGAs by Shifting the Configuration Data
Abderrahim DOUMAR Hideo ITO

PAPER-Fault Tolerance

Vol:
E83-D No:5
Page(s):
1104-1115
The homogeneous structure of field programmable gate arrays (FPGAs) suggests that the defect tolerance can be achieved by shifting the configuration data inside the FPGA. This paper proposes a new approach for tolerating the defects in FPGA's configurable logic blocks (CLBs). The defects affecting the FPGA's interconnection resources can also be tolerated with a high probability. This method is suited for the makers, since the yield of the chip is considerably improved, specially for large sizes. On the other hand, defect-free chips can be used as either maximum size, ordinary array chips or fault tolerant chips. In the fault tolerant chips, the users will be able to achieve directly the fault tolerance by only shifting the design data automatically, without changing the physical design of the running application, without loading other configurations data from the off-chip FPGA, and without the intervention of the company. For tolerating defective resources, the use of spare CLBs is required. In this paper, two possibilities for distributing the spare resources (king-shifting and Horse-allocation) are introduced and compared.
Fast Testable Design for SRAM-Based FPGAs
Abderrahim DOUMAR Toshiaki OHMAMEUDA Hideo ITO

PAPER-Fault Tolerance

Vol:
E83-D No:5
Page(s):
1116-1127
This paper presents a new design for testing SRAM-based field programmable gate arrays (FPGAs). The original FPGA's SRAM memory is modified so that the FPGA may have the facility to loop the testing configuration data inside the chip. The full testing of the FPGA is achieved by loading typically only one carefully chosen testing configuration data instead of the whole configurations data. The other required configurations data are obtained by shifting the first one inside the chip. As a result, the test becomes faster. This method does not need a large off-chip memory for the test. The evaluation results prove that this method is very effective when the complexity of the configurable blocks (CLBs) or the chip size increases.