IEICE global.ieice.org Site

Keyword Search Result

[Keyword] fpga(330hit)

181-200hit(330hit)

Accelerating Boolean Matching Using Bloom Filter
Chun ZHANG Yu HU Lingli WANG Lei HE Jiarong TONG

PAPER-VLSI Design Technology and CAD

Vol:
E93-A No:10
Page(s):
1775-1781
Boolean matching is a fundamental problem in FPGA synthesis, but existing Boolean matchers are not scalable to complex PLBs (programmable logic blocks) and large circuits. This paper proposes a filter-based Boolean matching method, F-BM, which accelerates Boolean matching using lookup tables implemented by Bloom filters storing pre-calculated matching results. To show the effectiveness of the proposed F-BM, a post-mapping re-synthesis minimizing area which employs Boolean matching as the kernel has been implemented. Tested on a broad selection of benchmarks, the re-synthesizer using F-BM is 80X faster with 0.5% more area, compared with the one using a SAT-based Boolean matcher.
FPGA Implementation of STBC Based Cooperative Relaying System
Hidekazu MURATA Yuji OISHI Koji YAMAMOTO Susumu YOSHIDA

PAPER

Vol:
E93-B No:8
Page(s):
1988-1992
Multihop network is an approach utilizing distributed wireless stations for relaying. In this system, area size, coverage and total transmit power efficiency can be improved. It is shown by computer simulations that the cooperative relaying scheme provides transmit diversity effect, and can offer much better performance compared with that of non-cooperation case. To confirm this superior performance in actual environments, field trials using real time communication equipments are now being planned. This paper reports the design and the performance of wireless equipments for field trials.
Theoretical and Heuristic Synthesis of Digital Spiking Neurons for Spike-Pattern-Division Multiplexing
Tetsuro IGUCHI Akira HIRATA Hiroyuki TORIKAI

PAPER-Nonlinear Problems

Vol:
E93-A No:8
Page(s):
1486-1496
A digital spiking neuron is a wired system of shift registers that can generate spike-trains having various spike patterns by adjusting the wiring pattern between the registers. Inspired by the ultra-wideband impulse radio, a novel theoretical synthesis method of the neuron for application to spike-pattern division multiplex communications in an artificial pulse-coupled neural network is presented. Also, a novel heuristic learning algorithm of the neuron for realization of better communication performances is presented. In addition, fundamental comparisons to existing impulse radio sequence design methods are given.
A Low Power and High Throughput Self Synchronous FPGA Using 65 nm CMOS with Throughput Optimization by Pipeline Alignment
Benjamin STEFAN DEVLIN Toru NAKURA Makoto IKEDA Kunihiro ASADA

PAPER-VLSI Design Technology and CAD

Vol:
E93-A No:7
Page(s):
1319-1328
We detail a self synchronous field programmable gate array (SSFPGA) with dual-pipeline (DP) architecture to conceal pre-charge time for dynamic logic, and its throughput optimization by using pipeline alignment implemented on benchmark circuits. A self synchronous LUT (SSLUT) consists of a three input tree-type structure with 8 bits of SRAM for programming. A self synchronous switch box (SSSB) consists of both pass transistors and buffers to route signals, with 12 bits of SRAM. One common block with one SSLUT and one SSSB occupies 2.2 Mλ2 area with 35 bits of SRAM, and the prototype SSFPGA with 3430 (1020) blocks is designed and fabricated using 65 nm CMOS. Measured results show at 1.2 V 430 MHz and 647 MHz operation for a 3 bit ripple carry adder, without and with throughput optimization, respectively. We find that using the proposed pipeline alignment techniques we can perform at maximum throughput of 647 MHz in various benchmarks on the SSFPGA. We demonstrate up to 56.1 times throughput improvement with our pipeline alignment techniques. The pipeline alignment is carried out within the number of logic elements in the array and pipeline buffers in the switching matrix.
A High Throughput Medium Access Control Implementation Based on IEEE 802.11e Standard
Min Li HUANG Jin LEE Hendra SETIAWAN Hiroshi OCHI Sin-Chong PARK

PAPER-Terrestrial Radio Communications

Vol:
E93-B No:4
Page(s):
948-960
With the growing demand for high-performance multimedia applications over wireless channels, we need to develop a Medium Access Control (MAC) system that supports high throughput and quality of service enhancements. This paper presents the standard analysis, design architecture and design issues leading to the implementation of an IEEE 802.11e based MAC system that supports MAC throughput of over 100 Mbps. In order to meet the MAC layer timing constraints, a hardware/software co-design approach is adopted. The proposed MAC architecture is implemented on the Xilinx Virtex-II Pro Field-Programmable Gate Array (FPGA) (XC2VP70-5FF1704C) prototype, and connected to a host computer through an external Universal Serial Bus (USB) interface. The total FPGA resource utilization is 11,508 out of 33,088 (34%) available slices. The measured MAC throughput is 100.7 Mbps and 109.2 Mbps for voice and video access categories, transmitted at a data rate of 260 Mbps based on IEEE 802.11n Physical Layer (PHY), using the contention-based hybrid coordination function channel access mechanism.
A Fast and Memory Efficient SPIHT Image Encoder
Zhong-Ho CHEN Alvin W. Y. SU

PAPER-Image Processing and Video Processing

Vol:
E93-D No:3
Page(s):
602-610
- HTML
- PDF(639.4KB) >> Buy this Article
- Errata[Uploaded on May 1,2010]
Set-partitioning in hierarchical trees (SPIHT) is one of the well-known image compression schemes. SPIHT offers an agreeable compression ratio and produces an embedded bit-stream for progressive transmission. However, the major disadvantage of SPIHT is its large memory requirement. In this paper, we propose a memory efficient SPIHT image coder and its parallel implantation. The memory requirement is reduced without sacrificing image quality. All bit-planes are concurrently encoded in order to speed up the entire coding flow. The result shows that the proposed algorithm is roughly 6 times faster than the original SPIHT. For a 512512 image, the memory requirement is reduced from 5.83 Mb to 491 Kb. The proposed algorithm is also realized on FPGA. With pipeline design, the circuit can run at 110 MHz, which can encode a 512512 image in 1.438 ms. Thus, the circuit achieves very high throughput, 182 MPixels/sec, and can be applied to high performance image compression applications.
Mapping Parallel FFT Algorithm onto SmartCell Coarse-Grained Reconfigurable Architecture
Cao LIANG Xinming HUANG

PAPER-Integrated Electronics

Vol:
E93-C No:3
Page(s):
407-415
Fast Fourier Transform (FFT) is an important algorithm in many digital signal processing applications, and it often requires parallel implementation for high throughput. In this paper, we first present the SmartCell coarse-grained reconfigurable architecture targeted for stream processing. A SmartCell prototype integrates 64 processing elements, configurable interconnections, and dedicated instruction and data memories into a single chip, which is able to provide high performance parallel processing while maintaining post-fabrication flexibility. Subsequently, we present a parallel FFT architecture targeted for multi-core platforms computing systems. This algorithm provides an optimized data flow pattern that reduces both communication and configuration overheads. The proposed parallel FFT algorithm is then mapped onto the SmartCell prototype device. Results show that the parallel FFT implementation on SmartCell is about 14.9 and 2.7 times faster than network-on-chip (NoC) and MorphoSys implementations, respectively. SmartCell also achieves the energy efficiency gains of 2.1 and 28.9 when compared with FPGA and DSP implementations.
High-Speed Passphrase Search System for PGP
Koichi SHIMIZU Daisuke SUZUKI Toyohiro TSURUMARU

PAPER-Application

Vol:
E93-A No:1
Page(s):
202-209
We propose an FPGA-based high-speed search system for cryptosystems that employ a passphrase-based security scheme. We first choose PGP as an example of such cryptosystems, clear several hurdles for high throughputs and manage to develop a high-speed search system for it. As a result we achieve a throughput of 1.1 105 passphrases per second, which is 38 times the speed of the fastest software. Furthermore we can do many flexible passphrase generations in addition to a simple brute force one because we assign the passphrase generation operation to software. In fact we implement a brute force and a dictionary-based ones, and get the same maximum throughput as above in both cases. We next consider the speed of passphrase generation in order to apply our system to other cryptosystems than PGP, and implement a hardware passphrase generator to achieve higher throughputs. In the PGP case, the very heavy iteration of hashing, 1025 times in our case, lowers the total throughput linearly, and makes the figure 1.1 105 suffice. In other cases without any such iteration structure, we have to generate even more passphrases, for example 108 per second. That can easily exceed the generation speed that software can offer and thus we conclude that it is now necessary to place the passphrase generation in hardware instead of in software.
Efficient Cut Enumeration Heuristics for Depth-Optimum Technology Mapping for LUT-Based FPGAs
Taiga TAKATA Yusuke MATSUNAGA

PAPER-Embedded, Real-Time and Reconfigurable Systems

Vol:
E92-A No:12
Page(s):
3268-3275
Recent technology mappers for LUT based FPGAs employ cut enumeration. Although many cuts are often needed to find a good network, enumerating all the cuts with large size consumes a lot of run-time. Existing algorithms employ the bottom-up merging which calculates Cartesian products of the fanins' cuts for each node. The number of cuts is much smaller than the size of the Cartesian products in most cases. Thus, the existing algorithms are inefficient. Furthermore, the number of cuts exponentially increases with the size of cuts, that makes the run-time much longer. Several algorithms to enumerate not all the cuts but partial cuts have been presented, but they tend to disturb the quality of networks. This paper presents two algorithms to enumerate cuts; an exhaustive enumeration and a partial enumeration. Both of them are efficient because they do not employ the bottom-up merging. The partial enumeration reduces the number of enumerated cuts with a guarantee that a depth-minimum network can be constructed. The experimental results show that the exhaustive enumeration runs about 5 and 13 times faster than the existing bottom-up algorithm for K=8, 9 respectively, while keeping the same results. On the other hand, the partial enumeration runs about 9 and 29 times faster than the existing algorithm for K = 8, 9, respectively. The average area of networks derived by the sets of cuts enumerated by the partial enumeration is only 4% larger than that derived with using all the cuts, and the depth is the same.
Pipelining a Multi-Mode SHA-384/512 Core with High Area Performance Rate
Anh-Tuan HOANG Katsuhiro YAMAZAKI Shigeru OYANAGI

PAPER-VLSI Systems

Vol:
E92-D No:10
Page(s):
2034-2042
The security hash algorithm 512 (SHA-512), which is used to verify the integrity of a message, involves computational iterations on data. The huge computation delay generated in such iterations limits the entire throughput of the system and makes it difficult to pipeline the computation. We describe a way to pipeline the computation using fine-grained pipelining with balanced critical paths. In this method, one critical path is broken into two stages by using data forwarding. The other critical path is broken into three stages by using computation postponement. The resulting critical paths all have two adder-layers with some data movements, and thus are balanced. In addition, the method also allows register reduction. Also, the similarity in SHA-384 and SHA-512 are used for a multi-mode design, which can generate a message digest for both versions with the same throughput, but with only a small increase in hardware size. Experimental results show that our implementation achieved not only the best area performance rate (throughput divided by area), but also a higher throughput than almost all related work.
PAMELA: Pattern Matching Engine with Limited-Time Update for NIDS/NIPS
Tran Ngoc THINH Surin KITTITORNKUN Shigenori TOMIYAMA

PAPER-VLSI Systems

Vol:
E92-D No:5
Page(s):
1049-1061
Several hardware-based pattern matching engines for network intrusion/prevention detection systems (NIDS/NIPSs) can achieve high throughput with less hardware resources. However, their flexibility to update new patterns is limited and still challenging. This paper describes a PAttern Matching Engine with Limited-time updAte (PAMELA) engine using a recently proposed hashing algorithm called Cuckoo Hashing. PAMELA features on-the-fly pattern updates without reconfiguration, more efficient hardware utilization, and higher performance compared with other works. First, we implement the improved parallel exact pattern matching with arbitrary length based on Cuckoo Hashing and linked-list technique. Second, while PAMELA is being updated with new attack patterns, both stack and FIFO are utilized to bound insertion time due to the drawback of Cuckoo Hashing and to avoid interruption of input data stream. Third, we extend the system for multi-character processing to achieve higher throughput. Our engine can accommodate the latest Snort rule-set, an open source NIDS/NIPS, and achieve the throughput up to 8.8 Gigabit per second while consuming the lowest amount of hardware. Compared to other approaches, ours is far more efficient than any other implemented on Xilinx FPGA architectures.
Application-Dependent Interconnect Testing of Xilinx FPGAs Based on Line Branches Partitioning
Teng LIN Jianhua FENG Dunshan YU

LETTER-Dependable Computing

Vol:
E92-D No:5
Page(s):
1197-1199
A novel application-dependent interconnect testing scheme of Xilinx Field Programmable Gate Arrays (FPGAs) based on line branches partitioning is presented. The targeted line branches of the interconnects in FPGAs' Application Configurations (ACs) are partitioned into multiple subsets, so that they can be tested with compatible Configurable Logic Blocks (CLBs) configurations in multiple Test Configurations (TCs). Experimental results show that for ISCAS89 and ITC99 benchmarks, this scheme can obtain a stuck-at fault coverage higher than 99% in less than 11 TCs.
Implementation of a Partially Reconfigurable Multi-Context FPGA Based on Asynchronous Architecture
Hasitha Muthumala WAIDYASOORIYA Masanori HARIYAMA Michitaka KAMEYAMA

PAPER-Electronic Circuits

Vol:
E92-C No:4
Page(s):
539-549
This paper presents a novel architecture to increase the hardware utilization in multi-context field programmable gate arrays (MC-FPGAs). Conventional MC-FPGAs use dedicated tracks to transfer context-ID bits. As a result, hardware utilization ratio decreases, since it is very difficult to map different contexts, area efficiently. It also increases the context switching power, area and static power of the context-ID tracks. The proposed MC-FPGA uses the same wires to transfer both data and context-ID bits from cell to cell. As a result, programs can be mapped area efficiently by partitioning them into different contexts. An asynchronous multi-context logic block architecture to increase the processing speed of the multiple contexts is also proposed. The proposed MC-FPGA is fabricated using 6-metal 1-poly CMOS design rules. The data and context-ID transfer delays are measured to be 2.03ns and 2.26ns respectively. We achieved 30% processing time reduction for the SAD based correspondance search algorithm.
FPGA Implementation of Highly Modular Fast Universal Discrete Transforms
Panan POTIPANTONG Phaophak SIRISUK Soontorn ORAINTARA Apisak WORAPISHET

PAPER-Integrated Electronics

Vol:
E92-C No:4
Page(s):
576-586
This paper presents an FPGA implementation of highly modular universal discrete transforms. The implementation relies upon the unified discrete Fourier Hartley transform (UDFHT), based on which essential sinusoidal transforms including discrete Fourier transform (DFT), discrete Hartley transform (DHT), discrete cosine transform (DCT) and discrete sine transform (DST) can be realized. It employs a reconfigurable, scalable and modular architecture that consists of a memory-based FFT processor equipped with pre- and post-processing units. Besides, a pipelining technique is exploited to seamlessly harmonize the operation between each sub-module. Experimental results based on Xilinx Virtex-II Pro are given to examine the performance of the proposed UDFHT implementation. Two practical applications are also shown to demonstrate the flexibility and modularity of the proposed work.
A Link Removal Methodology for Application-Specific Networks-on-Chip on FPGAs
Daihan WANG Hiroki MATSUTANI Michihiro KOIBUCHI Hideharu AMANO

PAPER-VLSI Systems

Vol:
E92-D No:4
Page(s):
575-583
The regular 2-D mesh topology has been utilized for most of Network-on-Chips (NoCs) on FPGAs. Spatially biased traffic generated in some applications makes a customization method for removing links more efficient, since some links become low utilization. In this paper, a link removal strategy that customizes the router in NoC is proposed for reconfigurable systems in order to minimize the required hardware amount. Based on the pre-analyzed traffic information, links on which the communication amount is small are removed to reduce the hardware cost while maintaining adequate performance. Two policies are proposed to avoid deadlocks and they outperform up*/down* routing, which is a representative deadlock-free routing on irregular topology. In the case of the image recognition application susan, the proposed method can save 30% of the hardware amount without performance degradation.
Optimal Time-Multiplexing in Inter-FPGA Connections for Accelerating Multi-FPGA Prototyping Systems
Masato INAGI Yasuhiro TAKASHIMA Yuichi NAKAMURA Atsushi TAKAHASHI

PAPER-Logic Synthesis, Test and Verification

Vol:
E91-A No:12
Page(s):
3539-3547
In multi-FPGA prototyping systems for circuit verification, serialized time-multiplexed I/O technique is used because of the limited number of I/O pins of an FPGA. The verification time depends on a selection of inter-FPGA signals to be time-multiplexed. In this paper, we propose a method that minimizes the verification time of multi-FPGA systems by finding an optimal selection of inter-FPGA signals to be time-multiplexed. In the experiments, it is shown that the estimated verification time is improved 38.2% on average compared with conventional methods.
Autonomous Repair Fault Tolerant Dynamic Reconfigurable Device
Kentaro NAKAHARA Shin'ichi KOUYAMA Tomonori IZUMI Hiroyuki OCHI Yukihiro NAKAMURA

PAPER-Embedded, Real-Time and Reconfigurable Systems

Vol:
E91-A No:12
Page(s):
3612-3621
Recently, reconfigurable devices are widely used in the fields of small amount production and trial production. They are also expected to be utilized in such mission-critical fields as space development, because system update and pseudo-repair can be achieved remotely by reconfiguring. However, in the case of conventional reconfigurable devices, configuration memory upsets caused by radiation and alpha particles reconfigure the device unpredictably, resulting in fatal system failures. Therefore, a reconfigurable device with high fault-tolerance against configuration upsets is required. In this paper, we propose an architecture of a fault-tolerant reconfigurable device that autonomously repairs configuration upsets by itself without interrupting system operations. The device consists of a 2D array of "Autonomous-Repair Cells" each of which repairs its upsets autonomously. The architecture has a scalability in fault tolerance; a finer-grained Autonomous-Repair Cell provides higher fault-tolerance. To determine the architecture, we analyze four autonomous repair techniques of the cell experimentally. Then, two autonomous repair techniques, simple multiplexing (S.M.) and memory multiplexing (M.M.), are applied; the former to programmable logics and the latter to cell-to-cell routing resources. Through evaluation, we show that proposed device achieves more than 10 years average lifetime against configuration upsets even in a severe situation such as a satellite orbit.
Design and Implementation of a Non-pipelined MD5 Hardware Architecture Using a New Functional Description
Ignacio ALGREDO-BADILLO Claudia FEREGRINO-URIBE Rene CUMPLIDO Miguel MORALES-SANDOVAL

LETTER-VLSI Systems

Vol:
E91-D No:10
Page(s):
2519-2523
MD5 is a cryptographic algorithm used for authentication. When implemented in hardware, the performance is affected by the data dependency of the iterative compression function. In this paper, a new functional description is proposed with the aim of achieving higher throughput by mean of reducing the critical path and latency. This description can be used in similar structures of other hash algorithms, such as SHA-1, SHA-2 and RIPEMD-160, which have comparable data dependence. The proposed MD5 hardware architecture achieves a high throughput/area ratio, results of implementation in an FPGA are presented and discussed, as well as comparisons against related works.
Evaluation of a Field-Programmable VLSI Based on an Asynchronous Bit-Serial Architecture
Masanori HARIYAMA Shota ISHIHARA Michitaka KAMEYAMA

PAPER

Vol:
E91-C No:9
Page(s):
1419-1426
This paper presents a novel asynchronous architecture of Field-programmable gate arrays (FPGAs) to reduce the power consumption. In the dynamic power consumption of the conventional FPGAs, the power consumed by the switch blocks and clock distribution is dominant since FPGAs have complex switch blocks and the large number of registers for high programmability. To reduce the power consumption of switch blocks and clock distribution, asynchronous bit-serial architecture is proposed. To ensure the correct operation independent of data-path lengths, we use the level-encoded dual-rail encoding and propose its area-efficient implementation. The proposed field-programmable VLSI is implemented in a 90 nm CMOS technology. The delay and the power consumption of the proposed FPVLSI are respectively 61% and 58% of those of 4-phase dual-rail encoding which is the most common encoding in delay insensitive encoding.
Adaptive Impedance Matching System Using FPGA Processor for Efficient Control Algorithm
Hirokazu OBA Minseok KIM Ryotaro TAMAKI Hiroyuki ARAI

PAPER-Microwaves, Millimeter-Waves

Vol:
E91-C No:8
Page(s):
1348-1355
The input impedance of an antenna fluctuates because of various usage conditions, which causes a mismatch between an internal circuit and an antenna. An automatic matching system solves this problem, then this paper presents a reconfigurable impedance tuner that has a set of fixed capacitors controlled by switching p-i-n diodes. A fast control algorithm for selecting the appropriate conditions of an impedance tuner is proposed and mounted on FPGA to demonstrate the performance.

181-200hit(330hit)

Keyword Search Result

[Keyword] fpga(330hit)

Accelerating Boolean Matching Using Bloom Filter

FPGA Implementation of STBC Based Cooperative Relaying System

Theoretical and Heuristic Synthesis of Digital Spiking Neurons for Spike-Pattern-Division Multiplexing

A Low Power and High Throughput Self Synchronous FPGA Using 65 nm CMOS with Throughput Optimization by Pipeline Alignment

A High Throughput Medium Access Control Implementation Based on IEEE 802.11e Standard

A Fast and Memory Efficient SPIHT Image Encoder

Mapping Parallel FFT Algorithm onto SmartCell Coarse-Grained Reconfigurable Architecture

High-Speed Passphrase Search System for PGP

Efficient Cut Enumeration Heuristics for Depth-Optimum Technology Mapping for LUT-Based FPGAs

Pipelining a Multi-Mode SHA-384/512 Core with High Area Performance Rate

PAMELA: Pattern Matching Engine with Limited-Time Update for NIDS/NIPS

Application-Dependent Interconnect Testing of Xilinx FPGAs Based on Line Branches Partitioning

Implementation of a Partially Reconfigurable Multi-Context FPGA Based on Asynchronous Architecture

FPGA Implementation of Highly Modular Fast Universal Discrete Transforms

A Link Removal Methodology for Application-Specific Networks-on-Chip on FPGAs

Optimal Time-Multiplexing in Inter-FPGA Connections for Accelerating Multi-FPGA Prototyping Systems

Autonomous Repair Fault Tolerant Dynamic Reconfigurable Device

Design and Implementation of a Non-pipelined MD5 Hardware Architecture Using a New Functional Description

Evaluation of a Field-Programmable VLSI Based on an Asynchronous Bit-Serial Architecture

Adaptive Impedance Matching System Using FPGA Processor for Efficient Control Algorithm

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles