IEICE global.ieice.org Site

Keyword Search Result

[Keyword] FPGA(329hit)

1-20hit(329hit)

A VVC Dependent Quantization Optimization Based on the Parallel Viterbi Algorithm and Its FPGA Implementation Open Access
Qinghua SHENG Yu CHENG Xiaofang HUANG Changcai LAI Xiaofeng HUANG Haibin YIN

PAPER-Computer System

Pubricized:
2024/03/04
Vol:
E107-D No:7
Page(s):
797-806
Dependent Quantization (DQ) is a new quantization tool introduced in the Versatile Video Coding (VVC) standard. While it provides better rate-distortion calculation accuracy, it also increases the computational complexity and hardware cost compared to the widely used scalar quantization. To address this issue, this paper proposes a parallel-dependent quantization hardware architecture using Verilog HDL language. The architecture preprocesses the coefficients with a scalar quantizer and a high-frequency filter, and then further segments and processes the coefficients in parallel using the Viterbi algorithm. Additionally, the weight bit width of the rate-distortion calculation is reduced to decrease the quantization cycle and computational complexity. Finally, the final quantization of the TU is determined through sequential scanning and judging of the rate-distortion cost. Experimental results show that the proposed algorithm reduces the quantization cycle by an average of 56.96% compared to VVC’s reference platform VTM, with a Bjøntegaard delta bit rate (BDBR) loss of 1.03% and 1.05% under the Low-delay P and Random Access configurations, respectively. Verification on the AMD FPGA development platform demonstrates that the hardware implementation meets the quantization requirements for 1080P@60Hz video hardware encoding.
High-Throughput Exact Matching Implementation on FPGA with Shared Rule Tables among Parallel Pipelines Open Access
Xiaoyong SONG Zhichuan GUO Xinshuo WANG Mangu SONG

PAPER-Network System

Vol:
E107-B No:5
Page(s):
387-397
In software defined network (SDN), packet processing is commonly implemented using match-action model, where packets are processed based on matched actions in match action table. Due to the limited FPGA on-board resources, it is an important challenge to achieve large-scale high throughput based on exact matching (EM), while solving hash conflicts and out-of-order problems. To address these issues, this study proposed an FPGA-based EM table that leverages shared rule tables across multiple pipelines to eliminate memory replication and enhance overall throughput. An out-of-order reordering function is used to ensure packet sequencing within the pipelines. Moreover, to handle collisions and increase load factor of hash table, multiple hash table blocks are combined and an auxiliary CAM-based EM table is integrated in each pipeline. To the best of our knowledge, this is the first time that the proposed design considers the recovery of out-of-order operations in multi-channel EM table for high-speed network packets processing application. Furthermore, it is implemented on Xilinx Alveo U250 field programmable gate arrays, which has a million rules and achieves a processing speed of 200 million operations per second, theoretically enabling throughput exceeding 100 Gbps for 64-Byte size packets.
Grid Sample Based Temporal Iteration for Fully Pipelined 1-ms SLIC Superpixel Segmentation System Open Access
Yuan LI Tingting HU Ryuji FUCHIKAMI Takeshi IKENAGA

PAPER-Computer System

Pubricized:
2023/12/19
Vol:
E107-D No:4
Page(s):
515-524
A 1 millisecond (1-ms) vision system, which processes videos at 1000 frames per second (FPS) within 1 ms/frame delay, plays an increasingly important role in fields such as robotics and factory automation. Superpixel as one of the most extensively employed image oversegmentation methods is a crucial pre-processing step for reducing computations in various computer vision applications. Among the different superpixel methods, simple linear iterative clustering (SLIC) has gained widespread adoption due to its simplicity, effectiveness, and computational efficiency. However, the iterative assignment and update steps in SLIC make it challenging to achieve high processing speed. To address this limitation and develop a SLIC superpixel segmentation system with a 1 ms delay, this paper proposes grid sample based temporal iteration. By leveraging the high frame rate of the input video, the proposed method distributes the iterations into the temporal domain, ensuring that the system's delay keeps within one frame. Additionally, grid sample information is added as initialization information to the obtained superpixel centers for enhancing the stability of superpixels. Furthermore, a selective label propagation based pipeline architecture is proposed for parallel computation of all the possibilities of label propagation. This eliminates data dependency between adjacent pixels and enables a fully pipelined system. The evaluation results demonstrate that the proposed superpixel segmentation system achieves boundary recall and under-segmentation error comparable to the original SLIC algorithm. When considering label consistency, the proposed system surpasses the performance of state-of-the-art superpixel segmentation methods. Moreover, in terms of hardware performance, the proposed system processes 1000 FPS images with 0.985 ms/frame delay.
Pipelined ADPCM Compression for HDR Synthesis on an FPGA
Masahiro NISHIMURA Taito MANABE Yuichiro SHIBATA

PAPER-VLSI Design Technology and CAD

Pubricized:
2023/08/31
Vol:
E107-A No:3
Page(s):
531-539
This paper presents an FPGA implementation of real-time high dynamic range (HDR) synthesis, which expresses a wide dynamic range by combining multiple images with different exposures using image pyramids. We have implemented a pipeline that performs streaming processing on images without using external memory. However, implementation for high-resolution images has been difficult due to large memory usage for line buffers. Therefore, we propose an image compression algorithm based on adaptive differential pulse code modulation (ADPCM). Compression modules based on the algorithm can be easily integrated into the pipeline. When the image resolution is 4K and the pyramid depth is 7, memory usage can be halved from 168.48% to 84.32% by introducing the compression modules, resulting in better quality.
A Unified Software and Hardware Platform for Machine Learning Aided Wireless Systems
Dody ICHWANA PUTRA Muhammad HARRY BINTANG PRATAMA Ryotaro ISSHIKI Yuhei NAGAO Leonardo LANANTE JR Hiroshi OCHI

PAPER-Digital Signal Processing

Pubricized:
2023/08/22
Vol:
E106-A No:12
Page(s):
1493-1503
This paper presents a unified software and hardware wireless AI platform (USHWAP) for developing and evaluating machine learning in wireless systems. The platform integrates multi-software development such as MATLAB and Python with hardware platforms like FPGA and SDR, allowing for flexible and scalable device and edge computing application development. The USHWAP is implemented and validated using FPGAs and SDRs. Wireless signal classification, wireless LAN sensing, and rate adaptation are used as examples to showcase the platform's capabilities. The platform enables versatile development, including software simulation and real-time hardware implementation, offering flexibility and scalability for multiple applications. It is intended to be used by wireless-AI researchers to develop and evaluate intelligent algorithms in a laboratory environment.
Design and Implementation of an On-Line Quality Control System for Latch-Based True Random Number Generator
Naoki FUJIEDA Shuichi ICHIKAWA Ryusei OYA Hitomi KISHIBE

PAPER

Pubricized:
2023/03/24
Vol:
E106-D No:12
Page(s):
1940-1950
This paper presents a design and an implementation of an on-line quality control method for a TRNG (True Random Number Generator) on an FPGA. It is based on a TRNG with RS latches and a temporal XOR corrector, which can make a trade-off between throughput and randomness quality by changing the number of accumulations by XOR. The goal of our method is to increase the throughput within the range of keeping the quality of output random numbers. In order to detect a sign of the loss of quality from the TRNG in parallel with random number generation, our method distinguishes random bitstrings to be tested from those to be output. The test bitstring is generated with the fewer number of accumulations than that of the output bitstring. The number of accumulations will be increased if the test bitstring fails in the randomness test. We designed and evaluated a prototype of on-line quality control system, using a Zynq-7000 FPGA SoC. The results indicate that the TRNG with the proposed method achieved 1.91-2.63 Mbits/s of throughput with 16 latches, following the change of the quality of output random numbers. The total number of logic elements in the prototype system with 16 latches was comparable to an existing system with 256 latches, without quality control capabilities.
FPGA-based Garbling Accelerator with Parallel Pipeline Processing
Rin OISHI Junichiro KADOMOTO Hidetsugu IRIE Shuichi SAKAI

PAPER

Pubricized:
2023/08/02
Vol:
E106-D No:12
Page(s):
1988-1996
As more and more programs handle personal information, the demand for secure handling of data is increasing. The protocol that satisfies this demand is called Secure function evaluation (SFE) and has attracted much attention from a privacy protection perspective. In two-party SFE, two mutually untrustworthy parties compute an arbitrary function on their respective secret inputs without disclosing any information other than the output of the function. For example, it is possible to execute a program while protecting private information, such as genomic information. The garbled circuit (GC) — a method of program obfuscation in which the program is divided into gates and the output is calculated using a symmetric key cipher for each gate — is an efficient method for this purpose. However, GC is computationally expensive and has a significant overhead even with an accelerator. We focus on hardware acceleration because of the nature of GC, which is limited to certain types of calculations, such as encryption and XOR. In this paper, we propose an architecture that accelerates garbling by running multiple garbling engines simultaneously based on the latest FPGA-based GC accelerator. In this architecture, managers are introduced to perform multiple rows of pipeline processing simultaneously. We also propose an optimized implementation of RAM for this FPGA accelerator. As a result, it achieves an average performance improvement of 26% in garbling the same set of programs, compared to the state-of-the-art (SOTA) garbling accelerator.
Power Analysis and Power Modeling of Directly-Connected FPGA Clusters
Kensuke IIZUKA Haruna TAKAGI Aika KAMEI Kazuei HIRONAKA Hideharu AMANO

PAPER

Pubricized:
2023/07/20
Vol:
E106-D No:12
Page(s):
1997-2005
FPGA cluster is a promising platform for future computing not only in the cloud but in the 5G wireless base stations with limited power supply by taking significant advantage of power efficiency. However, almost no power analyses with real systems have been reported. This work reports the detailed power consumption analyses of two FPGA clusters, namely FiC and M-KUBOS clusters with introducing power measurement tools and running the real applications. From the detailed analyses, we find that the number of activated links mainly determines the total power consumption of the systems regardless they are used or not. To improve the performance of applications while reducing power consumption, we should increase the clock frequency of the applications, use the minimum number of links and apply link aggregation. We also propose the power model for both clusters from the results of the analyses and this model can estimate the total power consumption of both FPGA clusters at the design step with 15% errors at maximum.
A Multi-FPGA Implementation of FM-Index Based Genomic Pattern Search
Ullah IMDAD Akram BEN AHMED Kazuei HIRONAKA Kensuke IIZUKA Hideharu AMANO

PAPER-Computer System

Pubricized:
2023/08/09
Vol:
E106-D No:11
Page(s):
1783-1795
FPGA clusters that consist of multiple FPGA boards have been gaining interest in recent times. Massively parallel processing with a stand-alone heterogeneous FPGA cluster with SoC- style FPGAs and mid-scale FPGAs is promising with cost-performance benefit. Here, we propose such a heterogeneous FPGA cluster with FiC and M-KUBOS cluster. FiC consists of multiple boards, mounting middle scale Xilinx's FPGAs and DRAMs, which are tightly coupled with high-speed serial links. In addition, M-KUBOS boards are connected to FiC for ensuring high IO data transfer bandwidth. As an example of massively parallel processing, here we implement genomic pattern search. Next-generation sequencing (NGS) technology has revolutionized biological system related research by its high-speed, scalable and massive throughput. To analyze the genomic data, short read mapping technique is used where short Deoxyribonucleic acid (DNA) sequences are mapped relative to a known reference sequence. Although several pattern matching techniques are available, FM-index based pattern search is perfectly suitable for this task due to the fastest mapping from known indices. Since matching can be done in parallel for different data, the massively parallel computing which distributes data, executes in parallel and gathers the results can be applied. We also implement a data compression method where about 10 times reduction in data size is achieved. We found that a M-KUBOS board matches four FiC boards, and a system with six M-KUBOS boards and 24 FiC boards achieved 30 times faster than the software based implementation.
Implementing Region-Based Segmentation for Hardware Trojan Detection in FPGAs Cell-Level Netlist
Ann Jelyn TIEMPO Yong-Jin JEONG

LETTER-Dependable Computing

Pubricized:
2023/07/28
Vol:
E106-D No:11
Page(s):
1926-1929
Field Programmable Gate Array (FPGA) is gaining popularity because of their reconfigurability which brings in security concerns like inserting hardware trojan. Various detection methods to overcome this threat have been proposed but in the ASIC's supply chain and cannot directly apply to the FPGA application. In this paper, the authors aim to implement a structural feature-based detection method for detecting hardware trojan in a cell-level netlist, which is not well explored yet, where the nets are segmented into smaller groups based on their interconnection and further analyzed by looking at their structural similarities. Experiments show positive performance with an average detection rate of 95.41%, an average false alarm rate of 2.87% and average accuracy of 96.27%.
An Efficient Reconfigurable Architecture for Software Defined Radio
Vijaya BHASKAR C Munaswamy P

PAPER-Information Network

Pubricized:
2023/06/20
Vol:
E106-D No:9
Page(s):
1519-1527
Wireless technology improvements have been continually increasing, resulting in greater needs for system design and implementation to accommodate all newly emerging standards. As a result, developing a system that ensures compatibility with numerous wireless systems has sparked interest. As a result of their flexibility and scalability over alternative wireless design options, software-defined radios (SDRs) are highly motivated for wireless device modelling. This research paper delves into the difficulties of designing a reconfigurable multi modulation baseband modulator for SDR systems that can handle a variety of wireless protocols. This research paper has proposed an area-efficient Reconfigurable Baseband Modulator (RBM) model to accomplish multi modulation scheme and resolve the adaptability and flexibility issues with the wide range of wireless standards. This also presents the feasibility of using a multi modulation baseband modulator to maximize adaptability with the least possible computational complexity overhead in the SDR system for next-generation wireless communication systems and provides parameterization. Finally, the re-configurability is evaluated concerning the appropriate symbols generations and analyzed its performance metrics through hardware synthesize results.
A Low-Cost Neural ODE with Depthwise Separable Convolution for Edge Domain Adaptation on FPGAs
Hiroki KAWAKAMI Hirohisa WATANABE Keisuke SUGIURA Hiroki MATSUTANI

PAPER-Computer System

Pubricized:
2023/04/05
Vol:
E106-D No:7
Page(s):
1186-1197
High-performance deep neural network (DNN)-based systems are in high demand in edge environments. Due to its high computational complexity, it is challenging to deploy DNNs on edge devices with strict limitations on computational resources. In this paper, we derive a compact while highly-accurate DNN model, termed dsODENet, by combining recently-proposed parameter reduction techniques: Neural ODE (Ordinary Differential Equation) and DSC (Depthwise Separable Convolution). Neural ODE exploits a similarity between ResNet and ODE, and shares most of weight parameters among multiple layers, which greatly reduces the memory consumption. We apply dsODENet to a domain adaptation as a practical use case with image classification datasets. We also propose a resource-efficient FPGA-based design for dsODENet, where all the parameters and feature maps except for pre- and post-processing layers can be mapped onto on-chip memories. It is implemented on Xilinx ZCU104 board and evaluated in terms of domain adaptation accuracy, inference speed, FPGA resource utilization, and speedup rate compared to a software counterpart. The results demonstrate that dsODENet achieves comparable or slightly better domain adaptation accuracy compared to our baseline Neural ODE implementation, while the total parameter size without pre- and post-processing layers is reduced by 54.2% to 79.8%. Our FPGA implementation accelerates the inference speed by 23.8 times.
Parallel Implementation of CNN on Multi-FPGA Cluster
Yasuyu FUKUSHIMA Kensuke IIZUKA Hideharu AMANO

PAPER-Computer System

Pubricized:
2023/04/12
Vol:
E106-D No:7
Page(s):
1198-1208
We developed a PYNQ cluster that consists of economical Zynq boards, called M-KUBOS, that are interconnected through low-cost high-performance GTH serial links. For the software environment, we employed the PYNQ open-source software platform. The PYNQ cluster is anticipated to be a multi-access edge computing (MEC) server for 5G mobile networks. We implemented the ResNet-50 inference accelerator on the PYNQ cluster for image recognition of MEC applications. By estimating the execution time of each ResNet-50 layer, layers of ResNet-50 were divided into multiple boards so that the execution time of each board would be as equal as possible for efficient pipeline processing. Owing to the PYNQ cluster in which FPGAs were directly connected by high-speed serial links, stream processing without network bottlenecks and pipeline processing between boards were readily realized. The implementation on 4 boards achieved 292 GOPS performance, 75.1 FPS throughput, and 7.81 GOPS/W power efficiency. It achieved 17 times faster speed and 130 times more power efficiency compared to the implementation on the CPU, and 5.8 times more power efficiency compared to the implementation on the GPU.
High Performance Network Virtualization Architecture on FPGA SmartNIC
Ke WANG Yiwei CHANG Zhichuan GUO

PAPER-Network System

Pubricized:
2022/11/29
Vol:
E106-B No:6
Page(s):
500-508
Network Functional Virtualization (NFV) is a high-performance network interconnection technology that allows access to traditional network transport devices through virtual network links. It is widely used in cloud computing and other high-concurrent access environments. However, there is a long delay in the introduction of software NFV solutions. Other hardware I/O virtualization solutions don't scale very well. Therefore, this paper proposes a virtualization implementation method on 100Gbps high-speed Field Programmable Gate Array (FPGA) network accelerator card, which uses FPGA accelerator to improve the performance of virtual network devices. This method uses the single root I/O virtualization (SR-IOV) technology to allow 256 virtual links to be created for a single Peripheral Component Interconnect express (PCIe) device. And it supports data transfer with virtual machine (VM) in the way of Peripheral Component Interconnect (PCI) passthrough. In addition, the design also adopts the shared extensible queue management mechanism, which supports the flexible allocation of more than 10,000 queues on virtual machines, and ensures the good isolation performance in the data path and control path. The design provides high-bandwidth transmission performance of more than 90Gbps for the entire network system, meeting the performance requirements of hyperscale cloud computing clusters.
Implementation of Fully-Pipelined CNN Inference Accelerator on FPGA and HBM2 Platform
Van-Cam NGUYEN Yasuhiko NAKASHIMA

PAPER-Computer System

Pubricized:
2023/03/17
Vol:
E106-D No:6
Page(s):
1117-1129
Many deep convolutional neural network (CNN) inference accelerators on the field-programmable gate array (FPGA) platform have been widely adopted due to their low power consumption and high performance. In this paper, we develop the following to improve performance and power efficiency. First, we use a high bandwidth memory (HBM) to expand the bandwidth of data transmission between the off-chip memory and the accelerator. Second, a fully-pipelined manner, which consists of pipelined inter-layer computation and a pipelined computation engine, is implemented to decrease idle time among layers. Third, a multi-core architecture with shared-dual buffers is designed to reduce off-chip memory access and maximize the throughput. We designed the proposed accelerator on the Xilinx Alveo U280 platform with in-depth Verilog HDL instead of high-level synthesis as the previous works and explored the VGG-16 model to verify the system during our experiment. With a similar accelerator architecture, the experimental results demonstrate that the memory bandwidth of HBM is 13.2× better than DDR4. Compared with other accelerators in terms of throughput, our accelerator is 1.9×/1.65×/11.9× better than FPGA+HBM2 based/low batch size (4) GPGPU/low batch size (4) CPU. Compared with the previous DDR+FPGA/DDR+GPGPU/DDR+CPU based accelerators in terms of power efficiency, our proposed system provides 1.4-1.7×/1.7-12.6×/6.6-37.1× improvement with the large-scale CNN model.
Study of FIT Dedicated Computer with Dataflow Architecture for High Performance 2-D Magneto-Static Field Simulation
Chenxu WANG Hideki KAWAGUCHI Kota WATANABE

PAPER

Pubricized:
2022/08/23
Vol:
E106-C No:4
Page(s):
136-143
An approach to dedicated computers is discussed in this study as a possibility for portable, low-cost, and low-power consumption high-performance computing technologies. Particularly, dataflow architecture dedicated computer of the finite integration technique (FIT) for 2D magnetostatic field simulation is considered for use in industrial applications. The dataflow architecture circuit of the BiCG-Stab matrix solver of the FIT matrix calculation is designed by the very high-speed integrated circuit hardware description language (VHDL). The operation of the dedicated computer's designed circuit is considered by VHDL logic circuit simulation.
Real-Time Image-Based Vibration Extraction with Memory-Efficient Optical Flow and Block-Based Adaptive Filter
Taito MANABE Yuichiro SHIBATA

PAPER

Pubricized:
2022/09/05
Vol:
E106-A No:3
Page(s):
504-513
In this paper, we propose a real-time vibration extraction system, which extracts vibration component within a given frequency range from videos in real time, for realizing tremor suppression used in microsurgery assistance systems. To overcome the problems in our previous system based on the mean Lucas-Kanade (LK) optical flow of the whole frame, we have introduced a new architecture combining dense optical flow calculated with simple feature matching and block-based band-pass filtering using band-limited multiple Fourier linear combiner (BMFLC). As a feature of optical flow calculation, we use the simplified rotation-invariant histogram of oriented gradients (RIHOG) based on a gradient angle quantized to 1, 2, or 3 bits, which greatly reduces the usage of memory resources for a frame buffer. An obtained optical flow map is then divided into multiple blocks, and BMFLC is applied to the mean optical flow of each block independently. By using the L1-norm of adaptive weight vectors in BMFLC as a criterion, blocks belonging to vibrating objects can be isolated from background at low cost, leading to better extraction accuracy compared to the previous system. The whole system for 480p and 720p resolutions can be implemented on a single Xilinx Zynq-7000 XC7Z020 FPGA without any external memory, and can process a video stream supplied directly from a camera at 60fps.
An eFPGA Generation Suite with Customizable Architecture and IDE
Morihiro KUGA Qian ZHAO Yuya NAKAZATO Motoki AMAGASAKI Masahiro IIDA

PAPER

Pubricized:
2022/10/07
Vol:
E106-A No:3
Page(s):
560-574
From edge devices to cloud servers, providing optimized hardware acceleration for specific applications has become a key approach to improve the efficiency of computer systems. Traditionally, many systems employ commercial field-programmable gate arrays (FPGAs) to implement dedicated hardware accelerator as the CPU's co-processor. However, commercial FPGAs are designed in generic architectures and are provided in the form of discrete chips, which makes it difficult to meet increasingly diversified market needs, such as balancing reconfigurable hardware resources for a specific application, or to be integrated into a customer's system-on-a-chip (SoC) in the form of embedded FPGA (eFPGA). In this paper, we propose an eFPGA generation suite with customizable architecture and integrated development environment (IDE), which covers the entire eFPGA design generation, testing, and utilization stages. For the eFPGA design generation, our intellectual property (IP) generation flow can explore the optimal logic cell, routing, and array structures for given target applications. For the testability, we employ a previously proposed shipping test method that is 100% accurate at detecting all stuck-at faults in the entire FPGA-IP. In addition, we propose a user-friendly and customizable Web-based IDE framework for the generated eFPGA based on the NODE-RED development framework. In the case study, we show an eFPGA architecture exploration example for a differential privacy encryption application using the proposed suite. Then we show the implementation and evaluation of the eFPGA prototype with a 55nm test element group chip design.
The Implementation of a Hybrid Router and Dynamic Switching Algorithm on a Multi-FPGA System
Tomoki SHIMIZU Kohei ITO Kensuke IIZUKA Kazuei HIRONAKA Hideharu AMANO

PAPER

Pubricized:
2022/06/30
Vol:
E105-D No:12
Page(s):
2008-2018
The multi-FPGA system known as, the Flow-in-Cloud (FiC) system, is composed of mid-range FPGAs that are directly interconnected by high-speed serial links. FiC is currently being developed as a server for multi-access edge computing (MEC), which is one of the core technologies of 5G. Because the applications of MEC are sometimes timing-critical, a static time division multiplexing (STDM) network has been used on FiC. However, the STDM network exhibits the disadvantage of decreasing link utilization, especially under light traffic. To solve this problem, we propose a hybrid router that combines packet switching for low-priority communication and STDM for high-priority communication. In our hybrid network, the packet switching uses slots that are unused by the STDM; therefore, best-effort communication by packet switching and QoS guarantee communication by the STDM can be used simultaneously. Furthermore, to improve each link utilization under a low network traffic load, we propose a dynamic communication switching algorithm. In our algorithm, each router monitors the network load metrics, and according to the metrics, timing-critical tasks select the STDM according to the metrics only when congestion occurs. This can achieve both QoS guarantee and efficient utilization of each link with a small resource overhead. In our evaluation, the dynamic algorithm was up to 24.6% faster on the execution time with a high network load compared to the packet switching on a real multi-FPGA system with 24 boards.
A Novel Fixed-Point Conversion Methodology For Digital Signal Processing Systems
Phuong T.K. DINH Linh T.T. DINH Tung T. TRAN Lam S. PHAM Han Le DUC Chi P. HOANG Minh D. NGUYEN

PAPER-Digital Signal Processing

Pubricized:
2022/06/17
Vol:
E105-A No:12
Page(s):
1537-1550
Recently, most signal processing algorithms have been developed with floating-point arithmetic, while the fixed-point arithmetic is more popular with most commercial devices and low-power real-time applications which are implemented on embedded/ASIC/FPGA systems. Therefore, the optimal Floating-point to Fixed-point Conversion (FFC) methodology is a promising solution. In this paper, we propose the FFC consisting of signal grouping technique and simulation-based word length optimization. In order to evaluate the performance of the proposed technique, simulations are carried out and hardware co-simulation on Field Programmable Gate Arrays (FPGAs) platform have been applied to complex Digital Signal Processing (DSP) algorithms: Linear Time Invariant (LTI) systems, multi-mode Fast Fourier Transform (FFT) circuit for IEEE 802.11 ax WLAN Devices and the calibration algorithm of gain and clock skew in Time-Interleaved ADC (TI-ADC) using Adaptive Noise Canceller (ANC). The results show that the proposed technique can reduce the hardware cost about 30% while being able to maintain its speed and reliability.

1-20hit(329hit)

Keyword Search Result

[Keyword] FPGA(329hit)

A VVC Dependent Quantization Optimization Based on the Parallel Viterbi Algorithm and Its FPGA Implementation Open Access

High-Throughput Exact Matching Implementation on FPGA with Shared Rule Tables among Parallel Pipelines Open Access

Grid Sample Based Temporal Iteration for Fully Pipelined 1-ms SLIC Superpixel Segmentation System Open Access

Pipelined ADPCM Compression for HDR Synthesis on an FPGA

A Unified Software and Hardware Platform for Machine Learning Aided Wireless Systems

Design and Implementation of an On-Line Quality Control System for Latch-Based True Random Number Generator

FPGA-based Garbling Accelerator with Parallel Pipeline Processing

Power Analysis and Power Modeling of Directly-Connected FPGA Clusters

A Multi-FPGA Implementation of FM-Index Based Genomic Pattern Search

Implementing Region-Based Segmentation for Hardware Trojan Detection in FPGAs Cell-Level Netlist

An Efficient Reconfigurable Architecture for Software Defined Radio

A Low-Cost Neural ODE with Depthwise Separable Convolution for Edge Domain Adaptation on FPGAs

Parallel Implementation of CNN on Multi-FPGA Cluster

High Performance Network Virtualization Architecture on FPGA SmartNIC

Implementation of Fully-Pipelined CNN Inference Accelerator on FPGA and HBM2 Platform

Study of FIT Dedicated Computer with Dataflow Architecture for High Performance 2-D Magneto-Static Field Simulation

Real-Time Image-Based Vibration Extraction with Memory-Efficient Optical Flow and Block-Based Adaptive Filter

An eFPGA Generation Suite with Customizable Architecture and IDE

The Implementation of a Hybrid Router and Dynamic Switching Algorithm on a Multi-FPGA System

A Novel Fixed-Point Conversion Methodology For Digital Signal Processing Systems

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles