The search functionality is under construction.

Keyword Search Result

[Keyword] FPGA(329hit)

1-20hit(329hit)

  • A VVC Dependent Quantization Optimization Based on the Parallel Viterbi Algorithm and Its FPGA Implementation Open Access

    Qinghua SHENG  Yu CHENG  Xiaofang HUANG  Changcai LAI  Xiaofeng HUANG  Haibin YIN  

     
    PAPER-Computer System

      Pubricized:
    2024/03/04
      Vol:
    E107-D No:7
      Page(s):
    797-806

    Dependent Quantization (DQ) is a new quantization tool introduced in the Versatile Video Coding (VVC) standard. While it provides better rate-distortion calculation accuracy, it also increases the computational complexity and hardware cost compared to the widely used scalar quantization. To address this issue, this paper proposes a parallel-dependent quantization hardware architecture using Verilog HDL language. The architecture preprocesses the coefficients with a scalar quantizer and a high-frequency filter, and then further segments and processes the coefficients in parallel using the Viterbi algorithm. Additionally, the weight bit width of the rate-distortion calculation is reduced to decrease the quantization cycle and computational complexity. Finally, the final quantization of the TU is determined through sequential scanning and judging of the rate-distortion cost. Experimental results show that the proposed algorithm reduces the quantization cycle by an average of 56.96% compared to VVC’s reference platform VTM, with a Bjøntegaard delta bit rate (BDBR) loss of 1.03% and 1.05% under the Low-delay P and Random Access configurations, respectively. Verification on the AMD FPGA development platform demonstrates that the hardware implementation meets the quantization requirements for 1080P@60Hz video hardware encoding.

  • High-Throughput Exact Matching Implementation on FPGA with Shared Rule Tables among Parallel Pipelines Open Access

    Xiaoyong SONG  Zhichuan GUO  Xinshuo WANG  Mangu SONG  

     
    PAPER-Network System

      Vol:
    E107-B No:5
      Page(s):
    387-397

    In software defined network (SDN), packet processing is commonly implemented using match-action model, where packets are processed based on matched actions in match action table. Due to the limited FPGA on-board resources, it is an important challenge to achieve large-scale high throughput based on exact matching (EM), while solving hash conflicts and out-of-order problems. To address these issues, this study proposed an FPGA-based EM table that leverages shared rule tables across multiple pipelines to eliminate memory replication and enhance overall throughput. An out-of-order reordering function is used to ensure packet sequencing within the pipelines. Moreover, to handle collisions and increase load factor of hash table, multiple hash table blocks are combined and an auxiliary CAM-based EM table is integrated in each pipeline. To the best of our knowledge, this is the first time that the proposed design considers the recovery of out-of-order operations in multi-channel EM table for high-speed network packets processing application. Furthermore, it is implemented on Xilinx Alveo U250 field programmable gate arrays, which has a million rules and achieves a processing speed of 200 million operations per second, theoretically enabling throughput exceeding 100 Gbps for 64-Byte size packets.

  • Grid Sample Based Temporal Iteration for Fully Pipelined 1-ms SLIC Superpixel Segmentation System Open Access

    Yuan LI  Tingting HU  Ryuji FUCHIKAMI  Takeshi IKENAGA  

     
    PAPER-Computer System

      Pubricized:
    2023/12/19
      Vol:
    E107-D No:4
      Page(s):
    515-524

    A 1 millisecond (1-ms) vision system, which processes videos at 1000 frames per second (FPS) within 1 ms/frame delay, plays an increasingly important role in fields such as robotics and factory automation. Superpixel as one of the most extensively employed image oversegmentation methods is a crucial pre-processing step for reducing computations in various computer vision applications. Among the different superpixel methods, simple linear iterative clustering (SLIC) has gained widespread adoption due to its simplicity, effectiveness, and computational efficiency. However, the iterative assignment and update steps in SLIC make it challenging to achieve high processing speed. To address this limitation and develop a SLIC superpixel segmentation system with a 1 ms delay, this paper proposes grid sample based temporal iteration. By leveraging the high frame rate of the input video, the proposed method distributes the iterations into the temporal domain, ensuring that the system's delay keeps within one frame. Additionally, grid sample information is added as initialization information to the obtained superpixel centers for enhancing the stability of superpixels. Furthermore, a selective label propagation based pipeline architecture is proposed for parallel computation of all the possibilities of label propagation. This eliminates data dependency between adjacent pixels and enables a fully pipelined system. The evaluation results demonstrate that the proposed superpixel segmentation system achieves boundary recall and under-segmentation error comparable to the original SLIC algorithm. When considering label consistency, the proposed system surpasses the performance of state-of-the-art superpixel segmentation methods. Moreover, in terms of hardware performance, the proposed system processes 1000 FPS images with 0.985 ms/frame delay.

  • Pipelined ADPCM Compression for HDR Synthesis on an FPGA

    Masahiro NISHIMURA  Taito MANABE  Yuichiro SHIBATA  

     
    PAPER-VLSI Design Technology and CAD

      Pubricized:
    2023/08/31
      Vol:
    E107-A No:3
      Page(s):
    531-539

    This paper presents an FPGA implementation of real-time high dynamic range (HDR) synthesis, which expresses a wide dynamic range by combining multiple images with different exposures using image pyramids. We have implemented a pipeline that performs streaming processing on images without using external memory. However, implementation for high-resolution images has been difficult due to large memory usage for line buffers. Therefore, we propose an image compression algorithm based on adaptive differential pulse code modulation (ADPCM). Compression modules based on the algorithm can be easily integrated into the pipeline. When the image resolution is 4K and the pyramid depth is 7, memory usage can be halved from 168.48% to 84.32% by introducing the compression modules, resulting in better quality.

  • A Unified Software and Hardware Platform for Machine Learning Aided Wireless Systems

    Dody ICHWANA PUTRA  Muhammad HARRY BINTANG PRATAMA  Ryotaro ISSHIKI  Yuhei NAGAO  Leonardo LANANTE JR  Hiroshi OCHI  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2023/08/22
      Vol:
    E106-A No:12
      Page(s):
    1493-1503

    This paper presents a unified software and hardware wireless AI platform (USHWAP) for developing and evaluating machine learning in wireless systems. The platform integrates multi-software development such as MATLAB and Python with hardware platforms like FPGA and SDR, allowing for flexible and scalable device and edge computing application development. The USHWAP is implemented and validated using FPGAs and SDRs. Wireless signal classification, wireless LAN sensing, and rate adaptation are used as examples to showcase the platform's capabilities. The platform enables versatile development, including software simulation and real-time hardware implementation, offering flexibility and scalability for multiple applications. It is intended to be used by wireless-AI researchers to develop and evaluate intelligent algorithms in a laboratory environment.

  • Design and Implementation of an On-Line Quality Control System for Latch-Based True Random Number Generator

    Naoki FUJIEDA  Shuichi ICHIKAWA  Ryusei OYA  Hitomi KISHIBE  

     
    PAPER

      Pubricized:
    2023/03/24
      Vol:
    E106-D No:12
      Page(s):
    1940-1950

    This paper presents a design and an implementation of an on-line quality control method for a TRNG (True Random Number Generator) on an FPGA. It is based on a TRNG with RS latches and a temporal XOR corrector, which can make a trade-off between throughput and randomness quality by changing the number of accumulations by XOR. The goal of our method is to increase the throughput within the range of keeping the quality of output random numbers. In order to detect a sign of the loss of quality from the TRNG in parallel with random number generation, our method distinguishes random bitstrings to be tested from those to be output. The test bitstring is generated with the fewer number of accumulations than that of the output bitstring. The number of accumulations will be increased if the test bitstring fails in the randomness test. We designed and evaluated a prototype of on-line quality control system, using a Zynq-7000 FPGA SoC. The results indicate that the TRNG with the proposed method achieved 1.91-2.63 Mbits/s of throughput with 16 latches, following the change of the quality of output random numbers. The total number of logic elements in the prototype system with 16 latches was comparable to an existing system with 256 latches, without quality control capabilities.

  • FPGA-based Garbling Accelerator with Parallel Pipeline Processing

    Rin OISHI  Junichiro KADOMOTO  Hidetsugu IRIE  Shuichi SAKAI  

     
    PAPER

      Pubricized:
    2023/08/02
      Vol:
    E106-D No:12
      Page(s):
    1988-1996

    As more and more programs handle personal information, the demand for secure handling of data is increasing. The protocol that satisfies this demand is called Secure function evaluation (SFE) and has attracted much attention from a privacy protection perspective. In two-party SFE, two mutually untrustworthy parties compute an arbitrary function on their respective secret inputs without disclosing any information other than the output of the function. For example, it is possible to execute a program while protecting private information, such as genomic information. The garbled circuit (GC) — a method of program obfuscation in which the program is divided into gates and the output is calculated using a symmetric key cipher for each gate — is an efficient method for this purpose. However, GC is computationally expensive and has a significant overhead even with an accelerator. We focus on hardware acceleration because of the nature of GC, which is limited to certain types of calculations, such as encryption and XOR. In this paper, we propose an architecture that accelerates garbling by running multiple garbling engines simultaneously based on the latest FPGA-based GC accelerator. In this architecture, managers are introduced to perform multiple rows of pipeline processing simultaneously. We also propose an optimized implementation of RAM for this FPGA accelerator. As a result, it achieves an average performance improvement of 26% in garbling the same set of programs, compared to the state-of-the-art (SOTA) garbling accelerator.

  • Power Analysis and Power Modeling of Directly-Connected FPGA Clusters

    Kensuke IIZUKA  Haruna TAKAGI  Aika KAMEI  Kazuei HIRONAKA  Hideharu AMANO  

     
    PAPER

      Pubricized:
    2023/07/20
      Vol:
    E106-D No:12
      Page(s):
    1997-2005

    FPGA cluster is a promising platform for future computing not only in the cloud but in the 5G wireless base stations with limited power supply by taking significant advantage of power efficiency. However, almost no power analyses with real systems have been reported. This work reports the detailed power consumption analyses of two FPGA clusters, namely FiC and M-KUBOS clusters with introducing power measurement tools and running the real applications. From the detailed analyses, we find that the number of activated links mainly determines the total power consumption of the systems regardless they are used or not. To improve the performance of applications while reducing power consumption, we should increase the clock frequency of the applications, use the minimum number of links and apply link aggregation. We also propose the power model for both clusters from the results of the analyses and this model can estimate the total power consumption of both FPGA clusters at the design step with 15% errors at maximum.

  • A Multi-FPGA Implementation of FM-Index Based Genomic Pattern Search

    Ullah IMDAD  Akram BEN AHMED  Kazuei HIRONAKA  Kensuke IIZUKA  Hideharu AMANO  

     
    PAPER-Computer System

      Pubricized:
    2023/08/09
      Vol:
    E106-D No:11
      Page(s):
    1783-1795

    FPGA clusters that consist of multiple FPGA boards have been gaining interest in recent times. Massively parallel processing with a stand-alone heterogeneous FPGA cluster with SoC- style FPGAs and mid-scale FPGAs is promising with cost-performance benefit. Here, we propose such a heterogeneous FPGA cluster with FiC and M-KUBOS cluster. FiC consists of multiple boards, mounting middle scale Xilinx's FPGAs and DRAMs, which are tightly coupled with high-speed serial links. In addition, M-KUBOS boards are connected to FiC for ensuring high IO data transfer bandwidth. As an example of massively parallel processing, here we implement genomic pattern search. Next-generation sequencing (NGS) technology has revolutionized biological system related research by its high-speed, scalable and massive throughput. To analyze the genomic data, short read mapping technique is used where short Deoxyribonucleic acid (DNA) sequences are mapped relative to a known reference sequence. Although several pattern matching techniques are available, FM-index based pattern search is perfectly suitable for this task due to the fastest mapping from known indices. Since matching can be done in parallel for different data, the massively parallel computing which distributes data, executes in parallel and gathers the results can be applied. We also implement a data compression method where about 10 times reduction in data size is achieved. We found that a M-KUBOS board matches four FiC boards, and a system with six M-KUBOS boards and 24 FiC boards achieved 30 times faster than the software based implementation.

  • Implementing Region-Based Segmentation for Hardware Trojan Detection in FPGAs Cell-Level Netlist

    Ann Jelyn TIEMPO  Yong-Jin JEONG  

     
    LETTER-Dependable Computing

      Pubricized:
    2023/07/28
      Vol:
    E106-D No:11
      Page(s):
    1926-1929

    Field Programmable Gate Array (FPGA) is gaining popularity because of their reconfigurability which brings in security concerns like inserting hardware trojan. Various detection methods to overcome this threat have been proposed but in the ASIC's supply chain and cannot directly apply to the FPGA application. In this paper, the authors aim to implement a structural feature-based detection method for detecting hardware trojan in a cell-level netlist, which is not well explored yet, where the nets are segmented into smaller groups based on their interconnection and further analyzed by looking at their structural similarities. Experiments show positive performance with an average detection rate of 95.41%, an average false alarm rate of 2.87% and average accuracy of 96.27%.

  • An Efficient Reconfigurable Architecture for Software Defined Radio

    Vijaya BHASKAR C  Munaswamy P  

     
    PAPER-Information Network

      Pubricized:
    2023/06/20
      Vol:
    E106-D No:9
      Page(s):
    1519-1527

    Wireless technology improvements have been continually increasing, resulting in greater needs for system design and implementation to accommodate all newly emerging standards. As a result, developing a system that ensures compatibility with numerous wireless systems has sparked interest. As a result of their flexibility and scalability over alternative wireless design options, software-defined radios (SDRs) are highly motivated for wireless device modelling. This research paper delves into the difficulties of designing a reconfigurable multi modulation baseband modulator for SDR systems that can handle a variety of wireless protocols. This research paper has proposed an area-efficient Reconfigurable Baseband Modulator (RBM) model to accomplish multi modulation scheme and resolve the adaptability and flexibility issues with the wide range of wireless standards. This also presents the feasibility of using a multi modulation baseband modulator to maximize adaptability with the least possible computational complexity overhead in the SDR system for next-generation wireless communication systems and provides parameterization. Finally, the re-configurability is evaluated concerning the appropriate symbols generations and analyzed its performance metrics through hardware synthesize results.

  • A Low-Cost Neural ODE with Depthwise Separable Convolution for Edge Domain Adaptation on FPGAs

    Hiroki KAWAKAMI  Hirohisa WATANABE  Keisuke SUGIURA  Hiroki MATSUTANI  

     
    PAPER-Computer System

      Pubricized:
    2023/04/05
      Vol:
    E106-D No:7
      Page(s):
    1186-1197

    High-performance deep neural network (DNN)-based systems are in high demand in edge environments. Due to its high computational complexity, it is challenging to deploy DNNs on edge devices with strict limitations on computational resources. In this paper, we derive a compact while highly-accurate DNN model, termed dsODENet, by combining recently-proposed parameter reduction techniques: Neural ODE (Ordinary Differential Equation) and DSC (Depthwise Separable Convolution). Neural ODE exploits a similarity between ResNet and ODE, and shares most of weight parameters among multiple layers, which greatly reduces the memory consumption. We apply dsODENet to a domain adaptation as a practical use case with image classification datasets. We also propose a resource-efficient FPGA-based design for dsODENet, where all the parameters and feature maps except for pre- and post-processing layers can be mapped onto on-chip memories. It is implemented on Xilinx ZCU104 board and evaluated in terms of domain adaptation accuracy, inference speed, FPGA resource utilization, and speedup rate compared to a software counterpart. The results demonstrate that dsODENet achieves comparable or slightly better domain adaptation accuracy compared to our baseline Neural ODE implementation, while the total parameter size without pre- and post-processing layers is reduced by 54.2% to 79.8%. Our FPGA implementation accelerates the inference speed by 23.8 times.

  • Parallel Implementation of CNN on Multi-FPGA Cluster

    Yasuyu FUKUSHIMA  Kensuke IIZUKA  Hideharu AMANO  

     
    PAPER-Computer System

      Pubricized:
    2023/04/12
      Vol:
    E106-D No:7
      Page(s):
    1198-1208

    We developed a PYNQ cluster that consists of economical Zynq boards, called M-KUBOS, that are interconnected through low-cost high-performance GTH serial links. For the software environment, we employed the PYNQ open-source software platform. The PYNQ cluster is anticipated to be a multi-access edge computing (MEC) server for 5G mobile networks. We implemented the ResNet-50 inference accelerator on the PYNQ cluster for image recognition of MEC applications. By estimating the execution time of each ResNet-50 layer, layers of ResNet-50 were divided into multiple boards so that the execution time of each board would be as equal as possible for efficient pipeline processing. Owing to the PYNQ cluster in which FPGAs were directly connected by high-speed serial links, stream processing without network bottlenecks and pipeline processing between boards were readily realized. The implementation on 4 boards achieved 292 GOPS performance, 75.1 FPS throughput, and 7.81 GOPS/W power efficiency. It achieved 17 times faster speed and 130 times more power efficiency compared to the implementation on the CPU, and 5.8 times more power efficiency compared to the implementation on the GPU.

  • High Performance Network Virtualization Architecture on FPGA SmartNIC

    Ke WANG  Yiwei CHANG  Zhichuan GUO  

     
    PAPER-Network System

      Pubricized:
    2022/11/29
      Vol:
    E106-B No:6
      Page(s):
    500-508

    Network Functional Virtualization (NFV) is a high-performance network interconnection technology that allows access to traditional network transport devices through virtual network links. It is widely used in cloud computing and other high-concurrent access environments. However, there is a long delay in the introduction of software NFV solutions. Other hardware I/O virtualization solutions don't scale very well. Therefore, this paper proposes a virtualization implementation method on 100Gbps high-speed Field Programmable Gate Array (FPGA) network accelerator card, which uses FPGA accelerator to improve the performance of virtual network devices. This method uses the single root I/O virtualization (SR-IOV) technology to allow 256 virtual links to be created for a single Peripheral Component Interconnect express (PCIe) device. And it supports data transfer with virtual machine (VM) in the way of Peripheral Component Interconnect (PCI) passthrough. In addition, the design also adopts the shared extensible queue management mechanism, which supports the flexible allocation of more than 10,000 queues on virtual machines, and ensures the good isolation performance in the data path and control path. The design provides high-bandwidth transmission performance of more than 90Gbps for the entire network system, meeting the performance requirements of hyperscale cloud computing clusters.

  • Implementation of Fully-Pipelined CNN Inference Accelerator on FPGA and HBM2 Platform

    Van-Cam NGUYEN  Yasuhiko NAKASHIMA  

     
    PAPER-Computer System

      Pubricized:
    2023/03/17
      Vol:
    E106-D No:6
      Page(s):
    1117-1129

    Many deep convolutional neural network (CNN) inference accelerators on the field-programmable gate array (FPGA) platform have been widely adopted due to their low power consumption and high performance. In this paper, we develop the following to improve performance and power efficiency. First, we use a high bandwidth memory (HBM) to expand the bandwidth of data transmission between the off-chip memory and the accelerator. Second, a fully-pipelined manner, which consists of pipelined inter-layer computation and a pipelined computation engine, is implemented to decrease idle time among layers. Third, a multi-core architecture with shared-dual buffers is designed to reduce off-chip memory access and maximize the throughput. We designed the proposed accelerator on the Xilinx Alveo U280 platform with in-depth Verilog HDL instead of high-level synthesis as the previous works and explored the VGG-16 model to verify the system during our experiment. With a similar accelerator architecture, the experimental results demonstrate that the memory bandwidth of HBM is 13.2× better than DDR4. Compared with other accelerators in terms of throughput, our accelerator is 1.9×/1.65×/11.9× better than FPGA+HBM2 based/low batch size (4) GPGPU/low batch size (4) CPU. Compared with the previous DDR+FPGA/DDR+GPGPU/DDR+CPU based accelerators in terms of power efficiency, our proposed system provides 1.4-1.7×/1.7-12.6×/6.6-37.1× improvement with the large-scale CNN model.

  • Study of FIT Dedicated Computer with Dataflow Architecture for High Performance 2-D Magneto-Static Field Simulation

    Chenxu WANG  Hideki KAWAGUCHI  Kota WATANABE  

     
    PAPER

      Pubricized:
    2022/08/23
      Vol:
    E106-C No:4
      Page(s):
    136-143

    An approach to dedicated computers is discussed in this study as a possibility for portable, low-cost, and low-power consumption high-performance computing technologies. Particularly, dataflow architecture dedicated computer of the finite integration technique (FIT) for 2D magnetostatic field simulation is considered for use in industrial applications. The dataflow architecture circuit of the BiCG-Stab matrix solver of the FIT matrix calculation is designed by the very high-speed integrated circuit hardware description language (VHDL). The operation of the dedicated computer's designed circuit is considered by VHDL logic circuit simulation.

  • Real-Time Image-Based Vibration Extraction with Memory-Efficient Optical Flow and Block-Based Adaptive Filter

    Taito MANABE  Yuichiro SHIBATA  

     
    PAPER

      Pubricized:
    2022/09/05
      Vol:
    E106-A No:3
      Page(s):
    504-513

    In this paper, we propose a real-time vibration extraction system, which extracts vibration component within a given frequency range from videos in real time, for realizing tremor suppression used in microsurgery assistance systems. To overcome the problems in our previous system based on the mean Lucas-Kanade (LK) optical flow of the whole frame, we have introduced a new architecture combining dense optical flow calculated with simple feature matching and block-based band-pass filtering using band-limited multiple Fourier linear combiner (BMFLC). As a feature of optical flow calculation, we use the simplified rotation-invariant histogram of oriented gradients (RIHOG) based on a gradient angle quantized to 1, 2, or 3 bits, which greatly reduces the usage of memory resources for a frame buffer. An obtained optical flow map is then divided into multiple blocks, and BMFLC is applied to the mean optical flow of each block independently. By using the L1-norm of adaptive weight vectors in BMFLC as a criterion, blocks belonging to vibrating objects can be isolated from background at low cost, leading to better extraction accuracy compared to the previous system. The whole system for 480p and 720p resolutions can be implemented on a single Xilinx Zynq-7000 XC7Z020 FPGA without any external memory, and can process a video stream supplied directly from a camera at 60fps.

  • An eFPGA Generation Suite with Customizable Architecture and IDE

    Morihiro KUGA  Qian ZHAO  Yuya NAKAZATO  Motoki AMAGASAKI  Masahiro IIDA  

     
    PAPER

      Pubricized:
    2022/10/07
      Vol:
    E106-A No:3
      Page(s):
    560-574

    From edge devices to cloud servers, providing optimized hardware acceleration for specific applications has become a key approach to improve the efficiency of computer systems. Traditionally, many systems employ commercial field-programmable gate arrays (FPGAs) to implement dedicated hardware accelerator as the CPU's co-processor. However, commercial FPGAs are designed in generic architectures and are provided in the form of discrete chips, which makes it difficult to meet increasingly diversified market needs, such as balancing reconfigurable hardware resources for a specific application, or to be integrated into a customer's system-on-a-chip (SoC) in the form of embedded FPGA (eFPGA). In this paper, we propose an eFPGA generation suite with customizable architecture and integrated development environment (IDE), which covers the entire eFPGA design generation, testing, and utilization stages. For the eFPGA design generation, our intellectual property (IP) generation flow can explore the optimal logic cell, routing, and array structures for given target applications. For the testability, we employ a previously proposed shipping test method that is 100% accurate at detecting all stuck-at faults in the entire FPGA-IP. In addition, we propose a user-friendly and customizable Web-based IDE framework for the generated eFPGA based on the NODE-RED development framework. In the case study, we show an eFPGA architecture exploration example for a differential privacy encryption application using the proposed suite. Then we show the implementation and evaluation of the eFPGA prototype with a 55nm test element group chip design.

  • The Implementation of a Hybrid Router and Dynamic Switching Algorithm on a Multi-FPGA System

    Tomoki SHIMIZU  Kohei ITO  Kensuke IIZUKA  Kazuei HIRONAKA  Hideharu AMANO  

     
    PAPER

      Pubricized:
    2022/06/30
      Vol:
    E105-D No:12
      Page(s):
    2008-2018

    The multi-FPGA system known as, the Flow-in-Cloud (FiC) system, is composed of mid-range FPGAs that are directly interconnected by high-speed serial links. FiC is currently being developed as a server for multi-access edge computing (MEC), which is one of the core technologies of 5G. Because the applications of MEC are sometimes timing-critical, a static time division multiplexing (STDM) network has been used on FiC. However, the STDM network exhibits the disadvantage of decreasing link utilization, especially under light traffic. To solve this problem, we propose a hybrid router that combines packet switching for low-priority communication and STDM for high-priority communication. In our hybrid network, the packet switching uses slots that are unused by the STDM; therefore, best-effort communication by packet switching and QoS guarantee communication by the STDM can be used simultaneously. Furthermore, to improve each link utilization under a low network traffic load, we propose a dynamic communication switching algorithm. In our algorithm, each router monitors the network load metrics, and according to the metrics, timing-critical tasks select the STDM according to the metrics only when congestion occurs. This can achieve both QoS guarantee and efficient utilization of each link with a small resource overhead. In our evaluation, the dynamic algorithm was up to 24.6% faster on the execution time with a high network load compared to the packet switching on a real multi-FPGA system with 24 boards.

  • A Novel Fixed-Point Conversion Methodology For Digital Signal Processing Systems

    Phuong T.K. DINH  Linh T.T. DINH  Tung T. TRAN  Lam S. PHAM  Han Le DUC  Chi P. HOANG  Minh D. NGUYEN  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2022/06/17
      Vol:
    E105-A No:12
      Page(s):
    1537-1550

    Recently, most signal processing algorithms have been developed with floating-point arithmetic, while the fixed-point arithmetic is more popular with most commercial devices and low-power real-time applications which are implemented on embedded/ASIC/FPGA systems. Therefore, the optimal Floating-point to Fixed-point Conversion (FFC) methodology is a promising solution. In this paper, we propose the FFC consisting of signal grouping technique and simulation-based word length optimization. In order to evaluate the performance of the proposed technique, simulations are carried out and hardware co-simulation on Field Programmable Gate Arrays (FPGAs) platform have been applied to complex Digital Signal Processing (DSP) algorithms: Linear Time Invariant (LTI) systems, multi-mode Fast Fourier Transform (FFT) circuit for IEEE 802.11 ax WLAN Devices and the calibration algorithm of gain and clock skew in Time-Interleaved ADC (TI-ADC) using Adaptive Noise Canceller (ANC). The results show that the proposed technique can reduce the hardware cost about 30% while being able to maintain its speed and reliability.

1-20hit(329hit)