IEICE global.ieice.org Site

Keyword Search Result

[Keyword] fpga(330hit)

61-80hit(330hit)

Dither NN: Hardware/Algorithm Co-Design for Accurate Quantized Neural Networks
Kota ANDO Kodai UEYOSHI Yuka OBA Kazutoshi HIROSE Ryota UEMATSU Takumi KUDO Masayuki IKEBE Tetsuya ASAI Shinya TAKAMAEDA-YAMAZAKI Masato MOTOMURA

PAPER-Computer System

Pubricized:
2019/07/22
Vol:
E102-D No:12
Page(s):
2341-2353
Deep neural network (NN) has been widely accepted for enabling various AI applications, however, the limitation of computational and memory resources is a major problem on mobile devices. Quantized NN with a reduced bit precision is an effective solution, which relaxes the resource requirements, but the accuracy degradation due to its numerical approximation is another problem. We propose a novel quantized NN model employing the “dithering” technique to improve the accuracy with the minimal additional hardware requirement at the view point of the hardware-algorithm co-designing. Dithering distributes the quantization error occurring at each pixel (neuron) spatially so that the total information loss of the plane would be minimized. The experiment we conducted using the software-based accuracy evaluation and FPGA-based hardware resource estimation proved the effectiveness and efficiency of the concept of an NN model with dithering.
Exploiting Packet-Level Parallelism of Packet Parsing for FPGA-Based Switches
Junnan LI Biao HAN Zhigang SUN Tao LI Xiaoyan WANG

PAPER-Transmission Systems and Transmission Equipment for Communications

Pubricized:
2019/03/18
Vol:
E102-B No:9
Page(s):
1862-1874
FPGA-based switches are appealing nowadays due to the balance between hardware performance and software flexibility. Packet parser, as the foundational component of FPGA-based switches, is to identify and extract specific fields used in forwarding decisions, e.g., destination IP address. However, traditional parsers are too rigid to accommodate new protocols. In addition, FPGAs usually have a much lower clock frequency and fewer hardware resources, compared to ASICs. In this paper, we present PLANET, a programmable packet-level parallel parsing architecture for FPGA-based switches, to overcome these two limitations. First, PLANET has flexible programmability of updating parsing algorithms at run-time. Second, PLANET highly exploits parallelism inside packet parsing to compensate FPGA's low clock frequency and reduces resource consumption with one-block recycling design. We implemented PLANET on an FPGA-based switch prototype with well-integrated datacenter protocols. Evaluation results show that our design can parse packets at up to 100 Gbps, as well as maintain a relative low parsing latency and fewer hardware resources than existing proposals.
An Architecture for Real-Time Retinex-Based Image Enhancement and Haze Removal and Its FPGA Implementation Open Access
Dabwitso KASAUKA Kenta SUGIYAMA Hiroshi TSUTSUI Hiroyuki OKUHATA Yoshikazu MIYANAGA

PAPER

Vol:
E102-A No:6
Page(s):
775-782
In recent years, much research interest has developed in image enhancement and haze removal techniques. With increasing demand for real time enhancement and haze removal, the need for efficient architecture incorporating both haze removal and enhancement is necessary. In this paper, we propose an architecture supporting both real-time Retinex-based image enhancement and haze removal, using a single module. Efficiently leveraging the similarity between Retinex-based image enhancement and haze removal algorithms, we have successfully proposed an architecture supporting both using a single module. The implementation results reveal that just 1% logic circuits overhead is required to support Retinex-based image enhancement in single mode and haze removal based on Retinex model. This reduction in computation complexity by using a single module reduces the processing and memory implications especially in mobile consumer electronics, as opposed to implementing them individually using different modules. Furthermore, we utilize image enhancement for transmission map estimation instead of soft matting, thereby avoiding further computation complexity which would affect our goal of realizing high frame-rate real time processing. Our FPGA implementation, operating at an optimum frequency of 125MHz with 5.67M total block memory bit size, supports WUXGA (1,920×1,200) 60fps as well as 1080p60 color input. Our proposed design is competitive with existing state-of-the-art designs. Our proposal is tailored to enhance consumer electronic such as on-board cameras, active surveillance intrusion detection systems, autonomous cars, mobile streaming systems and robotics with low processing and memory requirements.
A 3Gbps/Lane MIPI D-PHY Transmission Buffer Chip
Pil-Ho LEE Young-Chan JANG

LETTER

Vol:
E102-A No:6
Page(s):
783-787
A 3Gbps/lane transmission buffer chip including a high-speed mode detector is proposed for a field-programmable gate array (FPGA)-based frame generator supporting the mobile industry processor interface (MIPI) D-PHY version 1.2. It performs 1-to-3 repeat while buffering low voltage differential signaling (LVDS) or scalable low voltage signaling (SLVS) to SLVS.
RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks
Cheng LUO Wei CAO Lingli WANG Philip H. W. LEONG

PAPER-Applications

Pubricized:
2019/02/19
Vol:
E102-D No:5
Page(s):
1037-1045
With the continuous refinement of Deep Neural Networks (DNNs), a series of deep and complex networks such as Residual Networks (ResNets) show impressive prediction accuracy in image classification tasks. Unfortunately, the structural complexity and computational cost of residual networks make hardware implementation difficult. In this paper, we present the quantized and reconstructed deep neural network (QR-DNN) technique, which first inserts batch normalization (BN) layers in the network during training, and later removes them to facilitate efficient hardware implementation. Moreover, an accurate and efficient residual network accelerator (RNA) is presented based on QR-DNN with batch-normalization-free structures and weights represented in a logarithmic number system. RNA employs a systolic array architecture to perform shift-and-accumulate operations instead of multiplication operations. QR-DNN is shown to achieve a 1∼2% improvement in accuracy over existing techniques, and RNA over previous best fixed-point accelerators. An FPGA implementation on a Xilinx Zynq XC7Z045 device achieves 804.03 GOPS, 104.15 FPS and 91.41% top-5 accuracy for the ResNet-50 benchmark, and state-of-the-art results are also reported for AlexNet and VGG.
VHDL Design of a SpaceFibre Routing Switch Open Access
Alessandro LEONI Pietro NANNIPIERI Luca FANUCCI

LETTER-VLSI Design Technology and CAD

Vol:
E102-A No:5
Page(s):
729-731
The technology advancement of satellite instruments requires increasingly fast interconnection technologies, for which no standardised solution exists. SpaceFibre is the forthcoming protocol promising to overcome the limitation of its predecessor SpaceWire, offering data-rate higher than 1Gbps. However, while several implementations of the SpaceFibre IP already exist, its Network Layer is still at experimental level. This article describes the architecture of an implemented SpaceFibre Routing Switch and provides synthesis results for common FPGAs.
GUINNESS: A GUI Based Binarized Deep Neural Network Framework for Software Programmers
Hiroki NAKAHARA Haruyoshi YONEKAWA Tomoya FUJII Masayuki SHIMODA Shimpei SATO

PAPER-Design Tools

Pubricized:
2019/02/27
Vol:
E102-D No:5
Page(s):
1003-1011
The GUINNESS (GUI based binarized neural network synthesizer) is an open-source tool flow for a binarized deep neural network toward FPGA implementation based on the GUI including both the training on the GPU and inference on the FPGA. Since all the operation is done on the GUI, the software designer is not necessary to write any scripts to design the neural network structure, training behavior, only specify the values for hyperparameters. After finishing the training, it automatically generates C++ codes to synthesis the bit-stream using the Xilinx SDSoC system design tool flow. Thus, our tool flow is suitable for the software programmers who are not familiar with the FPGA design. In our tool flow, we modify the training algorithms both the training and the inference for a binarized CNN hardware. Since the hardware has a limited number of bit precision, it lacks minimal bias in training. Also, for the inference on the hardware, the conventional batch normalization technique requires additional hardware. Our modifications solve these problems. We implemented the VGG-11 benchmark CNN on the Digilent Inc. Zedboard. Compared with the conventional binarized implementations on an FPGA, the classification accuracy was almost the same, the performance per power efficiency is 5.1 times better, as for the performance per area efficiency, it is 8.0 times better, and as for the performance per memory, it is 8.2 times better. We compare the proposed FPGA design with the CPU and the GPU designs. Compared with the ARM Cortex-A57, it was 1776.3 times faster, it dissipated 3.0 times lower power, and its performance per power efficiency was 5706.3 times better. Also, compared with the Maxwell GPU, it was 11.5 times faster, it dissipated 7.3 times lower power, and its performance per power efficiency was 83.0 times better. The disadvantage of our FPGA based design requires additional time to synthesize the FPGA executable codes. From the experiment, it consumed more three hours, and the total FPGA design took 75 hours. Since the training of the CNN is dominant, it is considerable.
Automatic Generation Tool of FPGA Components for Robots Open Access
Takeshi OHKAWA Kazushi YAMASHINA Takuya MATSUMOTO Kanemitsu OOTSU Takashi YOKOTA

PAPER-Design Tools

Pubricized:
2019/03/01
Vol:
E102-D No:5
Page(s):
1012-1019
In order to realize intelligent robot system, it is required to process large amount of data input from complex and different kinds of sensors in a short time. FPGA is expected to improve process performance of robots due to better performance per power consumption than high performance CPU, but it has lower development productivity than software. In this paper, we discuss automatic generation of FPGA components for robots. A design tool, developed for easy integration of FPGA into robots, is proposed. The tool named cReComp can automatically convert circuit written in Verilog HDL into a software component compliant to a robot software framework ROS (Robot Operation System), which is the standard in robot development. To evaluate its productivity, we conducted a subject experiment. As a result, we confirmed that the automatic generation is effective to ease the development of FPGA components for robots.
Scalability Analysis of Deeply Pipelined Tsunami Simulation with Multiple FPGAs Open Access
Antoniette MONDIGO Tomohiro UENO Kentaro SANO Hiroyuki TAKIZAWA

PAPER-Applications

Pubricized:
2019/02/05
Vol:
E102-D No:5
Page(s):
1029-1036
Since the hardware resource of a single FPGA is limited, one idea to scale the performance of FPGA-based HPC applications is to expand the design space with multiple FPGAs. This paper presents a scalable architecture of a deeply pipelined stream computing platform, where available parallelism and inter-FPGA link characteristics are investigated to achieve a scaled performance. For a practical exploration of this vast design space, a performance model is presented and verified with the evaluation of a tsunami simulation application implemented on Intel Arria 10 FPGAs. Finally, scalability analysis is performed, where speedup is achieved when increasing the computing pipeline over multiple FPGAs while maintaining the problem size of computation. Performance is scaled with multiple FPGAs; however, performance degradation occurs with insufficient available bandwidth and large pipeline overhead brought by inadequate data stream size. Tsunami simulation results show that the highest scaled performance for 8 cascaded Arria 10 FPGAs is achieved with a single pipeline of 5 stream processing elements (SPEs), which obtained a scaled performance of 2.5 TFlops and a parallel efficiency of 98%, indicating the strong scalability of the multi-FPGA stream computing platform.
Power Efficient Object Detector with an Event-Driven Camera for Moving Object Surveillance on an FPGA
Masayuki SHIMODA Shimpei SATO Hiroki NAKAHARA

PAPER-Applications

Pubricized:
2019/02/27
Vol:
E102-D No:5
Page(s):
1020-1028
We propose an object detector using a sliding window method for an event-driven camera which outputs a subtracted frame (usually a binary value) when changes are detected in captured images. Since sliding window skips unchanged portions of the output, the number of target object area candidates decreases dramatically, which means that our system operates faster and with lower power consumption than a system using a straightforward sliding window approach. Since the event-driven camera output consists of binary precision frames, an all binarized convolutional neural network (ABCNN) can be available, which means that it allows all convolutional layers to share the same binarized convolutional circuit, thereby reducing the area requirement. We implemented our proposed method on the Xilinx Inc. Zedboard and then evaluated it using the PETS 2009 dataset. The results showed that our system outperformed BCNN system from the viewpoint of detection performance, hardware requirement, and computation time. Also, we showed that FPGA is an ideal method for our system than mobile GPU. From these results, our proposed system is more suitable for the embedded systems based on stationary cameras (such as security cameras).
Wideband Radar Frequency Measurement Receiver Based on FPGA without Mixer Open Access
Xinqun LIU Yingxiao ZHAO

LETTER-Computer System

Pubricized:
2019/01/18
Vol:
E102-D No:4
Page(s):
859-862
In this letter, a flexible and compatible with fine resolution radar frequency measurement receiver is designed. The receiver is implemented on the platform of Virtex-5 Field Programmable Grid Array (FPGA) from Xilinx. The Digital Down Conversion (DDC) without mixer based on polyphase filter has been successfully introduced in this receiver to obtain lower speed data flow and better resolution. This receiver can adapt to more modulation types and higher density of pulse flow, up to 200000 pulses per second. The measurement results indicate that the receiver is capable of detecting radar pulse signal of 0.2us to 2.5ms width with a major frequency root mean square error (RMSE) within 0.44MHz. Moreover, the wider pulse width and the higher decimation rate of DDC result in better performance. This frequency measurement receiver has been successfully used in a spaceborne radar system.
Fingertip-Size Optical Module, “Optical I/O Core”, and Its Application in FPGA Open Access
Takahiro NAKAMURA Kenichiro YASHIKI Kenji MIZUTANI Takaaki NEDACHI Junichi FUJIKATA Masatoshi TOKUSHIMA Jun USHIDA Masataka NOGUCHI Daisuke OKAMOTO Yasuyuki SUZUKI Takanori SHIMIZU Koichi TAKEMURA Akio UKITA Yasuhiro IBUSUKI Mitsuru KURIHARA Keizo KINOSHITA Tsuyoshi HORIKAWA Hiroshi YAMAGUCHI Junichi TSUCHIDA Yasuhiko HAGIHARA Kazuhiko KURATA

INVITED PAPER

Vol:
E102-C No:4
Page(s):
333-339
Optical I/O core based on silicon photonics technology and optical/electrical assembly was developed as a fingertip-size optical module with high bandwidth density, low power consumption, and high temperature operation. The advantages of the optical I/O core, including hybrid integration of quantum dot laser diode and optical pin, allow us to achieve 300-m transmission at 25Gbps per channel when optical I/O core is mounted around field-programmable gate array without clock data recovery.
FPGA Implementation of a Real-Time Super-Resolution System Using Flips and an RNS-Based CNN
Taito MANABE Yuichiro SHIBATA Kiyoshi OGURI

PAPER

Vol:
E101-A No:12
Page(s):
2280-2289
The super-resolution technology is one of the solutions to fill the gap between high-resolution displays and lower-resolution images. There are various algorithms to interpolate the lost information, one of which is using a convolutional neural network (CNN). This paper shows an FPGA implementation and a performance evaluation of a novel CNN-based super-resolution system, which can process moving images in real time. We apply horizontal and vertical flips to input images instead of enlargement. This flip method prevents information loss and enables the network to make the best use of its patch size. In addition, we adopted the residue number system (RNS) in the network to reduce FPGA resource utilization. Efficient multiplication and addition with LUTs increased a network scale that can be implemented on the same FPGA by approximately 54% compared to an implementation with fixed-point operations. The proposed system can perform super-resolution from 960×540 to 1920×1080 at 60fps with a latency of less than 1ms. Despite resource restriction of the FPGA, the system can generate clear super-resolution images with smooth edges. The evaluation results also revealed the superior quality in terms of the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM) index, compared to systems with other methods.
Lightweight Security Hardware Architecture Using DWT and AES Algorithms
Ignacio ALGREDO-BADILLO Francisco R. CASTILLO-SORIA Kelsey A. RAMÍREZ-GUTIÉRREZ Luis MORALES-ROSALES Alejandro MEDINA-SANTIAGO Claudia FEREGRINO-URIBE

PAPER-Information Network

Pubricized:
2018/08/02
Vol:
E101-D No:11
Page(s):
2754-2761
The great increase of the digital communications, where the technological society depends on applications, devices and networks, the security problems motivate different researches for providing algorithms and systems resistant to attacks, and these lasts need of services of confidentiality, authentication, integrity, etc. This paper proposes the hardware implementation of an steganographic/cryptographic algorithm, which is based on the DWT (Discrete Wavelet Transform) and the AES (Advanced Encryption Standard) cipher algorithm in CBC mode. The proposed scheme takes advantage of a double-security ciphertext, which makes difficult to identify and decipher it. The hardware architecture reports a high efficiency (182.2 bps/slice and 85.2 bps/LUT) and low hardware resources consumption (867 slices and 1853 LUTs), where several parallel implementations can improve the throughout (0.162 Mbps) for processing large amounts of data.
Proxy Responses by FPGA-Based Switch for MapReduce Stragglers
Koya MITSUZUKA Michihiro KOIBUCHI Hideharu AMANO Hiroki MATSUTANI

PAPER-Computer System

Pubricized:
2018/06/15
Vol:
E101-D No:9
Page(s):
2258-2268
In parallel processing applications, a few worker nodes called “stragglers”, which execute their tasks significantly slower than other tasks, increase the execution time of the job. In this paper, we propose a network switch based straggler handling system to mitigate the burden of the compute nodes. We also propose how to offload detecting stragglers and computing their results in the network switch with no additional communications between worker nodes. We introduce some approximate techniques for the proxy computation and response at the switch; thus our switch is called “ApproxSW.” As a result of a simulation experiment, the proposed approximation based on task similarity achieves the best accuracy in terms of quality of generated Map outputs. We also analyze how to suppress unnecessary proxy computation by the ApproxSW. We implement ApproxSW on NetFPGA-SUME board that has four 10Gbit Ethernet (10GbE) interfaces and a Virtex-7 FPGA. Experimental results shows that the ApproxSW functions do not degrade the original 10GbE switch performance.
Analysis and Implementation of a QoS Optimization Method for Access Networks
Ling ZHENG Zhiliang QIU Weitao PAN Yibo MEI Shiyong SUN Zhiyi ZHANG

PAPER-Network System

Pubricized:
2018/03/14
Vol:
E101-B No:9
Page(s):
1949-1960
High-performance Network Over Coax, or HINOC for short, is a broadband access technology that can achieve bidirectional transmission for high-speed Internet service through a coaxial medium. In HINOC access networks, buffer management scheme can improve the fairness of buffer usage among different output ports and the overall loss performance. To provide different services to multiple priority classes while reducing the overall packet loss rate and ensuring fairness among the output ports, this study proposes a QoS optimization method for access networks. A backpressure-based queue threshold control scheme is used to minimize the weighted average packet loss rate among multiple priorities. A theoretical analysis is performed to examine the performance of the proposed scheme, and optimal system parameters are provided. Software simulation shows that the proposed method can improve the average packet loss rate by about 20% to 40% compared with existing buffer management schemes. Besides, FPGA evaluation reveals that the proposed method can be implemented in practical hardware and performs well in access networks.
Pixel Selection and Intensity Directed Symmetry for High Frame Rate and Ultra-Low Delay Matching System
Tingting HU Takeshi IKENAGA

PAPER-Machine Vision and its Applications

Pubricized:
2018/02/16
Vol:
E101-D No:5
Page(s):
1260-1269
High frame rate and ultra-low delay matching system plays an increasingly important role in human-machine interactive applications which call for higher frame rate and lower delay for a better experience. The large amount of processing data and the complex computation in a local feature based matching system, make it difficult to achieve a high process speed and ultra-low delay matching with limited resource. Aiming at a matching system with the process speed of more than 1000 fps and with the delay of less than 1 ms/frame, this paper puts forward a local binary feature based matching system with field-programmable gate array (FPGA). Pixel selection based 4-1-4 parallel matching and intensity directed symmetry are proposed for the implementation of this system. To design a basic framework with the high process speed and ultra-low delay using limited resource, pixel selection based 4-1-4 parallel matching is proposed, which makes it possible to use only one-thread resource consumption to achieve a four-thread processing. Assumes that the orientation of the keypoint will bisect the patch best and will point to the region with high intensity, intensity directed symmetry is proposed to calculate the keypoint orientation in a hardware friendly way, which is an important part for a rotation-robust matching system. Software experiment result shows that the proposed keypoint orientation calculation method achieves almost the same performance with the state-of-art intensity centroid orientation calculation method in a matching system. Hardware experiment result shows that the designed image process core supports to process VGA (640×480) videos at a process speed of 1306 fps and with a delay of 0.8083 ms/frame.
A Hardware-Based Caching System on FPGA NIC for Blockchain
Yuma SAKAKIBARA Shin MORISHIMA Kohei NAKAMURA Hiroki MATSUTANI

PAPER-Computer System

Pubricized:
2018/02/02
Vol:
E101-D No:5
Page(s):
1350-1360
Engineers and researchers have recently paid attention to Blockchain. Blockchain is a fault-tolerant distributed ledger without administrators. Blockchain is originally derived from cryptocurrency, but it is possible to be applied to other industries. Transferring digital asset is called a transaction. Blockchain holds all transactions, so the total amount of Blockchain data will increase as time proceeds. On the other hand, the number of Internet of Things (IoT) products has been increasing. It is difficult for IoT products to hold all Blockchain data because of their storage capacity. Therefore, they access Blockchain data via servers that have Blockchain data. However, if a lot of IoT products access Blockchain network via servers, server overloads will occur. Thus, it is useful to reduce workloads and improve throughput. In this paper, we propose a caching technique using a Field Programmable Gate Array-based (FPGA) Network Interface Card (NIC) which possesses four 10Gigabit Ethernet (10GbE) interfaces. The proposed system can reduce server overloads, because the FPGA NIC instead of the server responds to requests from IoT products if cache hits. We implemented the proposed hardware cache to achieve high throughput on NetFPGA-10G board. We counted the number of requests that the server or the FPGA NIC processed as an evaluation. As a result, the throughput improved by on average 1.97 times when hitting the cache.
Enabling FPGA-as-a-Service in the Cloud with hCODE Platform
Qian ZHAO Motoki AMAGASAKI Masahiro IIDA Morihiro KUGA Toshinori SUEYOSHI

PAPER-Design Methodology and Platform

Pubricized:
2017/11/17
Vol:
E101-D No:2
Page(s):
335-343
Major cloud service providers, including Amazon and Microsoft, have started employing field-programmable gate arrays (FPGAs) to build high-performance and low-power-consumption cloud capability. However, utilizing an FPGA-enabled cloud is still challenging because of two main reasons. First, the introduction of software and hardware co-design leads to high development complexity. Second, FPGA virtualization and accelerator scheduling techniques are not fully researched for cluster deployment. In this paper, we propose an open-source FPGA-as-a-service (FaaS) platform, the hCODE, to simplify the design, management and deployment of FPGA accelerators at cluster scale. The proposed platform implements a Shell-and-IP design pattern and an open accelerator repository to reduce design and management costs of FPGA projects. Efficient FPGA virtualization and accelerator scheduling techniques are proposed to deploy accelerators on the FPGA-enabled cluster easily. With the proposed hCODE, hardware designers and accelerator users can be organized on one platform to efficiently build open-hardware ecosystem.
Development of an Evaluation Platform and Performance Experimentation of Flex Power FPGA Device
Toshihiro KATASHITA Masakazu HIOKI Yohei HORI Hanpei KOIKE

PAPER-Device and Architecture

Pubricized:
2017/11/17
Vol:
E101-D No:2
Page(s):
303-313
Field-programmable gate array (FPGA) devices are applied for accelerating specific calculations and reducing power consumption in a wide range of areas. One of the challenges associated with FPGAs is reducing static power for enforcing their power effectiveness. We propose a method involving fine-grained reconfiguration of body biases of logic and net resources to reduce the static power of FPGA devices. In addition, we develop an FPGA device called Flex Power FPGA with SOTB technology and demonstrate its power reduction function with a 32-bit counter circuit. In this paper, we describe the construction of an experimental platform to precisely evaluate power consumption and the maximum operating frequency of the device under various operating voltages and body biases with various practical circuits. Using the abovementioned platform, we evaluate the Flex Power FPGA chip at operating voltages of 0.5-1.0 V and at body biases of 0.0-0.5 V. In the evaluation, we use a 32-bit adder, 16-bit multiplier, and an SBOX circuit for AES cryptography. We operate the chip virtually with uniformed body bias voltage to drive all of the logic resources with the same threshold voltage. We demonstrate the advantage of the Flex Power FPGA by comparing its performance with non-reconfigurable biasing.

61-80hit(330hit)

Keyword Search Result

[Keyword] fpga(330hit)

Dither NN: Hardware/Algorithm Co-Design for Accurate Quantized Neural Networks

Exploiting Packet-Level Parallelism of Packet Parsing for FPGA-Based Switches

An Architecture for Real-Time Retinex-Based Image Enhancement and Haze Removal and Its FPGA Implementation Open Access

A 3Gbps/Lane MIPI D-PHY Transmission Buffer Chip

RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks

VHDL Design of a SpaceFibre Routing Switch Open Access

GUINNESS: A GUI Based Binarized Deep Neural Network Framework for Software Programmers

Automatic Generation Tool of FPGA Components for Robots Open Access

Scalability Analysis of Deeply Pipelined Tsunami Simulation with Multiple FPGAs Open Access

Power Efficient Object Detector with an Event-Driven Camera for Moving Object Surveillance on an FPGA

Wideband Radar Frequency Measurement Receiver Based on FPGA without Mixer Open Access

Fingertip-Size Optical Module, “Optical I/O Core”, and Its Application in FPGA Open Access

FPGA Implementation of a Real-Time Super-Resolution System Using Flips and an RNS-Based CNN

Lightweight Security Hardware Architecture Using DWT and AES Algorithms

Proxy Responses by FPGA-Based Switch for MapReduce Stragglers

Analysis and Implementation of a QoS Optimization Method for Access Networks

Pixel Selection and Intensity Directed Symmetry for High Frame Rate and Ultra-Low Delay Matching System

A Hardware-Based Caching System on FPGA NIC for Blockchain

Enabling FPGA-as-a-Service in the Cloud with hCODE Platform

Development of an Evaluation Platform and Performance Experimentation of Flex Power FPGA Device

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles