High Performance Application Specific Stream Architecture for Hardware Acceleration of HOG-SVM on FPGA

Piyumal RANAWAKA; Mongkol EKPANYAPONG; Adriano TAVARES; Mathew DAILEY; Krit ATHIKULWONGSE; Vitor SILVA

doi:10.1587/transfun.E102.A.1792

High Performance Application Specific Stream Architecture for Hardware Acceleration of HOG-SVM on FPGA

Piyumal RANAWAKA, Mongkol EKPANYAPONG, Adriano TAVARES, Mathew DAILEY, Krit ATHIKULWONGSE, Vitor SILVA

Full Text Views

0

Cite this

Summary :

Conventional sequential processing on software with a general purpose CPU has become significantly insufficient for certain heavy computations due to the high demand of processing power to deliver adequate throughput and performance. Due to many reasons a high degree of interest could be noted for high performance real time video processing on embedded systems. However, embedded processing platforms with limited performance could least cater the processing demand of several such intensive computations in computer vision domain. Therefore, hardware acceleration could be noted as an ideal solution where process intensive computations could be accelerated using application specific hardware integrated with a general purpose CPU. In this research we have focused on building a parallelized high performance application specific architecture for such a hardware accelerator for HOG-SVM computation implemented on Zynq 7000 FPGA. Histogram of Oriented Gradients (HOG) technique combined with a Support Vector Machine (SVM) based classifier is versatile and extremely popular in computer vision domain in contrast to high demand for processing power. Due to the popularity and versatility, various previous research have attempted on obtaining adequate throughput on HOG-SVM. This research with a high throughput of 240FPS on single scale on VGA frames of size 640x480 out performs the best case performance on a single scale of previous research by approximately a factor of 3-4. Further it's an approximately 15x speed up over the GPU accelerated software version with the same accuracy. This research has explored the possibility of using a novel architecture based on deep pipelining, parallel processing and BRAM structures for achieving high performance on the HOG-SVM computation. Further the above developed (video processing unit) VPU which acts as a hardware accelerator will be integrated as a co-processing peripheral to a host CPU using a novel custom accelerator structure with on chip buses in a System-On-Chip (SoC) fashion. This could be used to offload the heavy video stream processing redundant computations to the VPU whereas the processing power of the CPU could be preserved for running light weight applications. This research mainly focuses on the architectural techniques used to achieve higher performance on the hardware accelerator and on the novel accelerator structure used to integrate the accelerator with the host CPU.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E102-A No.12 pp.1792-1803

Publication Date: 2019/12/01

Publicized

Online ISSN: 1745-1337

DOI: 10.1587/transfun.E102.A.1792

Type of Manuscript: Special Section PAPER (Special Section on VLSI Design and CAD Algorithms)

Category

Authors

Piyumal RANAWAKA
  the University of Moratuwa
Mongkol EKPANYAPONG
  Asian Institute of Technology
Adriano TAVARES
  University of Minho
Mathew DAILEY
  Asian Institute of Technology
Krit ATHIKULWONGSE
  National Science and Technology Development Agency
Vitor SILVA
  University of Minho

Keyword

application specific architecture, hardware acceleration, pipelining, real-time HOG-SVM

Cite this

Copy

Piyumal RANAWAKA, Mongkol EKPANYAPONG, Adriano TAVARES, Mathew DAILEY, Krit ATHIKULWONGSE, Vitor SILVA, "High Performance Application Specific Stream Architecture for Hardware Acceleration of HOG-SVM on FPGA" in IEICE TRANSACTIONS on Fundamentals, vol. E102-A, no. 12, pp. 1792-1803, December 2019, doi: 10.1587/transfun.E102.A.1792.
Abstract: Conventional sequential processing on software with a general purpose CPU has become significantly insufficient for certain heavy computations due to the high demand of processing power to deliver adequate throughput and performance. Due to many reasons a high degree of interest could be noted for high performance real time video processing on embedded systems. However, embedded processing platforms with limited performance could least cater the processing demand of several such intensive computations in computer vision domain. Therefore, hardware acceleration could be noted as an ideal solution where process intensive computations could be accelerated using application specific hardware integrated with a general purpose CPU. In this research we have focused on building a parallelized high performance application specific architecture for such a hardware accelerator for HOG-SVM computation implemented on Zynq 7000 FPGA. Histogram of Oriented Gradients (HOG) technique combined with a Support Vector Machine (SVM) based classifier is versatile and extremely popular in computer vision domain in contrast to high demand for processing power. Due to the popularity and versatility, various previous research have attempted on obtaining adequate throughput on HOG-SVM. This research with a high throughput of 240FPS on single scale on VGA frames of size 640x480 out performs the best case performance on a single scale of previous research by approximately a factor of 3-4. Further it's an approximately 15x speed up over the GPU accelerated software version with the same accuracy. This research has explored the possibility of using a novel architecture based on deep pipelining, parallel processing and BRAM structures for achieving high performance on the HOG-SVM computation. Further the above developed (video processing unit) VPU which acts as a hardware accelerator will be integrated as a co-processing peripheral to a host CPU using a novel custom accelerator structure with on chip buses in a System-On-Chip (SoC) fashion. This could be used to offload the heavy video stream processing redundant computations to the VPU whereas the processing power of the CPU could be preserved for running light weight applications. This research mainly focuses on the architectural techniques used to achieve higher performance on the hardware accelerator and on the novel accelerator structure used to integrate the accelerator with the host CPU.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E102.A.1792/_p

Copy

@ARTICLE{e102-a_12_1792,
author={Piyumal RANAWAKA, Mongkol EKPANYAPONG, Adriano TAVARES, Mathew DAILEY, Krit ATHIKULWONGSE, Vitor SILVA, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={High Performance Application Specific Stream Architecture for Hardware Acceleration of HOG-SVM on FPGA},
year={2019},
volume={E102-A},
number={12},
pages={1792-1803},
abstract={Conventional sequential processing on software with a general purpose CPU has become significantly insufficient for certain heavy computations due to the high demand of processing power to deliver adequate throughput and performance. Due to many reasons a high degree of interest could be noted for high performance real time video processing on embedded systems. However, embedded processing platforms with limited performance could least cater the processing demand of several such intensive computations in computer vision domain. Therefore, hardware acceleration could be noted as an ideal solution where process intensive computations could be accelerated using application specific hardware integrated with a general purpose CPU. In this research we have focused on building a parallelized high performance application specific architecture for such a hardware accelerator for HOG-SVM computation implemented on Zynq 7000 FPGA. Histogram of Oriented Gradients (HOG) technique combined with a Support Vector Machine (SVM) based classifier is versatile and extremely popular in computer vision domain in contrast to high demand for processing power. Due to the popularity and versatility, various previous research have attempted on obtaining adequate throughput on HOG-SVM. This research with a high throughput of 240FPS on single scale on VGA frames of size 640x480 out performs the best case performance on a single scale of previous research by approximately a factor of 3-4. Further it's an approximately 15x speed up over the GPU accelerated software version with the same accuracy. This research has explored the possibility of using a novel architecture based on deep pipelining, parallel processing and BRAM structures for achieving high performance on the HOG-SVM computation. Further the above developed (video processing unit) VPU which acts as a hardware accelerator will be integrated as a co-processing peripheral to a host CPU using a novel custom accelerator structure with on chip buses in a System-On-Chip (SoC) fashion. This could be used to offload the heavy video stream processing redundant computations to the VPU whereas the processing power of the CPU could be preserved for running light weight applications. This research mainly focuses on the architectural techniques used to achieve higher performance on the hardware accelerator and on the novel accelerator structure used to integrate the accelerator with the host CPU.},
keywords={},
doi={10.1587/transfun.E102.A.1792},
ISSN={1745-1337},
month={December},}

Copy

TY - JOUR
TI - High Performance Application Specific Stream Architecture for Hardware Acceleration of HOG-SVM on FPGA
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1792
EP - 1803
AU - Piyumal RANAWAKA
AU - Mongkol EKPANYAPONG
AU - Adriano TAVARES
AU - Mathew DAILEY
AU - Krit ATHIKULWONGSE
AU - Vitor SILVA
PY - 2019
DO - 10.1587/transfun.E102.A.1792
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E102-A
IS - 12
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - December 2019
AB - Conventional sequential processing on software with a general purpose CPU has become significantly insufficient for certain heavy computations due to the high demand of processing power to deliver adequate throughput and performance. Due to many reasons a high degree of interest could be noted for high performance real time video processing on embedded systems. However, embedded processing platforms with limited performance could least cater the processing demand of several such intensive computations in computer vision domain. Therefore, hardware acceleration could be noted as an ideal solution where process intensive computations could be accelerated using application specific hardware integrated with a general purpose CPU. In this research we have focused on building a parallelized high performance application specific architecture for such a hardware accelerator for HOG-SVM computation implemented on Zynq 7000 FPGA. Histogram of Oriented Gradients (HOG) technique combined with a Support Vector Machine (SVM) based classifier is versatile and extremely popular in computer vision domain in contrast to high demand for processing power. Due to the popularity and versatility, various previous research have attempted on obtaining adequate throughput on HOG-SVM. This research with a high throughput of 240FPS on single scale on VGA frames of size 640x480 out performs the best case performance on a single scale of previous research by approximately a factor of 3-4. Further it's an approximately 15x speed up over the GPU accelerated software version with the same accuracy. This research has explored the possibility of using a novel architecture based on deep pipelining, parallel processing and BRAM structures for achieving high performance on the HOG-SVM computation. Further the above developed (video processing unit) VPU which acts as a hardware accelerator will be integrated as a co-processing peripheral to a host CPU using a novel custom accelerator structure with on chip buses in a System-On-Chip (SoC) fashion. This could be used to offload the heavy video stream processing redundant computations to the VPU whereas the processing power of the CPU could be preserved for running light weight applications. This research mainly focuses on the architectural techniques used to achieve higher performance on the hardware accelerator and on the novel accelerator structure used to integrate the accelerator with the host CPU.
ER -