The search functionality is under construction.
The search functionality is under construction.

Temporal Ensemble SSDLite: Exploiting Temporal Correlation in Video for Accurate Object Detection

Lukas NAKAMURA, Hiromitsu AWANO

  • Full Text Views

    0

  • Cite this

Summary :

We propose “Temporal Ensemble SSDLite,” a new method for video object detection that boosts accuracy while maintaining detection speed and energy consumption. Object detection for video is becoming increasingly important as a core part of applications in robotics, autonomous driving and many other promising fields. Many of these applications require high accuracy and speed to be viable, but are used in compute and energy restricted environments. Therefore, new methods that increase the overall performance of video object detection i.e., accuracy and speed have to be developed. To increase accuracy we use ensemble, the machine learning method of combining predictions of multiple different models. The drawback of ensemble is the increased computational cost which is proportional to the number of models used. We overcome this deficit by deploying our ensemble temporally, meaning we inference with only a single model at each frame, cycling through our ensemble of models at each frame. Then, we combine the predictions for the last N frames where N is the number of models in our ensemble through non-max-suppression. This is possible because close frames in a video are extremely similar due to temporal correlation. As a result, we increase accuracy through the ensemble while only inferencing a single model at each frame and therefore keeping the detection speed. To evaluate the proposal, we measure the accuracy, detection speed and energy consumption on the Google Edge TPU, a machine learning inference accelerator, with the Imagenet VID dataset. Our results demonstrate an accuracy boost of up to 4.9% while maintaining real-time detection speed and an energy consumption of 181mJ per image.

Publication
IEICE TRANSACTIONS on Fundamentals Vol.E105-A No.7 pp.1082-1090
Publication Date
2022/07/01
Publicized
2022/01/18
Online ISSN
1745-1337
DOI
10.1587/transfun.2021EAP1068
Type of Manuscript
PAPER
Category
Vision

Authors

Lukas NAKAMURA
  Osaka University
Hiromitsu AWANO
  Kyoto University

Keyword