Finding important regions is essential for applications, such as content-aware video compression and video retargeting to automatically crop a region in a video for small screens. Since people are one of main subjects when taking a video, some methods for finding important regions use a visual attention model based on face/pedestrian detection to incorporate the knowledge that people are important. However, such methods usually do not distinguish important people from passers-by and bystanders, which results in false positives. In this paper, we propose a deep neural network (DNN)-based method, which classifies a person into important or unimportant, given a video containing multiple people in a single frame and captured with a hand-held camera. Intuitively, important/unimportant labels are highly correlated given that corresponding people's spatial motions are similar. Based on this assumption, we propose to boost the performance of our important/unimportant classification by using conditional random fields (CRFs) built upon the DNN, which can be trained in an end-to-end manner. Our experimental results show that our method successfully classifies important people and the use of a DNN with CRFs improves the accuracy.
Mayu OTANI
Nara Institute of Science and Technology
Atsushi NISHIDA
Dai Nippon Printing Co., Ltd.
Yuta NAKASHIMA
Osaka University
Tomokazu SATO
Shiga University
Naokazu YOKOYA
Nara Institute of Science and Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Mayu OTANI, Atsushi NISHIDA, Yuta NAKASHIMA, Tomokazu SATO, Naokazu YOKOYA, "Finding Important People in a Video Using Deep Neural Networks with Conditional Random Fields" in IEICE TRANSACTIONS on Information,
vol. E101-D, no. 10, pp. 2509-2517, October 2018, doi: 10.1587/transinf.2018EDP7029.
Abstract: Finding important regions is essential for applications, such as content-aware video compression and video retargeting to automatically crop a region in a video for small screens. Since people are one of main subjects when taking a video, some methods for finding important regions use a visual attention model based on face/pedestrian detection to incorporate the knowledge that people are important. However, such methods usually do not distinguish important people from passers-by and bystanders, which results in false positives. In this paper, we propose a deep neural network (DNN)-based method, which classifies a person into important or unimportant, given a video containing multiple people in a single frame and captured with a hand-held camera. Intuitively, important/unimportant labels are highly correlated given that corresponding people's spatial motions are similar. Based on this assumption, we propose to boost the performance of our important/unimportant classification by using conditional random fields (CRFs) built upon the DNN, which can be trained in an end-to-end manner. Our experimental results show that our method successfully classifies important people and the use of a DNN with CRFs improves the accuracy.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDP7029/_p
Copy
@ARTICLE{e101-d_10_2509,
author={Mayu OTANI, Atsushi NISHIDA, Yuta NAKASHIMA, Tomokazu SATO, Naokazu YOKOYA, },
journal={IEICE TRANSACTIONS on Information},
title={Finding Important People in a Video Using Deep Neural Networks with Conditional Random Fields},
year={2018},
volume={E101-D},
number={10},
pages={2509-2517},
abstract={Finding important regions is essential for applications, such as content-aware video compression and video retargeting to automatically crop a region in a video for small screens. Since people are one of main subjects when taking a video, some methods for finding important regions use a visual attention model based on face/pedestrian detection to incorporate the knowledge that people are important. However, such methods usually do not distinguish important people from passers-by and bystanders, which results in false positives. In this paper, we propose a deep neural network (DNN)-based method, which classifies a person into important or unimportant, given a video containing multiple people in a single frame and captured with a hand-held camera. Intuitively, important/unimportant labels are highly correlated given that corresponding people's spatial motions are similar. Based on this assumption, we propose to boost the performance of our important/unimportant classification by using conditional random fields (CRFs) built upon the DNN, which can be trained in an end-to-end manner. Our experimental results show that our method successfully classifies important people and the use of a DNN with CRFs improves the accuracy.},
keywords={},
doi={10.1587/transinf.2018EDP7029},
ISSN={1745-1361},
month={October},}
Copy
TY - JOUR
TI - Finding Important People in a Video Using Deep Neural Networks with Conditional Random Fields
T2 - IEICE TRANSACTIONS on Information
SP - 2509
EP - 2517
AU - Mayu OTANI
AU - Atsushi NISHIDA
AU - Yuta NAKASHIMA
AU - Tomokazu SATO
AU - Naokazu YOKOYA
PY - 2018
DO - 10.1587/transinf.2018EDP7029
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2018
AB - Finding important regions is essential for applications, such as content-aware video compression and video retargeting to automatically crop a region in a video for small screens. Since people are one of main subjects when taking a video, some methods for finding important regions use a visual attention model based on face/pedestrian detection to incorporate the knowledge that people are important. However, such methods usually do not distinguish important people from passers-by and bystanders, which results in false positives. In this paper, we propose a deep neural network (DNN)-based method, which classifies a person into important or unimportant, given a video containing multiple people in a single frame and captured with a hand-held camera. Intuitively, important/unimportant labels are highly correlated given that corresponding people's spatial motions are similar. Based on this assumption, we propose to boost the performance of our important/unimportant classification by using conditional random fields (CRFs) built upon the DNN, which can be trained in an end-to-end manner. Our experimental results show that our method successfully classifies important people and the use of a DNN with CRFs improves the accuracy.
ER -