VisualTextualRank: An Extension of VisualRank to Large-Scale Video Shot Extraction Exploiting Tag Co-occurrence

Nga H. DO; Keiji YANAI

doi:10.1587/transinf.2014EDP7106

IEICE TRANSACTIONS on Information

VisualTextualRank: An Extension of VisualRank to Large-Scale Video Shot Extraction Exploiting Tag Co-occurrence

Nga H. DO, Keiji YANAI

Full Text Views

0

Cite this

Summary :

In this paper, we propose a novel ranking method called VisualTextualRank which ranks media data according to the relevance between the data and specified keywords. We apply our method to the system of video shot ranking which aims to automatically obtain video shots corresponding to given action keywords from Web videos. The keywords can be any type of action such as “surfing wave” (sport action) or “brushing teeth” (daily activity). Top ranked video shots are expected to be relevant to the keywords. While our baseline exploits only visual features of the data, the proposed method employs both textual information (tags) and visual features. Our method is based on random walks over a bipartite graph to integrate visual information of video shots and tag information of Web videos effectively. Note that instead of treating the textual information as an additional feature for shot ranking, we explore the mutual reinforcement between shots and textual information of their corresponding videos to improve shot ranking. We validated our framework on a database which was used by the baseline. Experiments showed that our proposed ranking method, VisualTextualRank, improved significantly the performance of the system of video shot extraction over the baseline.

Publication: IEICE TRANSACTIONS on Information Vol.E98-D No.1 pp.166-172

Publication Date: 2015/01/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2014EDP7106

Type of Manuscript: PAPER

Category: Image Processing and Video Processing

Authors

Nga H. DO
University of Electro-Communications
Keiji YANAI
University of Electro-Communications

Keyword

shot ranking, tag co-occurence, visual features, bipartite graph

Cite this

Copy

Nga H. DO, Keiji YANAI, "VisualTextualRank: An Extension of VisualRank to Large-Scale Video Shot Extraction Exploiting Tag Co-occurrence" in IEICE TRANSACTIONS on Information, vol. E98-D, no. 1, pp. 166-172, January 2015, doi: 10.1587/transinf.2014EDP7106.
Abstract: In this paper, we propose a novel ranking method called VisualTextualRank which ranks media data according to the relevance between the data and specified keywords. We apply our method to the system of video shot ranking which aims to automatically obtain video shots corresponding to given action keywords from Web videos. The keywords can be any type of action such as “surfing wave” (sport action) or “brushing teeth” (daily activity). Top ranked video shots are expected to be relevant to the keywords. While our baseline exploits only visual features of the data, the proposed method employs both textual information (tags) and visual features. Our method is based on random walks over a bipartite graph to integrate visual information of video shots and tag information of Web videos effectively. Note that instead of treating the textual information as an additional feature for shot ranking, we explore the mutual reinforcement between shots and textual information of their corresponding videos to improve shot ranking. We validated our framework on a database which was used by the baseline. Experiments showed that our proposed ranking method, VisualTextualRank, improved significantly the performance of the system of video shot extraction over the baseline.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2014EDP7106/_p

Copy

@ARTICLE{e98-d_1_166,
author={Nga H. DO, Keiji YANAI, },
journal={IEICE TRANSACTIONS on Information},
title={VisualTextualRank: An Extension of VisualRank to Large-Scale Video Shot Extraction Exploiting Tag Co-occurrence},
year={2015},
volume={E98-D},
number={1},
pages={166-172},
abstract={In this paper, we propose a novel ranking method called VisualTextualRank which ranks media data according to the relevance between the data and specified keywords. We apply our method to the system of video shot ranking which aims to automatically obtain video shots corresponding to given action keywords from Web videos. The keywords can be any type of action such as “surfing wave” (sport action) or “brushing teeth” (daily activity). Top ranked video shots are expected to be relevant to the keywords. While our baseline exploits only visual features of the data, the proposed method employs both textual information (tags) and visual features. Our method is based on random walks over a bipartite graph to integrate visual information of video shots and tag information of Web videos effectively. Note that instead of treating the textual information as an additional feature for shot ranking, we explore the mutual reinforcement between shots and textual information of their corresponding videos to improve shot ranking. We validated our framework on a database which was used by the baseline. Experiments showed that our proposed ranking method, VisualTextualRank, improved significantly the performance of the system of video shot extraction over the baseline.},
keywords={},
doi={10.1587/transinf.2014EDP7106},
ISSN={1745-1361},
month={January},}

Copy

TY - JOUR
TI - VisualTextualRank: An Extension of VisualRank to Large-Scale Video Shot Extraction Exploiting Tag Co-occurrence
T2 - IEICE TRANSACTIONS on Information
SP - 166
EP - 172
AU - Nga H. DO
AU - Keiji YANAI
PY - 2015
DO - 10.1587/transinf.2014EDP7106
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E98-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2015
AB - In this paper, we propose a novel ranking method called VisualTextualRank which ranks media data according to the relevance between the data and specified keywords. We apply our method to the system of video shot ranking which aims to automatically obtain video shots corresponding to given action keywords from Web videos. The keywords can be any type of action such as “surfing wave” (sport action) or “brushing teeth” (daily activity). Top ranked video shots are expected to be relevant to the keywords. While our baseline exploits only visual features of the data, the proposed method employs both textual information (tags) and visual features. Our method is based on random walks over a bipartite graph to integrate visual information of video shots and tag information of Web videos effectively. Note that instead of treating the textual information as an additional feature for shot ranking, we explore the mutual reinforcement between shots and textual information of their corresponding videos to improve shot ranking. We validated our framework on a database which was used by the baseline. Experiments showed that our proposed ranking method, VisualTextualRank, improved significantly the performance of the system of video shot extraction over the baseline.
ER -

IEICE TRANSACTIONS on Information