In this paper, we propose integration of multimodal features using conditional random fields (CRFs) for the segmentation of broadcast news stories. We study story boundary cues from lexical, audio and video modalities, where lexical features consist of lexical similarity, chain strength and overall cohesiveness; acoustic features involve pause duration, pitch, speaker change and audio event type; and visual features contain shot boundaries, anchor faces and news title captions. These features are extracted in a sequence of boundary candidate positions in the broadcast news. A linear-chain CRF is used to detect each candidate as boundary/non-boundary tags based on the multimodal features. Important interlabel relations and contextual feature information are effectively captured by the sequential learning framework of CRFs. Story segmentation experiments show that the CRF approach outperforms other popular classifiers, including decision trees (DTs), Bayesian networks (BNs), naive Bayesian classifiers (NBs), multilayer perception (MLP), support vector machines (SVMs) and maximum entropy (ME) classifiers.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Xiaoxuan WANG, Lei XIE, Mimi LU, Bin MA, Eng Siong CHNG, Haizhou LI, "Broadcast News Story Segmentation Using Conditional Random Fields and Multimodal Features" in IEICE TRANSACTIONS on Information,
vol. E95-D, no. 5, pp. 1206-1215, May 2012, doi: 10.1587/transinf.E95.D.1206.
Abstract: In this paper, we propose integration of multimodal features using conditional random fields (CRFs) for the segmentation of broadcast news stories. We study story boundary cues from lexical, audio and video modalities, where lexical features consist of lexical similarity, chain strength and overall cohesiveness; acoustic features involve pause duration, pitch, speaker change and audio event type; and visual features contain shot boundaries, anchor faces and news title captions. These features are extracted in a sequence of boundary candidate positions in the broadcast news. A linear-chain CRF is used to detect each candidate as boundary/non-boundary tags based on the multimodal features. Important interlabel relations and contextual feature information are effectively captured by the sequential learning framework of CRFs. Story segmentation experiments show that the CRF approach outperforms other popular classifiers, including decision trees (DTs), Bayesian networks (BNs), naive Bayesian classifiers (NBs), multilayer perception (MLP), support vector machines (SVMs) and maximum entropy (ME) classifiers.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E95.D.1206/_p
Copy
@ARTICLE{e95-d_5_1206,
author={Xiaoxuan WANG, Lei XIE, Mimi LU, Bin MA, Eng Siong CHNG, Haizhou LI, },
journal={IEICE TRANSACTIONS on Information},
title={Broadcast News Story Segmentation Using Conditional Random Fields and Multimodal Features},
year={2012},
volume={E95-D},
number={5},
pages={1206-1215},
abstract={In this paper, we propose integration of multimodal features using conditional random fields (CRFs) for the segmentation of broadcast news stories. We study story boundary cues from lexical, audio and video modalities, where lexical features consist of lexical similarity, chain strength and overall cohesiveness; acoustic features involve pause duration, pitch, speaker change and audio event type; and visual features contain shot boundaries, anchor faces and news title captions. These features are extracted in a sequence of boundary candidate positions in the broadcast news. A linear-chain CRF is used to detect each candidate as boundary/non-boundary tags based on the multimodal features. Important interlabel relations and contextual feature information are effectively captured by the sequential learning framework of CRFs. Story segmentation experiments show that the CRF approach outperforms other popular classifiers, including decision trees (DTs), Bayesian networks (BNs), naive Bayesian classifiers (NBs), multilayer perception (MLP), support vector machines (SVMs) and maximum entropy (ME) classifiers.},
keywords={},
doi={10.1587/transinf.E95.D.1206},
ISSN={1745-1361},
month={May},}
Copy
TY - JOUR
TI - Broadcast News Story Segmentation Using Conditional Random Fields and Multimodal Features
T2 - IEICE TRANSACTIONS on Information
SP - 1206
EP - 1215
AU - Xiaoxuan WANG
AU - Lei XIE
AU - Mimi LU
AU - Bin MA
AU - Eng Siong CHNG
AU - Haizhou LI
PY - 2012
DO - 10.1587/transinf.E95.D.1206
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E95-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2012
AB - In this paper, we propose integration of multimodal features using conditional random fields (CRFs) for the segmentation of broadcast news stories. We study story boundary cues from lexical, audio and video modalities, where lexical features consist of lexical similarity, chain strength and overall cohesiveness; acoustic features involve pause duration, pitch, speaker change and audio event type; and visual features contain shot boundaries, anchor faces and news title captions. These features are extracted in a sequence of boundary candidate positions in the broadcast news. A linear-chain CRF is used to detect each candidate as boundary/non-boundary tags based on the multimodal features. Important interlabel relations and contextual feature information are effectively captured by the sequential learning framework of CRFs. Story segmentation experiments show that the CRF approach outperforms other popular classifiers, including decision trees (DTs), Bayesian networks (BNs), naive Bayesian classifiers (NBs), multilayer perception (MLP), support vector machines (SVMs) and maximum entropy (ME) classifiers.
ER -