This letter presents a novel approach for automatic multimodal affect recognition. The audio and visual channels provide complementary information for human affective states recognition, and we utilize Boltzmann zippers as model-level fusion to learn intrinsic correlations between the different modalities. We extract effective audio and visual feature streams with different time scales and feed them to two component Boltzmann chains respectively. Hidden units of the two chains are interconnected to form a Boltzmann zipper which can effectively avoid local energy minima during training. Second-order methods are applied to Boltzmann zippers to speed up learning and pruning process. Experimental results on audio-visual emotion data recorded by ourselves in Wizard of Oz scenarios and collected from the SEMAINE naturalistic database both demonstrate our approach is robust and outperforms the state-of-the-art methods.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Kun LU, Xin ZHANG, "Multimodal Affect Recognition Using Boltzmann Zippers" in IEICE TRANSACTIONS on Information,
vol. E96-D, no. 11, pp. 2496-2499, November 2013, doi: 10.1587/transinf.E96.D.2496.
Abstract: This letter presents a novel approach for automatic multimodal affect recognition. The audio and visual channels provide complementary information for human affective states recognition, and we utilize Boltzmann zippers as model-level fusion to learn intrinsic correlations between the different modalities. We extract effective audio and visual feature streams with different time scales and feed them to two component Boltzmann chains respectively. Hidden units of the two chains are interconnected to form a Boltzmann zipper which can effectively avoid local energy minima during training. Second-order methods are applied to Boltzmann zippers to speed up learning and pruning process. Experimental results on audio-visual emotion data recorded by ourselves in Wizard of Oz scenarios and collected from the SEMAINE naturalistic database both demonstrate our approach is robust and outperforms the state-of-the-art methods.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E96.D.2496/_p
Copy
@ARTICLE{e96-d_11_2496,
author={Kun LU, Xin ZHANG, },
journal={IEICE TRANSACTIONS on Information},
title={Multimodal Affect Recognition Using Boltzmann Zippers},
year={2013},
volume={E96-D},
number={11},
pages={2496-2499},
abstract={This letter presents a novel approach for automatic multimodal affect recognition. The audio and visual channels provide complementary information for human affective states recognition, and we utilize Boltzmann zippers as model-level fusion to learn intrinsic correlations between the different modalities. We extract effective audio and visual feature streams with different time scales and feed them to two component Boltzmann chains respectively. Hidden units of the two chains are interconnected to form a Boltzmann zipper which can effectively avoid local energy minima during training. Second-order methods are applied to Boltzmann zippers to speed up learning and pruning process. Experimental results on audio-visual emotion data recorded by ourselves in Wizard of Oz scenarios and collected from the SEMAINE naturalistic database both demonstrate our approach is robust and outperforms the state-of-the-art methods.},
keywords={},
doi={10.1587/transinf.E96.D.2496},
ISSN={1745-1361},
month={November},}
Copy
TY - JOUR
TI - Multimodal Affect Recognition Using Boltzmann Zippers
T2 - IEICE TRANSACTIONS on Information
SP - 2496
EP - 2499
AU - Kun LU
AU - Xin ZHANG
PY - 2013
DO - 10.1587/transinf.E96.D.2496
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E96-D
IS - 11
JA - IEICE TRANSACTIONS on Information
Y1 - November 2013
AB - This letter presents a novel approach for automatic multimodal affect recognition. The audio and visual channels provide complementary information for human affective states recognition, and we utilize Boltzmann zippers as model-level fusion to learn intrinsic correlations between the different modalities. We extract effective audio and visual feature streams with different time scales and feed them to two component Boltzmann chains respectively. Hidden units of the two chains are interconnected to form a Boltzmann zipper which can effectively avoid local energy minima during training. Second-order methods are applied to Boltzmann zippers to speed up learning and pruning process. Experimental results on audio-visual emotion data recorded by ourselves in Wizard of Oz scenarios and collected from the SEMAINE naturalistic database both demonstrate our approach is robust and outperforms the state-of-the-art methods.
ER -