This study focuses on modeling the storytelling performance of the participants in a group conversation. Storytelling performance is one of the fundamental communication techniques for providing information and entertainment effectively to a listener. We present a multimodal analysis of the storytelling performance in a group conversation, as evaluated by external observers. A new multimodal data corpus is collected through this group storytelling task, which includes the participants' performance scores. We extract multimodal (verbal and nonverbal) features regarding storytellers and listeners from a manual description of spoken dialog and from various nonverbal patterns, including each participant's speaking turn, utterance prosody, head gesture, hand gesture, and head direction. We also extract multimodal co-occurrence features, such as head gestures, and interaction features, such as storyteller utterance overlapped with listener's backchannel. In the experiment, we modeled the relationship between the performance indices and the multimodal features using machine-learning techniques. Experimental results show that the highest accuracy (R2) is 0.299 for the total storytelling performance (sum of indices scores) obtained with a combination of verbal and nonverbal features in a regression task.
Shogo OKADA
Tokyo Institute of Technology
Mi HANG
Tokyo Institute of Technology
Katsumi NITTA
Tokyo Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Shogo OKADA, Mi HANG, Katsumi NITTA, "Predicting Performance of Collaborative Storytelling Using Multimodal Analysis" in IEICE TRANSACTIONS on Information,
vol. E99-D, no. 6, pp. 1462-1473, June 2016, doi: 10.1587/transinf.2015CBP0003.
Abstract: This study focuses on modeling the storytelling performance of the participants in a group conversation. Storytelling performance is one of the fundamental communication techniques for providing information and entertainment effectively to a listener. We present a multimodal analysis of the storytelling performance in a group conversation, as evaluated by external observers. A new multimodal data corpus is collected through this group storytelling task, which includes the participants' performance scores. We extract multimodal (verbal and nonverbal) features regarding storytellers and listeners from a manual description of spoken dialog and from various nonverbal patterns, including each participant's speaking turn, utterance prosody, head gesture, hand gesture, and head direction. We also extract multimodal co-occurrence features, such as head gestures, and interaction features, such as storyteller utterance overlapped with listener's backchannel. In the experiment, we modeled the relationship between the performance indices and the multimodal features using machine-learning techniques. Experimental results show that the highest accuracy (R2) is 0.299 for the total storytelling performance (sum of indices scores) obtained with a combination of verbal and nonverbal features in a regression task.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2015CBP0003/_p
Copy
@ARTICLE{e99-d_6_1462,
author={Shogo OKADA, Mi HANG, Katsumi NITTA, },
journal={IEICE TRANSACTIONS on Information},
title={Predicting Performance of Collaborative Storytelling Using Multimodal Analysis},
year={2016},
volume={E99-D},
number={6},
pages={1462-1473},
abstract={This study focuses on modeling the storytelling performance of the participants in a group conversation. Storytelling performance is one of the fundamental communication techniques for providing information and entertainment effectively to a listener. We present a multimodal analysis of the storytelling performance in a group conversation, as evaluated by external observers. A new multimodal data corpus is collected through this group storytelling task, which includes the participants' performance scores. We extract multimodal (verbal and nonverbal) features regarding storytellers and listeners from a manual description of spoken dialog and from various nonverbal patterns, including each participant's speaking turn, utterance prosody, head gesture, hand gesture, and head direction. We also extract multimodal co-occurrence features, such as head gestures, and interaction features, such as storyteller utterance overlapped with listener's backchannel. In the experiment, we modeled the relationship between the performance indices and the multimodal features using machine-learning techniques. Experimental results show that the highest accuracy (R2) is 0.299 for the total storytelling performance (sum of indices scores) obtained with a combination of verbal and nonverbal features in a regression task.},
keywords={},
doi={10.1587/transinf.2015CBP0003},
ISSN={1745-1361},
month={June},}
Copy
TY - JOUR
TI - Predicting Performance of Collaborative Storytelling Using Multimodal Analysis
T2 - IEICE TRANSACTIONS on Information
SP - 1462
EP - 1473
AU - Shogo OKADA
AU - Mi HANG
AU - Katsumi NITTA
PY - 2016
DO - 10.1587/transinf.2015CBP0003
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2016
AB - This study focuses on modeling the storytelling performance of the participants in a group conversation. Storytelling performance is one of the fundamental communication techniques for providing information and entertainment effectively to a listener. We present a multimodal analysis of the storytelling performance in a group conversation, as evaluated by external observers. A new multimodal data corpus is collected through this group storytelling task, which includes the participants' performance scores. We extract multimodal (verbal and nonverbal) features regarding storytellers and listeners from a manual description of spoken dialog and from various nonverbal patterns, including each participant's speaking turn, utterance prosody, head gesture, hand gesture, and head direction. We also extract multimodal co-occurrence features, such as head gestures, and interaction features, such as storyteller utterance overlapped with listener's backchannel. In the experiment, we modeled the relationship between the performance indices and the multimodal features using machine-learning techniques. Experimental results show that the highest accuracy (R2) is 0.299 for the total storytelling performance (sum of indices scores) obtained with a combination of verbal and nonverbal features in a regression task.
ER -