Detection of Overlapping Speech in Meetings Using Support Vector Machines and Support Vector Regression

Kiyoshi YAMAMOTO; Futoshi ASANO; Takeshi YAMADA; Nobuhiko KITAWAKI

doi:10.1093/ietfec/e89-a.8.2158

Detection of Overlapping Speech in Meetings Using Support Vector Machines and Support Vector Regression

Kiyoshi YAMAMOTO, Futoshi ASANO, Takeshi YAMADA, Nobuhiko KITAWAKI

Full Text Views

0

Cite this

Summary :

In this paper, a method of detecting overlapping speech segments in meetings is proposed. It is known that the eigenvalue distribution of the spatial correlation matrix calculated from a multiple microphone input reflects information on the number and relative power of sound sources. However, in a reverberant sound field, the feature of the number of sources in the eigenvalue distribution is degraded by the room reverberation. In the Support Vector Machines approach, the eigenvalue distribution is classified into two classes (overlapping speech segments and single speech segments). In the Support Vector Regression approach, the relative power of sound sources is estimated by using the eigenvalue distribution, and overlapping speech segments are detected based on the estimated relative power. The salient feature of this approach is that the sensitivity of detecting overlapping speech segments can be controlled simply by changing the threshold value of the relative power. The proposed method was evaluated using recorded data of an actual meeting.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E89-A No.8 pp.2158-2165

Publication Date: 2006/08/01

Publicized

Online ISSN: 1745-1337

DOI: 10.1093/ietfec/e89-a.8.2158

Type of Manuscript: PAPER

Category: Engineering Acoustics

Cite this

Copy

Kiyoshi YAMAMOTO, Futoshi ASANO, Takeshi YAMADA, Nobuhiko KITAWAKI, "Detection of Overlapping Speech in Meetings Using Support Vector Machines and Support Vector Regression" in IEICE TRANSACTIONS on Fundamentals, vol. E89-A, no. 8, pp. 2158-2165, August 2006, doi: 10.1093/ietfec/e89-a.8.2158.
Abstract: In this paper, a method of detecting overlapping speech segments in meetings is proposed. It is known that the eigenvalue distribution of the spatial correlation matrix calculated from a multiple microphone input reflects information on the number and relative power of sound sources. However, in a reverberant sound field, the feature of the number of sources in the eigenvalue distribution is degraded by the room reverberation. In the Support Vector Machines approach, the eigenvalue distribution is classified into two classes (overlapping speech segments and single speech segments). In the Support Vector Regression approach, the relative power of sound sources is estimated by using the eigenvalue distribution, and overlapping speech segments are detected based on the estimated relative power. The salient feature of this approach is that the sensitivity of detecting overlapping speech segments can be controlled simply by changing the threshold value of the relative power. The proposed method was evaluated using recorded data of an actual meeting.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1093/ietfec/e89-a.8.2158/_p

Copy

@ARTICLE{e89-a_8_2158,
author={Kiyoshi YAMAMOTO, Futoshi ASANO, Takeshi YAMADA, Nobuhiko KITAWAKI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Detection of Overlapping Speech in Meetings Using Support Vector Machines and Support Vector Regression},
year={2006},
volume={E89-A},
number={8},
pages={2158-2165},
abstract={In this paper, a method of detecting overlapping speech segments in meetings is proposed. It is known that the eigenvalue distribution of the spatial correlation matrix calculated from a multiple microphone input reflects information on the number and relative power of sound sources. However, in a reverberant sound field, the feature of the number of sources in the eigenvalue distribution is degraded by the room reverberation. In the Support Vector Machines approach, the eigenvalue distribution is classified into two classes (overlapping speech segments and single speech segments). In the Support Vector Regression approach, the relative power of sound sources is estimated by using the eigenvalue distribution, and overlapping speech segments are detected based on the estimated relative power. The salient feature of this approach is that the sensitivity of detecting overlapping speech segments can be controlled simply by changing the threshold value of the relative power. The proposed method was evaluated using recorded data of an actual meeting.},
keywords={},
doi={10.1093/ietfec/e89-a.8.2158},
ISSN={1745-1337},
month={August},}

Copy

TY - JOUR
TI - Detection of Overlapping Speech in Meetings Using Support Vector Machines and Support Vector Regression
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 2158
EP - 2165
AU - Kiyoshi YAMAMOTO
AU - Futoshi ASANO
AU - Takeshi YAMADA
AU - Nobuhiko KITAWAKI
PY - 2006
DO - 10.1093/ietfec/e89-a.8.2158
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E89-A
IS - 8
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - August 2006
AB - In this paper, a method of detecting overlapping speech segments in meetings is proposed. It is known that the eigenvalue distribution of the spatial correlation matrix calculated from a multiple microphone input reflects information on the number and relative power of sound sources. However, in a reverberant sound field, the feature of the number of sources in the eigenvalue distribution is degraded by the room reverberation. In the Support Vector Machines approach, the eigenvalue distribution is classified into two classes (overlapping speech segments and single speech segments). In the Support Vector Regression approach, the relative power of sound sources is estimated by using the eigenvalue distribution, and overlapping speech segments are detected based on the estimated relative power. The salient feature of this approach is that the sensitivity of detecting overlapping speech segments can be controlled simply by changing the threshold value of the relative power. The proposed method was evaluated using recorded data of an actual meeting.
ER -