The search functionality is under construction.

The search functionality is under construction.

Under the framework of traditional power spectrum based feature extraction, in order to extract more discriminative information for playback attack detection, this paper proposes a feature by making use of deep neural network to describe the nonlinear relationship between power spectrum and discriminative information. Namely, constant-Q deep coefficients (CQDC). It relies on constant-Q transform, deep neural network and discrete cosine transform. In which, constant-Q transform is used to convert signal from the time domain into the frequency domain because it is a long-term transform that can provide more frequency detail, deep neural network is used to extract more discriminative information to discriminate playback speech from genuine speech and discrete cosine transform is used to decorrelate among the feature dimensions. ASVspoof 2017 corpus version 2.0 is used to evaluate the performance of CQDC. The experimental results show that CQDC outperforms the existing power spectrum obtained from constant-Q transform based features, and equal error can reduce from 19.18% to 51.56%. In addition, we found that discriminative information of CQDC hides in all frequency bins, which is different from commonly used features.

- Publication
- IEICE TRANSACTIONS on Information Vol.E103-D No.2 pp.464-468

- Publication Date
- 2020/02/01

- Publicized
- 2019/11/14

- Online ISSN
- 1745-1361

- DOI
- 10.1587/transinf.2019EDL8115

- Type of Manuscript
- LETTER

- Category
- Speech and Hearing

Jichen YANG

National University of Singapore

Longting XU

Donghua University

Bo REN

Microsoft Search Technology Center Asia

The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.

Copy

Jichen YANG, Longting XU, Bo REN, "Constant-Q Deep Coefficients for Playback Attack Detection" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 2, pp. 464-468, February 2020, doi: 10.1587/transinf.2019EDL8115.

Abstract: Under the framework of traditional power spectrum based feature extraction, in order to extract more discriminative information for playback attack detection, this paper proposes a feature by making use of deep neural network to describe the nonlinear relationship between power spectrum and discriminative information. Namely, constant-Q deep coefficients (CQDC). It relies on constant-Q transform, deep neural network and discrete cosine transform. In which, constant-Q transform is used to convert signal from the time domain into the frequency domain because it is a long-term transform that can provide more frequency detail, deep neural network is used to extract more discriminative information to discriminate playback speech from genuine speech and discrete cosine transform is used to decorrelate among the feature dimensions. ASVspoof 2017 corpus version 2.0 is used to evaluate the performance of CQDC. The experimental results show that CQDC outperforms the existing power spectrum obtained from constant-Q transform based features, and equal error can reduce from 19.18% to 51.56%. In addition, we found that discriminative information of CQDC hides in all frequency bins, which is different from commonly used features.

URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDL8115/_p

Copy

@ARTICLE{e103-d_2_464,

author={Jichen YANG, Longting XU, Bo REN, },

journal={IEICE TRANSACTIONS on Information},

title={Constant-Q Deep Coefficients for Playback Attack Detection},

year={2020},

volume={E103-D},

number={2},

pages={464-468},

abstract={Under the framework of traditional power spectrum based feature extraction, in order to extract more discriminative information for playback attack detection, this paper proposes a feature by making use of deep neural network to describe the nonlinear relationship between power spectrum and discriminative information. Namely, constant-Q deep coefficients (CQDC). It relies on constant-Q transform, deep neural network and discrete cosine transform. In which, constant-Q transform is used to convert signal from the time domain into the frequency domain because it is a long-term transform that can provide more frequency detail, deep neural network is used to extract more discriminative information to discriminate playback speech from genuine speech and discrete cosine transform is used to decorrelate among the feature dimensions. ASVspoof 2017 corpus version 2.0 is used to evaluate the performance of CQDC. The experimental results show that CQDC outperforms the existing power spectrum obtained from constant-Q transform based features, and equal error can reduce from 19.18% to 51.56%. In addition, we found that discriminative information of CQDC hides in all frequency bins, which is different from commonly used features.},

keywords={},

doi={10.1587/transinf.2019EDL8115},

ISSN={1745-1361},

month={February},}

Copy

TY - JOUR

TI - Constant-Q Deep Coefficients for Playback Attack Detection

T2 - IEICE TRANSACTIONS on Information

SP - 464

EP - 468

AU - Jichen YANG

AU - Longting XU

AU - Bo REN

PY - 2020

DO - 10.1587/transinf.2019EDL8115

JO - IEICE TRANSACTIONS on Information

SN - 1745-1361

VL - E103-D

IS - 2

JA - IEICE TRANSACTIONS on Information

Y1 - February 2020

AB - Under the framework of traditional power spectrum based feature extraction, in order to extract more discriminative information for playback attack detection, this paper proposes a feature by making use of deep neural network to describe the nonlinear relationship between power spectrum and discriminative information. Namely, constant-Q deep coefficients (CQDC). It relies on constant-Q transform, deep neural network and discrete cosine transform. In which, constant-Q transform is used to convert signal from the time domain into the frequency domain because it is a long-term transform that can provide more frequency detail, deep neural network is used to extract more discriminative information to discriminate playback speech from genuine speech and discrete cosine transform is used to decorrelate among the feature dimensions. ASVspoof 2017 corpus version 2.0 is used to evaluate the performance of CQDC. The experimental results show that CQDC outperforms the existing power spectrum obtained from constant-Q transform based features, and equal error can reduce from 19.18% to 51.56%. In addition, we found that discriminative information of CQDC hides in all frequency bins, which is different from commonly used features.

ER -