Deep Learning Approaches for Pathological Voice Detection Using Heterogeneous Parameters

JiYeoun LEE; Hee-Jin CHOI

doi:10.1587/transinf.2020EDL8031

IEICE TRANSACTIONS on Information

Deep Learning Approaches for Pathological Voice Detection Using Heterogeneous Parameters

JiYeoun LEE, Hee-Jin CHOI

Full Text Views

0

Cite this

Summary :

We propose a deep learning-based model for classifying pathological voices using a convolutional neural network and a feedforward neural network. The model uses combinations of heterogeneous parameters, including mel-frequency cepstral coefficients, linear predictive cepstral coefficients and higher-order statistics. We validate the accuracy of this model using the Massachusetts Eye and Ear Infirmary (MEEI) voice disorder database and the Saarbruecken Voice Database (SVD). Our model achieved an accuracy of 99.3% for MEEI and 75.18% for SVD. This model achieved an accuracy that is 7.18% higher than that of competitive models in previous studies.

Publication: IEICE TRANSACTIONS on Information Vol.E103-D No.8 pp.1920-1923

Publication Date: 2020/08/01

Publicized: 2020/05/14

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2020EDL8031

Type of Manuscript: LETTER

Category: Speech and Hearing

Authors

JiYeoun LEE
Jungwon University
Hee-Jin CHOI
KAIST

Keyword

pathological voice detection, feedforward neural network, convolutional neural network, higher-order statistics, deep learning method

Cite this

Copy

JiYeoun LEE, Hee-Jin CHOI, "Deep Learning Approaches for Pathological Voice Detection Using Heterogeneous Parameters" in IEICE TRANSACTIONS on Information, vol. E103-D, no. 8, pp. 1920-1923, August 2020, doi: 10.1587/transinf.2020EDL8031.
Abstract: We propose a deep learning-based model for classifying pathological voices using a convolutional neural network and a feedforward neural network. The model uses combinations of heterogeneous parameters, including mel-frequency cepstral coefficients, linear predictive cepstral coefficients and higher-order statistics. We validate the accuracy of this model using the Massachusetts Eye and Ear Infirmary (MEEI) voice disorder database and the Saarbruecken Voice Database (SVD). Our model achieved an accuracy of 99.3% for MEEI and 75.18% for SVD. This model achieved an accuracy that is 7.18% higher than that of competitive models in previous studies.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020EDL8031/_p

Copy

@ARTICLE{e103-d_8_1920,
author={JiYeoun LEE, Hee-Jin CHOI, },
journal={IEICE TRANSACTIONS on Information},
title={Deep Learning Approaches for Pathological Voice Detection Using Heterogeneous Parameters},
year={2020},
volume={E103-D},
number={8},
pages={1920-1923},
abstract={We propose a deep learning-based model for classifying pathological voices using a convolutional neural network and a feedforward neural network. The model uses combinations of heterogeneous parameters, including mel-frequency cepstral coefficients, linear predictive cepstral coefficients and higher-order statistics. We validate the accuracy of this model using the Massachusetts Eye and Ear Infirmary (MEEI) voice disorder database and the Saarbruecken Voice Database (SVD). Our model achieved an accuracy of 99.3% for MEEI and 75.18% for SVD. This model achieved an accuracy that is 7.18% higher than that of competitive models in previous studies.},
keywords={},
doi={10.1587/transinf.2020EDL8031},
ISSN={1745-1361},
month={August},}

Copy

TY - JOUR
TI - Deep Learning Approaches for Pathological Voice Detection Using Heterogeneous Parameters
T2 - IEICE TRANSACTIONS on Information
SP - 1920
EP - 1923
AU - JiYeoun LEE
AU - Hee-Jin CHOI
PY - 2020
DO - 10.1587/transinf.2020EDL8031
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2020
AB - We propose a deep learning-based model for classifying pathological voices using a convolutional neural network and a feedforward neural network. The model uses combinations of heterogeneous parameters, including mel-frequency cepstral coefficients, linear predictive cepstral coefficients and higher-order statistics. We validate the accuracy of this model using the Massachusetts Eye and Ear Infirmary (MEEI) voice disorder database and the Saarbruecken Voice Database (SVD). Our model achieved an accuracy of 99.3% for MEEI and 75.18% for SVD. This model achieved an accuracy that is 7.18% higher than that of competitive models in previous studies.
ER -

IEICE TRANSACTIONS on Information