Real-Time Hardware Implementation of a Sound Recognition System with In-Field Learning

Mauricio KUGLER; Teemu TOSSAVAINEN; Miku NAKATSU; Susumu KUROYANAGI; Akira IWATA

doi:10.1587/transinf.2015EDP7432

IEICE TRANSACTIONS on Information

Real-Time Hardware Implementation of a Sound Recognition System with In-Field Learning

Mauricio KUGLER, Teemu TOSSAVAINEN, Miku NAKATSU, Susumu KUROYANAGI, Akira IWATA

Full Text Views

0

Cite this

Summary :

The development of assistive devices for automated sound recognition is an important field of research and has been receiving increased attention. However, there are still very few methods specifically developed for identifying environmental sounds. The majority of the existing approaches try to adapt speech recognition techniques for the task, usually incurring high computational complexity. This paper proposes a sound recognition method dedicated to environmental sounds, designed with its main focus on embedded applications. The pre-processing stage is loosely based on the human hearing system, while a robust set of binary features permits a simple k-NN classifier to be used. This gives the system the capability of in-field learning, by which new sounds can be simply added to the reference set in real-time, greatly improving its usability. The system was implemented in an FPGA based platform, developed in-house specifically for this application. The design of the proposed method took into consideration several restrictions imposed by the hardware, such as limited computing power and memory, and supports up to 12 reference sounds of around 5.3 s each. Experimental results were performed in a database of 29 sounds. Sensitivity and specificity were evaluated over several random subsets of these signals. The obtained values for sensitivity and specificity, without additional noise, were, respectively, 0.957 and 0.918. With the addition of +6 dB of pink noise, sensitivity and specificity were 0.822 and 0.942, respectively. The in-field learning strategy presented no significant change in sensitivity and a total decrease of 5.4% in specificity when progressively increasing the number of reference sounds from 1 to 9 under noisy conditions. The minimal signal-to-noise ration required by the prototype to correctly recognize sounds was between -8 dB and 3 dB. These results show that the proposed method and implementation have great potential for several real life applications.

Publication: IEICE TRANSACTIONS on Information Vol.E99-D No.7 pp.1885-1894

Publication Date: 2016/07/01

Publicized: 2016/03/30

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2015EDP7432

Type of Manuscript: PAPER

Category: Speech and Hearing

Authors

Mauricio KUGLER
  Nagoya Institute of Technology
Teemu TOSSAVAINEN
  Aalto University
Miku NAKATSU
  Nagoya Institute of Technology
Susumu KUROYANAGI
  Nagoya Institute of Technology
Akira IWATA
  Nagoya Institute of Technology

Keyword

environmental sound recognition, binary features, field-programmable gate arrays, in-field learning

Cite this

Copy

Mauricio KUGLER, Teemu TOSSAVAINEN, Miku NAKATSU, Susumu KUROYANAGI, Akira IWATA, "Real-Time Hardware Implementation of a Sound Recognition System with In-Field Learning" in IEICE TRANSACTIONS on Information, vol. E99-D, no. 7, pp. 1885-1894, July 2016, doi: 10.1587/transinf.2015EDP7432.
Abstract: The development of assistive devices for automated sound recognition is an important field of research and has been receiving increased attention. However, there are still very few methods specifically developed for identifying environmental sounds. The majority of the existing approaches try to adapt speech recognition techniques for the task, usually incurring high computational complexity. This paper proposes a sound recognition method dedicated to environmental sounds, designed with its main focus on embedded applications. The pre-processing stage is loosely based on the human hearing system, while a robust set of binary features permits a simple k-NN classifier to be used. This gives the system the capability of in-field learning, by which new sounds can be simply added to the reference set in real-time, greatly improving its usability. The system was implemented in an FPGA based platform, developed in-house specifically for this application. The design of the proposed method took into consideration several restrictions imposed by the hardware, such as limited computing power and memory, and supports up to 12 reference sounds of around 5.3 s each. Experimental results were performed in a database of 29 sounds. Sensitivity and specificity were evaluated over several random subsets of these signals. The obtained values for sensitivity and specificity, without additional noise, were, respectively, 0.957 and 0.918. With the addition of +6 dB of pink noise, sensitivity and specificity were 0.822 and 0.942, respectively. The in-field learning strategy presented no significant change in sensitivity and a total decrease of 5.4% in specificity when progressively increasing the number of reference sounds from 1 to 9 under noisy conditions. The minimal signal-to-noise ration required by the prototype to correctly recognize sounds was between -8 dB and 3 dB. These results show that the proposed method and implementation have great potential for several real life applications.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2015EDP7432/_p

Copy

@ARTICLE{e99-d_7_1885,
author={Mauricio KUGLER, Teemu TOSSAVAINEN, Miku NAKATSU, Susumu KUROYANAGI, Akira IWATA, },
journal={IEICE TRANSACTIONS on Information},
title={Real-Time Hardware Implementation of a Sound Recognition System with In-Field Learning},
year={2016},
volume={E99-D},
number={7},
pages={1885-1894},
abstract={The development of assistive devices for automated sound recognition is an important field of research and has been receiving increased attention. However, there are still very few methods specifically developed for identifying environmental sounds. The majority of the existing approaches try to adapt speech recognition techniques for the task, usually incurring high computational complexity. This paper proposes a sound recognition method dedicated to environmental sounds, designed with its main focus on embedded applications. The pre-processing stage is loosely based on the human hearing system, while a robust set of binary features permits a simple k-NN classifier to be used. This gives the system the capability of in-field learning, by which new sounds can be simply added to the reference set in real-time, greatly improving its usability. The system was implemented in an FPGA based platform, developed in-house specifically for this application. The design of the proposed method took into consideration several restrictions imposed by the hardware, such as limited computing power and memory, and supports up to 12 reference sounds of around 5.3 s each. Experimental results were performed in a database of 29 sounds. Sensitivity and specificity were evaluated over several random subsets of these signals. The obtained values for sensitivity and specificity, without additional noise, were, respectively, 0.957 and 0.918. With the addition of +6 dB of pink noise, sensitivity and specificity were 0.822 and 0.942, respectively. The in-field learning strategy presented no significant change in sensitivity and a total decrease of 5.4% in specificity when progressively increasing the number of reference sounds from 1 to 9 under noisy conditions. The minimal signal-to-noise ration required by the prototype to correctly recognize sounds was between -8 dB and 3 dB. These results show that the proposed method and implementation have great potential for several real life applications.},
keywords={},
doi={10.1587/transinf.2015EDP7432},
ISSN={1745-1361},
month={July},}

Copy

TY - JOUR
TI - Real-Time Hardware Implementation of a Sound Recognition System with In-Field Learning
T2 - IEICE TRANSACTIONS on Information
SP - 1885
EP - 1894
AU - Mauricio KUGLER
AU - Teemu TOSSAVAINEN
AU - Miku NAKATSU
AU - Susumu KUROYANAGI
AU - Akira IWATA
PY - 2016
DO - 10.1587/transinf.2015EDP7432
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2016
AB - The development of assistive devices for automated sound recognition is an important field of research and has been receiving increased attention. However, there are still very few methods specifically developed for identifying environmental sounds. The majority of the existing approaches try to adapt speech recognition techniques for the task, usually incurring high computational complexity. This paper proposes a sound recognition method dedicated to environmental sounds, designed with its main focus on embedded applications. The pre-processing stage is loosely based on the human hearing system, while a robust set of binary features permits a simple k-NN classifier to be used. This gives the system the capability of in-field learning, by which new sounds can be simply added to the reference set in real-time, greatly improving its usability. The system was implemented in an FPGA based platform, developed in-house specifically for this application. The design of the proposed method took into consideration several restrictions imposed by the hardware, such as limited computing power and memory, and supports up to 12 reference sounds of around 5.3 s each. Experimental results were performed in a database of 29 sounds. Sensitivity and specificity were evaluated over several random subsets of these signals. The obtained values for sensitivity and specificity, without additional noise, were, respectively, 0.957 and 0.918. With the addition of +6 dB of pink noise, sensitivity and specificity were 0.822 and 0.942, respectively. The in-field learning strategy presented no significant change in sensitivity and a total decrease of 5.4% in specificity when progressively increasing the number of reference sounds from 1 to 9 under noisy conditions. The minimal signal-to-noise ration required by the prototype to correctly recognize sounds was between -8 dB and 3 dB. These results show that the proposed method and implementation have great potential for several real life applications.
ER -

IEICE TRANSACTIONS on Information