In this paper, we propose an image to sound mapping method. This technique treats an image as a spectrogram and maps it to a sound by taking inverse FFT of the spectrogram. Amplitude spectra of a speech signal are embedded to the spectrogram to give speech intelligibility for the mapped sound. Specifically, we hold amplitude spectra of a speech signal with strong power and embed the image brightness in other frequency bands. Holding amplitude spectra of a speech signal with strong power preserves a speech spectral envelope and improves the speech quality of the mapped sound. The amplitude spectra of the mapped sound with weak power represent the image brightness, and then the image is successfully reconstructed from the mapped sound. Simulation results show that the proposed method achieves sufficient speech quality.
Yuya HOSODA
Osaka University
Arata KAWAMURA
Kyoto Sangyo University
Youji IIGUNI
Osaka University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yuya HOSODA, Arata KAWAMURA, Youji IIGUNI, "An Efficient Image to Sound Mapping Method Preserving Speech Spectral Envelope" in IEICE TRANSACTIONS on Fundamentals,
vol. E103-A, no. 3, pp. 629-630, March 2020, doi: 10.1587/transfun.2019EAL2139.
Abstract: In this paper, we propose an image to sound mapping method. This technique treats an image as a spectrogram and maps it to a sound by taking inverse FFT of the spectrogram. Amplitude spectra of a speech signal are embedded to the spectrogram to give speech intelligibility for the mapped sound. Specifically, we hold amplitude spectra of a speech signal with strong power and embed the image brightness in other frequency bands. Holding amplitude spectra of a speech signal with strong power preserves a speech spectral envelope and improves the speech quality of the mapped sound. The amplitude spectra of the mapped sound with weak power represent the image brightness, and then the image is successfully reconstructed from the mapped sound. Simulation results show that the proposed method achieves sufficient speech quality.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.2019EAL2139/_p
Copy
@ARTICLE{e103-a_3_629,
author={Yuya HOSODA, Arata KAWAMURA, Youji IIGUNI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={An Efficient Image to Sound Mapping Method Preserving Speech Spectral Envelope},
year={2020},
volume={E103-A},
number={3},
pages={629-630},
abstract={In this paper, we propose an image to sound mapping method. This technique treats an image as a spectrogram and maps it to a sound by taking inverse FFT of the spectrogram. Amplitude spectra of a speech signal are embedded to the spectrogram to give speech intelligibility for the mapped sound. Specifically, we hold amplitude spectra of a speech signal with strong power and embed the image brightness in other frequency bands. Holding amplitude spectra of a speech signal with strong power preserves a speech spectral envelope and improves the speech quality of the mapped sound. The amplitude spectra of the mapped sound with weak power represent the image brightness, and then the image is successfully reconstructed from the mapped sound. Simulation results show that the proposed method achieves sufficient speech quality.},
keywords={},
doi={10.1587/transfun.2019EAL2139},
ISSN={1745-1337},
month={March},}
Copy
TY - JOUR
TI - An Efficient Image to Sound Mapping Method Preserving Speech Spectral Envelope
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 629
EP - 630
AU - Yuya HOSODA
AU - Arata KAWAMURA
AU - Youji IIGUNI
PY - 2020
DO - 10.1587/transfun.2019EAL2139
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E103-A
IS - 3
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - March 2020
AB - In this paper, we propose an image to sound mapping method. This technique treats an image as a spectrogram and maps it to a sound by taking inverse FFT of the spectrogram. Amplitude spectra of a speech signal are embedded to the spectrogram to give speech intelligibility for the mapped sound. Specifically, we hold amplitude spectra of a speech signal with strong power and embed the image brightness in other frequency bands. Holding amplitude spectra of a speech signal with strong power preserves a speech spectral envelope and improves the speech quality of the mapped sound. The amplitude spectra of the mapped sound with weak power represent the image brightness, and then the image is successfully reconstructed from the mapped sound. Simulation results show that the proposed method achieves sufficient speech quality.
ER -