1-3hit |
Hiroaki KURABAYASHI Makoto OTANI Kazunori ITOH Masami HASHIMOTO Mizue KAYAMA
Binaural reproduction is one of the promising approaches to present a highly realistic virtual auditory space to a listener. Generally, binaural signals are reproduced using a set of headphones that leads to a simple implementation of such a system. In contrast, binaural signals can be presented to a listener using a technique called “transaural reproduction” which employs a few loudspeakers with crosstalk cancellation for compensating acoustic transmissions from the loudspeakers to both ears of the listener. The major advantage of transaural reproduction is that a listener is able to experience binaural reproduction without wearing any device. This leads to a more natural listening environment. However, in transaural reproduction, the listener is required to be still within a very narrow sweet spot because the crosstalk canceller is very sensitive to the listener's head position and orientation. To solve this problem, dynamic transaural systems have been developed by utilizing contact type head tracking. This paper introduces the development of a dynamic transaural system with non-contact head tracking which releases the listener from any attachment, thereby preserving the advantage of transaural reproduction. Experimental results revealed that sound images presented in the horizontal and median planes were localized more accurately when the system tracked the listener's head rotation than when the listeners did not rotate their heads or when the system did not track the listener's head rotation. These results demonstrate that the system works effectively and correctly with the listener's head rotation.
Auditory artifacts due to switching head-related transfer functions (HRTFs) are investigated, using a software-implemented dynamic virtual auditory display (DVAD) developed by the authors. The DVAD responds to a listener's head rotation using a head-tracking device and switching HRTFs to present a highly realistic 3D virtual auditory space to the listener. The DVAD operates on Windows XP and does not require high-performance computers. A total system latency (TSL), which is the delay between head motion and the corresponding change of the ear input signal, is a significant factor of DVADs. The measured TSL of our DVAD is about 50 ms, which is sufficient for practical applications and localization experiments. Another matter of concern is the auditory artifact in DVADs caused by switching HRTFs. Switching HRTFs gives rise to wave discontinuity of synthesized binaural signals, which can be perceived as click noises that degrade the quality of presented sound image. A subjective test and excitation patterns (EPNs) analysis using an auditory filter are performed with various source signals and HRTF spatial resolutions. The results of the subjective test reveal that click noise perception depends on the source signal and the HRTF spatial resolution. Furthermore, EPN analysis reveals that switching HRTFs significantly distorts the EPNs at the off signal frequencies. Such distortions, however, are masked perceptually by broad-bandwidth source signals, whereas they are not masked by narrow-bandwidth source signals, thereby making the click noise more detectable. A higher HRTF spatial resolution leads to smaller distortions. But, depending on the source signal, perceivable click noises still remain even with 0.5-degree spatial resolution, which is less than minimum audible angle (1 degree in front).
A non-audible murmur (NAM), a very weak whisper sound produced without vocal fold vibration, has been researched in the development of a silent-speech communication tool for functional speech disorders as well as human-to-human/machine interfaces with inaudible voice input. The NAM can be detected using a specially designed microphone, called a NAM microphone, attached to the neck. However, the detected NAM signal has a low signal-to-noise ratio and severely suppressed high-frequency component. To improve NAM clarity, the mechanism of a NAM production must be clarified. In this work, an air flow through a glottis in the vocal tract was numerically simulated using computational fluid dynamics and vocal tract shape models that are obtained by a magnetic resonance imaging (MRI) scan for whispered voice production with various strengths, i.e. strong, weak, and very weak. For a very weak whispering during the MRI scan, subjects were trained, just before the scanning, to produce the very weak whispered voice, or the NAM. The numerical results show that a weak vorticity flow occurs in the supraglottal region even during a very weak whisper production; such vorticity flow provide aeroacoustic sources for a very weak whispering, i.e. NAM, as in an ordinary whispering.