1-10hit |
Hiroaki KURABAYASHI Makoto OTANI Kazunori ITOH Masami HASHIMOTO Mizue KAYAMA
Binaural reproduction is one of the promising approaches to present a highly realistic virtual auditory space to a listener. Generally, binaural signals are reproduced using a set of headphones that leads to a simple implementation of such a system. In contrast, binaural signals can be presented to a listener using a technique called “transaural reproduction” which employs a few loudspeakers with crosstalk cancellation for compensating acoustic transmissions from the loudspeakers to both ears of the listener. The major advantage of transaural reproduction is that a listener is able to experience binaural reproduction without wearing any device. This leads to a more natural listening environment. However, in transaural reproduction, the listener is required to be still within a very narrow sweet spot because the crosstalk canceller is very sensitive to the listener's head position and orientation. To solve this problem, dynamic transaural systems have been developed by utilizing contact type head tracking. This paper introduces the development of a dynamic transaural system with non-contact head tracking which releases the listener from any attachment, thereby preserving the advantage of transaural reproduction. Experimental results revealed that sound images presented in the horizontal and median planes were localized more accurately when the system tracked the listener's head rotation than when the listeners did not rotate their heads or when the system did not track the listener's head rotation. These results demonstrate that the system works effectively and correctly with the listener's head rotation.
Shuichi SAKAMOTO Satoshi HONGO Yôiti SUZUKI
Sensing and reproduction of precise sound-space information is important to realize highly realistic audio communications. This study was conducted to realize high-precision sensors of 3D sound-space information for transmission to distant places and for preservation of sound data for the future. Proposed method comprises a compact and spherical object with numerous microphones. Each recorded signal from multiple microphones that are uniformly distributed on the sphere is simply weighted and summed to synthesize signals to be presented to a listener's left and right ears. The calculated signals are presented binaurally via ordinary binaural systems such as headphones. Moreover, the weight can be changed according to a human's 3D head movement. A human's 3D head movement is well known to be a crucially important factor to facilitate human spatial hearing. For accurate spatial hearing, 3D sound-space information is acquired as accurately reflecting the listener's head movement. We named the proposed method SENZI (Symmetrical object with ENchased ZIllion microphones). The results of computer simulations demonstrate that our proposed SENZI outperforms a conventional method (binaural Ambisonics). It can sense 3D sound-space with high precision over a wide frequency range.
Tai-Ming CHANG Yi-Ming SHIU Pao-Chi CHANG
This work presents a four-channel headset achieving a 5.1-channel-like hearing experience using a low-complexity head-related transfer function (HRTF) model and a simplified reverberator. The proposed down-mixing architecture enhances the sound localization capability of a headset using the HRTF and by simulating multiple sound reflections in a room using Moorer's reverberator. Since the HRTF has large memory and computation requirements, the common-acoustical-pole and zero (CAPZ) model can be used to reshape the lower-order HRTF model. From a power consumption viewpoint, the CAPZ model reduces computation complexity by approximately 40%. The subjective listening tests in this study shows that the proposed four-channel headset performs much better than stereo headphones. On the other hand, the four-channel headset that can be implemented by off-the-shelf components preserves the privacy with low cost.
Masashi OKADA Nobuyuki IWANAGA Tomoya MATSUMURA Takao ONOYE Wataru KOBAYASHI
In this paper, we propose a new 3D sound rendering method for multiple sound sources with limited computational resources. The method is based on fuzzy clustering, which achieves dual benefits of two general methods based on amplitude-panning and hard clustering. In embedded systems where the number of reproducible sound sources is restricted, the general methods suffer from localization errors and/or serious quality degradation, whereas the proposed method settles the problems by executing clustering-process and amplitude-panning simultaneously. Computational cost evaluation based on DSP implementation and subjective listening test have been performed to demonstrate the applicability for embedded systems and the effectiveness of the proposed method.
Auditory artifacts due to switching head-related transfer functions (HRTFs) are investigated, using a software-implemented dynamic virtual auditory display (DVAD) developed by the authors. The DVAD responds to a listener's head rotation using a head-tracking device and switching HRTFs to present a highly realistic 3D virtual auditory space to the listener. The DVAD operates on Windows XP and does not require high-performance computers. A total system latency (TSL), which is the delay between head motion and the corresponding change of the ear input signal, is a significant factor of DVADs. The measured TSL of our DVAD is about 50 ms, which is sufficient for practical applications and localization experiments. Another matter of concern is the auditory artifact in DVADs caused by switching HRTFs. Switching HRTFs gives rise to wave discontinuity of synthesized binaural signals, which can be perceived as click noises that degrade the quality of presented sound image. A subjective test and excitation patterns (EPNs) analysis using an auditory filter are performed with various source signals and HRTF spatial resolutions. The results of the subjective test reveal that click noise perception depends on the source signal and the HRTF spatial resolution. Furthermore, EPN analysis reveals that switching HRTFs significantly distorts the EPNs at the off signal frequencies. Such distortions, however, are masked perceptually by broad-bandwidth source signals, whereas they are not masked by narrow-bandwidth source signals, thereby making the click noise more detectable. A higher HRTF spatial resolution leads to smaller distortions. But, depending on the source signal, perceivable click noises still remain even with 0.5-degree spatial resolution, which is less than minimum audible angle (1 degree in front).
Nobuyuki IWANAGA Tomoya MATSUMURA Akihiro YOSHIDA Wataru KOBAYASHI Takao ONOYE
A sound localization method in the proximal region is proposed, which is based on a low-cost 3D sound localization algorithm with the use of head-related transfer functions (HRTFs). The auditory parallax model is applied to the current algorithm so that more accurate HRTFs can be used for sound localization in the proximal region. In addition, head-shadowing effects based on rigid-sphere model are reproduced in the proximal region by means of a second-order IIR filter. A subjective listening test demonstrates the effectiveness of the proposed method. Embedded system implementation of the proposed method is also described claiming that the proposed method improves sound effects in the proximal region only with 5.1% increase of memory capacity and 8.3% of computational costs.
Kazuya TSUKAMOTO Yoshinobu KAJIKAWA Yasuo NOMURA
In this paper, we propose a novel sound field reproduction system that uses the simultaneous perturbation (SP) method as well as two fast convergence techniques. Sound field reproduction systems that reproduce any desired signal at listener's ear generally use fixed preprocessing filters that are determined by the transfer functions from loudspeakers to control points in advance. However, control point movement results in severe localization errors. Our solution is a sound field reproduction system, based on the SP method, which uses only an error signal to update the filter coefficients. The SP method can track all control point movements but suffers from slow convergence. Hence, we also propose two methods that offer improved convergence speeds. One is a delay control method that compensates the delay caused by back-and-forth control point movements. The other is a compensation method that offsets the localization error caused by head rotation. Simulations demonstrate that the proposed methods can well track control point movements while offering reasonable convergence speeds.
Kosuke TSUJINO Wataru KOBAYASHI Takao ONOYE Yukihiro NAKAMURA
3-D sound using head-related transfer functions (HRTFs) is applicable to embedded systems such as portable devices, since it can create spatial sound effect without multichannel transducers. Low-order modeling of HRTF with an IIR filter is effective for the reduction of the computational load required in embedded applications. Although modeling of HRTFs with IIR filters has been studied earnestly, little attention has been paid to sound movement with IIR filters, which is important for practical applications of 3-D sound. In this paper, a practical method for sound movement is proposed, which utilizes time-varying IIR filters and variable delay filters. The computational cost for sound movement is reduced by about 50% with the proposed method, compared to conventional low-order FIR implementation. In order to facilitate efficient implementation of 3-D sound movement, tradeoffs between the subjective quality of the output sound and implementation parameters such as the size of filter coefficient database and the update period of filter coefficients are also discussed.
Hiroshi HASEGAWA Masao KASUGA Shuichi MATSUMOTO Atsushi KOIKE
HRTFs (head-related transfer functions) are available for sound field reproduction with spatial fidelity, since HRTFs involve the acoustic cues such as interaural time difference, interaural intensity difference and spectral cues that are used for the perception of the location of a sound image. Generally, FIR filters are used in the simulation of HRTFs. However, this method is not useful for a simply system, since the orders of the FIR filters are high. In this paper, we propose a method using IIR filter for simply realization of sound image localization. The HRTFs of a dummy-head were approximated by the following filters: (A) fourth to seventh-order IIR filters and (B) third-order IIR filters. In total, the HRTFs of 24 different directions on the horizontal plane were used as the target characteristics. Sound localization experiments for the direction and the elevation angle of a sound image were carried out for 3 subjects in a soundproof chamber. The binaural signal sounds using the HRTFs simulated by FIR filters and approximated by IIR filters (A) and (B) were reproduced via two loudspeakers, and sound image localization on the horizontal plane was realized. As the result of the experiments, the sound image localization using the HRTFs approximated by IIR filters (A) is the same accuracy as the case of using the FIR filters. This result shows that it is possible to create sound fields with binaural reproduction more simply.
Binaural effects in two measures are studied. One measure is the detectable limen of click sounds under lateralization of diotic or dichotic noise signals, and the other is phoneme articulation score under localization or lateralization of speech and noise signals. The experiments use a headphones system with listener's own head related transfer function (HRTF) filters. The HRTF filter coefficients are calculated individually from the impulse responses due to the listener's HRTF measured in a slightly sound reflective booth. The frequency response of the headphone is compensated for using an inverse filter calculated from the response at the subject's own ear canal entrance point. Considering the speech frequency band in tele-communication systems is not sufficiently wide, the bandwidth of the HRTF filter is limited below 6.2 kHz. However, the experiments of the localization simulation in the horizontal plane show that the sound image is mostly perceived outside the head in the simulated direction. Under simulation of localization or lateralization of speech and noise signals, the phoneme articulation score increases when the simulation spatially separates the phonemes from the noise signals while the total signal to noise ratio for both ears is maintained constant. This result shows the binaural effect in speech intelligibility under the noise disturbance condition, which is regarded as a part of the cocktail party effect.