1-18hit |
Keisuke YAMADA Hironobu TAKAHASHI Ryuzo HORIUCHI
The sound power level is a physical quantity indispensable for evaluating the amount of sound energy radiated from electrical and mechanical apparatuses. The precise determination of the sound power level requires the qualification of the measurement environment, such as a hemi-anechoic room, by estimating the deviation of the sound pressure level from the inverse-square law. In this respect, Annex A of ISO 3745 specifies the procedure for room qualification and defines a tolerance limit for the directivity of the sound source, which is used for the qualification. However, it is impractical to prepare a special loudspeaker only for room qualification. Thus, we developed a simulation method to investigate the influence of the sound source directivity on the measured deviation of the sound pressure level from the inverse-square law by introducing a quantitative index for the influence of the directivity. In this study, type 4202 reference sound source by Brüel & Kjær was used as a directional sound source because it has been widely used as a reference standard for the measurement of sound power levels. We experimentally obtained the directivity of the sound source by measuring the sound pressure level over the measurement surface. Moreover, the proposed method was applied to the qualification of several hemi-anechoic rooms, and we discussed the availability of a directional sound source for the process. Analytical results showed that available reference sound sources may be used for the evaluation of hemi-anechoic rooms depending on the sound energy absorption coefficient of the inner wall, the directionality of the microphone traverse, and the size of the space to be qualified. In other words, the results revealed that a reference sound source that is once quantified by the proposed method can be used for qualifying hemi-anechoic rooms.
Sound source localization is an essential technique in many applications, e.g., speech enhancement, speech capturing and human-robot interaction. However, the performance of traditional methods degrades in noisy or reverberant environments, and it is sensitive to the spatial location of sound source. To solve these problems, we propose a sound source localization framework based on bi-direction interaural matching filter (IMF) and decision weighting fusion. Firstly, bi-directional IMF is put forward to describe the difference between binaural signals in forward and backward directions, respectively. Then, a hybrid interaural matching filter (HIMF), which is obtained by the bi-direction IMF through decision weighting fusion, is used to alleviate the affection of sound locations on sound source localization. Finally, the cosine similarity between the HIMFs computed from the binaural audio and transfer functions is employed to measure the probability of the source location. Constructing the similarity for all the spatial directions as a matrix, we can determine the source location by Maximum A Posteriori (MAP) estimation. Compared with several state-of-the-art methods, experimental results indicate that HIMF is more robust in noisy environments.
Hirofumi NAKAJIMA Keiko KIKUCHI Kazuhiro NAKADAI Yutaka KANEDA
This paper proposes a sound source orientation estimation method that is suitable for a distributed microphone arrangement. The proposed method is based on orientation-extended beamforming (OEBF), which has four features: (a) robustness against reverberations, (b) robustness against noises, (c) free arrangements of microphones and (d) feasibility for real-time processing. In terms of (a) and (c), since OEBF is based on a general propagation model using transfer functions (TFs) that include all propagation phenomena such as reflections and diffractions, OEBF causes no model errors for the propagation phenomena, and is applicable to arbitrary microphone arrangements. Regarding (b), OEBF overcomes noise effects by incorporating three additional processes (Amplitude extraction, time-frequency mask and histogram integration) that are also proposed in this paper. As for (d), OEBF is executable in real-time basis as the execution process is the same as usual beamforming processes. A numerical experiment was performed to confirm the theoretical validity of OEBF. The results showed that OEBF was able to estimate sound source positions and orientations very precisely. Practical experiments were carried out using a 96-channel microphone array in real environments. The results indicated that OEBF worked properly even under reverberant and noisy environments and the averaged estimation error was given only 4°.
Hirofumi TSUZUKI Mauricio KUGLER Susumu KUROYANAGI Akira IWATA
This paper presents a Complex-Valued Neural Network-based sound localization method. The proposed approach uses two microphones to localize sound sources in the whole horizontal plane. The method uses time delay and amplitude difference to generate a set of features which are then classified by a Complex-Valued Multi-Layer Perceptron. The advantage of using complex values is that the amplitude information can naturally masks the phase information. The proposed method is analyzed experimentally with regard to the spectral characteristics of the target sounds and its tolerance to noise. The obtained results emphasize and confirm the advantages of using Complex-Valued Neural Networks for the sound localization problem in comparison to the traditional Real-Valued Neural Network model.
Kenta NIWA Yusuke HIOKA Sumitaka SAKAUCHI Ken'ichi FURUYA Yoichi HANEDA
A method to estimate sound source orientation in a reverberant room using a microphone array is proposed. We extend the conventional modeling of a room transfer function based on the image method in order to take into account the directivity of a sound source. With this extension, a transfer function between a sound source and a listener (or a microphone) is described by the superposition of transfer functions from each image source to the listener multiplied by the source directivity; thus, the sound source orientation can be estimated by analyzing how the image sources are distributed (power distribution of image sources) from observed signals. We applied eigenvalue analysis to the spatial correlation matrix of the microphone array observation to obtain the power distribution of image sources. Bsed on the assumption that the spatial correlation matrix for each set of source position and orientation is known a priori, the variation of the eigenspace can be modeled. By comparing the eigenspace of observed signals and that of pre-learned models, we estimated the sound source orientation. Through experiments using seven microphones, the sound source orientation was estimated with high accuracy by increasing the reverberation time of a room.
Sang Ha PARK Seokjin LEE Koeng-Mo SUNG
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
Suwon SHON David K. HAN Jounghoon BEH Hanseok KO
This paper describes a method for estimating Direction Of Arrival (DOA) of multiple sound sources in full azimuth with three microphones. Estimating DOA with paired microphone arrays creates imaginary sound sources because of time delay of arrival (TDOA) being identical between real and imaginary sources. Imaginary sound sources can create chronic problems in multiple Sound Source Localization (SSL), because they can be localized as real sound sources. Our proposed approach is based on the observation that each microphone array creates imaginary sound sources, but the DOA of imaginary sources may be different depending on the orientation of the paired microphone array. With the fact that a real source would always be localized in the same direction regardless of the array orientation, we can suppress the imaginary sound sources by minimum filtering based on Steered Response Power – Phase Transform (SRP-PHAT) method. A set of experiments conducted in a real noisy environment showed that the proposed method was accurate in localizing multiple sound sources.
Masashi OKADA Nobuyuki IWANAGA Tomoya MATSUMURA Takao ONOYE Wataru KOBAYASHI
In this paper, we propose a new 3D sound rendering method for multiple sound sources with limited computational resources. The method is based on fuzzy clustering, which achieves dual benefits of two general methods based on amplitude-panning and hard clustering. In embedded systems where the number of reproducible sound sources is restricted, the general methods suffer from localization errors and/or serious quality degradation, whereas the proposed method settles the problems by executing clustering-process and amplitude-panning simultaneously. Computational cost evaluation based on DSP implementation and subjective listening test have been performed to demonstrate the applicability for embedded systems and the effectiveness of the proposed method.
Kook CHO Hajime OKUMURA Takanobu NISHIURA Yoichi YAMASHITA
In real environments, the presence of ambient noise and room reverberations seriously degrades the accuracy in sound source localization. In addition, conventional sound source localization methods cannot localize multiple sound sources accurately in real noisy environments. This paper proposes a new method of multiple sound source localization using a distributed microphone system that is a recording system with multiple microphones dispersed to a wide area. The proposed method localizes a sound source by finding the position that maximizes the accumulated correlation coefficient between multiple channel pairs. After the estimation of the first sound source, a typical pattern of the accumulated correlation for a single sound source is subtracted from the observed distribution of the accumulated correlation. Subsequently, the second sound source is searched again. To evaluate the effectiveness of the proposed method, experiments of two sound source localization were carried out in an office room. The result shows that sound source localization accuracy is about 99.7%. The proposed method could realize the multiple sound source localization robustly and stably.
Yoshifumi CHISAKI Toshimichi TAKADA Masahiro NAGANISHI Tsuyoshi USAGAWA
The frequency domain binaural model (FDBM) has been previously proposed to localize multiple sound sources. Since the method requires only two input signals and uses interaural phase and level differences caused by the diffraction generated by the head, flexibility in application is very high when the head is considered as an object. When an object is symmetric with respect to the two microphones, the performance of sound source localization is degraded, as a human being has front-back confusion due to the symmetry in a median plane. This paper proposes to reduce the degradation of performance on sound source localization by a combination of the microphone pair outputs using the FDBM. The proposed method is evaluated by applying to a security camera system, and the results showed performance improvement in sound source localization because of reducing the number of cones of confusion.
Hua XIAO Huai-Zong SHAO Qi-Cong PENG
In this paper, a robust sound source localization approach is proposed. The approach retains good performance even when model errors exist. Compared with previous work in this field, the contributions of this paper are as follows. First, an improved broad-band and near-field array model is proposed. It takes array gain, phase perturbations into account and is based on the actual positions of the elements. It can be used in arbitrary planar geometry arrays. Second, a subspace model errors estimation algorithm and a Weighted 2-Dimension Multiple Signal Classification (W2D-MUSIC) algorithm are proposed. The subspace model errors estimation algorithm estimates unknown parameters of the array model, i.e., gain, phase perturbations, and positions of the elements, with high accuracy. The performance of this algorithm is improved with the increasing of SNR or number of snapshots. The W2D-MUSIC algorithm based on the improved array model is implemented to locate sound sources. These two algorithms compose the robust sound source approach. The more accurate steering vectors can be provided for further processing such as adaptive beamforming algorithm. Numerical examples confirm effectiveness of this proposed approach.
Junfeng LI Masato AKAGI Yoiti SUZUKI
In this paper, we propose a two-microphone noise reduction method to deal with non-stationary interfering noises in multiple-noise-source environments in which the traditional two-microphone algorithms cannot function well. In the proposed algorithm, multiple interfering noise sources are regarded as one virtually integrated noise source in each subband, and the spectrum of the integrated noise is then estimated using its virtual direction of arrival. To do this, we suggest a direction finder for the integrated noise using only two microphones that performs well even in speech active periods. The noise spectrum estimate is further improved by integrating a single-channel noise estimation approach and then subtracted from that of the noisy signal, finally enhancing the desired target signal. The performance of the proposed algorithm is evaluated and compared with the traditional algorithms in various conditions. Experimental results demonstrate that the proposed algorithm outperforms the traditional algorithms in various conditions in terms of objective and subjective speech quality measures.
Naoya MOCHIKI Tetsuji OGAWA Tetsunori KOBAYASHI
A new type of sound source segregation method using robot-mounted microphones, which are free from strict head related transfer function (HRTF) estimation, has been proposed and successfully applied to three simultaneous speech recognition systems. The proposed segregation method is executed with sound intensity differences that are due to the particular arrangement of the four directivity microphones and the existence of a robot head acting as a sound barrier. The proposed method consists of three-layered signal processing: two-line SAFIA (binary masking based on the narrow band sound intensity comparison), two-line spectral subtraction and their integration. We performed 20 K vocabulary continuous speech recognition test in the presence of three speakers' simultaneous talk, and achieved more than 70% word error reduction compared with the case without any segregation processing.
Osamu ICHIKAWA Tetsuya TAKIGUCHI Masafumi NISHIMURA
In a two-microphone approach, interchannel differences in time (ICTD) and interchannel differences in sound level (ICLD) have generally been used for sound source localization. But those cues are not effective for vertical localization in the median plane (direct front). For that purpose, spectral cues based on features of head-related transfer functions (HRTF) have been investigated, but they are not robust enough against signal variations and environmental noise. In this paper, we use a "profile" as a cue while using a combination of reflectors specially designed for vertical localization. The observed sound is converted into a profile containing information about reflections as well as ICTD and ICLD data. The observed profile is decomposed into signal and noise by using template profiles associated with sound source locations. The template minimizing the residual of the decomposition gives the estimated sound source location. Experiments show this method can correctly provide a rough estimate of the vertical location even in a noisy environment.
Toshiharu HORIUCHI Haruhide HOKARI Shoji SHIMADA Takashi INADA
A sound localization method based on the adaptive estimation of inverse Ear Canal Transfer Functions (ECTFs) using a stereo earphone-microphone combination is proposed. This method can adaptively obtain the individual's transfer functions to fit the listener in real-time. We evaluate our sound localization method by studying the relationship between the estimation error of inverse ECTFs and the auditory sound localization score perceived by several listener. As a result, we clarified that the estimation error required of inverse ECTFs are less than -10 dB. In addition, we describe two adaptive inverse filtering methods in order to realize real-time signal processing implementation using affine projection algorithm and discusses the convergence time of an adaptive inverse filter to determine the initial value. It is clarified that method 2 based on copy weights with initial value is more effective than method 1 with filtered-x algorithm, in terms of convergence, if the initial value is the average of many listeners' impulse responses for our sound localization method.
Panikos HERACLEOUS Satoshi NAKAMURA Takeshi YAMADA Kiyohiro SHIKANO
This paper describes a method for hands-free speech recognition, and particularly for the simultaneous recognition of multiple sound sources. The method is based on the 3-D Viterbi search, i.e., extended to the 3-D N-best search method enabling the recognition of multiple sound sources. The baseline system integrates two existing technologies--3-D Viterbi search and conventional N-best search--into a complete system. Previously, the first evaluation of the 3-D N-best search-based system showed that new ideas are necessary to develop a system for the simultaneous recognition of multiple sound sources. It found two factors that play important roles in the performance of the system, namely the different likelihood ranges of the sound sources and the direction-based separation of the hypotheses. In order to solve these problems, we implemented a likelihood normalization and a path distance-based clustering technique into the baseline 3-D N-best search-based system. The performance of our system was evaluated through experiments on simulated data for the case of two talkers. The experiments showed significant improvements by implementing the above two techniques. The best results were obtained by implementing the two techniques and using a microphone array composed of 32 channels. More specifically, the Word Accuracy for the two talkers was higher than 80% and the Simultaneous Word Accuracy (where both sources are correctly recognized simultaneously) was higher than 70%, which are very promising results.
Tatsuhiro YONEKURA Rikako NARISAWA Yoshiki WATANABE
This paper proposes a new emphasizing three-dimensional pointing device considering user friendliness and lack of cable clutter. The proposed method utilizes five degrees of freedom via the medium of non-verbal voice of human. That is, the spatial direction of the sound source, the type of the voice phoneme and the tone of the voice phoneme are utilized. The input voice is analyzed regarding the above factors and then taking proper effects as previously defined for human interface. In this paper the estimated spatial direction is used for three-dimensional movement for the virtual object as three degrees of freedom. Both of the type and the tone of the voice phoneme are used for remaining two degrees of freedom. Since vocalization of nonverbal human voice is an everyday task, and the intonation of the voice can be quite easily and intentionally controlled by human vocal ability, the proposed scheme is a new three-dimensional spatial interaction medium. In this sense, this paper realizes a cost-effective and handy nonverbal interface scheme without any artificial wearing materials which might give a physical and psychological fatigue. By using the prototype the authors evaluate the performance of the scheme from both of static and dynamic points of view and show some advantages of look and feel, and then prospect possibilities of the application for the proposed scheme.
Shinichi SATO Takuro SATO Atsushi FUKASAWA
The method of estimating multiple sound source locations based on a neural network algorithm and its performance are described in this paper. An evaluation function is first defined to reflect both properties of sound propagation of spherical wave front and the uniqueness of solution. A neural network is then composed to satisfy the conditions for the above evaluation function. Locations of multiple sources are given as exciting neurons. The proposed method is evaluated and compared with the deterministic method based on the Hyperbolic Method for the case of 8 sources on a square plane of 200m200m. It is found that the solutions are obtained correctly without any pseudo or dropped-out solutions. The proposed method is also applied to another case in which 54 sound sources are composed of 9 sound groups, each of which contains 6 sound sources. The proposed method is found to be effective and sufficient for practical application.