Tetsuo KOSAKA Shigeki SAGAYAMA
We discuss how to determine automatically the number of mixture components in continuous mixture density HMMs (CHMMs). A notable trend has been the use of CHMMs in recent years. One of the major problems with a CHMM is how to determine its structure, that is, how many mixture components and states it has and its optimal topology. The number of mixture components has been determined heuristically so far. To solve this problem, we first investigate the influence of the number of mixture components on model parameters and the output log likelihood value. As a result, in contrast to the mixture number uniformity" which is applied in conventional approaches to determine the number of mixture components, we propose the principle of distribution size uniformity". An algorithm is introduced for automatically determining the number of mixture components. The performance of this algorithm is shown through recognition experiments involving all Japanese phonemes. Two types of experiments are carried out. One assumes that the number of mixture components for each state is the same within a phonetic model but may vary between states belonging to different phonemes. The other assumes that each state has a variable number of mixture components. These two experiments give better results than the conventional method.
In this study, after focussing on an energy (or intensity) scaled variable of acoustic systems, first, a new regression analysis method is theoretically proposed by introducing a multiplicative noise model suitable to the positively scaled stocastic system. Then, the effectiveness of the proposed method is confirmed experimentally by applying it to the actual acoustic data.
A new leaky surface wave on lithium tetraborate that propagates along the surface with a higher phase velocity than that of ordinary leaky surface waves, radiating two bulk wave terms into the solid, is described.
By applying Wigner distribution, which has high time resolution and high random noise reducing capability, to the acoustic bio–signals, the possibility of early diagnosis in both intracranial vascular deformation and prosthetic cardiac valve malfunction increased. Especially in latter case, 1st–order local moment of the distribution showed its effectiveness.
In the actual acoustic environment, the stochastic process exhibits various non-Gaussian distribution forms, and there exist potentially various nonlinear correlations in addition to the linear correlation between time series. In this study, a nonlinear ARMA model is proposed, based on the Bayes' theorem, where no artificially pre-established regression function model is assumed between time series, while reflecting hierarchically all of those various correlation informations. The proposed method is applied to the actual data of road traffic noise and its practical usefulness is verified.
In direct connection with the signal information processing, a practical method of identification and probabilistic prediction for sound insulation systems is theoretically proposed in the object-oriented expression forms by introducing a few functional system parameters. Concretely, a trial of identification of the above functional system parameters and the output probabilistic prediction for a panel thickness change of double-wall type sound insulation system, especially, under the existence of a strong background noise inside of the reception room, is newly proposed based on one of wide sense digital filters and SEA (Statistical Energy Analysis) method. Finally, by using the actual music sound of an arbitrary distribution type, the effectiveness of the proposad method is confirmed experimentally by applying it to some problems of predicting the cumulative probability distribution of the transmitted sound level fluctuation.
Takeshi INOUE Noriko WATARI Akira KAMEYAMA Michiya SUZUKI Tetsuo MIYAMA
Wide-band, low-ripple underwater transducers with high-power acoustic radiation capability have been designed on the basis of multiple-mode filter synthesis theory. They are composed of triple acoustic matching plates and double backing plates with optimized specific acoustic impedances,besides piezoelectric ceramic elements. One of the backing plates employs a Fe damping-alloy to suppress unwanted response peaks in the frequency range above the passband region. Two 33 array transducers were fabricated, each with a center frequency of 200 kHz, one as a transmitter and the other as a receiver. The two transducers show high-sensitivity, low-ripple and wide-band transmitting and receiving responses. Then, the transducers were applied in a color video picture digital transmission system.Clear color video pictures, composed of 256240 pixels, were successfully received within one second.
Manabu KOTANI Haruya MATSUMOTO Toshihide KANAGAWA
An attempt to apply neural networks to the acoustic diagnosis for the reciprocating compressor is described. The proposed neural network, Hybrid Neural Network (HNN), is composed of two multi-layered neural networks, an Acoustic Feature Extraction Network (AFEN) and a Fault Discrimination Network (FDN). The AFEN has multi-layers and the number of units in the middle hidden layer is smaller than the others. The input patterns of the AFEN are the logarithmic power spectra. In the AFEN, the error back propagation method is applied as the learning algorithm and the target patterns for the output layer are the same as the input patterns. After the learning, the hidden layer acquires the compressed input information. The architecture of the AFEN appropriate for the acoustic diagnosis is examined. This includes the determination of the form of the activation function in the output layer, the number of hidden layers and the numbers of units in the hidden layers. The FDN is composed of three layers and the learning algorithm is the same as the AFEN. The appropriate number of units in the hidden layer of the FDN is examined. The input patterns of the FDN are fed from the output of the hidden layer in the learned AFEN. The task of the HNN is to discriminate the types of faults in the compressor's two elements, the valve plate and the valve spring. The performance of the FDN are compared between the different inputs; the output of the hidden layer in the AFEN, the conventional cepstral coefficients and the filterbank's outputs. Furthermore, the FDN itself is compared to the conventional pattern recognition technique based on the feature vector distance, the Euclid distance measure, where the input is taken from the AFEN. The obtained results show that the discrimination accuracy with the HNN is better than that with the other combination of the discrimination method and its input. The output criteria of network for practical use is also discussed. The discrimination accuracy with this criteria is 85.4% and there is no case which mistakes the fault condition for the normal condition. These results suggest that the proposed decision network is effective for the acoustic diagnosis.
Shinichi SATO Takuro SATO Atsushi FUKASAWA
The method of estimating multiple sound source locations based on a neural network algorithm and its performance are described in this paper. An evaluation function is first defined to reflect both properties of sound propagation of spherical wave front and the uniqueness of solution. A neural network is then composed to satisfy the conditions for the above evaluation function. Locations of multiple sources are given as exciting neurons. The proposed method is evaluated and compared with the deterministic method based on the Hyperbolic Method for the case of 8 sources on a square plane of 200m200m. It is found that the solutions are obtained correctly without any pseudo or dropped-out solutions. The proposed method is also applied to another case in which 54 sound sources are composed of 9 sound groups, each of which contains 6 sound sources. The proposed method is found to be effective and sufficient for practical application.
This paper proposes a new adaptive algorithm for acoustic echo cancellers with four times the convergence speed for a speech input, at almost the same computational load, of the normalized LMS (NLMS). This algorithm reflects both the statistics of the variation of a room impulse response and the whitening of the received input signal. This algorithm, called the ESP (exponentially weighted step-size projection) algorithm, uses a different step size for each coefficient of an adaptive transversal filter. These step sizes are time-invariant and weighted proportional to the expected variation of a room impulse response. As a result, the algorithm adjusts coefficients with large errors in large steps, and coefficients with small errors in small steps. The algorithm is based on the fact that the expected variation of a room impulse response becomes progressively smaller along the series by the same exponential ratio as the impulse response energy decay. This algorithm also reflects the whitening of the received input signal, i.e., it removes the correlation between consecutive received input vectors. This process is effective for speech, which has a highly non-white spectrum. A geometric interpretation of the proposed algorithm is derived and the convergence condition is proved. A fast profection algorithm is introduced to reduce the computational complexity and modified for a practical multiple DSP structure so that it requires almost the same computational load, 2L multiply-add operations, as the conventional NLMS. The algorithm is implemented in an acoustic echo canceller constructed with multiple DSP chips, and its fast convergence is demonstrated.
An acoustic echo canceller that also cancels room noise is proposed. This system has an additive (noise reference) input port, and a noise canceller (NC) precedes the echo canceller (EC) in a cascade configuration. The adaptation control problem for the cascaded echo and noise canceller is solved by controlling the adaptation process to match the occurrence of intermittent speech/echo; the room noise is a stationary signal. A simulation shows that adaptation using the NLMS algorithm is very effective for the echo and noise cancellation. Sub-band cancelling techniques are utilized. Noise cancellation is realized with a lower band EC. Hardware is implemented and its performance evaluated through experiments under a real acoustic field. The combination of the EC with NC maintains excellent performance at all echo to room noise power ratios. It is shown that the proposed canceller overcomes the disadvantages traditionally associated with ECs and NSc.
This paper relates to a novel algorithm for fast estimation of the coefficients of the adaptive FIR filter. The novel algorithm is derived from a first order IIR filter experssion clarifying the estimation process of the NLMS (normalized least mean square) algorithm. The expression shows that the estimation process is equivalent to a procedure extracting the cross-correlation coefficient between the input and the output of an unknown system to be estimated. The interpretation allows to move a subtraction of the echo replica beyond the IIR filter, and the movement gives a construction with the IIR filter coefficient of unity which forms the arithmetic mean. The construction in comparison with the conventional NLMS algorithm, improves the covergence rate extreamly. Moreover, when we use the construction with a simple technique which limits the term of calculating the correlation coefficient in the beginning of a convergence process, the convergence delay becomes negligible. This is a very desirable performance for acoustic echo canceller. In this paper, double-talk and echo path fluctuation are also studied as the first stage for application to acoustic echo canceller. The two subjects can be resolved by introducing two switches and delays into the evaluation process of the correlation coefficient.
The realization of acoustic inverse filter is often difficult because of the non-minimum phase property and the long time duration of the impulse response of the acoustic enclosure. However, if the signals are divided into a large number of sub-bands, many of the sub-bands are found to be invertible. The invertibility of a sub-band signal depends on the zero distribution of the transfer function in the z-plane. In a multi-microphone system, the transfer functions between the sound source and the mirophones have different zero distributions. The method proposed here, taking advantage of the differences of zero distributions, selects the best invertible microphone in each sub-band, and reconstructs the full band signal by summing up the inverse filtered sub-band signals of the best microphones. The quality of the dereverberated signal using the proposed inverse filtering approach is improved with increasing number of microphones and sub-bands. When seven microphones are used and the number of sub-bands is 513, the quality of the dereverberated speech signals are almost the same with the original ones even when the revergeration time is about one second. The introduction of multi-microphones in addition to sub-band processing provides a new way of dealing with the non-minimum phase problem in deconvolution.
Tsuyoshi USAGAWA Hideki MATSUO Yuji MORITA Masanao EBATA
This paper proposes a new adaptive algorithm of the FIR type digital filter for an acoustic echo canceller and similar application fields. Unlike an echo canceller for line, an acoustic echo canceller requires a large number of taps, and it must work appropriately while it is driven by colored input signal. By controlling the filter tap length and updating filter coefficients multiple times during a single sampling interval, the proposed algorithm improves the convergence characteristics of adaptation even if colored input signal is introduced. This algorithm is maned VT-LMS after variable tap length LMS. The results of simulation show the effectiveness of the proposed algorithm not only for white noise but also for colored input signal such as speech. The VT-LMS algorithm has better convergence characteristice with very little extra computational load compared to the conventional algorithm.
Yumi TAKIZAWA Shinichi SATO Keisuke ODA Atsushi FUKASAWA
This paper describes a nonstationary spectral analysis method and its application to prognosis and diagnosis of automobiles. An instantaneous frequency spectrum is considered first at a single point of time based on the instantaneous representation of autocorrelation. The spectral distortion is then considered on two-dimensional spectrum, and the filtering is introduced into the instantaneous autocorrelations. By the above procedure, the Instantaneous Covariance method (ICOV), the Instantaneous Maximum Entropy Method (IMEM), and the Wigner method are shown and they are unified. The IMEM is used for the time-dependent spectral estimation of vibration and acoustic sound signals of automobiles. A multi-dimensional (M-D) space is composed based on the variables which are obtained by the IMEM. The M-D space is transformed into a simple two-dimensional (2-D) plane by a projection matrix chosen by the experiments. The proposed method is confirmed useful to analyze nonstationary signals, and it is expected to implement automatic supervising, prognosis and diagnosis for a traffic system.
It often occurs in the acoustic environment that a specific signal is contaminated by the additional noise of non-Gaussian distribution type. In order to extract exactly the various statistical information of only specific signal from the observed noisy data, a stochastic signal processing by use of digital computer is essential. In this study, a stochastic method for estimating the probability function of the specific signal embedded in the additional noise is first theoretically proposed in a suitable form for the quantized level observation. Then, the effectiveness of the proposed method is experimentally confirmed by applying it to the observed data in the acoustic environment.
It often occurs in an environmental phenomenon in our daily life that a specific signal is partially or completely contaminated by the additional external noise. In this study, a digital filter for estimating a specific signal fluctuating impulsively under the existence of an actual external noise with various kinds of probability distribution forms is proposed in an improved form of already reported digital filter. The effectivenss of the proposed theory is experimentally confirmed by applying it to the estimation of an actual impulsve signal in a room acoustic.
Hisakazu KIKUCHI Makoto NAKASHIZUKA Hiromichi WATANABE Satoru WATANABE Naoki TOMISAWA
Fast wavelet transform is presented for realtime processing of wavelet transforms. A processor for the fast wavelet transform is of the frequency sampling structure in architectural level. The fast wavelet transform owes its parallelism both to the frequency sampling structure and parallel tapping of a series of delay elements. Computational burden of the fast transform is hence independent of specific scale values in wavelets and the parallel processing of the fast transform is readily implemented for real-time applications. This point is quite different from the computation of wavelet transforms by convolution. We applied the fast wavelet transform to detecting detonation in a vehicle engine for precise real-time control of ignition advancement. The prototype wavelet for this experiment was the Gaussian wavelet (i.e. Gabor function) which is known to have the least spread both in time and in frequency. The number of complex multiplications needed to compute the fast wavelet transform over 51 scales is 714 in this experiment, which is less than one tenth of that required for the convolution method. Experimental results have shown that detonation is successfully detected from the acoustic vibration signal picked up by a single knock sensor embedded in the outer wall of a V/8 engine and is discriminated from other environmental mechanical vibrations.