1-14hit |
A novel pulse neural network model for sound localization has been proposed. Our model is based on the physiological auditory nervous system. Human beings can perceive the sound direction using inter-aural time difference (ILD) and inter-aural level difference (ILD) of two sounds. The model extracts these features using only pulse train information. The model is divided roughly into three sections: preprocessing for input signals; transforming continuous signals to pulse trains; and extracting features. The last section consists of two parts: ITD extractor and ILD extractor. Both extractors are implemented using a pulse neuron model. They have the same network structure, differing only in terms of parameters and arrangements of the pulse neuron model. The pulse neuron model receives pulse trains and outputs a pulse train. Because the pulses have only simple informations, their data structures are very simple and clear. Thus, a strict design is not required for the implementation of the model. These advantages are profitable for realizing this model by hardware. A computer simulation has demonstrated that time and level differences between two signals have been successfully extracted by the model.
Anto Satriyo NUGROHO Susumu KUROYANAGI Akira IWATA
Studies on artificial neural network have been conducted for a long time, and its contribution has been shown in many fields. However, the application of neural networks in the real world domain is still a challenge, since nature does not always provide the required satisfactory conditions. One example is the class size imbalanced condition in which one class is heavily under-represented compared to another class. This condition is often found in the real world domain and presents several difficulties for algorithms that assume the balanced condition of the classes. In this paper, we propose a method for solving problems posed by imbalanced training sets by applying the modified large-scale neural network "CombNET-II. " CombNET-II consists of two types of neural networks. The first type is a one-layer vector quantization neural network to turn the problem into a more balanced condition. The second type consists of several modules of three-layered multilayer perceptron trained by backpropagation for finer classification. CombNET-II combines the two types of neural networks to solve the problem effectively within a reasonable time. The performance is then evaluated by turning the model into a practical application for a fog forecasting problem. Fog forecasting is an imbalanced training sets problem, since the probability of fog appearance in the observation location is very low. Fog events should be predicted every 30 minutes based on the observation of meteorological conditions. Our experiments showed that CombNET-II could achieve a high prediction rate compared to the k-nearest neighbor classifier and the three-layered multilayer perceptron trained with BP. Part of this research was presented in the 1999 Fog Forecasting Contest sponsored by Neurocomputing Technical Group of IEICE, Japan, and CombNET-II achieved the highest accuracy among the participants.
Mauricio KUGLER Teemu TOSSAVAINEN Susumu KUROYANAGI Akira IWATA
Sound localization systems are widely studied and have several potential applications, including hearing aid devices, surveillance and robotics. However, few proposed solutions target portable systems, such as wearable devices, which require a small unnoticeable platform, or unmanned aerial vehicles, in which weight and low power consumption are critical aspects. The main objective of this research is to achieve real-time sound localization capability in a small, self-contained device, without having to rely on large shaped platforms or complex microphone arrays. The proposed device has two surface-mount microphones spaced only 20 mm apart. Such reduced dimensions present challenges for the implementation, as differences in level and spectra become negligible, and only time-difference of arrival (TDoA) can be used as a localization cue. Three main issues have to be addressed in order to accomplish these objectives. To achieve real-time processing, the TDoA is calculated using zero-crossing spikes applied to the hardware-friendly Jeffers model. In order to make up for the reduction in resolution due to the small dimensions, the signal is upsampled several-fold within the system. Finally, a coherence-based spectral masking is used to select only frequency components with relevant TDoA information. The proposed system was implemented on a field-programmable gate array (FPGA) based platform, due to the large amount of concurrent and independent tasks, which can be efficiently parallelized in reconfigurable hardware devices. Experimental results with white-noise and environmental sounds show high accuracies for both anechoic and reverberant conditions.
Yuji IWAHORI Shinji FUKUI Robert J. WOODHAM Akira IWATA
This paper proposes a new approach to recover the sign of local surface curvature of object from three shading images using neural network. The RBF (Radial Basis Function) neural network is used to learn the mapping of three image irradiances to the position on a sphere. Then, the learned neural network maps the image irradiances at the neighbor pixels of the test object taken from three illuminating directions of light sources onto the sphere images taken under the same illuminating condition. Using the property that basic six kinds of surface curvature has the different relative locations of the local five points mapped on the sphere, not only the Gaussian curvature but also the kind of curvature is directly recovered locally from the relation of the locations on the mapped points on the sphere without knowing the values of surface gradient for each point. Further, two step neural networks which combines the forward mapping and its inverse mapping one can be used to get the local confidence estimate for the obtained results. The entire approach is non-parametric, empirical in that no explicit assumptions are made about light source directions or surface reflectance. Results are demonstrated by the experiments for real images.
Shinji FUKUI Yuji IWAHORI Robert J. WOODHAM Kenji FUNAHASHI Akira IWATA
This paper proposes a new method to recover the sign of local Gaussian curvature from multiple (more than three) shading images. The information required to recover the sign of Gaussian curvature is obtained by applying Principal Components Analysis (PCA) to the normalized irradiance measurements. The sign of the Gaussian curvature is recovered based on the relative orientation of measurements obtained on a local five point test pattern to those in the 2-D subspace called the eigen plane. Using multiple shading images gives a more accurate and robust result and minimizes the effect of shadows by allowing a larger area of the visible surface to be analyzed compared to methods using only three shading images. Furthermore, it allows the method to be applied to specular surfaces. Since PCA removes linear correlation among images, the method can produce results of high quality even when the light source directions are not widely dispersed.
Hirofumi TSUZUKI Mauricio KUGLER Susumu KUROYANAGI Akira IWATA
This paper presents a Complex-Valued Neural Network-based sound localization method. The proposed approach uses two microphones to localize sound sources in the whole horizontal plane. The method uses time delay and amplitude difference to generate a set of features which are then classified by a Complex-Valued Multi-Layer Perceptron. The advantage of using complex values is that the amplitude information can naturally masks the phase information. The proposed method is analyzed experimentally with regard to the spectral characteristics of the target sounds and its tolerance to noise. The obtained results emphasize and confirm the advantages of using Complex-Valued Neural Networks for the sound localization problem in comparison to the traditional Real-Valued Neural Network model.
Mauricio KUGLER Teemu TOSSAVAINEN Miku NAKATSU Susumu KUROYANAGI Akira IWATA
The development of assistive devices for automated sound recognition is an important field of research and has been receiving increased attention. However, there are still very few methods specifically developed for identifying environmental sounds. The majority of the existing approaches try to adapt speech recognition techniques for the task, usually incurring high computational complexity. This paper proposes a sound recognition method dedicated to environmental sounds, designed with its main focus on embedded applications. The pre-processing stage is loosely based on the human hearing system, while a robust set of binary features permits a simple k-NN classifier to be used. This gives the system the capability of in-field learning, by which new sounds can be simply added to the reference set in real-time, greatly improving its usability. The system was implemented in an FPGA based platform, developed in-house specifically for this application. The design of the proposed method took into consideration several restrictions imposed by the hardware, such as limited computing power and memory, and supports up to 12 reference sounds of around 5.3 s each. Experimental results were performed in a database of 29 sounds. Sensitivity and specificity were evaluated over several random subsets of these signals. The obtained values for sensitivity and specificity, without additional noise, were, respectively, 0.957 and 0.918. With the addition of +6 dB of pink noise, sensitivity and specificity were 0.822 and 0.942, respectively. The in-field learning strategy presented no significant change in sensitivity and a total decrease of 5.4% in specificity when progressively increasing the number of reference sounds from 1 to 9 under noisy conditions. The minimal signal-to-noise ration required by the prototype to correctly recognize sounds was between -8 dB and 3 dB. These results show that the proposed method and implementation have great potential for several real life applications.
Ahmad Fadzil ARIF Hidekazu TAKAHASHI Akira IWATA Toshio TSUTSUMIDA
This paper compares some popular character recognition techniques which have been proposed until today. 17 feature extraction methods and 4 neural network based recognition processes were used in handwritten numerals (postal codes) recognition. It was found that Weighted Direction Index Histogram, Peripheral Direction Contributivity Function and Expansion Cell feature extractions gave good results. As for the neural network recognition process, CombNET- and multi layer neural network showed good performances.
Keiji YAMANAKA Susumu KUROYANAGI Akira IWATA
Based on a previous work on handwritten Japanese kanji character recognition, a postprocessing system for handwritten Japanese address recognition is proposed. Basically, the recognition system is composed of CombNET-II, a general-purpose large-scale character recognizer and MMVA, a modified majority voting system. Beginning with a set of character candidates, produced by a character recognizer for each character that composes the input word and a lexicon, an interpretation to the input word is generated. MMVA is used in the postprocessing stage to select the interpretation that accumulates the highest score. In the case of more than one possible interpretation, the Conflict Analyzing System calls the character recognizer again to generate scores for each character that composes each interpretation to determine the final output word. The proposed word recognition system was tested with 2 sets of handwritten Japanese city names, and recognition rates higher than 99% were achieved, demonstrating the effectiveness of the method.
Md. Shoaib BHUIYAN Hiroshi MATSUO Akira IWATA Hideo FUJIMOTO Makoto SATOH
Existing edge detection methods provide unsatisfactory results when contrast changes largely within an image due to non-uniform illumination. Koch et al. developed an energy function based upon the Hopfield neural network, whose coefficients were fixed by trial and error, and remain constant for the entire image, irrespective of the differences in intensity level. This paper presents an improved edge detection method for non-uniformly illuminated images. We propose that the energy function coefficients for an image with inconsistent illumination should not remain fixed, rather should vary as a second-order function of the intensity differences between pixels, and actually use a schedule of changing coefficients. The results, compared with those of existing methods, suggest a better strategy for edge detection depending upon both the dynamic range of the original image pixel values as well as their contrast.
Mauricio KUGLER Susumu KUROYANAGI Anto Satriyo NUGROHO Akira IWATA
Modern applications of pattern recognition generate very large amounts of data, which require large computational effort to process. However, the majority of the methods intended for large-scale problems aim to merely adapt standard classification methods without considering if those algorithms are appropriated for large-scale problems. CombNET-II was one of the first methods specifically proposed for such kind of a task. Recently, an extension of this model, named CombNET-III, was proposed. The main modifications over the previous model was the substitution of the expert networks by Support Vectors Machines (SVM) and the development of a general probabilistic framework. Although the previous model's performance and flexibility were improved, the low accuracy of the gating network was still compromising CombNET-III's classification results. In addition, due to the use of SVM based experts, the computational complexity is higher than CombNET-II. This paper proposes a new two-layered gating network structure that reduces the compromise between number of clusters and accuracy, increasing the model's performance with only a small complexity increase. This high-accuracy gating network also enables the removal the low confidence expert networks from the decoding procedure. This, in addition to a new faster strategy for calculating multiclass SVM outputs significantly reduced the computational complexity. Experimental results of problems with large number of categories show that the proposed model outperforms the original CombNET-III, while presenting a computational complexity more than one order of magnitude smaller. Moreover, when applied to a database with a large number of samples, it outperformed all compared methods, confirming the proposed model's flexibility.
The performances of BPNN (neural network trained by back propagation) and PCANN (neural network which computes principal component analysis) for ECG data compression have been investigated from several points of view. We have compared them with an existing data compression method TOMEK. We used MIT/BIH arrhythmia database as ECG data. Both BPNN and PCANN showed better results than TOMEK. They showed 1.1 to 1.4 times higher compression than TOMEK to achieve the same accuracy of reproduction (13.0% of PRD and 99.0% of CC). While PCANN showed better learning ability than BPNN in simple learning task, BPNN was a little better than PCANN regarding compression rates. Observing the reproduced waveforms, BPNN and PCANN had almost the same performance, and they were superior to TOMEK. The following characteristics were obtained from the experiments. Since PCANN is sensitive to the learning rate, we had to precisely control the learning rate while the learning is in progress. We also found the tendency that PCANN needs larger amount of iteration in learning than BPNN for getting the same performance. PCANN showed better learning ability than BPNN, however, the total learning cost were almost the same between BPNN and PCANN due to the large amount of iteration. We analyzed the connection weight patterns. Since PCANN has a clear mathematical background, its behavior can be explained theoretically. BPNN sometimes generated the connection weights which were similar to the principal components. We supposed that BPNN may occasionally generate those patterns, and performs well while doing that. Finally we concluded as follows. Although the difference of the performances is smal, it was always observed and PCANN never exceeded BPNN. When the ease of analysis or the relation to mathematics is important, PCANN is suitable. It will be useful for the study of the recorded data such as statistics.
Mauricio KUGLER Susumu KUROYANAGI Anto Satriyo NUGROHO Akira IWATA
Several research fields have to deal with very large classification problems, e.g. handwritten character recognition and speech recognition. Many works have proposed methods to address problems with large number of samples, but few works have been done concerning problems with large numbers of classes. CombNET-II was one of the first methods proposed for such a kind of task. It consists of a sequential clustering VQ based gating network (stem network) and several Multilayer Perceptron (MLP) based expert classifiers (branch networks). With the objectives of increasing the classification accuracy and providing a more flexible model, this paper proposes a new model based on the CombNET-II structure, the CombNET-III. The new model, intended for, but not limited to, problems with large number of classes, replaces the branch networks MLP with multiclass Support Vector Machines (SVM). It also introduces a new probabilistic framework that outputs posterior class probabilities, enabling the model to be applied in different scenarios (e.g. together with Hidden Markov Models). These changes permit the use of a larger number of smaller clusters, which reduce the complexity of the final classifiers. Moreover, the use of binary SVM with probabilistic outputs and a probabilistic decoding scheme permit the use of a pairwise output encoding on the branch networks, which reduces the computational complexity of the training stage. The experimental results show that the proposed model outperforms both the previous model CombNET-II and a single multiclass SVM, while presenting considerably smaller complexity than the latter. It is also confirmed that CombNET-III classification accuracy scales better with the increasing number of clusters, in comparison with CombNET-II.
Marcus Vinicius LAMAR Md. Shoaib BHUIYAN Akira IWATA
This paper presents a new neural network structure, called Temporal-CombNET (T-CombNET), dedicated to the time series analysis and classification. It has been developed from a large scale Neural Network structure, CombNET-II, which is designed to deal with a very large vocabulary, such as Japanese character recognition. Our specific modifications of the original CombNET-II model allow it to do temporal analysis, and to be used in large set of human movements recognition system. In T-CombNET structure one of most important parameter to be set is the space division criterion. In this paper we analyze some practical approaches and present an Interclass Distance Measurement based criterion. The T-CombNET performance is analyzed applying to in a practical problem, Japanese Kana finger spelling recognition. The obtained results show a superior recognition rate when compared to different neural network structures, such as Multi-Layer Perceptron, Learning Vector Quantization, Elman and Jordan Partially Recurrent Neural Networks, CombNET-II, k-NN, and the proposed T-CombNET structure.