Makoto SAKAI Norihide KITAOKA Seiichi NAKAGAWA
To precisely model the time dependency of features is one of the important issues for speech recognition. Segmental unit input HMM with a dimensionality reduction method has been widely used to address this issue. Linear discriminant analysis (LDA) and heteroscedastic extensions, e.g., heteroscedastic linear discriminant analysis (HLDA) or heteroscedastic discriminant analysis (HDA), are popular approaches to reduce dimensionality. However, it is difficult to find one particular criterion suitable for any kind of data set in carrying out dimensionality reduction while preserving discriminative information. In this paper, we propose a new framework which we call power linear discriminant analysis (PLDA). PLDA can be used to describe various criteria including LDA, HLDA, and HDA with one control parameter. In addition, we provide an efficient selection method using a control parameter without training HMMs nor testing recognition performance on a development data set. Experimental results show that the PLDA is more effective than conventional methods for various data sets.
Kohei MIYASE Kenta TERASHIMA Xiaoqing WEN Seiji KAJIHARA Sudhakar M. REDDY
If a test set for more complex faults than stuck-at faults is generated, higher defect coverage would be obtained. Such a test set, however, would have a large number of test vectors, and hence the test costs would go up. In this paper we propose a method to detect bridge defects with a test set initially generated for stuck-at faults in a full scan sequential circuit. The proposed method doesn't add new test vectors to the test set but modifies test vectors. Therefore there are no negative impacts on test data volume and test application time. The initial fault coverage for stuck-at faults of the test set is guaranteed with modified test vectors. In this paper we focus on detecting as many as possible non-feedback AND-type, OR-type and 4-way bridging faults, respectively. Experimental results show that the proposed method increases the defect coverage.
Yalan YE Zhi-Lin ZHANG Jia CHEN
Fetal electrocardiogram (FECG) extraction is of vital importance in biomedical signal processing. A promising approach is blind source extraction (BSE) emerging from the neural network fields, which is generally implemented in a semi-blind way. In this paper, we propose a robust extraction algorithm that can extract the clear FECG as the first extracted signal. The algorithm exploits the fact that the FECG signal's kurtosis value lies in a specific range, while the kurtosis values of other unwanted signals do not belong to this range. Moreover, the algorithm is very robust to outliers and its robustness is theoretically analyzed and is confirmed by simulation. In addition, the algorithm can work well in some adverse situations when the kurtosis values of some source signals are very close to each other. The above reasons mean that the algorithm is an appealing method which obtains an accurate and reliable FECG.
Satoshi KOBASHIKAWA Satoshi TAKAHASHI
Users require speech recognition systems that offer rapid response and high accuracy concurrently. Speech recognition accuracy is degraded by additive noise, imposed by ambient noise, and convolutional noise, created by space transfer characteristics, especially in distant talking situations. Against each type of noise, existing model adaptation techniques achieve robustness by using HMM-composition and CMN (cepstral mean normalization). Since they need an additive noise sample as well as a user speech sample to generate the models required, they can not achieve rapid response, though it may be possible to catch just the additive noise in a previous step. In the previous step, the technique proposed herein uses just the additive noise to generate an adapted and normalized model against both types of noise. When the user's speech sample is captured, only online-CMN need be performed to start the recognition processing, so the technique offers rapid response. In addition, to cover the unpredictable S/N values possible in real applications, the technique creates several S/N HMMs. Simulations using artificial speech data show that the proposed technique increased the character correct rate by 11.62% compared to CMN.
Modeling visual attention provides an alternative methodology to image description in many applications such as adaptive content delivery and image retrieval. In this paper, we propose a robust approach to the modeling bottom-up visual attention. The main contributions are twofold: 1) We use a principal component analysis (PCA) to transform the RGB color space into three principal components, which intrinsically leads to an opponent representation of colors to ensure good saliency analysis. 2) A practicable framework for modeling visual attention is presented based on a region-level reliability analysis for each feature map. And then the salient map can be robustly generated for a variety of nature images. Experiments show that the proposed algorithm is effective and can characterize the human perception well.
Mohammad NURUL HUDA Muhammad GHULAM Takashi FUKUDA Kouichi KATSURADA Tsuneo NITTA
This paper describes a robust automatic speech recognition (ASR) system with less computation. Acoustic models of a hidden Markov model (HMM)-based classifier include various types of hidden factors such as speaker-specific characteristics, coarticulation, and an acoustic environment, etc. If there exists a canonicalization process that can recover the degraded margin of acoustic likelihoods between correct phonemes and other ones caused by hidden factors, the robustness of ASR systems can be improved. In this paper, we introduce a canonicalization method that is composed of multiple distinctive phonetic feature (DPF) extractors corresponding to each hidden factor canonicalization, and a DPF selector which selects an optimum DPF vector as an input of the HMM-based classifier. The proposed method resolves gender factors and speaker variability, and eliminates noise factors by applying the canonicalzation based on the DPF extractors and two-stage Wiener filtering. In the experiment on AURORA-2J, the proposed method provides higher word accuracy under clean training and significant improvement of word accuracy in low signal-to-noise ratio (SNR) under multi-condition training compared to a standard ASR system with mel frequency ceptral coeffient (MFCC) parameters. Moreover, the proposed method requires a reduced, two-fifth, Gaussian mixture components and less memory to achieve accurate ASR.
This paper presents a methodology for performing on-line voltage risk identification (VRI) in power supply networks using hyperrectangular composite neural networks (HRCNNs) and synchronized phasor measurements. The FHRCNN presented in this study integrates the paradigm of neural networks with the concept of knowledge-based approaches, rendering them both more useful than when applied alone. The fuzzy rules extracted from the dynamic data relating to the power system formalize the knowledge applied by experts when conducting the voltage risk assessment procedure. The efficiency of the proposed technique is demonstrated via its application to the Taiwan Power Provider System (Tai-Power System) under various operating conditions. Overall, the results indicated that the proposed scheme achieves a minimum 97 % success rate in determining the current voltage security level.
Bo YANG Hiroshi MURATA Shigetoshi NAKATAKE
This paper addresses the on-resistance (Ron) extraction of the DMOS based driver in Power IC designs. The proposed method can extract Ron of a driver from its layout data for the arbitrarily shaped metallization patterns. Such a driver is usually composed of arbitrarily shaped metals, arrayed vias, and DMOS transistors. We use FEM to extract the parasitic resistance of the source/drain metals since its strong contribution to Ron. In order to handle the large design case and accelerate the extraction process, a domain decomposition with virtual terminal insertion method is introduced, which succeeds in extraction for a set of industrial test cases including those the FEM without domain decomposition failed in. For a layout in which the DMOS cells are regularly placed, a sub-domain reuse procedure is also proposed, which obtained a dramatic speedup for the extraction. Even without the sub-domain reuse, our method still shows advantage in runtime and memory usage according to the simulation results.
A fair exchange scheme is a protocol by which two parties Alice and Bob exchange items or services without allowing either party to gain advantages by quitting prematurely or otherwise misbehaving. To this end, modern cryptographic solutions use a semi-trusted arbitrator who involves only in cases where one party attempts to cheat or simply crashes. We call such a fair exchange scheme optimistic. When no registration is required between the signer and the arbitrator, we say that the fair exchange scheme is setup-free. To date, the setup-free optimist fair exchange scheme under the standard RSA assumption was only possible from the generic construction of [12], which uses ring signatures. In this paper, we introduce a new setup-free optimistic fair exchange scheme under the standard RSA assumption. Our scheme uses the GQ identity-based signature and is more efficient than [12]. The construction can also be generalized by using various identity-based signature schemes. Our main technique is to allow each user to choose his (or her) own "random" public key in the identity-based signature scheme.
In this letter, a semi-automatic method for road network extraction from high-resolution satellite images is proposed. First, we focus on detecting the seed points in candidate road regions using a method of self-organizing map (SOM). Then, an approach to road tracking is presented, searching for connected points in the direction and candidate domain of a road. A study of Geographical Information Systems (GIS) using high-resolution satellite images is presented in this letter. Experimental results verified the effectiveness and efficiency of this approach.
Tri-Thanh NGUYEN Akira SHIMAZU
Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the hierarchy of NE types, there are still a fixed number of NE types. In real applications, such as question answering or semantic search systems, users may be interested in more diverse specific NE types. This paper proposes a method to extract categories of person named entities from text documents. Based on Dual Iterative Pattern Relation Extraction method, we develop a more suitable model for solving our problem, and explore the generation of different pattern types. A method for validating whether a category is valid or not is proposed to improve the performance, and experiments on Wall Street Journal corpus give promising results.
Expansion of imagination is crucial for lively creativity. However, such expansion is sometimes rather difficult and an environment which supports creativity is required. Because people can attain higher creativity by using words with a thematic relation rather than words with a taxonomical relation, we tried to extract word lists having thematic relations among words. We first extracted word lists from domain specific documents by utilizing inclusive relations between words based on a modifiee/modifier relationship in documents. Next, from the extracted word lists, we removed the word lists having taxonomical relations so as to obtain only word lists having thematic relations. Finally, based on the assumption what kind of knowledge a person can associate when he/she looks at a set of words correlates with how the word set is effective in creativity support, we examined whether the word lists direct us to informative pages on the Web for verifying the availability of our extracted word lists.
Virach SORNLERTLAMVANICH Thatsanee CHAROENPORN Shisanu TONGCHIM Canasai KRUENGKRAI Hitoshi ISAHARA
Several approaches have been studied to cope with the exceptional features of non-segmented languages. When there is no explicit information about the boundary of a word, segmenting an input text is a formidable task in language processing. Not only the contemporary word list, but also usages of the words have to be maintained to cover the use in the current texts. The accuracy and efficiency in higher processing do heavily rely on this word boundary identification task. In this paper, we introduce some statistical based approaches to tackle the problem due to the ambiguity in word segmentation. The word boundary identification problem is then defined as a part of others for performing the unified language processing in total. To exhibit the ability in conducting the unified language processing, we selectively study the tasks of language identification, word extraction, and dictionary-less search engine.
Kenji IDE Ryusuke KAWAHARA Satoshi SHIMIZU Takayuki HAMAMOTO
We have investigated real-time object tracking using a wide view imaging system. For the system, we have designed and fabricated new smart image sensor with four functions effective in wide view imaging, such as a random access function. In this system, eight smart sensors and an octagonal mirror are used and each image obtained by the sensors is equivalent to a partial image of the wide view. In addition, by using an FPGA for processing, the circuits in this system can be scaled down and a panoramic image can be obtained in real time. For object tracking using this system, the object-detection method based on background subtraction is used. When moving objects are detected in the panoramic image, the objects are constantly displayed on the monitor at higher resolution in real time. In this paper, we describe the random access image sensor and show some results obtained using this sensor. In addition, we describe the wide view imaging system using eight sensors. Furthermore, we explain the method of object tracking in this system and show the results of real-time multipl-object tracking.
Haruna MATSUSHITA Yoshifumi NISHIO
Since we can accumulate a large amount of data including useless information in recent years, it is important to investigate various extraction method of clusters from data including much noises. The Self-Organizing Map (SOM) has attracted attention for clustering nowadays. In this study, we propose a method of using plural SOMs (TSOM: Tentacled SOM) for effective data extraction. TSOM consists of two kinds of SOM whose features are different, namely, one self-organizes the area where input data are concentrated, and the other self-organizes the whole of the input space. Each SOM of TSOM can catch the information of other SOMs existing in its neighborhood and self-organizes with the competing and accommodating behaviors. We apply TSOM to data extraction from input data including much noise, and can confirm that TSOM successfully extracts only clusters even in the case that we do not know the number of clusters in advance.
Chaveevan PECHSIRI Asanee KAWTRAKUL
This research aims to develop automatic knowledge mining of causality from texts for supporting an automatic question answering system (QA) in answering 'why' question, which is among the most crucial forms of questions. The out come of this research will assist people in diagnosing problems, such as in plant diseases, health, industrial and etc. While the previous works have extracted causality knowledge within only one or two adjacent EDUs (Elementary Discourse Units), this research focuses to mine causality knowledge existing within multiple EDUs which takes multiple causes and multiple effects in to consideration, where the adjacency between cause and effect is unnecessary. There are two main problems: how to identify the interesting causality events from documents, and how to identify the boundaries of the causative unit and the effective unit in term of the multiple EDUs. In addition, there are at least three main problems involved in boundaries identification: the implicit boundary delimiter, the nonadjacent cause-consequence, and the effect surrounded by causes. This research proposes using verb-pair rules learnt by comparing the Naïve Bayes classifier (NB) and Support Vector Machine (SVM) to identify causality EDUs in Thai agricultural and health news domains. The boundary identification problems are solved by utilizing verb-pair rules, Centering Theory and cue phrase set. The reason for emphasizing on using verbs to extract causality is that they explicitly make, in a certain way, the consequent events of cause-effect, e.g. 'Aphids suck the sap from rice leaves. Then leaves will shrink. Later, they will become yellow and dry.'. The outcome of the proposed methodology shown that the verb-pair rules extracted from NB outperform those extracted from SVM when the corpus contains high occurence of each verb, while the results from SVM is better than NB when the corpus contains less occurence of each verb. The verb-pair rules extracted from NB for causality extraction has the highest precision (0.88) with the recall of 0.75 from the plant disease corpus whereas from SVM has the highest precision (0.89) with the recall of 0.76 from bird flu news. For boundary determination, our methodology can handle very well with approximate 96% accuracy. In addition, the extracted causality results from this research can be generalized as laws in the Inductive-Statistical theory of Hempel's explanation theory, which will be useful for QA and reasoning.
Naoya MOCHIKI Tetsuji OGAWA Tetsunori KOBAYASHI
A new type of sound source segregation method using robot-mounted microphones, which are free from strict head related transfer function (HRTF) estimation, has been proposed and successfully applied to three simultaneous speech recognition systems. The proposed segregation method is executed with sound intensity differences that are due to the particular arrangement of the four directivity microphones and the existence of a robot head acting as a sound barrier. The proposed method consists of three-layered signal processing: two-line SAFIA (binary masking based on the narrow band sound intensity comparison), two-line spectral subtraction and their integration. We performed 20 K vocabulary continuous speech recognition test in the presence of three speakers' simultaneous talk, and achieved more than 70% word error reduction compared with the case without any segregation processing.
Jianfeng XU Toshihiko YAMASAKI Kiyoharu AIZAWA
3D video, which consists of a sequence of mesh models, can reproduce dynamic scenes containing 3D information. To summarize 3D video, a key frame extraction method is developed using rate-distortion (R-D) trade-off. For this purpose, an effective feature vector is extracted for each frame. Shot detection is performed using the feature vectors as a preprocessing followed by key frame extraction. Simple but reasonable definitions of rate and distortion are presented. Based on an assumption of linearity, an R-D curve is generated in each shot, where the locations of the key frames are optimized. Finally, R-D trade-off can be achieved by optimizing a cost function using a Lagrange multiplier, where the number of key frames is optimized in each shot. Therefore, our system will automatically determine the best locations and the number of key frames in the sense of R-D trade-off. Our experimental results show the extracted key frames are compact and faithful to the original 3D video.
Naoto MIURA Akio NAGASAKA Takafumi MIYATAKE
A biometrics system for identifying individuals using the pattern of veins in a finger was previously proposed. The system has the advantage of being resistant to forgery because the pattern is inside a finger. Infrared light is used to capture an image of a finger that shows the vein patterns, which have various widths and brightnesses that change temporally as a result of fluctuations in the amount of blood in the vein, depending on temperature, physical conditions, etc. To robustly extract the precise details of the depicted veins, we developed a method of calculating local maximum curvatures in cross-sectional profiles of a vein image. This method can extract the centerlines of the veins consistently without being affected by the fluctuations in vein width and brightness, so its pattern matching is highly accurate. Experimental results show that our method extracted patterns robustly when vein width and brightness fluctuated, and that the equal error rate for personal identification was 0.0009%, which is much better than that of conventional methods.
Young Woo LEE Sang Min LEE Yoon Sang JI Jong Shill LEE Young Joon CHEE Sung Hwa HONG Sun I. KIM In Young KIM
Digital hearing aid users often complain of difficulty in understanding speech in the presence of background noise. To improve speech perception in a noisy environment, various speech enhancement algorithms have been applied in digital hearing aids. In this study, a speech enhancement algorithm using modified spectral subtraction and companding is proposed for digital hearing aids. We adjusted the biases of the estimated noise spectrum, based on a subtraction factor, to decrease the residual noise. Companding was applied to the channel of the formant frequency based on the speech presence indicator to enhance the formant. Noise suppression was achieved while retaining weak speech components and avoiding the residual noise phenomena. Objective and subjective evaluation under various environmental conditions confirmed the improvement due to the proposed algorithm. We tested segmental SNR and Log Likelihood Ratio (LLR), which have higher correlation with subjective measures. Segmental SNR has the highest and LLR the lowest correlation of the methods tested. In addition, we confirmed by spectrogram that the proposed method significantly reduced the residual noise and enhanced the formants. A mean opinion score that represented the global perception score was tested; this produced the highest quality speech using the proposed method. The results show that the proposed speech enhancement algorithm is beneficial for hearing aid users in noisy environments.