Sung Soo KIM Chang Woo HAN Nam Soo KIM
In this letter, we present useful features accounting for pronunciation prominence and propose a classification technique for prominence detection. A set of phone-specific features are extracted based on a forced alignment of the test pronunciation provided by a speech recognition system. These features are then applied to the traditional classifiers such as the support vector machine (SVM), artificial neural network (ANN) and adaptive boosting (Adaboost) for detecting the place of prominence.
In this paper, we propose a novel feature named histogram of template (HOT) for human detection in still images. For every pixel of an image, various templates are defined, each of which contains the pixel itself and two of its neighboring pixels. If the texture and gradient values of the three pixels satisfy a pre-defined formula, the central pixel is regarded to meet the corresponding template for this formula. Histograms of pixels meeting various templates are calculated for a set of formulas, and combined to be the feature for detection. Compared to the other features, the proposed feature takes texture as well as the gradient information into consideration. Besides, it reflects the relationship between 3 pixels, instead of focusing on only one. Experiments for human detection are performed on INRIA dataset, which shows the proposed HOT feature is more discriminative than histogram of orientated gradient (HOG) feature, under the same training method.
In this study, a discriminative weight training is applied to a support vector machine (SVM) based speech/music classification for a 3GPP2 selectable mode vocoder (SMV). In the proposed approach, the speech/music decision rule is derived by the SVM by incorporating optimally weighted features derived from the SMV based on a minimum classification error (MCE) method. This method differs from that of the previous work in that different weights are assigned to each feature of the SMV a novel process. According to the experimental results, the proposed approach is effective for speech/music classification using the SVM.
Dipankar DAS Yoshinori KOBAYASHI Yoshinori KUNO
This paper proposes an integrated approach to simultaneous detection and localization of multiple object categories using both generative and discriminative models. Our approach consists of first generating a set of hypotheses for each object category using a generative model (pLSA) with a bag of visual words representing each object. Based on the variation of objects within a category, the pLSA model automatically fits to an optimal number of topics. Then, the discriminative part verifies each hypothesis using a multi-class SVM classifier with merging features that combines spatial shape and appearance of an object. In the post-processing stage, environmental context information along with the probabilistic output of the SVM classifier is used to improve the overall performance of the system. Our integrated approach with merging features and context information allows reliable detection and localization of various object categories in the same image. The performance of the proposed framework is evaluated on the various standards (MIT-CSAIL, UIUC, TUD etc.) and the authors' own datasets. In experiments we achieved superior results to some state of the art methods over a number of standard datasets. An extensive experimental evaluation on up to ten diverse object categories over thousands of images demonstrates that our system works for detecting and localizing multiple objects within an image in the presence of cluttered background, substantial occlusion, and significant scale changes.
A novel age estimation method is presented which improves performance by fusing complementary information acquired from global and local features of the face. Two-directional two-dimensional principal component analysis ((2D)2PCA) is used for dimensionality reduction and construction of individual feature spaces. Each feature space contributes a confidence value which is calculated by Support vector machines (SVMs). The confidence values of all the facial features are then fused for final age estimation. Experimental results demonstrate that fusing multiple facial features can achieve significant accuracy gains over any single feature. Finally, we propose a fusion method that further improves accuracy.
Shan ZHONG Yuxiang SHAN Liang HE Jia LIU
One of the most important challenges in speaker recognition is intersession variability (ISV), primarily cross-channel effects. Recent NIST speaker recognition evaluations (SRE) include a multilingual scenario with training conversations involving multilingual speakers collected in a number of other languages, leading to further performance decline. One important reason for this is that more and more researchers are using phonetic clustering to introduce high level information to improve speaker recognition. But such language dependent methods do not work well in multilingual conditions. In this paper, we study both language and channel mismatch using a support vector machine (SVM) speaker recognition system. Maximum likelihood linear regression (MLLR) transforms adapting a universal background model (UBM) are adopted as features. We first introduce a novel language independent statistical binary-decision tree to reduce multi-language effects, and compare this data-driven approach with a traditional knowledge based one. We also construct a framework for channel compensation using feature-domain latent factor analysis (LFA) and MLLR supervector kernel-based nuisance attribute projection (NAP) in the model-domain. Results on the NIST SRE 2006 1conv4w-1conv4w/mic corpus show significant improvement. We also compare our compensated MLLR-SVM system with state-of-the-art cepstral Gaussian mixture and SVM systems, and combine them for a further improvement.
Jungsuk SONG Hiroki TAKAKURA Yasuo OKABE Yongjin KWON
Intrusion detection system (IDS) has played an important role as a device to defend our networks from cyber attacks. However, since it is unable to detect unknown attacks, i.e., 0-day attacks, the ultimate challenge in intrusion detection field is how we can exactly identify such an attack by an automated manner. Over the past few years, several studies on solving these problems have been made on anomaly detection using unsupervised learning techniques such as clustering, one-class support vector machine (SVM), etc. Although they enable one to construct intrusion detection models at low cost and effort, and have capability to detect unforeseen attacks, they still have mainly two problems in intrusion detection: a low detection rate and a high false positive rate. In this paper, we propose a new anomaly detection method based on clustering and multiple one-class SVM in order to improve the detection rate while maintaining a low false positive rate. We evaluated our method using KDD Cup 1999 data set. Evaluation results show that our approach outperforms the existing algorithms reported in the literature; especially in detection of unknown attacks.
In this letter, we propose a novel approach to speech/music classification based on the support vector machine (SVM) to improve the performance of the 3GPP2 selectable mode vocoder (SMV) codec. We first analyze the features and the classification method used in real time speech/music classification algorithm in SMV, and then apply the SVM for enhanced speech/music classification. For evaluation of performance, we compare the proposed algorithm and the traditional algorithm of the SMV. The performance of the proposed system is evaluated under the various environments and shows better performance compared to the original method in the SMV.
Hiroyuki NARITA Yasumasa SAWAMURA Akira HAYASHI
One of the advantages of the kernel methods is that they can deal with various kinds of objects, not necessarily vectorial data with a fixed number of attributes. In this paper, we develop kernels for time series data using dynamic time warping (DTW) distances. Since DTW distances are pseudo distances that do not satisfy the triangle inequality, a kernel matrix based on them is not positive semidefinite, in general. We use semidefinite programming (SDP) to guarantee the positive definiteness of a kernel matrix. We present neighborhood preserving embedding (NPE), an SDP formulation to obtain a kernel matrix that best preserves the local geometry of time series data. We also present an out-of-sample extension (OSE) for NPE. We use two applications, time series classification and time series embedding for similarity search, to validate our approach.
Yufeng ZHAO Yao ZHAO Zhenfeng ZHU Jeng-Shyang PAN
A novel automatic image annotation (AIA) scheme is proposed based on multiple-instance learning (MIL). For a given concept, manifold ranking (MR) is first employed to MIL (referred as MR-MIL) for effectively mining the positive instances (i.e. regions in images) embedded in the positive bags (i.e. images). With the mined positive instances, the semantic model of the concept is built by the probabilistic output of SVM classifier. The experimental results reveal that high annotation accuracy can be achieved at region-level.
Kye-Hwan LEE Sang-Ick KANG Deok-Hwan KIM Joon-Hyuk CHANG
We propose an effective voice-based gender identification method using a support vector machine (SVM). The SVM is a binary classification algorithm that classifies two groups by finding the voluntary nonlinear boundary in a feature space and is known to yield high classification performance. In the present work, we compare the identification performance of the SVM with that of a Gaussian mixture model (GMM)-based method using the mel frequency cepstral coefficients (MFCC). A novel approach of incorporating a features fusion scheme based on a combination of the MFCC and the fundamental frequency is proposed with the aim of improving the performance of gender identification. Experimental results demonstrate that the gender identification performance using the SVM is significantly better than that of the GMM-based scheme. Moreover, the performance is substantially improved when the proposed features fusion technique is applied.
Kyoko SUDO Tatsuya OSAWA Kaoru WAKABAYASHI Hideki KOIKE Kenichi ARAKAWA
We have proposed a method to detect and quantitatively extract anomalies from surveillance videos. Using our method, anomalies are detected as patterns based on spatio-temporal features that are outliers in new feature space. Conventional anomaly detection methods use features such as tracks or local spatio-temporal features, both of which provide insufficient timing information. Using our method, the principal components of spatio-temporal features of change are extracted from the frames of video sequences of several seconds duration. This enables anomalies based on movement irregularity, both position and speed, to be determined and thus permits the automatic detection of anomal events in sequences of constant length without regard to their start and end. We used a 1-class SVM, which is a non-supervised outlier detection method. The output from the SVM indicates the distance between the outlier and the concentrated base pattern. We demonstrated that the anomalies extracted using our method subjectively matched perceived irregularities in the pattern of movements. Our method is useful in surveillance services because the captured images can be shown in the order of anomality, which significantly reduces the time needed.
Service robots need to be able to recognize and identify objects located within complex backgrounds. Since no single method may work in every situation, several methods need to be combined and robots have to select the appropriate one automatically. In this paper we propose a scheme to classify situations depending on the characteristics of the object of interest and user demand. We classify situations into four groups and employ different techniques for each. We use Scale-invariant feature transform (SIFT), Kernel Principal Components Analysis (KPCA) in conjunction with Support Vector Machine (SVM) using intensity, color, and Gabor features for five object categories. We show that the use of appropriate features is important for the use of KPCA and SVM based techniques on different kinds of objects. Through experiments we show that by using our categorization scheme a service robot can select an appropriate feature and method, and considerably improve its recognition performance. Yet, recognition is not perfect. Thus, we propose to combine the autonomous method with an interactive method that allows the robot to recognize the user request for a specific object and class when the robot fails to recognize the object. We also propose an interactive way to update the object model that is used to recognize an object upon failure in conjunction with the user's feedback.
Jong Shill LEE Baek Hwan CHO Young Joon CHEE In Young KIM Sun I. KIM
We propose a new approach to personal identification using derived vectorcardiogram (dVCG). The dVCG was calculated from recorded ECG using inverse Dower transform. Twenty-one features were extracted from the resulting dVCG. To analyze the effect of each feature and to improve efficiency while maintaining the performance, we performed feature selection using the Relief-F algorithm using these 21 features. Each set of the eight highest ranked features and all 21 features were used in SVM learning and in tests, respectively. The classification accuracy using the entire feature set was 99.53 %. However, using only the eight highest ranked features, the classification accuracy was 99.07 %, indicating only a 0.46 % decrease in accuracy compared with the accuracy achieved using the entire feature set. Using only the eight highest ranked features, the conventional ECG method resulted in a 93 % recognition rate, whereas our method achieved >99 % recognition rate, over 6 % higher than the conventional ECG method. Our experiments show that it is possible to perform a personal identification using only eight features extracted from the dVCG.
This paper describes a method of producing segmentation point candidates for on-line handwritten Japanese text by a support vector machine (SVM) to improve text recognition. This method extracts multi-dimensional features from on-line strokes of handwritten text and applies the SVM to the extracted features to produces segmentation point candidates. We incorporate the method into the segmentation by recognition scheme based on a stochastic model which evaluates the likelihood composed of character pattern structure, character segmentation, character recognition and context to finally determine segmentation points and recognize handwritten Japanese text. This paper also shows the details of generating segmentation point candidates in order to achieve high discrimination rate by finding the optimal combination of the segmentation threshold and the concatenation threshold. We compare the method for segmentation by the SVM with that by a neural network (NN) using the database HANDS-Kondate_t_bf-2001-11 and show the result that the method by the SVM bring about a better segmentation rate and character recognition rate.
In this paper, we describe a two-phase method for biomedical named entity recognition consisting of term boundary detection and biomedical category labeling. The term boundary detection can be defined as a task to assign label sequences to a given sentence, and biomedical category labeling can be viewed as a local classification problem which does not need knowledge of the labels of other named entities in a sentence. The advantage of dividing the recognition process into two phases is that we can measure the effectiveness of models at each phase and select separately the appropriate model for each subtask. In order to obtain a better performance in biomedical named entity recognition, we conducted comparative experiments using several learning methods at each phase. Moreover, results by these machine learning based models are refined by rule-based postprocessing. We tested our methods on the JNLPBA 2004 shared task and the GENIA corpus.
Tomohiro MITSUMORI Masaki MURATA Yasushi FUKUDA Kouichi DOI Hirohumi DOI
Automated information extraction systems from biomedical text have been reported. Some systems are based on manually developed rules or pattern matching. Manually developed rules are specific for analysis, however, new rules must be developed for each new domain. Although the corpus must be developed by human effort, a machine-learning approach automatically learns the rules from the corpus. In this article, we present a system for automatically extracting protein-protein interaction information from biomedical text with support vector machines (SVMs). We describe the performance of our system and compare its ability to extract protein-protein interaction information with that of other systems.
A multi-stage approach -- which is fast, robust and easy to train -- for a face-detection system is proposed. Motivated by the work of Viola and Jones [1], this approach uses a cascade of classifiers to yield a coarse-to-fine strategy to reduce significantly detection time while maintaining a high detection rate. However, it is distinguished from previous work by two features. First, a new stage has been added to detect face candidate regions more quickly by using a larger window size and larger moving step size. Second, support vector machine (SVM) classifiers are used instead of AdaBoost classifiers in the last stage, and Haar wavelet features selected by the previous stage are reused for the SVM classifiers robustly and efficiently. By combining AdaBoost and SVM classifiers, the final system can achieve both fast and robust detection because most non-face patterns are rejected quickly in earlier layers, while only a small number of promising face patterns are classified robustly in later layers. The proposed multi-stage-based system has been shown to run faster than the original AdaBoost-based system while maintaining comparable accuracy.
Shinjiro KAWATO Nobuji TETSUTANI Kenichi HOSAKA
In this paper, we propose a method for detecting and tracking faces in video sequences in real time. It can be applied to a wide range of face scales. Our basic strategy for detection is fast extraction of face candidates with a Six-Segmented Rectangular (SSR) filter and face verification by a support vector machine. A motion cue is used in a simple way to avoid picking up false candidates in the background. In face tracking, the patterns of between-the-eyes are tracked while updating the matching template. To cope with various scales of faces, we use a series of approximately 1/ scale-down images, and an appropriate scale is selected according to the distance between the eyes. We tested our algorithm on 7146 video frames of a news broadcast featuring sign language at 320240 frame size, in which one or two persons appeared. Although gesturing hands often hid faces and interrupted tracking, 89% of faces were correctly tracked. We implemented the system on a PC with a Xeon 2.2-GHz CPU, running at 15 frames/second without any special hardware.
Masqueraders who impersonate other users pose serious threat to computer security. Unfortunately, firewalls or misuse-based intrusion detection systems are generally ineffective in detecting masqueraders. Anomaly detection techniques have been proposed as a complementary approach to overcome such limitations. However, they are not accurate enough in detection, and the rate of false alarm is too high for the technique to be applied in practice. For example, recent empirical studies on masquerade detection using UNIX commands found the accuracy to be below 70%. In this research, we performed a comparative study to investigate the effectiveness of SVM (Support Vector Machine) technique using the same data set and configuration reported in the previous experiments. In order to improve accuracy of masquerade detection, we used command frequencies in sliding windows as feature sets. In addition, we chose to ignore commands commonly used by all the users and introduce the concept of voting engine. Though still imperfect, we were able to improve the accuracy of masquerade detection to 80.1% and 94.8%, whereas previous studies reported accuracy of 69.3% and 62.8% in the same configurations. This study convincingly demonstrates that SVM is useful as an anomaly detection technique and that there are several advantages SVM offers as a tool to detect masqueraders.