The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] unsupervised(46hit)

21-40hit(46hit)

  • Unsupervised Fingerprint Recognition

    Wei-Ho TSAI  Jun-Wei LIN  Der-Chang TSENG  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E96-D No:9
      Page(s):
    2115-2125

    This study extends conventional fingerprint recognition from a supervised to an unsupervised framework. Instead of enrolling fingerprints from known persons to identify unknown fingerprints, our aim is to partition a collection of unknown fingerprints into clusters, so that each cluster consists of fingerprints from the same finger and the number of generated clusters equals the number of distinct fingers involved in the collection. Such an unsupervised framework is helpful to handle the situation where a collection of captured fingerprints are not from the enrolled people. The task of fingerprint clustering is formulated as a problem of minimizing the clustering errors characterized by the Rand index. We estimate the Rand index by computing the similarities between fingerprints and then apply a genetic algorithm to minimize the Rand index. Experiments conducted using the FVC2002 database show that the proposed fingerprint clustering method outperforms an intuitive method based on hierarchical agglomerative clustering. The experiments also show that the number of clusters determined by our system is close to the true number of distinct fingers involved in the collection.

  • Extreme Maximum Margin Clustering

    Chen ZHANG  ShiXiong XIA  Bing LIU  Lei ZHANG  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E96-D No:8
      Page(s):
    1745-1753

    Maximum margin clustering (MMC) is a newly proposed clustering method that extends the large-margin computation of support vector machine (SVM) to unsupervised learning. Traditionally, MMC is formulated as a nonconvex integer programming problem which makes it difficult to solve. Several methods rely on reformulating and relaxing the nonconvex optimization problem as semidefinite programming (SDP) or second-order cone program (SOCP), which are computationally expensive and have difficulty handling large-scale data sets. In linear cases, by making use of the constrained concave-convex procedure (CCCP) and cutting plane algorithm, several MMC methods take linear time to converge to a local optimum, but in nonlinear cases, time complexity is still high. Since extreme learning machine (ELM) has achieved similar generalization performance at much faster learning speed than traditional SVM and LS-SVM, we propose an extreme maximum margin clustering (EMMC) algorithm based on ELM. It can perform well in nonlinear cases. Moreover, the kernel parameters of EMMC need not be tuned by means of random feature mappings. Experimental results on several real-world data sets show that EMMC performs better than traditional MMC methods, especially in handling large-scale data sets.

  • Sub-Category Optimization through Cluster Performance Analysis for Multi-View Multi-Pose Object Detection

    Dipankar DAS  Yoshinori KOBAYASHI  Yoshinori KUNO  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E94-D No:7
      Page(s):
    1467-1478

    The detection of object categories with large variations in appearance is a fundamental problem in computer vision. The appearance of object categories can change due to intra-class variations, background clutter, and changes in viewpoint and illumination. For object categories with large appearance changes, some kind of sub-categorization based approach is necessary. This paper proposes a sub-category optimization approach that automatically divides an object category into an appropriate number of sub-categories based on appearance variations. Instead of using predefined intra-category sub-categorization based on domain knowledge or validation datasets, we divide the sample space by unsupervised clustering using discriminative image features. We then use a cluster performance analysis (CPA) algorithm to verify the performance of the unsupervised approach. The CPA algorithm uses two performance metrics to determine the optimal number of sub-categories per object category. Furthermore, we employ the optimal sub-category representation as the basis and a supervised multi-category detection system with χ2 merging kernel function to efficiently detect and localize object categories within an image. Extensive experimental results are shown using a standard and the authors' own databases. The comparison results reveal that our approach outperforms the state-of-the-art methods.

  • Unsupervised Feature Selection and Category Classification for a Vision-Based Mobile Robot

    Masahiro TSUKADA  Yuya UTSUMI  Hirokazu MADOKORO  Kazuhito SATO  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E94-D No:1
      Page(s):
    127-136

    This paper presents an unsupervised learning-based method for selection of feature points and object category classification without previous setting of the number of categories. Our method consists of the following procedures: 1)detection of feature points and description of features using a Scale-Invariant Feature Transform (SIFT), 2)selection of target feature points using One Class-Support Vector Machines (OC-SVMs), 3)generation of visual words of all SIFT descriptors and histograms in each image of selected feature points using Self-Organizing Maps (SOMs), 4)formation of labels using Adaptive Resonance Theory-2 (ART-2), and 5)creation and classification of categories on a category map of Counter Propagation Networks (CPNs) for visualizing spatial relations between categories. Classification results of static images using a Caltech-256 object category dataset and dynamic images using time-series images obtained using a robot according to movements respectively demonstrate that our method can visualize spatial relations of categories while maintaining time-series characteristics. Moreover, we emphasize the effectiveness of our method for category classification of appearance changes of objects.

  • A Comparative Study of Unsupervised Anomaly Detection Techniques Using Honeypot Data

    Jungsuk SONG  Hiroki TAKAKURA  Yasuo OKABE  Daisuke INOUE  Masashi ETO  Koji NAKAO  

     
    PAPER-Information Network

      Vol:
    E93-D No:9
      Page(s):
    2544-2554

    Intrusion Detection Systems (IDS) have been received considerable attention among the network security researchers as one of the most promising countermeasures to defend our crucial computer systems or networks against attackers on the Internet. Over the past few years, many machine learning techniques have been applied to IDSs so as to improve their performance and to construct them with low cost and effort. Especially, unsupervised anomaly detection techniques have a significant advantage in their capability to identify unforeseen attacks, i.e., 0-day attacks, and to build intrusion detection models without any labeled (i.e., pre-classified) training data in an automated manner. In this paper, we conduct a set of experiments to evaluate and analyze performance of the major unsupervised anomaly detection techniques using real traffic data which are obtained at our honeypots deployed inside and outside of the campus network of Kyoto University, and using various evaluation criteria, i.e., performance evaluation by similarity measurements and the size of training data, overall performance, detection ability for unknown attacks, and time complexity. Our experimental results give some practical and useful guidelines to IDS researchers and operators, so that they can acquire insight to apply these techniques to the area of intrusion detection, and devise more effective intrusion detection models.

  • An Unsupervised Model of Redundancy for Answer Validation

    Youzheng WU  Hideki KASHIOKA  Satoshi NAKAMURA  

     
    PAPER-Natural Language Processing

      Vol:
    E93-D No:3
      Page(s):
    624-634

    Given a question and a set of its candidate answers, the task of answer validation (AV) aims to return a Boolean value indicating whether a given candidate answer is the correct answer to the question. Unlike previous works, this paper presents an unsupervised model, called the U-model, for AV. This approach regards AV as a classification task and investigates how effectively using redundancy of the Web into the proposed architecture. Experimental results with TREC factoid test sets and Chinese test sets indicate that the proposed U-model with redundancy information is very effective for AV. For example, the top@1/mrr@5 scores on the TREC05, and 06 tracks are 40.1/51.5% and 35.8/47.3%, respectively. Furthermore, a cross-model comparison experiment demonstrates that the U-model is the best among the redundancy-based models considered. Even compared with a syntax-based approach, a supervised machine learning approach and a pattern-based approach, the U-model performs much better.

  • An Efficient Initialization Scheme for SOM Algorithm Based on Reference Point and Filters

    Shu-Ling SHIEH  I-En LIAO  Kuo-Feng HWANG  Heng-Yu CHEN  

     
    PAPER-Data Mining

      Vol:
    E92-D No:3
      Page(s):
    422-432

    This paper proposes an efficient self-organizing map algorithm based on reference point and filters. A strategy called Reference Point SOM (RPSOM) is proposed to improve SOM execution time by means of filtering with two thresholds T1 and T2. We use one threshold, T1, to define the search boundary parameter used to search for the Best-Matching Unit (BMU) with respect to input vectors. The other threshold, T2, is used as the search boundary within which the BMU finds its neighbors. The proposed algorithm reduces the time complexity from O(n2) to O(n) in finding the initial neurons as compared to the algorithm proposed by Su et al. [16] . The RPSOM dramatically reduces the time complexity, especially in the computation of large data set. From the experimental results, we find that it is better to construct a good initial map and then to use the unsupervised learning to make small subsequent adjustments.

  • A New Approach to Unsupervised Target Classification for Polarimetric SAR Images

    Xing RONG  Weijie ZHANG  Jian YANG  Wen HONG  

     
    LETTER-Sensing

      Vol:
    E91-B No:6
      Page(s):
    2081-2084

    A new unsupervised classification method is proposed for polarimetric SAR images to keep the spatial coherence of pixels and edges of different kinds of targets simultaneously. We consider the label scale variability of images by combining Inhomogeneous Markov Random Field (MRF) and Bayes' theorem. After minimizing an energy function using an expansion algorithm based on Graph Cuts, we can obtain classification results that are discontinuity preserving. Using a NASA/JPL AIRSAR image, we demonstrate the effectiveness of the proposed method.

  • An Unsupervised Opinion Mining Approach for Japanese Weblog Reputation Information Using an Improved SO-PMI Algorithm

    Guangwei WANG  Kenji ARAKI  

     
    PAPER-Data Mining

      Vol:
    E91-D No:4
      Page(s):
    1032-1041

    In this paper, we propose an improved SO-PMI (Semantic Orientation Using Pointwise Mutual Information) algorithm, for use in Japanese Weblog Opinion Mining. SO-PMI is an unsupervised approach proposed by Turney that has been shown to work well for English. When this algorithm was translated into Japanese naively, most phrases, whether positive or negative in meaning, received a negative SO. For dealing with this slanting phenomenon, we propose three improvements: to expand the reference words to sets of words, to introduce a balancing factor and to detect neutral expressions. In our experiments, the proposed improvements obtained a well-balanced result: both positive and negative accuracy exceeded 62%, when evaluated on 1,200 opinion sentences sampled from three different domains (reviews of Electronic Products, Cars and Travels from Kakaku.com). In a comparative experiment on the same corpus, a supervised approach (SA-Demo) achieved a very similar accuracy to our method. This shows that our proposed approach effectively adapted SO-PMI for Japanese, and it also shows the generality of SO-PMI.

  • Cost Reduction of Acoustic Modeling for Real-Environment Applications Using Unsupervised and Selective Training

    Tobias CINCAREK  Tomoki TODA  Hiroshi SARUWATARI  Kiyohiro SHIKANO  

     
    PAPER-Acoustic Modeling

      Vol:
    E91-D No:3
      Page(s):
    499-507

    Development of an ASR application such as a speech-oriented guidance system for a real environment is expensive. Most of the costs are due to human labeling of newly collected speech data to construct the acoustic model for speech recognition. Employment of existing models or sharing models across multiple applications is often difficult, because the characteristics of speech depend on various factors such as possible users, their speaking style and the acoustic environment. Therefore, this paper proposes a combination of unsupervised learning and selective training to reduce the development costs. The employment of unsupervised learning alone is problematic due to the task-dependency of speech recognition and because automatic transcription of speech is error-prone. A theoretically well-defined approach to automatic selection of high quality and task-specific speech data from an unlabeled data pool is presented. Only those unlabeled data which increase the model likelihood given the labeled data are employed for unsupervised training. The effectivity of the proposed method is investigated with a simulation experiment to construct adult and child acoustic models for a speech-oriented guidance system. A completely human-labeled database which contains real-environment data collected over two years is available for the development simulation. It is shown experimentally that the employment of selective training alleviates the problems of unsupervised learning, i.e. it is possible to select speech utterances of a certain speaker group but discard noise inputs and utterances with lower recognition accuracy. The simulation experiment is carried out for several selected combinations of data collection and human transcription period. It is found empirically that the proposed method is especially effective if only relatively few of the collected data can be labeled and transcribed by humans.

  • Unsupervised Classification of Polarimetric SAR Images by EM Algorithm

    Kamran-Ullah KHAN  Jian YANG  Weijie ZHANG  

     
    PAPER-Sensing

      Vol:
    E90-B No:12
      Page(s):
    3632-3642

    In this paper, the expectation maximization (EM) algorithm is used for unsupervised classification of polarimetric synthetic aperture radar (SAR) images. The EM algorithm provides an estimate of the parameters of the underlying probability distribution functions (pdf's) for each class. The feature vector is 9-dimensional, consisting of the six magnitudes and three angles of the elements of a coherency matrix. Each of the elements of the feature vector is assigned a specific parametric pdf. In this work, all the features are supposed to be statistically independent. Then we present a two-stage unsupervised clustering procedure. The EM algorithm is first run for a few iterations to obtain an initial partition of, for example, four clusters. A randomly selected sample of, for example, 2% pixels of the polarimetric SAR image may be used for unsupervised training. In the second stage, the EM algorithm may be run again to reclassify the first stage clusters into smaller sub-clusters. Each cluster from the first stage will be processed separately in the second stage. This approach makes further classification possible as shown in the results. The training cost is also reduced as the number of feature vector in a specific cluster is much smaller than the whole image.

  • Reducing Computation Time of the Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics

    Randy GOMEZ  Tomoki TODA  Hiroshi SARUWATARI  Kiyohiro SHIKANO  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:2
      Page(s):
    554-561

    In real-time speech recognition applications, there is a need to implement a fast and reliable adaptation algorithm. We propose a method to reduce adaptation time of the rapid unsupervised speaker adaptation based on HMM-Sufficient Statistics. We use only a single arbitrary utterance without transcriptions in selecting the N-best speakers' Sufficient Statistics created offline to provide data for adaptation to a target speaker. Further reduction of N-best implies a reduction in adaptation time. However, it degrades recognition performance due to insufficiency of data needed to robustly adapt the model. Linear interpolation of the global HMM-Sufficient Statistics offsets this negative effect and achieves a 50% reduction in adaptation time without compromising the recognition performance. Furthermore, we compared our method with Vocal Tract Length Normalization (VTLN), Maximum A Posteriori (MAP) and Maximum Likelihood Linear Regression (MLLR). Moreover, we tested in office, car, crowd and booth noise environments in 10 dB, 15 dB, 20 dB and 25 dB SNRs.

  • Pruning-Based Unsupervised Segmentation for Korean

    In-Su KANG  Seung-Hoon NA  Jong-Hyeok LEE  

     
    PAPER-Natural Language Processing

      Vol:
    E89-D No:10
      Page(s):
    2670-2677

    Compound noun segmentation is a key component for Korean language processing. Supervised approaches require some types of human intervention such as maintaining lexicons, manually segmenting the corpora, or devising heuristic rules. Thus, they suffer from the unknown word problem, and cannot distinguish domain-oriented or corpus-directed segmentation results from the others. These problems can be overcome by unsupervised approaches that employ segmentation clues obtained purely from a raw corpus. However, most unsupervised approaches require tuning of empirical parameters or learning of the statistical dictionary. To develop a tuning-less, learning-free unsupervised segmentation algorithm, this study proposes a pruning-based unsupervised technique that eliminates unhelpful segmentation candidates. In addition, unlike previous unsupervised methods that have relied on purely character-based segmentation clues, this study utilizes word-based segmentation clues. Experimental evaluations show that the pruning scheme is very effective to unsupervised segmentation of Korean compound nouns, and the use of word-based prior knowledge enables better segmentation accuracy. This study also shows that the proposed algorithm performs competitively with or better than other unsupervised methods.

  • Unsupervised and Semi-Supervised Extraction of Clusters from Hypergraphs

    Weiwei DU  Kohei INOUE  Kiichi URAHAMA  

     
    LETTER-Biological Engineering

      Vol:
    E89-D No:7
      Page(s):
    2315-2318

    We extend a graph spectral method for extracting clusters from graphs representing pairwise similarity between data to hypergraph data with hyperedges denoting higher order similarity between data. Our method is robust to noisy outlier data and the number of clusters can be easily determined. The unsupervised method extracts clusters sequentially in the order of the majority of clusters. We derive from the unsupervised algorithm a semi-supervised one which can extract any cluster irrespective of its majority. The performance of those methods is exemplified with synthetic toy data and real image data.

  • Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models

    Randy GOMEZ  Akinobu LEE  Tomoki TODA  Hiroshi SARUWATARI  Kiyohiro SHIKANO  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    998-1005

    This paper describes the method of using multi-template unsupervised speaker adaptation based on HMM-Sufficient Statistics to push up the adaptation performance while keeping adaptation time within few seconds with just one arbitrary utterance. This adaptation scheme is mainly composed of two processes. The first part is done offline which involves the training of multiple class-dependent acoustic models and the creation of speakers' HMM-Sufficient Statistics based on gender and age. The second part is performed online where adaptation begins using the single utterance of a test speaker. From this utterance, the system will classify the speaker's class and consequently select the N-best neighbor speakers close to the utterance using Gaussian Mixture Models (GMM). The classified speakers' class template model is then adopted as a base model. From this template model, the adapted model is rapidly constructed using the N-best neighbor speakers' HMM-Sufficient Statistics. Experiments in noisy environment conditions with 20 dB, 15 dB and 10 dB SNR office, crowd, booth, and car noise are performed. The proposed multi-template method achieved 89.5% word accuracy rate compared with 88.1% of the conventional single-template method, while the baseline recognition rate without adaptation is 86.4%. Moreover, experiments using Vocal Tract Length Normalization (VTLN) and supervised Maximum Likelihood Linear Regression (MLLR) are also compared.

  • An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems

    Seiichi NAKAGAWA  Tomohiro WATANABE  Hiromitsu NISHIZAKI  Takehito UTSURO  

     
    PAPER-Spoken Language Systems

      Vol:
    E88-D No:3
      Page(s):
    463-471

    This paper describes an accurate unsupervised speaker adaptation method for lecture style spontaneous speech recognition using multiple LVCSR systems. In an unsupervised speaker adaptation framework, the improvement of recognition performance by adapting acoustic models remarkably depends on the accuracy of labels such as phonemes and syllables. Therefore, extraction of the adaptation data guided by confidence measure is effective for unsupervised adaptation. In this paper, we looked for the high confidence portions based on the agreement between two LVCSR systems, adapted acoustic models using the portions attached with high accurate labels, and then improved the recognition accuracy. We applied our method to the Corpus of Spontaneous Japanese (CSJ) and the method improved the recognition rate by about 2.1% in comparison with a traditional method.

  • Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

    Hiroyuki KAJI  Yasutsugu MORIMOTO  

     
    PAPER-Natural Language Processing

      Vol:
    E88-D No:2
      Page(s):
    289-301

    An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns word associations by consulting a bilingual dictionary and calculates correlation between senses of a target polysemous word and its associated words, which can be regarded as clues for identifying the sense of the target word. To overcome the problem of disparity of topical coverage between corpora of the two languages as well as the problem of ambiguity in word-association alignment, an algorithm for iteratively calculating a sense-vs.-clue correlation matrix for each target word was devised. Word-sense disambiguation for each instance of the target word is done by selecting the sense that maximizes the score, i.e., a weighted sum of the correlations between each sense and clues appearing in the context of the instance. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary showed that the new method has promising performance; namely, the F-measure of its sense selection was 74.6% compared to a baseline of 62.8%. The developed method will possibly be extended into a fully unsupervised method that features automatic division and definition of word senses.

  • Density-Based Spam Detector

    Kenichi YOSHIDA  Fuminori ADACHI  Takashi WASHIO  Hiroshi MOTODA  Teruaki HOMMA  Akihiro NAKASHIMA  Hiromitsu FUJIKAWA  Katsuyuki YAMAZAKI  

     
    PAPER-Internet Systems

      Vol:
    E87-D No:12
      Page(s):
    2678-2688

    The volume of mass unsolicited electronic mail, often known as spam, has recently increased enormously and has become a serious threat not only to the Internet but also to society. This paper proposes a new spam detection method which uses document space density information. Although the proposed method requires extensive e-mail traffic to acquire the necessary information, it can achieve perfect detection (i.e., both recall and precision is 100%) under practical conditions. A direct-mapped cache method contributes to the handling of over 13,000 e-mail messages per second. Experimental results, which were conducted using over 50 million actual e-mail messages, are also reported in this paper.

  • A Simple Learning Algorithm for Network Formation Based on Growing Self-Organizing Maps

    Hiroki SASAMURA  Toshimichi SAITO  Ryuji OHTA  

     
    LETTER-Nonlinear Problems

      Vol:
    E87-A No:10
      Page(s):
    2807-2810

    This paper presents a simple learning algorithm for network formation. The algorithm is based on self-organizing maps with growing cell structures and can adapt input data which correspond to nodes of the network. In basic numerical experiments, as a parameter is selected suitably, our algorithm can generate network having small-world-like structure. Such network structure appears in some natural networks and has advantages in practical systems.

  • Global and Local Feature Extraction by Natural Elastic Nets

    Jiann-Ming WU  Zheng-Han LIN  

     
    LETTER-Pattern Recognition

      Vol:
    E87-D No:9
      Page(s):
    2267-2271

    This work explores generative models of handwritten digit images using natural elastic nets. The analysis aims to extract global features as well as distributed local features of handwritten digits. These features are expected to form a basis that is significant for discriminant analysis of handwritten digits and related analysis of character images or natural images.

21-40hit(46hit)