The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] clustering(170hit)

101-120hit(170hit)

  • Distributed Noise Generation for Density Estimation Based Clustering without Trusted Third Party

    Chunhua SU  Feng BAO  Jianying ZHOU  Tsuyoshi TAKAGI  Kouichi SAKURAI  

     
    LETTER

      Vol:
    E92-A No:8
      Page(s):
    1868-1871

    The rapid growth of the Internet provides people with tremendous opportunities for data collection, knowledge discovery and cooperative computation. However, it also brings the problem of sensitive information leakage. Both individuals and enterprises may suffer from the massive data collection and the information retrieval by distrusted parties. In this paper, we propose a privacy-preserving protocol for the distributed kernel density estimation-based clustering. Our scheme applies random data perturbation (RDP) technique and the verifiable secret sharing to solve the security problem of distributed kernel density estimation in [4] which assumed a mediate party to help in the computation.

  • An Accurate Scene Segmentation Method Based on Graph Analysis Using Object Matching and Audio Feature

    Makoto YAMAMOTO  Miki HASEYAMA  

     
    PAPER-Speech/Audio

      Vol:
    E92-A No:8
      Page(s):
    1883-1891

    A method for accurate scene segmentation using two kinds of directed graph obtained by object matching and audio features is proposed. Generally, in audiovisual materials, such as broadcast programs and movies, there are repeated appearances of similar shots that include frames of the same background, object or place, and such shots are included in a single scene. Many scene segmentation methods based on this idea have been proposed; however, since they use color information as visual features, they cannot provide accurate scene segmentation results if the color features change in different shots for which frames include the same object due to camera operations such as zooming and panning. In order to solve this problem, scene segmentation by the proposed method is realized by using two novel approaches. In the first approach, object matching is performed between two frames that are each included in different shots. By using these matching results, repeated appearances of shots for which frames include the same object can be successfully found and represented as a directed graph. The proposed method also generates another directed graph that represents the repeated appearances of shots with similar audio features in the second approach. By combined use of these two directed graphs, degradation of scene segmentation accuracy, which results from using only one kind of graph, can be avoided in the proposed method and thereby accurate scene segmentation can be realized. Experimental results performed by applying the proposed method to actual broadcast programs are shown to verify the effectiveness of the proposed method.

  • Fuzzy Entropy Based Fuzzy c-Means Clustering with Deterministic and Simulated Annealing Methods

    Makoto YASUDA  Takeshi FURUHASHI  

     
    PAPER-Computation and Computational Models

      Vol:
    E92-D No:6
      Page(s):
    1232-1239

    This article explains how to apply the deterministic annealing (DA) and simulated annealing (SA) methods to fuzzy entropy based fuzzy c-means clustering. By regularizing the fuzzy c-means method with fuzzy entropy, a membership function similar to the Fermi-Dirac distribution function, well known in statistical mechanics, is obtained, and, while optimizing its parameters by SA, the minimum of the Helmholtz free energy for fuzzy c-means clustering is searched by DA. Numerical experiments are performed and the obtained results indicate that this combinatorial algorithm of SA and DA can represent various cluster shapes and divide data more properly and stably than the standard single DA algorithm.

  • Differentiating Honeycombed Images from Normal HRCT Lung Images

    Aamir Saeed MALIK  Tae-Sun CHOI  

     
    LETTER-Biological Engineering

      Vol:
    E92-D No:5
      Page(s):
    1218-1221

    A classification method is presented for differentiating honeycombed High Resolution Computed Tomographic (HRCT) images from normal HRCT images. For successful classification of honeycombed HRCT images, a complete set of methods and algorithms is described from segmentation to extraction to feature selection to classification. Wavelet energy is selected as a feature for classification using K-means clustering. Test data of 20 patients are used to validate the method.

  • Security and Correctness Analysis on Privacy-Preserving k-Means Clustering Schemes

    Chunhua SU  Feng BAO  Jianying ZHOU  Tsuyoshi TAKAGI  Kouichi SAKURAI  

     
    LETTER-Cryptography and Information Security

      Vol:
    E92-A No:4
      Page(s):
    1246-1250

    Due to the fast development of Internet and the related IT technologies, it becomes more and more easier to access a large amount of data. k-means clustering is a powerful and frequently used technique in data mining. Many research papers about privacy-preserving k-means clustering were published. In this paper, we analyze the existing privacy-preserving k-means clustering schemes based on the cryptographic techniques. We show those schemes will cause the privacy breach and cannot output the correct results due to the faults in the protocol construction. Furthermore, we analyze our proposal as an option to improve such problems but with intermediate information breach during the computation.

  • Training Set Selection for Building Compact and Efficient Language Models

    Keiji YASUDA  Hirofumi YAMAMOTO  Eiichiro SUMITA  

     
    PAPER-Natural Language Processing

      Vol:
    E92-D No:3
      Page(s):
    506-511

    For statistical language model training, target domain matched corpora are required. However, training corpora sometimes include both target domain matched and unmatched sentences. In such a case, training set selection is effective for both reducing model size and improving model performance. In this paper, training set selection method for statistical language model training is described. The method provides two advantages for training a language model. One is its capacity to improve the language model performance, and the other is its capacity to reduce computational loads for the language model. The method has four steps. 1) Sentence clustering is applied to all available corpora. 2) Language models are trained on each cluster. 3) Perplexity on the development set is calculated using the language models. 4) For the final language model training, we use the clusters whose language models yield low perplexities. The experimental results indicate that the language model trained on the data selected by our method gives lower perplexity on an open test set than a language model trained on all available corpora.

  • Image Recommendation Algorithm Using Feature-Based Collaborative Filtering

    Deok-Hwan KIM  

     
    PAPER-Contents Technology and Web Information Systems

      Vol:
    E92-D No:3
      Page(s):
    413-421

    As the multimedia contents market continues its rapid expansion, the amount of image contents used in mobile phone services, digital libraries, and catalog service is increasing remarkably. In spite of this rapid growth, users experience high levels of frustration when searching for the desired image. Even though new images are profitable to the service providers, traditional collaborative filtering methods cannot recommend them. To solve this problem, in this paper, we propose feature-based collaborative filtering (FBCF) method to reflect the user's most recent preference by representing his purchase sequence in the visual feature space. The proposed approach represents the images that have been purchased in the past as the feature clusters in the multi-dimensional feature space and then selects neighbors by using an inter-cluster distance function between their feature clusters. Various experiments using real image data demonstrate that the proposed approach provides a higher quality recommendation and better performance than do typical collaborative filtering and content-based filtering techniques.

  • Component Reduction for Gaussian Mixture Models

    Kumiko MAEBASHI  Nobuo SUEMATSU  Akira HAYASHI  

     
    PAPER-Pattern Recognition

      Vol:
    E91-D No:12
      Page(s):
    2846-2853

    The mixture modeling framework is widely used in many applications. In this paper, we propose a component reduction technique, that collapses a Gaussian mixture model into a Gaussian mixture with fewer components. The EM (Expectation-Maximization) algorithm is usually used to fit a mixture model to data. Our algorithm is derived by extending mixture model learning using the EM-algorithm. In this extension, a difficulty arises from the fact that some crucial quantities cannot be evaluated analytically. We overcome this difficulty by introducing an effective approximation. The effectiveness of our algorithm is demonstrated by applying it to a simple synthetic component reduction task and a phoneme clustering problem.

  • Robust Speaker Clustering Using Affinity Propagation

    Xiang ZHANG  Ping LU  Hongbin SUO  Qingwei ZHAO  Yonghong YAN  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:11
      Page(s):
    2739-2741

    In this letter, a recently proposed clustering algorithm named affinity propagation is introduced for the task of speaker clustering. This novel algorithm exhibits fast execution speed and finds clusters with low error. However, experiments show that the speaker purity of affinity propagation is not satisfying. Thus, we propose a hybrid approach that combines affinity propagation with agglomerative hierarchical clustering to improve the clustering performance. Experiments show that compared with traditional agglomerative hierarchical clustering, the hybrid method achieves better performance on the test corpora.

  • Adaptive Routing Protocol with Energy Efficiency and Event Clustering for Wireless Sensor Networks

    Vinh TRAN QUANG  Takumi MIYOSHI  

     
    PAPER-Wireless Sensor Networks

      Vol:
    E91-B No:9
      Page(s):
    2795-2805

    Wireless sensor network (WSN) is a promising approach for a variety of applications. Routing protocol for WSNs is very challenging because it should be simple, scalable, energy-efficient, and robust to deal with a very large number of nodes, and also self-configurable to node failures and changes of the network topology dynamically. Recently, many researchers have focused on developing hierarchical protocols for WSNs. However, most protocols in the literatures cannot scale well to large sensor networks and difficult to apply in the real applications. In this paper, we propose a novel adaptive routing protocol for WSNs called ARPEES. The main design features of the proposed method are: energy efficiency, dynamic event clustering, and multi-hop relay considering the trade-off relationship between the residual energy available of relay nodes and distance from the relay node to the base station. With a distributed and light overhead traffic approach, we spread energy consumption required for aggregating data and relaying them to different sensor nodes to prolong the lifetime of the whole network. In this method, we consider energy and distance as the parameters in the proposed function to select relay nodes and finally select the optimal path among cluster heads, relay nodes and the base station. The simulation results show that our routing protocol achieves better performance than other previous routing protocols.

  • Energy Efficient Online Routing Algorithm for QoS-Sensitive Sensor Networks

    Sungwook KIM  Sungyong PARK  Sooyong PARK  Sungchun KIM  

     
    LETTER-Network

      Vol:
    E91-B No:7
      Page(s):
    2401-2404

    In this letter, we propose a new energy efficient online routing algorithm for QoS-sensitive sensor networks. An important design principle underlying our algorithm is online decision making based on real time network estimation. This on-line approach gives adaptability and flexibility to solve a wide range of control tasks for efficient network performance. In addition, our distributed control paradigm is practical for real sensor network management. Simulation results indicate the superior performance of our algorithm between energy efficiency and QoS provisioning.

  • Bilingual Cluster Based Models for Statistical Machine Translation

    Hirofumi YAMAMOTO  Eiichiro SUMITA  

     
    PAPER-Applications

      Vol:
    E91-D No:3
      Page(s):
    588-597

    We propose a domain specific model for statistical machine translation. It is well-known that domain specific language models perform well in automatic speech recognition. We show that domain specific language and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparseness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In order to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynamically. For these cases, not only the translation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statistical machine translation. In the proposed method, a bilingual training corpus, is automatically clustered into sub-corpora. Each sub-corpus is deemed to be a domain. The domain of a source sentence is predicted by using its similarity to the sub-corpora. The predicted domain (sub-corpus) specific language and translation models are then used for the translation decoding. This approach gave an improvement of 2.7 in BLEU score on the IWSLT05 Japanese to English evaluation corpus (improving the score from 52.4 to 55.1). This is a substantial gain and indicates the validity of the proposed bilingual cluster based models.

  • Development of a Mandarin-English Bilingual Speech Recognition System for Real World Music Retrieval

    Qingqing ZHANG  Jielin PAN  Yang LIN  Jian SHAO  Yonghong YAN  

     
    PAPER-Acoustic Modeling

      Vol:
    E91-D No:3
      Page(s):
    514-521

    In recent decades, there has been a great deal of research into the problem of bilingual speech recognition - to develop a recognizer that can handle inter- and intra-sentential language switching between two languages. This paper presents our recent work on the development of a grammar-constrained, Mandarin-English bilingual Speech Recognition System (MESRS) for real world music retrieval. Two of the main difficult issues in handling the bilingual speech recognition systems for real world applications are tackled in this paper. One is to balance the performance and the complexity of the bilingual speech recognition system; the other is to effectively deal with the matrix language accents in embedded language. In order to process the intra-sentential language switching and reduce the amount of data required to robustly estimate statistical models, a compact single set of bilingual acoustic models derived by phone set merging and clustering is developed instead of using two separate monolingual models for each language. In our study, a novel Two-pass phone clustering method based on Confusion Matrix (TCM) is presented and compared with the log-likelihood measure method. Experiments testify that TCM can achieve better performance. Since potential system users' native language is Mandarin which is regarded as a matrix language in our application, their pronunciations of English as the embedded language usually contain Mandarin accents. In order to deal with the matrix language accents in embedded language, different non-native adaptation approaches are investigated. Experiments show that model retraining method outperforms the other common adaptation methods such as Maximum A Posteriori (MAP). With the effective incorporation of approaches on phone clustering and non-native adaptation, the Phrase Error Rate (PER) of MESRS for English utterances was reduced by 24.47% relatively compared to the baseline monolingual English system while the PER on Mandarin utterances was comparable to that of the baseline monolingual Mandarin system. The performance for bilingual utterances achieved 22.37% relative PER reduction.

  • Image Segmentation Using Fuzzy Clustering with Spatial Constraints Based on Markov Random Field via Bayesian Theory

    Xiaohe LI  Taiyi ZHANG  Zhan QU  

     
    PAPER-Image Processing

      Vol:
    E91-A No:3
      Page(s):
    723-729

    Image segmentation is an essential processing step for many image analysis applications. In this paper, a novel image segmentation algorithm using fuzzy C-means clustering (FCM) with spatial constraints based on Markov random field (MRF) via Bayesian theory is proposed. Due to disregard of spatial constraint information, the FCM algorithm fails to segment images corrupted by noise. In order to improve the robustness of FCM to noise, a powerful model for the membership functions that incorporates local correlation is given by MRF defined through a Gibbs function. Then spatial information is incorporated into the FCM by Bayesian theory. Therefore, the proposed algorithm has both the advantages of the FCM and MRF, and is robust to noise. Experimental results on the synthetic and real-world images are given to demonstrate the robustness and validity of the proposed algorithm.

  • Automatic Language Identification with Discriminative Language Characterization Based on SVM

    Hongbin SUO  Ming LI  Ping LU  Yonghong YAN  

     
    PAPER-Language Identification

      Vol:
    E91-D No:3
      Page(s):
    567-575

    Robust automatic language identification (LID) is the task of identifying the language from a short utterance spoken by an unknown speaker. The mainstream approaches include parallel phone recognition language modeling (PPRLM), support vector machine (SVM) and the general Gaussian mixture models (GMMs). These systems map the cepstral features of spoken utterances into high level scores by classifiers. In this paper, in order to increase the dimension of the score vector and alleviate the inter-speaker variability within the same language, multiple data groups based on supervised speaker clustering are employed to generate the discriminative language characterization score vectors (DLCSV). The back-end SVM classifiers are used to model the probability distribution of each target language in the DLCSV space. Finally, the output scores of back-end classifiers are calibrated by a pair-wise posterior probability estimation (PPPE) algorithm. The proposed language identification frameworks are evaluated on 2003 NIST Language Recognition Evaluation (LRE) databases and the experiments show that the system described in this paper produces comparable results to the existing systems. Especially, the SVM framework achieves an equal error rate (EER) of 4.0% in the 30-second task and outperforms the state-of-art systems by more than 30% relative error reduction. Besides, the performances of proposed PPRLM and GMMs algorithms achieve an EER of 5.1% and 5.0% respectively.

  • New Inter-Cluster Proximity Index for Fuzzy c-Means Clustering

    Fan LI  Shijin DAI  Qihe LIU  Guowei YANG  

     
    LETTER-Data Mining

      Vol:
    E91-D No:2
      Page(s):
    363-366

    This letter presents a new inter-cluster proximity index for fuzzy partitions obtained from the fuzzy c-means algorithm. It is defined as the average proximity of all possible pairs of clusters. The proximity of each pair of clusters is determined by the overlap and the separation of the two clusters. The former is quantified by using concepts of Fuzzy Rough sets theory and the latter by computing the distance between cluster centroids. Experimental results indicate the efficiency of the proposed index.

  • GDME: Grey Relational Clustering Applied to a Clock Tree Construction with Zero Skew and Minimal Delay

    Chia-Chun TSAI  Jan-Ou WU  Trong-Yen LEE  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E91-A No:1
      Page(s):
    365-374

    This study has demonstrated that the clock tree construction in an SoC should be expanded to consider the intrinsic delay and skew of each IP's clock sink. A novel algorithm, called GDME, is proposed to combine grey relational clustering and DME approach for solving the problem of clock tree construction. Grey relational analysis can cluster the best pair of clock sinks and that guide a tapping point search for a DME algorithm for constructing a clock tree with zero skew and minimal delay. Experimentally, the proposed algorithm always obtains an RC- or RLC-based clock tree with zero skew and minimal delay for all the test cases and benchmarks. Experimental results demonstrate that the GDME improves up to 3.74% for total average in terms of total wire length compared with other DME algorithms. Furthermore, our results for the zero-skew RLC-based clock trees compared with Hspice are 0.017% and 0.2% lower for absolute average in terms of skew and delay, respectively.

  • Optimizing the Number of Clusters in Multi-Hop Wireless Sensor Networks

    Namhoon KIM  Soohee HAN  Wook Hyun KWON  

     
    LETTER-Network

      Vol:
    E91-B No:1
      Page(s):
    318-321

    In this paper, an analytical model is proposed to compute the optimal number of clusters that minimizes the energy consumption of multi-hop wireless sensor networks. In the proposed analytical model, the average hop count between a general node (GN) and its nearest clusterhead (CH) is obtained assuming a uniform distribution. How the position of the sink impacts the optimal number of clusters is also discussed. A numerical simulation is carried out to validate the proposed model in various network environments.

  • RK-Means Clustering: K-Means with Reliability

    Chunsheng HUA  Qian CHEN  Haiyuan WU  Toshikazu WADA  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E91-D No:1
      Page(s):
    96-104

    This paper presents an RK-means clustering algorithm which is developed for reliable data grouping by introducing a new reliability evaluation to the K-means clustering algorithm. The conventional K-means clustering algorithm has two shortfalls: 1) the clustering result will become unreliable if the assumed number of the clusters is incorrect; 2) during the update of a cluster center, all the data points belong to that cluster are used equally without considering how distant they are to the cluster center. In this paper, we introduce a new reliability evaluation to K-means clustering algorithm by considering the triangular relationship among each data point and its two nearest cluster centers. We applied the proposed algorithm to track objects in video sequence and confirmed its effectiveness and advantages.

  • Fuzzy c-Means Algorithms for Data with Tolerance Based on Opposite Criterions

    Yuchi KANZAWA  Yasunori ENDO  Sadaaki MIYAMOTO  

     
    PAPER-Soft Computing

      Vol:
    E90-A No:10
      Page(s):
    2194-2202

    In this paper, two new clustering algorithms are proposed for the data with some errors. In any of these algorithms, the error is interpreted as one of decision variables -- called "tolerance" -- of a certain optimization problem like the previously proposed algorithm, but the tolerance is determined based on the opposite criterion to its corresponding previously proposed one. Applying our each algorithm together with its corresponding previously proposed one, a reliability of the clustering result is discussed. Through some numerical experiments, the validity of this paper is discussed.

101-120hit(170hit)