IEICE global.ieice.org Site

Keyword Search Result

[Keyword] clustering(170hit)

101-120hit(170hit)

Distributed Noise Generation for Density Estimation Based Clustering without Trusted Third Party
Chunhua SU Feng BAO Jianying ZHOU Tsuyoshi TAKAGI Kouichi SAKURAI

LETTER

Vol:
E92-A No:8
Page(s):
1868-1871
The rapid growth of the Internet provides people with tremendous opportunities for data collection, knowledge discovery and cooperative computation. However, it also brings the problem of sensitive information leakage. Both individuals and enterprises may suffer from the massive data collection and the information retrieval by distrusted parties. In this paper, we propose a privacy-preserving protocol for the distributed kernel density estimation-based clustering. Our scheme applies random data perturbation (RDP) technique and the verifiable secret sharing to solve the security problem of distributed kernel density estimation in [4] which assumed a mediate party to help in the computation.
An Accurate Scene Segmentation Method Based on Graph Analysis Using Object Matching and Audio Feature
Makoto YAMAMOTO Miki HASEYAMA

PAPER-Speech/Audio

Vol:
E92-A No:8
Page(s):
1883-1891
A method for accurate scene segmentation using two kinds of directed graph obtained by object matching and audio features is proposed. Generally, in audiovisual materials, such as broadcast programs and movies, there are repeated appearances of similar shots that include frames of the same background, object or place, and such shots are included in a single scene. Many scene segmentation methods based on this idea have been proposed; however, since they use color information as visual features, they cannot provide accurate scene segmentation results if the color features change in different shots for which frames include the same object due to camera operations such as zooming and panning. In order to solve this problem, scene segmentation by the proposed method is realized by using two novel approaches. In the first approach, object matching is performed between two frames that are each included in different shots. By using these matching results, repeated appearances of shots for which frames include the same object can be successfully found and represented as a directed graph. The proposed method also generates another directed graph that represents the repeated appearances of shots with similar audio features in the second approach. By combined use of these two directed graphs, degradation of scene segmentation accuracy, which results from using only one kind of graph, can be avoided in the proposed method and thereby accurate scene segmentation can be realized. Experimental results performed by applying the proposed method to actual broadcast programs are shown to verify the effectiveness of the proposed method.
Fuzzy Entropy Based Fuzzy c-Means Clustering with Deterministic and Simulated Annealing Methods
Makoto YASUDA Takeshi FURUHASHI

PAPER-Computation and Computational Models

Vol:
E92-D No:6
Page(s):
1232-1239
This article explains how to apply the deterministic annealing (DA) and simulated annealing (SA) methods to fuzzy entropy based fuzzy c-means clustering. By regularizing the fuzzy c-means method with fuzzy entropy, a membership function similar to the Fermi-Dirac distribution function, well known in statistical mechanics, is obtained, and, while optimizing its parameters by SA, the minimum of the Helmholtz free energy for fuzzy c-means clustering is searched by DA. Numerical experiments are performed and the obtained results indicate that this combinatorial algorithm of SA and DA can represent various cluster shapes and divide data more properly and stably than the standard single DA algorithm.
Differentiating Honeycombed Images from Normal HRCT Lung Images
Aamir Saeed MALIK Tae-Sun CHOI

LETTER-Biological Engineering

Vol:
E92-D No:5
Page(s):
1218-1221
A classification method is presented for differentiating honeycombed High Resolution Computed Tomographic (HRCT) images from normal HRCT images. For successful classification of honeycombed HRCT images, a complete set of methods and algorithms is described from segmentation to extraction to feature selection to classification. Wavelet energy is selected as a feature for classification using K-means clustering. Test data of 20 patients are used to validate the method.
Security and Correctness Analysis on Privacy-Preserving k-Means Clustering Schemes
Chunhua SU Feng BAO Jianying ZHOU Tsuyoshi TAKAGI Kouichi SAKURAI

LETTER-Cryptography and Information Security

Vol:
E92-A No:4
Page(s):
1246-1250
Due to the fast development of Internet and the related IT technologies, it becomes more and more easier to access a large amount of data. k-means clustering is a powerful and frequently used technique in data mining. Many research papers about privacy-preserving k-means clustering were published. In this paper, we analyze the existing privacy-preserving k-means clustering schemes based on the cryptographic techniques. We show those schemes will cause the privacy breach and cannot output the correct results due to the faults in the protocol construction. Furthermore, we analyze our proposal as an option to improve such problems but with intermediate information breach during the computation.
Training Set Selection for Building Compact and Efficient Language Models
Keiji YASUDA Hirofumi YAMAMOTO Eiichiro SUMITA

PAPER-Natural Language Processing

Vol:
E92-D No:3
Page(s):
506-511
For statistical language model training, target domain matched corpora are required. However, training corpora sometimes include both target domain matched and unmatched sentences. In such a case, training set selection is effective for both reducing model size and improving model performance. In this paper, training set selection method for statistical language model training is described. The method provides two advantages for training a language model. One is its capacity to improve the language model performance, and the other is its capacity to reduce computational loads for the language model. The method has four steps. 1) Sentence clustering is applied to all available corpora. 2) Language models are trained on each cluster. 3) Perplexity on the development set is calculated using the language models. 4) For the final language model training, we use the clusters whose language models yield low perplexities. The experimental results indicate that the language model trained on the data selected by our method gives lower perplexity on an open test set than a language model trained on all available corpora.
Image Recommendation Algorithm Using Feature-Based Collaborative Filtering
Deok-Hwan KIM

PAPER-Contents Technology and Web Information Systems

Vol:
E92-D No:3
Page(s):
413-421
As the multimedia contents market continues its rapid expansion, the amount of image contents used in mobile phone services, digital libraries, and catalog service is increasing remarkably. In spite of this rapid growth, users experience high levels of frustration when searching for the desired image. Even though new images are profitable to the service providers, traditional collaborative filtering methods cannot recommend them. To solve this problem, in this paper, we propose feature-based collaborative filtering (FBCF) method to reflect the user's most recent preference by representing his purchase sequence in the visual feature space. The proposed approach represents the images that have been purchased in the past as the feature clusters in the multi-dimensional feature space and then selects neighbors by using an inter-cluster distance function between their feature clusters. Various experiments using real image data demonstrate that the proposed approach provides a higher quality recommendation and better performance than do typical collaborative filtering and content-based filtering techniques.
Component Reduction for Gaussian Mixture Models
Kumiko MAEBASHI Nobuo SUEMATSU Akira HAYASHI

PAPER-Pattern Recognition

Vol:
E91-D No:12
Page(s):
2846-2853
The mixture modeling framework is widely used in many applications. In this paper, we propose a component reduction technique, that collapses a Gaussian mixture model into a Gaussian mixture with fewer components. The EM (Expectation-Maximization) algorithm is usually used to fit a mixture model to data. Our algorithm is derived by extending mixture model learning using the EM-algorithm. In this extension, a difficulty arises from the fact that some crucial quantities cannot be evaluated analytically. We overcome this difficulty by introducing an effective approximation. The effectiveness of our algorithm is demonstrated by applying it to a simple synthetic component reduction task and a phoneme clustering problem.
Robust Speaker Clustering Using Affinity Propagation
Xiang ZHANG Ping LU Hongbin SUO Qingwei ZHAO Yonghong YAN

LETTER-Speech and Hearing

Vol:
E91-D No:11
Page(s):
2739-2741
In this letter, a recently proposed clustering algorithm named affinity propagation is introduced for the task of speaker clustering. This novel algorithm exhibits fast execution speed and finds clusters with low error. However, experiments show that the speaker purity of affinity propagation is not satisfying. Thus, we propose a hybrid approach that combines affinity propagation with agglomerative hierarchical clustering to improve the clustering performance. Experiments show that compared with traditional agglomerative hierarchical clustering, the hybrid method achieves better performance on the test corpora.
Adaptive Routing Protocol with Energy Efficiency and Event Clustering for Wireless Sensor Networks
Vinh TRAN QUANG Takumi MIYOSHI

PAPER-Wireless Sensor Networks

Vol:
E91-B No:9
Page(s):
2795-2805
Wireless sensor network (WSN) is a promising approach for a variety of applications. Routing protocol for WSNs is very challenging because it should be simple, scalable, energy-efficient, and robust to deal with a very large number of nodes, and also self-configurable to node failures and changes of the network topology dynamically. Recently, many researchers have focused on developing hierarchical protocols for WSNs. However, most protocols in the literatures cannot scale well to large sensor networks and difficult to apply in the real applications. In this paper, we propose a novel adaptive routing protocol for WSNs called ARPEES. The main design features of the proposed method are: energy efficiency, dynamic event clustering, and multi-hop relay considering the trade-off relationship between the residual energy available of relay nodes and distance from the relay node to the base station. With a distributed and light overhead traffic approach, we spread energy consumption required for aggregating data and relaying them to different sensor nodes to prolong the lifetime of the whole network. In this method, we consider energy and distance as the parameters in the proposed function to select relay nodes and finally select the optimal path among cluster heads, relay nodes and the base station. The simulation results show that our routing protocol achieves better performance than other previous routing protocols.
Energy Efficient Online Routing Algorithm for QoS-Sensitive Sensor Networks
Sungwook KIM Sungyong PARK Sooyong PARK Sungchun KIM

LETTER-Network

Vol:
E91-B No:7
Page(s):
2401-2404
In this letter, we propose a new energy efficient online routing algorithm for QoS-sensitive sensor networks. An important design principle underlying our algorithm is online decision making based on real time network estimation. This on-line approach gives adaptability and flexibility to solve a wide range of control tasks for efficient network performance. In addition, our distributed control paradigm is practical for real sensor network management. Simulation results indicate the superior performance of our algorithm between energy efficiency and QoS provisioning.
Bilingual Cluster Based Models for Statistical Machine Translation
Hirofumi YAMAMOTO Eiichiro SUMITA

PAPER-Applications

Vol:
E91-D No:3
Page(s):
588-597
We propose a domain specific model for statistical machine translation. It is well-known that domain specific language models perform well in automatic speech recognition. We show that domain specific language and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparseness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In order to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynamically. For these cases, not only the translation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statistical machine translation. In the proposed method, a bilingual training corpus, is automatically clustered into sub-corpora. Each sub-corpus is deemed to be a domain. The domain of a source sentence is predicted by using its similarity to the sub-corpora. The predicted domain (sub-corpus) specific language and translation models are then used for the translation decoding. This approach gave an improvement of 2.7 in BLEU score on the IWSLT05 Japanese to English evaluation corpus (improving the score from 52.4 to 55.1). This is a substantial gain and indicates the validity of the proposed bilingual cluster based models.
Development of a Mandarin-English Bilingual Speech Recognition System for Real World Music Retrieval
Qingqing ZHANG Jielin PAN Yang LIN Jian SHAO Yonghong YAN

PAPER-Acoustic Modeling

Vol:
E91-D No:3
Page(s):
514-521
In recent decades, there has been a great deal of research into the problem of bilingual speech recognition - to develop a recognizer that can handle inter- and intra-sentential language switching between two languages. This paper presents our recent work on the development of a grammar-constrained, Mandarin-English bilingual Speech Recognition System (MESRS) for real world music retrieval. Two of the main difficult issues in handling the bilingual speech recognition systems for real world applications are tackled in this paper. One is to balance the performance and the complexity of the bilingual speech recognition system; the other is to effectively deal with the matrix language accents in embedded language. In order to process the intra-sentential language switching and reduce the amount of data required to robustly estimate statistical models, a compact single set of bilingual acoustic models derived by phone set merging and clustering is developed instead of using two separate monolingual models for each language. In our study, a novel Two-pass phone clustering method based on Confusion Matrix (TCM) is presented and compared with the log-likelihood measure method. Experiments testify that TCM can achieve better performance. Since potential system users' native language is Mandarin which is regarded as a matrix language in our application, their pronunciations of English as the embedded language usually contain Mandarin accents. In order to deal with the matrix language accents in embedded language, different non-native adaptation approaches are investigated. Experiments show that model retraining method outperforms the other common adaptation methods such as Maximum A Posteriori (MAP). With the effective incorporation of approaches on phone clustering and non-native adaptation, the Phrase Error Rate (PER) of MESRS for English utterances was reduced by 24.47% relatively compared to the baseline monolingual English system while the PER on Mandarin utterances was comparable to that of the baseline monolingual Mandarin system. The performance for bilingual utterances achieved 22.37% relative PER reduction.
Image Segmentation Using Fuzzy Clustering with Spatial Constraints Based on Markov Random Field via Bayesian Theory
Xiaohe LI Taiyi ZHANG Zhan QU

PAPER-Image Processing

Vol:
E91-A No:3
Page(s):
723-729
Image segmentation is an essential processing step for many image analysis applications. In this paper, a novel image segmentation algorithm using fuzzy C-means clustering (FCM) with spatial constraints based on Markov random field (MRF) via Bayesian theory is proposed. Due to disregard of spatial constraint information, the FCM algorithm fails to segment images corrupted by noise. In order to improve the robustness of FCM to noise, a powerful model for the membership functions that incorporates local correlation is given by MRF defined through a Gibbs function. Then spatial information is incorporated into the FCM by Bayesian theory. Therefore, the proposed algorithm has both the advantages of the FCM and MRF, and is robust to noise. Experimental results on the synthetic and real-world images are given to demonstrate the robustness and validity of the proposed algorithm.
Automatic Language Identification with Discriminative Language Characterization Based on SVM
Hongbin SUO Ming LI Ping LU Yonghong YAN

PAPER-Language Identification

Vol:
E91-D No:3
Page(s):
567-575
Robust automatic language identification (LID) is the task of identifying the language from a short utterance spoken by an unknown speaker. The mainstream approaches include parallel phone recognition language modeling (PPRLM), support vector machine (SVM) and the general Gaussian mixture models (GMMs). These systems map the cepstral features of spoken utterances into high level scores by classifiers. In this paper, in order to increase the dimension of the score vector and alleviate the inter-speaker variability within the same language, multiple data groups based on supervised speaker clustering are employed to generate the discriminative language characterization score vectors (DLCSV). The back-end SVM classifiers are used to model the probability distribution of each target language in the DLCSV space. Finally, the output scores of back-end classifiers are calibrated by a pair-wise posterior probability estimation (PPPE) algorithm. The proposed language identification frameworks are evaluated on 2003 NIST Language Recognition Evaluation (LRE) databases and the experiments show that the system described in this paper produces comparable results to the existing systems. Especially, the SVM framework achieves an equal error rate (EER) of 4.0% in the 30-second task and outperforms the state-of-art systems by more than 30% relative error reduction. Besides, the performances of proposed PPRLM and GMMs algorithms achieve an EER of 5.1% and 5.0% respectively.
New Inter-Cluster Proximity Index for Fuzzy c-Means Clustering
Fan LI Shijin DAI Qihe LIU Guowei YANG

LETTER-Data Mining

Vol:
E91-D No:2
Page(s):
363-366
This letter presents a new inter-cluster proximity index for fuzzy partitions obtained from the fuzzy c-means algorithm. It is defined as the average proximity of all possible pairs of clusters. The proximity of each pair of clusters is determined by the overlap and the separation of the two clusters. The former is quantified by using concepts of Fuzzy Rough sets theory and the latter by computing the distance between cluster centroids. Experimental results indicate the efficiency of the proposed index.
GDME: Grey Relational Clustering Applied to a Clock Tree Construction with Zero Skew and Minimal Delay
Chia-Chun TSAI Jan-Ou WU Trong-Yen LEE

PAPER-VLSI Design Technology and CAD

Vol:
E91-A No:1
Page(s):
365-374
This study has demonstrated that the clock tree construction in an SoC should be expanded to consider the intrinsic delay and skew of each IP's clock sink. A novel algorithm, called GDME, is proposed to combine grey relational clustering and DME approach for solving the problem of clock tree construction. Grey relational analysis can cluster the best pair of clock sinks and that guide a tapping point search for a DME algorithm for constructing a clock tree with zero skew and minimal delay. Experimentally, the proposed algorithm always obtains an RC- or RLC-based clock tree with zero skew and minimal delay for all the test cases and benchmarks. Experimental results demonstrate that the GDME improves up to 3.74% for total average in terms of total wire length compared with other DME algorithms. Furthermore, our results for the zero-skew RLC-based clock trees compared with Hspice are 0.017% and 0.2% lower for absolute average in terms of skew and delay, respectively.
Optimizing the Number of Clusters in Multi-Hop Wireless Sensor Networks
Namhoon KIM Soohee HAN Wook Hyun KWON

LETTER-Network

Vol:
E91-B No:1
Page(s):
318-321
In this paper, an analytical model is proposed to compute the optimal number of clusters that minimizes the energy consumption of multi-hop wireless sensor networks. In the proposed analytical model, the average hop count between a general node (GN) and its nearest clusterhead (CH) is obtained assuming a uniform distribution. How the position of the sink impacts the optimal number of clusters is also discussed. A numerical simulation is carried out to validate the proposed model in various network environments.
RK-Means Clustering: K-Means with Reliability
Chunsheng HUA Qian CHEN Haiyuan WU Toshikazu WADA

PAPER-Image Recognition, Computer Vision

Vol:
E91-D No:1
Page(s):
96-104
This paper presents an RK-means clustering algorithm which is developed for reliable data grouping by introducing a new reliability evaluation to the K-means clustering algorithm. The conventional K-means clustering algorithm has two shortfalls: 1) the clustering result will become unreliable if the assumed number of the clusters is incorrect; 2) during the update of a cluster center, all the data points belong to that cluster are used equally without considering how distant they are to the cluster center. In this paper, we introduce a new reliability evaluation to K-means clustering algorithm by considering the triangular relationship among each data point and its two nearest cluster centers. We applied the proposed algorithm to track objects in video sequence and confirmed its effectiveness and advantages.
Fuzzy c-Means Algorithms for Data with Tolerance Based on Opposite Criterions
Yuchi KANZAWA Yasunori ENDO Sadaaki MIYAMOTO

PAPER-Soft Computing

Vol:
E90-A No:10
Page(s):
2194-2202
In this paper, two new clustering algorithms are proposed for the data with some errors. In any of these algorithms, the error is interpreted as one of decision variables -- called "tolerance" -- of a certain optimization problem like the previously proposed algorithm, but the tolerance is determined based on the opposite criterion to its corresponding previously proposed one. Applying our each algorithm together with its corresponding previously proposed one, a reliability of the clustering result is discussed. Through some numerical experiments, the validity of this paper is discussed.

101-120hit(170hit)

Keyword Search Result

[Keyword] clustering(170hit)

Distributed Noise Generation for Density Estimation Based Clustering without Trusted Third Party

An Accurate Scene Segmentation Method Based on Graph Analysis Using Object Matching and Audio Feature

Fuzzy Entropy Based Fuzzy c-Means Clustering with Deterministic and Simulated Annealing Methods

Differentiating Honeycombed Images from Normal HRCT Lung Images

Security and Correctness Analysis on Privacy-Preserving k-Means Clustering Schemes

Training Set Selection for Building Compact and Efficient Language Models

Image Recommendation Algorithm Using Feature-Based Collaborative Filtering

Component Reduction for Gaussian Mixture Models

Robust Speaker Clustering Using Affinity Propagation

Adaptive Routing Protocol with Energy Efficiency and Event Clustering for Wireless Sensor Networks

Energy Efficient Online Routing Algorithm for QoS-Sensitive Sensor Networks

Bilingual Cluster Based Models for Statistical Machine Translation

Development of a Mandarin-English Bilingual Speech Recognition System for Real World Music Retrieval

Image Segmentation Using Fuzzy Clustering with Spatial Constraints Based on Markov Random Field via Bayesian Theory

Automatic Language Identification with Discriminative Language Characterization Based on SVM

New Inter-Cluster Proximity Index for Fuzzy c-Means Clustering

GDME: Grey Relational Clustering Applied to a Clock Tree Construction with Zero Skew and Minimal Delay

Optimizing the Number of Clusters in Multi-Hop Wireless Sensor Networks

RK-Means Clustering: K-Means with Reliability

Fuzzy c-Means Algorithms for Data with Tolerance Based on Opposite Criterions

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles