The search functionality is under construction.

Keyword Search Result

[Keyword] K-means(36hit)

1-20hit(36hit)

  • A Method of K-Means Clustering Based on TF-IDF for Software Requirements Documents Written in Chinese Language

    Jing ZHU  Song HUANG  Yaqing SHI  Kaishun WU  Yanqiu WANG  

     
    PAPER-Software Engineering

      Pubricized:
    2021/12/28
      Vol:
    E105-D No:4
      Page(s):
    736-754

    Nowadays there is no way to automatically obtain the function points when using function point analyze (FPA) method, especially for the requirement documents written in Chinese language. Considering the characteristics of Chinese grammar in words segmentation, it is necessary to divide words accurately Chinese words, so that the subsequent entity recognition and disambiguation can be carried out in a smaller range, which lays a solid foundation for the efficient automatic extraction of the function points. Therefore, this paper proposed a method of K-Means clustering based on TF-IDF, and conducts experiments with 24 software requirement documents written in Chinese language. The results show that the best clustering effect is achieved when the extracted information is retained by 55% to 75% and the number of clusters takes the middle value of the total number of clusters. Not only for Chinese, this method and conclusion of this paper, but provides an important reference for automatic extraction of function points from software requirements documents written in other Oriental languages, and also fills the gaps of data preprocessing in the early stage of automatic calculation function points.

  • Efficient Task Allocation Protocol for a Hybrid-Hierarchical Spatial-Aerial-Terrestrial Edge-Centric IoT Architecture Open Access

    Abbas JAMALIPOUR  Forough SHIRIN ABKENAR  

     
    INVITED PAPER

      Pubricized:
    2021/08/17
      Vol:
    E105-B No:2
      Page(s):
    116-130

    In this paper, we propose a novel Hybrid-Hierarchical spatial-aerial-Terrestrial Edge-Centric (H2TEC) for the space-air integrated Internet of Things (IoT) networks. (H2TEC) comprises unmanned aerial vehicles (UAVs) that act as mobile fog nodes to provide the required services for terminal nodes (TNs) in cooperation with the satellites. TNs in (H2TEC) offload their generated tasks to the UAVs for further processing. Due to the limited energy budget of TNs, a novel task allocation protocol, named TOP, is proposed to minimize the energy consumption of TNs while guaranteeing the outage probability and network reliability for which the transmission rate of TNs is optimized. TOP also takes advantage of the energy harvesting by which the low earth orbit satellites transfer energy to the UAVs when the remaining energy of the UAVs is below a predefined threshold. To this end, the harvested power of the UAVs is optimized alongside the corresponding harvesting time so that the UAVs can improve the network throughput via processing more bits. Numerical results reveal that TOP outperforms the baseline method in critical situations that more power is required to process the task. It is also found that even in such situations, the energy harvesting mechanism provided in the TOP yields a more efficient network throughput.

  • Research on DoS Attacks Intrusion Detection Model Based on Multi-Dimensional Space Feature Vector Expansion K-Means Algorithm

    Lijun GAO  Zhenyi BIAN  Maode MA  

     
    PAPER-Fundamental Theories for Communications

      Pubricized:
    2021/04/22
      Vol:
    E104-B No:11
      Page(s):
    1377-1385

    DoS (Denial of Service) attacks are becoming one of the most serious security threats to global networks. We analyze the existing DoS detection methods and defense mechanisms in depth. In recent years, K-Means and improved variants have been widely examined for security intrusion detection, but the detection accuracy to data is not satisfactory. In this paper we propose a multi-dimensional space feature vector expansion K-Means model to detect threats in the network environment. The model uses a genetic algorithm to optimize the weight of K-Means multi-dimensional space feature vector, which greatly improves the detection rate against 6 typical Dos attacks. Furthermore, in order to verify the correctness of the model, this paper conducts a simulation on the NSL-KDD data set. The results show that the algorithm of multi-dimensional space feature vectors expansion K-Means improves the recognition accuracy to 96.88%. Furthermore, 41 kinds of feature vectors in NSL-KDD are analyzed in detail according to a large number of experimental training. The feature vector of the probability positive return of security attack detection is accurately extracted, and a comparison chart is formed to support subsequent research. A theoretical analysis and experimental results show that the multi-dimensional space feature vector expansion K-Means algorithm has a good application in the detection of DDos attacks.

  • Improving Seeded k-Means Clustering with Deviation- and Entropy-Based Term Weightings

    Uraiwan BUATOOM  Waree KONGPRAWECHNON  Thanaruk THEERAMUNKONG  

     
    PAPER

      Pubricized:
    2020/01/08
      Vol:
    E103-D No:4
      Page(s):
    748-758

    The outcome of document clustering depends on the scheme used to assign a weight to each term in a document. While recent works have tried to use distributions related to class to enhance the discrimination ability. It is worth exploring whether a deviation approach or an entropy approach is more effective. This paper presents a comparison between deviation-based distribution and entropy-based distribution as constraints in term weighting. In addition, their potential combinations are investigated to find optimal solutions in guiding the clustering process. In the experiments, the seeded k-means method is used for clustering, and the performances of deviation-based, entropy-based, and hybrid approaches, are analyzed using two English and one Thai text datasets. The result showed that the deviation-based distribution outperformed the entropy-based distribution, and a suitable combination of these distributions increases the clustering accuracy by 10%.

  • A Knowledge Representation Based User-Driven Ontology Summarization Method

    Yuehang DING  Hongtao YU  Jianpeng ZHANG  Huanruo LI  Yunjie GU  

     
    LETTER-Data Engineering, Web Information Systems

      Pubricized:
    2019/05/30
      Vol:
    E102-D No:9
      Page(s):
    1870-1873

    As the superstructure of knowledge graph, ontology has been widely applied in knowledge engineering. However, it becomes increasingly difficult to be practiced and comprehended due to the growing data size and complexity of schemas. Hence, ontology summarization surfaced to enhance the comprehension and application of ontology. Existing summarization methods mainly focus on ontology's topology without taking semantic information into consideration, while human understand information based on semantics. Thus, we proposed a novel algorithm to integrate semantic information and topological information, which enables ontology to be more understandable. In our work, semantic and topological information are represented by concept vectors, a set of high-dimensional vectors. Distances between concept vectors represent concepts' similarity and we selected important concepts following these two criteria: 1) the distances from important concepts to normal concepts should be as short as possible, which indicates that important concepts could summarize normal concepts well; 2) the distances from an important concept to the others should be as long as possible which ensures that important concepts are not similar to each other. K-means++ is adopted to select important concepts. Lastly, we performed extensive evaluations to compare our algorithm with existing ones. The evaluations prove that our approach performs better than the others in most of the cases.

  • Anomaly Prediction Based on Machine Learning for Memory-Constrained Devices

    Yuto KITAGAWA  Tasuku ISHIGOOKA  Takuya AZUMI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/05/30
      Vol:
    E102-D No:9
      Page(s):
    1797-1807

    This paper proposes an anomaly prediction method based on k-means clustering that assumes embedded devices with memory constraints. With this method, by checking control system behavior in detail using k-means clustering, it is possible to predict anomalies. However, continuing clustering is difficult because data accumulate in memory similar to existing k-means clustering method, which is problematic for embedded devices with low memory capacity. Therefore, we also propose k-means clustering to continue clustering for infinite stream data. The proposed k-means clustering method is based on online k-means clustering of sequential processing. The proposed k-means clustering method only stores data required for anomaly prediction and releases other data from memory. Due to these characteristics, the proposed k-means clustering realizes that anomaly prediction is performed by reducing memory consumption. Experiments were performed with actual data of control system for anomaly prediction. Experimental results show that the proposed anomaly prediction method can predict anomaly, and the proposed k-means clustering can predict anomalies similar to standard k-means clustering while reducing memory consumption. Moreover, the proposed k-means clustering demonstrates better results of anomaly prediction than existing online k-means clustering.

  • JPEG Steganalysis Based on Multi-Projection Ensemble Discriminant Clustering

    Yan SUN  Guorui FENG  Yanli REN  

     
    LETTER-Information Network

      Pubricized:
    2018/10/15
      Vol:
    E102-D No:1
      Page(s):
    198-201

    In this paper, we propose a novel algorithm called multi-projection ensemble discriminant clustering (MPEDC) for JPEG steganalysis. The scheme makes use of the optimal projection of linear discriminant analysis (LDA) algorithm to get more projection vectors by using the micro-rotation method. These vectors are similar to the optimal vector. MPEDC combines unsupervised K-means algorithm to make a comprehensive decision classification adaptively. The power of the proposed method is demonstrated on three steganographic methods with three feature extraction methods. Experimental results show that the accuracy can be improved using iterative discriminant classification.

  • Multi Long-Short Term Memory Models for Short Term Traffic Flow Prediction

    Zelong XUE  Yang XUE  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2018/09/18
      Vol:
    E101-D No:12
      Page(s):
    3272-3275

    Many single model methods have been applied to real-time short-term traffic flow prediction. However, since traffic flow data is mixed with a variety of ingredients, the performance of single model is limited. Therefore, we proposed Multi-Long-Short Term Memory Models, which improved traffic flow prediction accuracy comparing with state-of-the-art models.

  • Accelerating a Lloyd-Type k-Means Clustering Algorithm with Summable Lower Bounds in a Lower-Dimensional Space

    Kazuo AOYAMA  Kazumi SAITO  Tetsuo IKEDA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/08/02
      Vol:
    E101-D No:11
      Page(s):
    2773-2783

    This paper presents an efficient acceleration algorithm for Lloyd-type k-means clustering, which is suitable to a large-scale and high-dimensional data set with potentially numerous classes. The algorithm employs a novel projection-based filter (PRJ) to avoid unnecessary distance calculations, resulting in high-speed performance keeping the same results as a standard Lloyd's algorithm. The PRJ exploits a summable lower bound on a squared distance defined in a lower-dimensional space to which data points are projected. The summable lower bound can make the bound tighter dynamically by incremental addition of components in the lower-dimensional space within each iteration although the existing lower bounds used in other acceleration algorithms work only once as a fixed filter. Experimental results on large-scale and high-dimensional real image data sets demonstrate that the proposed algorithm works at high speed and with low memory consumption when large k values are given, compared with the state-of-the-art algorithms.

  • Dynamic Ensemble Selection Based on Rough Set Reduction and Cluster Matching

    Ying-Chun CHEN  Ou LI  Yu SUN  

     
    PAPER-Fundamental Theories for Communications

      Pubricized:
    2018/04/11
      Vol:
    E101-B No:10
      Page(s):
    2196-2202

    Ensemble learning is widely used in the field of sensor network monitoring and target identification. To improve the generalization ability and classification precision of ensemble learning, we first propose an approximate attribute reduction algorithm based on rough sets in this paper. The reduction algorithm uses mutual information to measure attribute importance and introduces a correction coefficient and an approximation parameter. Based on a random sampling strategy, we use the approximate attribute reduction algorithm to implement the multi-modal sample space perturbation. To further reduce the ensemble size and realize a dynamic subset of base classifiers that best matches the test sample, we define a similarity parameter between the test samples and training sample sets that takes the similarity and number of the training samples into consideration. We then propose a k-means clustering-based dynamic ensemble selection algorithm. Simulations show that the multi-modal perturbation method effectively selects important attributes and reduces the influence of noise on the classification results. The classification precision and runtime of experiments demonstrate the effectiveness of the proposed dynamic ensemble selection algorithm.

  • On the Feasibility of an Adaptive Movable Access Point System in a Static Indoor WLAN Environment

    Tomoki MURAKAMI  Shingo OKA  Yasushi TAKATORI  Masato MIZOGUCHI  Fumiaki MAEHARA  

     
    PAPER-Antennas and Propagation

      Pubricized:
    2018/01/10
      Vol:
    E101-B No:7
      Page(s):
    1693-1700

    This paper investigates an adaptive movable access point (AMAP) system and explores its feasibility in a static indoor classroom environment with an applied wireless local area network (WLAN) system. In the AMAP system, the positions of multiple access points (APs) are adaptively moved in accordance with clustered user groups, which ensures effective coverage for non-uniform user distributions over the target area. This enhances the signal to interference and noise power ratio (SINR) performance. In order to derive the appropriate AP positions, we utilize the k-means method in the AMAP system. To accurately estimate the position of each user within the target area for user clustering, we use the general methods of received signal strength indicator (RSSI) or time of arrival (ToA), measured by the WLAN systems. To clarify the basic effectiveness of the AMAP system, we first evaluate the SINR performance of the AMAP system and a conventional fixed-position AP system with equal intervals using computer simulations. Moreover, we demonstrate the quantitative improvement of the SINR performance by analyzing the ToA and RSSI data measured in an indoor classroom environment in order to clarify the feasibility of the AMAP system.

  • Filter Level Pruning Based on Similar Feature Extraction for Convolutional Neural Networks

    Lianqiang LI  Yuhui XU  Jie ZHU  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/01/18
      Vol:
    E101-D No:4
      Page(s):
    1203-1206

    This paper introduces a filter level pruning method based on similar feature extraction for compressing and accelerating the convolutional neural networks by k-means++ algorithm. In contrast to other pruning methods, the proposed method would analyze the similarities in recognizing features among filters rather than evaluate the importance of filters to prune the redundant ones. This strategy would be more reasonable and effective. Furthermore, our method does not result in unstructured network. As a result, it needs not extra sparse representation and could be efficiently supported by any off-the-shelf deep learning libraries. Experimental results show that our filter pruning method could reduce the number of parameters and the amount of computational costs in Lenet-5 by a factor of 17.9× with only 0.3% accuracy loss.

  • An FPGA Realization of a Random Forest with k-Means Clustering Using a High-Level Synthesis Design

    Akira JINGUJI  Shimpei SATO  Hiroki NAKAHARA  

     
    PAPER-Emerging Applications

      Pubricized:
    2017/11/17
      Vol:
    E101-D No:2
      Page(s):
    354-362

    A random forest (RF) is a kind of ensemble machine learning algorithm used for a classification and a regression. It consists of multiple decision trees that are built from randomly sampled data. The RF has a simple, fast learning, and identification capability compared with other machine learning algorithms. It is widely used for application to various recognition systems. Since it is necessary to un-balanced trace for each tree and requires communication for all the ones, the random forest is not suitable in SIMD architectures such as GPUs. Although the accelerators using the FPGA have been proposed, such implementations were based on HDL design. Thus, they required longer design time than the soft-ware based realizations. In the previous work, we showed the high-level synthesis design of the RF including the fully pipelined architecture and the all-to-all communication. In this paper, to further reduce the amount of hardware, we use k-means clustering to share comparators of the branch nodes on the decision tree. Also, we develop the krange tool flow, which generates the bitstream with a few number of hyper parameters. Since the proposed tool flow is based on the high-level synthesis design, we can obtain the high performance RF with short design time compared with the conventional HDL design. We implemented the RF on the Xilinx Inc. ZC702 evaluation board. Compared with the CPU (Intel Xeon (R) E5607 Processor) and the GPU (NVidia Geforce Titan) implementations, as for the performance, the FPGA realization was 8.4 times faster than the CPU one, and it was 62.8 times faster than the GPU one. As for the power consumption efficiency, the FPGA realization was 7.8 times better than the CPU one, and it was 385.9 times better than the GPU one.

  • A Refined Estimator of Multicomponent Third-Order Polynomial Phase Signals

    GuoJian OU  ShiZhong YANG  JianXun DENG  QingPing JIANG  TianQi ZHANG  

     
    PAPER-Fundamental Theories for Communications

      Vol:
    E99-B No:1
      Page(s):
    143-151

    This paper describes a fast and effective algorithm for refining the parameter estimates of multicomponent third-order polynomial phase signals (PPSs). The efficiency of the proposed algorithm is accompanied by lower signal-to-noise ratio (SNR) threshold, and computational complexity. A two-step procedure is used to estimate the parameters of multicomponent third-order PPSs. In the first step, an initial estimate for the phase parameters can be obtained by using fast Fourier transformation (FFT), k-means algorithm and three time positions. In the second step, these initial estimates are refined by a simple moving average filter and singular value decomposition (SVD). The SNR threshold of the proposed algorithm is lower than those of the non-linear least square (NLS) method and the estimation refinement method even though it uses a simple moving average filter. In addition, the proposed method is characterized by significantly lower complexity than computationally intensive NLS methods. Simulations confirm the effectiveness of the proposed method.

  • Expose Spliced Photographic Basing on Boundary and Noise Features

    Jun HOU  Yan CHENG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2015/04/01
      Vol:
    E98-D No:7
      Page(s):
    1426-1429

    The paper proposes an algorithm to expose spliced photographs. Firstly, a graph-based segmentation, which defines a predictor to measure boundary evidence between two neighbor regions, is used to make greedy decision. Then the algorithm gets prediction error image using non-negative linear least-square prediction. For each pair of segmented neighbor regions, the proposed algorithm gathers their statistic features and calculates features of gray level co-occurrence matrix. K-means clustering is applied to create a dictionary, and the vector quantization histogram is taken as the result vector with fixed length. For a tampered image, its noise satisfies Gaussian distribution with zero mean. The proposed method checks the similarity between noise distribution and a zero-mean Gaussian distribution, and follows with the local flatness and texture measurement. Finally, all features are fed to a support vector machine classifier. The algorithm has low computational cost. Experiments show its effectiveness in exposing forgery.

  • Kernel-Reliability-Based K-Means (KRKM) Clustering Algorithm and Image Processing

    Chunsheng HUA  Juntong QI  Jianda HAN  Haiyuan WU  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E97-D No:9
      Page(s):
    2423-2433

    In this paper, we introduced a novel Kernel-Reliability-based K-Means (KRKM) clustering algorithm for categorizing an unknown dataset under noisy condition. Compared with the conventional clustering algorithms, the proposed KRKM algorithm will measure both the reliability and the similarity for classifying data into its neighbor clusters by the dynamic kernel functions, where the noisy data will be rejected by being given low reliability. The reliability for classifying data is measured by a dynamic kernel function whose window size will be determined by the triangular relationship from this data to its two nearest clusters. The similarity from a data item to its neighbor clusters is measured by another adaptive kernel function which takes into account not only the similarity from data to clusters but also that between its two nearest clusters. The main contribution of this work lies in introducing the dynamic kernel functions to evaluate both the reliability and similarity for clustering, which makes the proposed algorithm more efficient in dealing with very strong noisy data. Through various experiments, the efficiency and effectiveness of proposed algorithm have been confirmed.

  • IDDQ Outlier Screening through Two-Phase Approach: Clustering-Based Filtering and Estimation-Based Current-Threshold Determination

    Michihiro SHINTANI  Takashi SATO  

     
    PAPER-Dependable Computing

      Vol:
    E97-D No:8
      Page(s):
    2095-2104

    We propose a novel IDDQ outlier screening flow through a two-phase approach: a clustering-based filtering and an estimation-based current-threshold determination. In the proposed flow, a clustering technique first filters out chips that have high IDDQ current. Then, in the current-threshold determination phase, device-parameters of the unfiltered chips are estimated based on measured IDDQ currents through Bayesian inference. The estimated device-parameters will further be used to determine a statistical leakage current distribution for each test pattern and to calculate a and suitable current-threshold. Numerical experiments using a virtual wafer show that our proposed technique is 14 times more accurate than the neighbor nearest residual (NNR) method and can achieve 80% of the test escape in the case of small leakage faults whose ratios of leakage fault sizes to the nominal IDDQ current are above 40%.

  • Online Learned Player Recognition Model Based Soccer Player Tracking and Labeling for Long-Shot Scenes

    Weicun XU  Qingjie ZHAO  Yuxia WANG  Xuanya LI  

     
    PAPER-Pattern Recognition

      Vol:
    E97-D No:1
      Page(s):
    119-129

    Soccer player tracking and labeling suffer from the similar appearance of the players in the same team, especially in long-shot scenes where the faces and the numbers of the players are too blurry to identify. In this paper, we propose an efficient multi-player tracking system. The tracking system takes the detection responses of a human detector as inputs. To realize real-time player detection, we generate a spatial proposal to minimize the scanning scope of the detector. The tracking system utilizes the discriminative appearance models trained using the online Boosting method to reduce data-association ambiguity caused by the appearance similarity of the players. We also propose to build an online learned player recognition model which can be embedded in the tracking system to approach online player recognition and labeling in tracking applications for long-shot scenes by two stages. At the first stage, to build the model, we utilize the fast k-means clustering method instead of classic k-means clustering to build and update a visual word vocabulary in an efficient online manner, using the informative descriptors extracted from the training samples drawn at each time step of multi-player tracking. The first stage finishes when the vocabulary is ready. At the second stage, given the obtained visual word vocabulary, an incremental vector quantization strategy is used to recognize and label each tracked player. We also perform importance recognition validation to avoid mistakenly recognizing an outlier, namely, people we do not need to recognize, as a player. Both quantitative and qualitative experimental results on the long-shot video clips of a real soccer game video demonstrate that, the proposed player recognition model performs much better than some state-of-the-art online learned models, and our tracking system also performs quite effectively even under very complicated situations.

  • Study of a Reasonable Initial Center Selection Method Applied to a K-Means Clustering

    WonHee LEE  Samuel Sangkon LEE  Dong-Un AN  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E96-D No:8
      Page(s):
    1727-1733

    Clustering methods are divided into hierarchical clustering, partitioning clustering, and more. K-Means is a method of partitioning clustering. We improve the performance of a K-Means, selecting the initial centers of a cluster through a calculation rather than using random selecting. This method maximizes the distance among the initial centers of clusters. Subsequently, the centers are distributed evenly and the results are more accurate than for initial cluster centers selected at random. This is time-consuming, but it can reduce the total clustering time by minimizing allocation and recalculation. Compared with the standard algorithm, F-Measure is more accurate by 5.1%.

  • Invertible Color-to-Monochrome Conversion Based on Color Quantization with Lightness Constraint

    Go TANAKA  Noriaki SUETAKE  Eiji UCHINO  

     
    LETTER-Image

      Vol:
    E95-A No:11
      Page(s):
    2093-2097

    A method obtaining a monochrome image which can rebuild colors is proposed. In this method, colors in an input image are quantized under a lightness constraint and a palette, which represents relationship between quantized colors and gray-levels, is generated. Using the palette, an output monochrome image is obtained. Experiments show that the proposed method obtains good monochrome and rebuilt color images.

1-20hit(36hit)