To fully exploit the attribute information in graphs and dynamically fuse the features from different modalities, this letter proposes the Attributed Graph Clustering Network with Adaptive Feature Fusion (AGC-AFF) for graph clustering, where an Attribute Reconstruction Graph Autoencoder (ARGAE) with masking operation learns to reconstruct the node attributes and adjacency matrix simultaneously, and an Adaptive Feature Fusion (AFF) mechanism dynamically fuses the features from different modules based on node attention. Extensive experiments on various benchmark datasets demonstrate the effectiveness of the proposed method.
Mikiya YOSHIDA Yusuke ITO Yurino SATO Hiroyuki KOGA
Information-centric networking (ICN) provides low-latency content delivery with in-network caching, but delivery latency depends on cache distance from consumers. To reduce delivery latency, a scheme to cluster domains and retain the main popular content in each cluster with a cache distribution range has been proposed, which enables consumers to retrieve content from neighboring clusters/caches. However, when the distribution of content popularity changes, all content caches may not be distributed adequately in a cluster, so consumers cannot retrieve them from nearby caches. We therefore propose a dynamic clustering scheme to adjust the cache distribution range in accordance with the change in content popularity and evaluate the effectiveness of the proposed scheme through simulation.
Mingyu LI Jihang YIN Yonggang XU Gang HUA Nian XU
Aiming at the problem of “energy hole” caused by random distribution of nodes in large-scale wireless sensor networks (WSNs), this paper proposes an adaptive energy-efficient balanced uneven clustering routing protocol (AEBUC) for WSNs. The competition radius is adaptively adjusted based on the node density and the distance from candidate cluster head (CH) to base station (BS) to achieve scale-controlled adaptive optimal clustering; in candidate CHs, the energy relative density and candidate CH relative density are comprehensively considered to achieve dynamic CH selection. In the inter-cluster communication, based on the principle of energy balance, the relay communication cost function is established and combined with the minimum spanning tree method to realize the optimized inter-cluster multi-hop routing, forming an efficient communication routing tree. The experimental results show that the protocol effectively saves network energy, significantly extends network lifetime, and better solves the “energy hole” problem.
Zhaohu LIU Peng SONG Jinshuai MU Wenming ZHENG
Most existing multi-view subspace clustering approaches only capture the inter-view similarities between different views and ignore the optimal local geometric structure of the original data. To this end, in this letter, we put forward a novel method named shared latent embedding learning for multi-view subspace clustering (SLE-MSC), which can efficiently capture a better latent space. To be specific, we introduce a pseudo-label constraint to capture the intra-view similarities within each view. Meanwhile, we utilize a novel optimal graph Laplacian to learn the consistent latent representation, in which the common manifold is considered as the optimal manifold to obtain a more reasonable local geometric structure. Comprehensive experimental results indicate the superiority and effectiveness of the proposed method.
Keishi HANAKAGO Ryo TAKAHASHI Takahiro OHYAMA Fumiyuki ADACHI
In this study, an overloaded large-scale distributed antenna network is considered, for which the number of active users is larger than that of antennas distributed in a base station coverage area (called a cell). To avoid overload, users in each cell are divided into multiple user groups, and, to reduce the computational complexity required for multi-user multiple-input and multiple-output (MU-MIMO), users in each user group are grouped into multiple user clusters so that cluster-wise distributed MU-MIMO can be performed in parallel in each user group. However, as the network size increases, conventional computational methods may not be able to solve combinatorial optimization problems, such as user scheduling and user clustering, which are required for performing cluster-wise distributed MU-MIMO in a finite amount of time. In this study, we apply quantum computing to solve the combinatorial optimization problems of user scheduling and clustering for an overloaded distributed antenna network and propose a quantum computing-based user scheduling and clustering method. The results of computer simulations indicate that as the technology of quantum computers and their related algorithms evolves in the future, the proposed method can realize large-scale dense wireless systems and realize real-time optimization with a short optimization execution cycle.
Xingyu QIAN Xiaogang CHEN Aximu YUEMAIER Shunfen LI Weibang DAI Zhitang SONG
Video-based action recognition encompasses the recognition of appearance and the classification of action types. This work proposes a discrete-temporal-sequence-based motion tendency clustering framework to implement motion clustering by extracting motion tendencies and self-supervised learning. A published traffic intersection dataset (inD) and a self-produced gesture video set are used for evaluation and to validate the motion tendency action recognition hypothesis.
Yun WU Yu SHI Jieming YANG Lishan BAO Chunzhe LI
In the Artificial Intelligence for IT Operations scenarios, KPI (Key Performance Indicator) is a very important operation and maintenance monitoring indicator, and research on KPI anomaly detection has also become a hot spot in recent years. Aiming at the problems of low detection efficiency and insufficient representation learning of existing methods, this paper proposes a fast clustering-based KPI anomaly detection method HCE-DWL. This paper firstly adopts the combination of hierarchical agglomerative clustering (HAC) and deep assignment based on CNN-Embedding (CE) to perform cluster analysis (that is HCE) on KPI data, so as to improve the clustering efficiency of KPI data, and then separately the centroid of each KPI cluster and its Transformed Outlier Scores (TOS) are given weights, and finally they are put into the LightGBM model for detection (the Double Weight LightGBM model, referred to as DWL). Through comparative experimental analysis, it is proved that the algorithm can effectively improve the efficiency and accuracy of KPI anomaly detection.
Yahui TANG Tong LI Rui ZHU Cong LIU Shuaipeng ZHANG
Service mining aims to use process mining for the analysis of services, making it possible to discover, analyze, and improve service processes. In the context of Web services, the recording of all kinds of events related to activities is possible, which can be used to extract new information of service processes. However, the distributed nature of the services tends to generate large-scale service event logs, which complicates the discovery and analysis of service processes. To solve this problem, this research focus on the existing large-scale service event logs, a hybrid genetic service mining based on a trace clustering population method (HGSM) is proposed. By using trace clustering, the complex service system is divided into multiple functionally independent components, thereby simplifying the mining environment; And HGSM improves the mining efficiency of the genetic mining algorithm from the aspects of initial population quality improvement and genetic operation improvement, makes it better handle large service event logs. Experimental results demonstrate that compare with existing state-of-the-art mining methods, HGSM has better characteristics to handle large service event logs, in terms of both the mining efficiency and model quality.
Jinho CHOI Taehwa LEE Kwanwoo KIM Minjae SEO Jian CUI Seungwon SHIN
Bitcoin is currently a hot issue worldwide, and it is expected to become a new legal tender that replaces the current currency started with El Salvador. Due to the nature of cryptocurrency, however, difficulties in tracking led to the arising of misuses and abuses. Consequently, the pain of innocent victims by exploiting these bitcoins abuse is also increasing. We propose a way to detect new signatures by applying two-fold NLP-based clustering techniques to text data of Bitcoin abuse reports received from actual victims. By clustering the reports of text data, we were able to cluster the message templates as the same campaigns. The new approach using the abuse massage template representing clustering as a signature for identifying abusers is much efficacious.
Abbas JAMALIPOUR Forough SHIRIN ABKENAR
In this paper, we propose a novel Hybrid-Hierarchical spatial-aerial-Terrestrial Edge-Centric (H2TEC) for the space-air integrated Internet of Things (IoT) networks. (H2TEC) comprises unmanned aerial vehicles (UAVs) that act as mobile fog nodes to provide the required services for terminal nodes (TNs) in cooperation with the satellites. TNs in (H2TEC) offload their generated tasks to the UAVs for further processing. Due to the limited energy budget of TNs, a novel task allocation protocol, named TOP, is proposed to minimize the energy consumption of TNs while guaranteeing the outage probability and network reliability for which the transmission rate of TNs is optimized. TOP also takes advantage of the energy harvesting by which the low earth orbit satellites transfer energy to the UAVs when the remaining energy of the UAVs is below a predefined threshold. To this end, the harvested power of the UAVs is optimized alongside the corresponding harvesting time so that the UAVs can improve the network throughput via processing more bits. Numerical results reveal that TOP outperforms the baseline method in critical situations that more power is required to process the task. It is also found that even in such situations, the energy harvesting mechanism provided in the TOP yields a more efficient network throughput.
Ryo TAKAHASHI Hidenori MATSUO Fumiyuki ADACHI
Ultra-densification of radio access network (RAN) is essential to efficiently handle the ever-increasing mobile data traffic. In this paper, a joint multi-layered user clustering and scheduling is proposed as an inter-cluster interference coordination scheme for ultra-dense RAN using cluster-wise distributed MIMO transmission/reception. The proposed joint multi-layered user clustering and scheduling consists of user clustering using the K-means algorithm, user-cluster layering (called multi-layering) based on the interference-offset-distance (IOD), cluster-antenna association on each layer, and layer-wise round-robin-type scheduling. The user capacity, the sum capacity, and the fairness are evaluated by computer simulations to show the effectiveness of the proposed joint multi-layered user clustering and scheduling. Also shown are uplink and downlink capacity comparisons and optimal IOD setting considering the trade-off between inter-cluster interference mitigation and transmission opportunity.
Thanh Vu DANG Hoang Trong VO Gwang Hyun YU Jin Young KIM
Capsules are fundamental informative units that are introduced into capsule networks to manipulate the hierarchical presentation of patterns. The part-hole relationship of an entity is learned through capsule layers, using a routing-by-agreement mechanism that is approximated by a voting procedure. Nevertheless, existing routing methods are computationally inefficient. We address this issue by proposing a novel routing mechanism, namely “shortcut routing”, that directly learns to activate global capsules from local capsules. In our method, the number of operations in the routing procedure is reduced by omitting the capsules in intermediate layers, resulting in lighter routing. To further address the computational problem, we investigate an attention-based approach, and propose fuzzy coefficients, which have been found to be efficient than mixture coefficients from EM routing. Our method achieves on-par classification results on the Mnist (99.52%), smallnorb (93.91%), and affNist (89.02%) datasets. Compared to EM routing, our fuzzy-based and attention-based routing methods attain reductions of 1.42 and 2.5 in terms of the number of calculations.
Byeonghak KIM Murray LOEW David K. HAN Hanseok KO
To date, many studies have employed clustering for the classification of unlabeled data. Deep separate clustering applies several deep learning models to conventional clustering algorithms to more clearly separate the distribution of the clusters. In this paper, we employ a convolutional autoencoder to learn the features of input images. Following this, k-means clustering is conducted using the encoded layer features learned by the convolutional autoencoder. A center loss function is then added to aggregate the data points into clusters to increase the intra-cluster homogeneity. Finally, we calculate and increase the inter-cluster separability. We combine all loss functions into a single global objective function. Our new deep clustering method surpasses the performance of existing clustering approaches when compared in experiments under the same conditions.
The pervasive application of Small Private Online Course (SPOC) provides a powerful impetus for the reform of higher education. During the teaching process, a teacher needs to understand the difficulty of SPOC videos for students in real time to be more focused on the difficulties and key points of the course in a flipped classroom. However, existing educational data mining techniques pay little attention to the SPOC video difficulty clustering or classification. In this paper, we propose an approach to cluster SPOC videos based on the difficulty using video-watching data in a SPOC. Specifically, a bipartite graph that expresses the learning relationship between students and videos is constructed based on the number of video-watching times. Then, the SimRank++ algorithm is used to measure the similarity of the difficulty between any two videos. Finally, the spectral clustering algorithm is used to implement the video clustering based on the obtained similarity of difficulty. Experiments on a real data set in a SPOC show that the proposed approach has better clustering accuracy than other existing ones. This approach facilitates teachers learn about the overall difficulty of a SPOC video for students in real time, and therefore knowledge points can be explained more effectively in a flipped classroom.
Kenichi KAWAMURA Akiyoshi INOKI Shouta NAKAYAMA Keisuke WAKAO Yasushi TAKATORI
A method is presented for increasing wireless LAN (WLAN) capacity in high-density environments with IEEE 802.11ax systems. We propose using coordinated scheduling of trigger frames based on our mobile cooperative control concept. High-density WLAN systems are managed by a management server, which gathers wireless environmental information from user equipment through cellular access. Hierarchical clustering of basic service sets is used to form synchronized clusters to reduce interference and increase throughput of high-density WLAN systems based on mobile cooperative control. This method increases uplink capacity by up to 19.4% and by up to 11.3% in total when WLAN access points are deployed close together. This control method is potentially effective for IEEE 802.11ax WLAN systems utilized as 5G mobile network components.
Qian WANG Qingmei ZHOU Wei ZHAO Xuangou WU Xun SHAO
In the age of big data, recommendation systems provide users with fast access to interesting information, resulting to a significant commercial value. However, the extreme sparseness of user assessment data is one of the key factors that lead to the poor performance of recommendation algorithms. To address this problem, we propose a spectral clustering recommendation scheme with low-rank matrix completion and spectral clustering. Our scheme exploits spectral clustering to achieve the division of a similar user group. Meanwhile, the low-rank matrix completion is used to effectively predict un-rated items in the sub-matrix of the spectral clustering. With the real dataset experiment, the results show that our proposed scheme can effectively improve the prediction accuracy of un-rated items.
Uraiwan BUATOOM Waree KONGPRAWECHNON Thanaruk THEERAMUNKONG
The outcome of document clustering depends on the scheme used to assign a weight to each term in a document. While recent works have tried to use distributions related to class to enhance the discrimination ability. It is worth exploring whether a deviation approach or an entropy approach is more effective. This paper presents a comparison between deviation-based distribution and entropy-based distribution as constraints in term weighting. In addition, their potential combinations are investigated to find optimal solutions in guiding the clustering process. In the experiments, the seeded k-means method is used for clustering, and the performances of deviation-based, entropy-based, and hybrid approaches, are analyzed using two English and one Thai text datasets. The result showed that the deviation-based distribution outperformed the entropy-based distribution, and a suitable combination of these distributions increases the clustering accuracy by 10%.
In this paper, we consider the clustering problem of independent general subspaces. That is, with given data points lay near or on the union of independent low-dimensional linear subspaces, we aim to recover the subspaces and assign the corresponding label to each data point. To settle this problem, we take advantages of both greedy strategy and energy minimization strategy to propose a simple yet effective algorithm based on the assumption that an m-branched (i.e., perfect m-ary) tree which is constructed by collecting m-nearest neighbor points in each node has a high probability of containing the near-exact subspace. Specifically, at first, subspace candidates are enumerated by multiple m-branched trees. Each tree starts with a data point and grows by collecting nearest neighbors in the breadth-first search order. Then, subspace proposals are further selected from the enumeration to initialize the energy minimization algorithm. Eventually, both the proposals and the labeling result are finalized by iterative re-estimation and labeling. Experiments with both synthetic and real-world data show that the proposed method can outperform state-of-the-art methods and is practical in real application.
Lianqiang LI Jie ZHU Ming-Ting SUN
Convolutional Neural Networks (CNNs) usually have millions or even billions of parameters, which make them hard to be deployed into mobile devices. In this work, we present a novel filter-level pruning method to alleviate this issue. More concretely, we first construct an undirected fully connected graph to represent a pre-trained CNN model. Then, we employ the spectral clustering algorithm to divide the graph into some subgraphs, which is equivalent to clustering the similar filters of the CNN into the same groups. After gaining the grouping relationships among the filters, we finally keep one filter for one group and retrain the pruned model. Compared with previous pruning methods that identify the redundant filters by heuristic ways, the proposed method can select the pruning candidates more reasonably and precisely. Experimental results also show that our proposed pruning method has significant improvements over the state-of-the-arts.
Hongcui WANG Shanshan LIU Di JIN Lantian LI Jianwu DANG
Recognizing the different segments of speech belonging to the same speaker is an important speech analysis task in various applications. Recent works have shown that there was an underlying manifold on which speaker utterances live in the model-parameter space. However, most speaker clustering methods work on the Euclidean space, and hence often fail to discover the intrinsic geometrical structure of the data space and fail to use such kind of features. For this problem, we consider to convert the speaker i-vector representation of utterances in the Euclidean space into a network structure constructed based on the local (k) nearest neighbor relationship of these signals. We then propose an efficient community detection model on the speaker content network for clustering signals. The new model is based on the probabilistic community memberships, and is further refined with the idea that: if two connected nodes have a high similarity, their community membership distributions in the model should be made close. This refinement enhances the local invariance assumption, and thus better respects the structure of the underlying manifold than the existing community detection methods. Some experiments are conducted on graphs built from two Chinese speech databases and a NIST 2008 Speaker Recognition Evaluations (SREs). The results provided the insight into the structure of the speakers present in the data and also confirmed the effectiveness of the proposed new method. Our new method yields better performance compared to with the other state-of-the-art clustering algorithms. Metrics for constructing speaker content graph is also discussed.