Yuguang ZHANG Zhiyong ZHANG Wei ZHANG Deming MAO Zhihong RAO
Using a limited number of probes has always been a focus in interface-level network topology probing to discover complete network topologies. Stop-set-based network topology probing methods significantly reduce the number of probes sent but suffer from the side effect of incomplete topology information discovery. This study proposes an optimized probing method based on stop probabilities (SPs) that builds on existing stop-set-based network topology discovery methods to address the issue of incomplete topology information owing to multipath routing. The statistics of repeat nodes (RNs) and multipath routing on the Internet are analyzed and combined with the principles of stop-set-based probing methods, highlighting that stopping probing at the first RN compromises the completeness of topology discovery. To address this issue, SPs are introduced to adjust the stopping strategy upon encountering RNs during probing. A method is designed for generating SPs that achieves high completeness and low cost based on the distribution of the number of RNs. Simulation experiments demonstrate that the proposed stop-probability-based probing method almost completely discovers network nodes and links across different regions and times over a two-year period, while significantly reducing probing redundancy. In addition, the proposed approach balances and optimizes the trade-off between complete topology discovery and reduced probing costs compared with existing topology probing methods. Building on this, the factors influencing the probing cost of the proposed method and methods to further reduce the number of probes while ensuring completeness are analyzed. The proposed method yields universally applicable SPs in the current Internet environment.
Jingyi ZHANG Kuiyu CHEN Yue MA
Previously, convolutional neural networks have made tremendous progress in target recognition based on micro-Doppler radar. However, these studies only considered the presence of one target at a time in the surveillance area. Simultaneous multi-targets recognition for surveillance radar remains a pretty challenging issue. To alleviate this issue, this letter develops a multi-instance multi-label (MIML) learning strategy, which can automatically locate the crucial input patterns that trigger the labels. Benefitting from its powerful target-label relation discovery ability, the proposed framework can be trained with limited supervision. We emphasize that only echoes from single targets are involved in training data, avoiding the preparation and annotation of multi-targets echo in the training stage. To verify the validity of the proposed method, we model two representative ground moving targets, i.e., person and wheeled vehicles, and carry out numerous comparative experiments. The result demonstrates that the developed framework can simultaneously recognize multiple targets and is also robust to variation of the signal-to-noise ratio (SNR), the initial position of targets, and the difference in scattering coefficient.
The increasing attention to the interpretability of machine learning models has led to the development of methods to explain the behavior of black-box models in a post-hoc manner. However, such post-hoc approaches generate a new explanation for every new input, and these explanations cannot be checked by humans in advance. A method that selects decision rules from a finite ruleset as explanation for neural networks has been proposed, but it cannot be used for other models. In this paper, we propose a model-agnostic explanation method to find a pre-verifiable finite ruleset from which a decision rule is selected to support every prediction made by a given black-box model. First, we define an explanation model that selects the rule, from a ruleset, that gives the closest prediction; this rule works as an alternative explanation or supportive evidence for the prediction of a black-box model. The ruleset should have high coverage to give close predictions for future inputs, but it should also be small enough to be checkable by humans in advance. However, minimizing the ruleset while keeping high coverage leads to a computationally hard combinatorial problem. Hence, we show that this problem can be reduced to a weighted MaxSAT problem composed only of Horn clauses, which can be efficiently solved with modern solvers. Experimental results showed that our method found small rulesets such that the rules selected from them can achieve higher accuracy for structured data as compared to the existing method using rulesets of almost the same size. We also experimentally compared the proposed method with two purely rule-based models, CORELS and defragTrees. Furthermore, we examine rulesets constructed for real datasets and discuss the characteristics of the proposed method from different viewpoints including interpretability, limitation, and possible use cases.
Kosuke MATSUDA Kazuhisa SETA Yuki HAYASHI
Self-directed learning in an appropriately designed environment can help learners retain knowledge tied to experience and motivate them to learn more. For teachers, however, it is difficult to design an environment to give to learners and to give feedback that reflects respect for their independent efforts, while for learners, it is difficult to set learning objectives on their own and to construct knowledge correctly based on their own efforts. In this research, we developed a learning support system that provides a mechanism for constructing an observational learning environment using virtual space and that encourages self-directed knowledge discovery. We confirmed that this system contributes to a learner's structural understanding and its retention and to a greater desire to learn at a level comparable to that of concept map creation, another active learning method.
Kazuhito MATSUDA Kouji KURIHARA Kentaro KAWAKAMI Masafumi YAMAZAKI Fuyuka YAMADA Tsuguchika TABARU Ken YOKOYAMA
Statical causal discovery is an approach to infer the causal relationship between observed variables whose causalities are not revealed. LiNGAM (Linear Non-Gaussian Acyclic Model), an algorithm for causal discovery, can calculate the causal relationship uniquely if the independent components of variables are assumed to be non-Gaussian. However, use-cases of LiNGAM are limited because of its O(d3x) computational complexity, where dx is the number of variables. This paper shows two approaches to accelerate LiNGAM causal discovery: SIMD utilization for LiNGAM's mathematical matrixes operations and MPI parallelization. We evaluate the implementation with the supercomputer Fugaku. Using 96 nodes of Fugaku, our improved version can achieve 17,531 times faster than the original OSS implementation (completed in 17.7 hours).
Chen CHANG Jianjun CAO Qin FENG Nianfeng WENG Yuling SHANG
Most existing truth discovery approaches are designed for structured data, and cannot meet the strong need to extract trustworthy information from raw text data for its unique characteristics such as multifactorial property of text answers (i.e., an answer may contain multiple key factors) and the diversity of word usages (i.e., different words may have the same semantic meaning). As for text answers, there are no absolute correctness or errors, most answers may be partially correct, which is quite different from the situation of traditional truth discovery. To solve these challenges, we propose an optimization-based text truth discovery model which jointly groups keywords extracted from the answers of the specific question into a set of multiple factors. Then, we select the subset of multiple factors as identified truth set for each question by parallel ant colony synchronization optimization algorithm. After that, the answers to each question can be ranked based on the similarities between factors answer provided and identified truth factors. The experiment results on real dataset show that though text data structures are complex, our model can still find reliable answers compared with retrieval-based and state-of-the-art approaches.
The methods proposed in this paper enable resynchronization when a synchronization deviation occurs in a sensor node without a beacon or an ack in a wireless sensor network under ultra-limited but stable resources such as the energy generated from tiny solar cell batteries. The method for a single-hop network is straightforward; when a receiver does not receive data, it is simply placed in recovery mode, in which the receiver sets its cycle length TB to (b±γ)T, where b is non-negative integer, 0 < γ < 1, and T is its cycle length in normal mode, and in which the receiver sets its active interval WB to a value that satisfies WB ≥ W + γT, where W is its active interval in normal mode. In contrast, a sender stays in normal mode. Resynchronization methods for linear multi-hop and tree-based multi-hop sensor networks are constructed using the method for a single-hop network. All the methods proposed here are complete because they are always able to resynchronize networks. The results of simulations based on the resynchronization methods are given and those of an experiment using actual sensor nodes with wireless modules are also presented, which show that the methods are feasible.
Cong LIU Jianpeng ZHANG Guangming LI Shangce GAO Qingtian ZENG
During the execution of software, tremendous amounts of data can be recorded. By exploiting the execution data, one can discover behavioral models to describe the actual software execution. As a well-known open-source process mining toolkit, ProM integrates quantities of process mining techniques and enjoys a variety of applications in a broad range of areas. How to develop a better ProM software, both from user experience and software performance perspective, are of vital importance. To achieve this goal, we need to investigate the real execution behavior of ProM which can provide useful insights on its usage and how it responds to user operations. This paper aims to propose an effective approach to solve this problem. To this end, we first instrument existing ProM framework to capture execution logs without changing its architecture. Then a two-layered framework is introduced to support accurate ProM behavior discovery by characterizing both user interaction behavior and plug-in calling behavior separately. Next, detailed discovery techniques to obtain user interaction behavior model and plug-in calling behavior model are proposed. All proposed approaches have been implemented.
Huan-Bang LI Ryu MIURA Fumihide KOJIMA
Device-to-device (D2D) networks are expected to play a number of roles, such as increasing frequency spectrum efficiency and improving throughput at hot-spots. In this paper, our interest is on the potential of D2D on reducing delivery latency. To enable fast D2D network forming, quick device discovery is essential. For quickening device discovery, we propose a method of defining and using common channel and group channels so as to avoid the channel scan uncertainty faced by the conventional method. Rules for using the common channel and group channels are designed. We evaluate and compare the discovery performance of the proposed method with conventional method by using the superframe structure defined in IEEE 802.15.8 and a general discovery procedure. IEEE 802.15.8 is a standard under development for fully distributed D2D communications. A Netlogo simulator is used to perform step by step MAC simulations. The simulation results verify the effectiveness of the proposed method.
Shouhei FUKUNAGA Yoshimasa TAKABATAKE Tomohiro I Hiroshi SAKAMOTO
A grammar compression is a restricted context-free grammar (CFG) that derives a single string deterministically. The goal of a grammar compression algorithm is to develop a smaller CFG by finding and removing duplicate patterns, which is simply a frequent pattern discovery process. Any frequent pattern can be obtained in linear time; however, a huge working space is required for longer patterns, and the entire string must be preloaded into memory. We propose an online algorithm to address this problem approximately within compressed space. For an input sequence of symbols, a1,a2,..., let Gi be a grammar compression for the string a1a2…ai. In this study, an online algorithm is considered one that can compute Gi+1 from (Gi,ai+1) without explicitly decompressing Gi. Here, let G be a grammar compression for string S. We say that variable X approximates a substring P of S within approximation ratio δ iff for any interval [i,j] with P=S[i,j], the parse tree of G has a node labeled with X that derives S[l,r] for a subinterval [l,r] of [i,j] satisfying |[l,r]|≥δ|[i,j]|. Then, G solves the frequent pattern discovery problem approximately within δ iff for any frequent pattern P of S, there exists a variable that approximates P within δ. Here, δ is called the approximation ratio of G for S. Previously, the best approximation ratio obtained by a polynomial time algorithm was Ω(1/lg2|P|). The main contribution of this work is to present a new lower bound Ω(1/<*|S|lg|P|) that is smaller than the previous bound when lg*|S|
Michael HECK Sakriani SAKTI Satoshi NAKAMURA
In this work we utilize feature transformations that are common in supervised learning without having prior supervision, with the goal to improve Dirichlet process Gaussian mixture model (DPGMM) based acoustic unit discovery. The motivation of using such transformations is to create feature vectors that are more suitable for clustering. The need of labels for these methods makes it difficult to use them in a zero resource setting. To overcome this issue we utilize a first iteration of DPGMM clustering to generate frame based class labels for the target data. The labels serve as basis for learning linear discriminant analysis (LDA), maximum likelihood linear transform (MLLT) and feature-space maximum likelihood linear regression (fMLLR) based feature transformations. The novelty of our approach is the way how we use a traditional acoustic model training pipeline for supervised learning to estimate feature transformations in a zero resource scenario. We show that the learned transformations greatly support the DPGMM sampler in finding better clusters, according to the performance of the DPGMM posteriorgrams on the ABX sound class discriminability task. We also introduce a method for combining posteriorgram outputs of multiple clusterings and demonstrate that such combinations can further improve sound class discriminability.
Natthawut KERTKEIDKACHORN Ryutaro ICHISE
Knowledge graphs (KG) play a crucial role in many modern applications. However, constructing a KG from natural language text is challenging due to the complex structure of the text. Recently, many approaches have been proposed to transform natural language text to triples to obtain KGs. Such approaches have not yet provided efficient results for mapping extracted elements of triples, especially the predicate, to their equivalent elements in a KG. Predicate mapping is essential because it can reduce the heterogeneity of the data and increase the searchability over a KG. In this article, we propose T2KG, an automatic KG creation framework for natural language text, to more effectively map natural language text to predicates. In our framework, a hybrid combination of a rule-based approach and a similarity-based approach is presented for mapping a predicate to its corresponding predicate in a KG. Based on experimental results, the hybrid approach can identify more similar predicate pairs than a baseline method in the predicate mapping task. An experiment on KG creation is also conducted to investigate the performance of the T2KG. The experimental results show that the T2KG also outperforms the baseline in KG creation. Although KG creation is conducted in open domains, in which prior knowledge is not provided, the T2KG still achieves an F1 score of approximately 50% when generating triples in the KG creation task. In addition, an empirical study on knowledge population using various text sources is conducted, and the results indicate the T2KG could be used to obtain knowledge that is not currently available from DBpedia.
Jae-Yoon JUNG Gyunyoung HEO Kyuhyup OH
Smart card payment systems provide a convenient billing mechanism for public transportation providers and passengers. In this paper, a smart card-based transit log is used to reveal functionally related regions in a city, which are called zones. To discover significant zones based on the transit log data, two algorithms, minimum spanning trees and agglomerative hierarchical clustering, are extended by considering the additional factors of geographical distance and adjacency. The hierarchical spatial geocoding system, called Geohash, is adopted to merge nearby bus stops to a region before zone discovery. We identify different urban zones that contain functionally interrelated regions based on passenger trip data stored in the smart card-based transit log by manipulating the level of abstraction and the adjustment parameters.
Liangliang ZHANG Longqi YANG Yong GONG Zhisong PAN Yanyan ZHANG Guyu HU
In multi-view social networks field, a flexible Nonnegative Matrix Factorization (NMF) based framework is proposed which integrates multi-view relation data and feature data for community discovery. Benefit with a relaxed pairwise regularization and a novel orthogonal regularization, it outperforms the-state-of-art algorithms on five real-world datasets in terms of accuracy and NMI.
Semih YUMUSAK Erdogan DOGDU Halife KODAZ Andreas KAMILARIS Pierre-Yves VANDENBUSSCHE
Linked data endpoints are online query gateways to semantically annotated linked data sources. In order to query these data sources, SPARQL query language is used as a standard. Although a linked data endpoint (i.e. SPARQL endpoint) is a basic Web service, it provides a platform for federated online querying and data linking methods. For linked data consumers, SPARQL endpoint availability and discovery are crucial for live querying and semantic information retrieval. Current studies show that availability of linked datasets is very low, while the locations of linked data endpoints change frequently. There are linked data respsitories that collect and list the available linked data endpoints or resources. It is observed that around half of the endpoints listed in existing repositories are not accessible (temporarily or permanently offline). These endpoint URLs are shared through repository websites, such as Datahub.io, however, they are weakly maintained and revised only by their publishers. In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a “search keyword” discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, the collected search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this method, most of the currently listed SPARQL endpoints in existing endpoint repositories, as well as a significant number of new SPARQL endpoints, have been discovered. We analyze our findings in comparison to Datahub collection in detail.
Jianquan LIU Shoji NISHIMURA Takuya ARAKI Yuichi NAKAMURA
Similarity search is an important and fundamental problem, and thus widely used in various fields of computer science including multimedia, computer vision, database, information retrieval, etc. Recently, since loitering behavior often leads to abnormal situations, such as pickpocketing and terrorist attacks, its analysis attracts increasing attention from research communities. In this paper, we present AntiLoiter, a loitering discovery system adopting efficient similarity search on surveillance videos. As we know, most of existing systems for loitering analysis, mainly focus on how to detect or identify loiterers by behavior tracking techniques. However, the difficulties of tracking-based methods are known as that their analysis results are heavily influenced by occlusions, overlaps, and shadows. Moreover, tracking-based methods need to track the human appearance continuously. Therefore, existing methods are not readily applied to real-world surveillance cameras due to the appearance discontinuity of criminal loiterers. To solve this problem, we abandon the tracking method, instead, propose AntiLoiter to efficiently discover loiterers based on their frequent appearance patterns in longtime multiple surveillance videos. In AntiLoiter, we propose a novel data structure Luigi that indexes data using only similarity value returned by a corresponding function (e.g., face matching). Luigi is adopted to perform efficient similarity search to realize loitering discovery. We conducted extensive experiments on both synthetic and real surveillance videos to evaluate the efficiency and efficacy of our approach. The experimental results show that our system can find out loitering candidates correctly and outperforms existing method by 100 times in terms of runtime.
Asako SOGA Bin UMINO Yuho YAZAKI Motoko HIRAYAMA
This paper reports an assessment of the feasibility and the practicality of a creation support system for contemporary dance e-learning. We developed a Body-part Motion Synthesis System (BMSS) that allows users to create choreographies by synthesizing body-part motions to increase the effect of learning contemporary dance choreography. Short created choreographies can be displayed as animation using 3DCG characters. The system targets students who are studying contemporary dance and is designed to promote the discovery learning of contemporary dance. We conducted a series of evaluation experiments for creating contemporary dance choreographies to verify the learning effectiveness of our system as a support system for discovery learning. As a consequence of experiments with 26 students who created contemporary dances, we verified that BMSS is a helpful creation training tool to discover new choreographic methods, new dance movements, and new awareness of their bodies.
Woongsup LEE Juyeop KIM Dong-Ho CHO
We herein describe an autonomous peer discovery scheme for Device-to-Device (D2D) communications. With the increasing popularity of D2D communications, an efficient means of finding the neighboring node, i.e., peer discovery, is required. To this end, we propose a new autonomous peer discovery scheme that uses azimuth spread (AS), delay spread (DS), and shadow fading of the uplink pilot from each mobile station (MS). Given that AS, DS, and shadow fading are spatially correlated, nodes that have similar values must be neighbors. The proposed scheme filters out the MSs that are unlikely to be neighbors and uses the Kolmogorov-Smirnov (K-S) test to improve the accuracy of neighbor discovery. Unlike previous peer discovery schemes that incur additional signaling overheads, our proposal finds neighboring nodes by using the existing uplink pilot transmission from MSs such that neighboring peers can be found autonomously. Through analysis and simulation, we show that neighboring MSs can be found accurately with low latency.
Meng SUN Hugo VAN HAMME Yimin WANG Xiongwei ZHANG
Unsupervised spoken unit discovery or zero-source speech recognition is an emerging research topic which is important for spoken document analysis of languages or dialects with little human annotation. In this paper, we extend our earlier joint training framework for unsupervised learning of discrete density HMM to continuous density HMM (CDHMM) and apply it to spoken unit discovery. In the proposed recipe, we first cluster a group of Gaussians which then act as initializations to the joint training framework of nonnegative matrix factorization and semi-continuous density HMM (SCDHMM). In SCDHMM, all the hidden states share the same group of Gaussians but with different mixture weights. A CDHMM is subsequently constructed by tying the top-N activated Gaussians to each hidden state. Baum-Welch training is finally conducted to update the parameters of the Gaussians, mixture weights and HMM transition probabilities. Experiments were conducted on word discovery from TIDIGITS and phone discovery from TIMIT. For TIDIGITS, units were modeled by 10 states which turn out to be strongly related to words; while for TIMIT, units were modeled by 3 states which are likely to be phonemes.
Minjoong RIM Gyuhak YEO Seungyeob CHAE Chung G. KANG
One of the most important processes in cellular-assisted device-to-device (D2D) communications is device discovery, which decides whether two devices are located close to each other. The discovery process is performed by devices periodically transmitting discovery signals so that neighbor devices can receive them to recognize their proximate physical presence. While a fixed set of discovery parameters are used regardless of devices in most of the existing works, discovery periods are not necessarily the same for all devices, as they can be set differently depending on their channel conditions and operational environments, e.g., the mobile speeds. In this paper, we present an optimization framework to determine the discovery periods for individual devices in cellular-assisted D2D communication systems. We consider two different types of optimization problems, taking the different user velocities into account: minimizing the average number of undiscovered device pairs, and minimizing the number of discovery signal transmissions while maintaining the average number of undiscovered device pairs for each device less than a pre-specified threshold. We present analytical and simulation results to demonstrate that short discovery periods can be beneficial to high-mobility devices, while longer discovery periods are allowed for devices with lower velocities.