1-4hit |
Tian XIE Hongchang CHEN Tuosiyu MING Jianpeng ZHANG Chao GAO Shaomei LI Yuehang DING
In partial label data, the ground-truth label of a training example is concealed in a set of candidate labels associated with the instance. As the ground-truth label is inaccessible, it is difficult to train the classifier via the label information. Consequently, manifold structure information is adopted, which is under the assumption that neighbor/similar instances in the feature space have similar labels in the label space. However, the real-world data may not fully satisfy this assumption. In this paper, a partial label metric learning method based on likelihood-ratio test is proposed to make partial label data satisfy the manifold assumption. Moreover, the proposed method needs no objective function and treats the data pairs asymmetrically. The experimental results on several real-world PLL datasets indicate that the proposed method outperforms the existing partial label metric learning methods in terms of classification accuracy and disambiguation accuracy while costs less time.
Yuehang DING Hongtao YU Jianpeng ZHANG Yunjie GU Ruiyang HUANG Shize KANG
Redundant relations refer to explicit relations which can also be deducted implicitly. Although there exist several ontology redundancy elimination methods, they all do not take equivalent relations into consideration. Actually, real ontologies usually contain equivalent relations; their redundancies cannot be completely detected by existing algorithms. Aiming at solving this problem, this paper proposes a super-node based ontology redundancy elimination algorithm. The algorithm consists of super-node transformation and transitive redundancy elimination. During the super-node transformation process, nodes equivalent to each other are transferred into a super-node. Then by deleting the overlapped edges, redundancies relating to equivalent relations are eliminated. During the transitive redundancy elimination process, redundant relations are eliminated by comparing concept nodes' direct and indirect neighbors. Most notably, we proposed a theorem to validate real ontology's irredundancy. Our algorithm outperforms others on both real ontologies and synthetic dynamic ontologies.
Yuehang DING Hongtao YU Jianpeng ZHANG Huanruo LI Yunjie GU
As the superstructure of knowledge graph, ontology has been widely applied in knowledge engineering. However, it becomes increasingly difficult to be practiced and comprehended due to the growing data size and complexity of schemas. Hence, ontology summarization surfaced to enhance the comprehension and application of ontology. Existing summarization methods mainly focus on ontology's topology without taking semantic information into consideration, while human understand information based on semantics. Thus, we proposed a novel algorithm to integrate semantic information and topological information, which enables ontology to be more understandable. In our work, semantic and topological information are represented by concept vectors, a set of high-dimensional vectors. Distances between concept vectors represent concepts' similarity and we selected important concepts following these two criteria: 1) the distances from important concepts to normal concepts should be as short as possible, which indicates that important concepts could summarize normal concepts well; 2) the distances from an important concept to the others should be as long as possible which ensures that important concepts are not similar to each other. K-means++ is adopted to select important concepts. Lastly, we performed extensive evaluations to compare our algorithm with existing ones. The evaluations prove that our approach performs better than the others in most of the cases.
Cong LIU Jianpeng ZHANG Guangming LI Shangce GAO Qingtian ZENG
During the execution of software, tremendous amounts of data can be recorded. By exploiting the execution data, one can discover behavioral models to describe the actual software execution. As a well-known open-source process mining toolkit, ProM integrates quantities of process mining techniques and enjoys a variety of applications in a broad range of areas. How to develop a better ProM software, both from user experience and software performance perspective, are of vital importance. To achieve this goal, we need to investigate the real execution behavior of ProM which can provide useful insights on its usage and how it responds to user operations. This paper aims to propose an effective approach to solve this problem. To this end, we first instrument existing ProM framework to capture execution logs without changing its architecture. Then a two-layered framework is introduced to support accurate ProM behavior discovery by characterizing both user interaction behavior and plug-in calling behavior separately. Next, detailed discovery techniques to obtain user interaction behavior model and plug-in calling behavior model are proposed. All proposed approaches have been implemented.