Semih YUMUSAK Erdogan DOGDU Halife KODAZ
Linked data sets are created using semantic Web technologies and they are usually big and the number of such datasets is growing. The query execution is therefore costly, and knowing the content of data in such datasets should help in targeted querying. Our aim in this paper is to classify linked data sets by their knowledge content. Earlier projects such as LOD Cloud, LODStats, and SPARQLES analyze linked data sources in terms of content, availability and infrastructure. In these projects, linked data sets are classified and tagged principally using VoID vocabulary and analyzed according to their content, availability and infrastructure. Although all linked data sources listed in these projects appear to be classified or tagged, there are a limited number of studies on automated tagging and classification of newly arriving linked data sets. Here, we focus on automated classification of linked data sets using semantic scoring methods. We have collected the SPARQL endpoints of 1,328 unique linked datasets from Datahub, LOD Cloud, LODStats, SPARQLES, and SpEnD projects. We have then queried textual descriptions of resources in these data sets using their rdfs:comment and rdfs:label property values. We analyzed these texts in a similar manner with document analysis techniques by assuming every SPARQL endpoint as a separate document. In this regard, we have used WordNet semantic relations library combined with an adapted term frequency-inverted document frequency (tfidf) analysis on the words and their semantic neighbours. In WordNet database, we have extracted information about comment/label objects in linked data sources by using hypernym, hyponym, homonym, meronym, region, topic and usage semantic relations. We obtained some significant results on hypernym and topic semantic relations; we can find words that identify data sets and this can be used in automatic classification and tagging of linked data sources. By using these words, we experimented different classifiers with different scoring methods, which results in better classification accuracy results.
Wenpeng LU Hao WU Ping JIAN Yonggang HUANG Heyan HUANG
Word sense disambiguation (WSD) is to identify the right sense of ambiguous words via mining their context information. Previous studies show that classifier combination is an effective approach to enhance the performance of WSD. In this paper, we systematically review state-of-the-art methods for classifier combination based WSD, including probability-based and voting-based approaches. Furthermore, a new classifier combination based WSD, namely the probability weighted voting method with dynamic self-adaptation, is proposed in this paper. Compared with existing approaches, the new method can take into consideration both the differences of classifiers and ambiguous instances. Exhaustive experiments are performed on a real-world dataset, the results show the superiority of our method over state-of-the-art methods.
Seongkyu MUN Suwon SHON Wooil KIM David K. HAN Hanseok KO
Various types of classifiers and feature extraction methods for acoustic scene classification have been recently proposed in the IEEE Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Challenge Task 1. The results of the final evaluation, however, have shown that even top 10 ranked teams, showed extremely low accuracy performance in particular class pairs with similar sounds. Due to such sound classes being difficult to distinguish even by human ears, the conventional deep learning based feature extraction methods, as used by most DCASE participating teams, are considered facing performance limitations. To address the low performance problem in similar class pair cases, this letter proposes to employ a recurrent neural network (RNN) based source separation for each class prior to the classification step. Based on the fact that the system can effectively extract trained sound components using the RNN structure, the mid-layer of the RNN can be considered to capture discriminative information of the trained class. Therefore, this letter proposes to use this mid-layer information as novel discriminative features. The proposed feature shows an average classification rate improvement of 2.3% compared to the conventional method, which uses additional classifiers for the similar class pair issue.
Donghyun YOO Youngjoong KO Jungyun SEO
In this paper, we propose a deep learning based model for classifying speech-acts using a convolutional neural network (CNN). The model uses some bigram features including parts-of-speech (POS) tags and dependency-relation bigrams, which represent syntactic structural information in utterances. Previous classification approaches using CNN have commonly exploited word embeddings using morpheme unigrams. However, the proposed model first extracts two different bigram features that well reflect the syntactic structure of utterances and then represents them as a vector representation using a word embedding technique. As a result, the proposed model using bigram embeddings achieves an accuracy of 89.05%. Furthermore, the accuracy of this model is relatively 2.8% higher than that of competitive models in previous studies.
Lu SUN Mineichi KUDO Keigo KIMURA
Multi-label classification is an appealing and challenging supervised learning problem, where multiple labels, rather than a single label, are associated with an unseen test instance. To remove possible noises in labels and features of high-dimensionality, multi-label dimension reduction has attracted more and more attentions in recent years. The existing methods usually suffer from several problems, such as ignoring label outliers and label correlations. In addition, most of them emphasize on conducting dimension reduction in an unsupervised or supervised way, therefore, unable to utilize the label information or a large amount of unlabeled data to improve the performance. In order to cope with these problems, we propose a novel method termed Robust sEmi-supervised multi-lAbel DimEnsion Reduction, shortly READER. From the viewpoint of empirical risk minimization, READER selects most discriminative features for all the labels in a semi-supervised way. Specifically, the ℓ2,1-norm induced loss function and regularization term make READER robust to the outliers in the data points. READER finds a feature subspace so as to keep originally neighbor instances close and embeds labels into a low-dimensional latent space nonlinearly. To optimize the objective function, an efficient algorithm is developed with convergence property. Extensive empirical studies on real-world datasets demonstrate the superior performance of the proposed method.
Sanay MUHAMMAD UMAR SAEED Syed MUHAMMAD ANWAR Muhammad MAJID
A study on quantification of human stress using low beta waves of electroencephalography (EEG) is presented. For the very first time the importance of low beta waves as a feature for quantification of human stress is highlighted. In this study, there were twenty-eight participants who filled the Perceived Stress Scale (PSS) questionnaire and recorded their EEG in closed eye condition by using a commercially available single channel EEG headset placed at frontal site. On the regression analysis of beta waves extracted from recorded EEG, it has been observed that low beta waves can predict PSS scores with a confidence level of 94%. Consequently, when low beta wave is used as a feature with the Naive Bayes algorithm for classification of stress level, it not only reduces the computational cost by 7 folds but also improves the accuracy to 71.4%.
Jaekeun YUN Daehee KIM Sunshin AN
Since the sensor nodes are subject to faults due to the highly-constrained resources and hostile deployment environments, fault management in wireless sensor networks (WSNs) is essential to guarantee the proper operation of networks, especially routing. In contrast to existing fault management methods which mainly aim to be tolerant to faults without considering the fault type, we propose a novel efficient fault-aware routing method where faults are classified and dealt with accordingly. More specifically, we first identify each fault and then try to set up the new routing path according to the fault type. Our proposed method can be easily integrated with any kind of existing routing method. We show that our proposed method outperforms AODV, REAR, and GPSR, which are the representative works of single-path routing, multipath routing and location based routing, in terms of energy efficiency and data delivery ratio.
Seongkyu MUN Minkyu SHIN Suwon SHON Wooil KIM David K. HAN Hanseok KO
Recent acoustic event classification research has focused on training suitable filters to represent acoustic events. However, due to limited availability of target event databases and linearity of conventional filters, there is still room for improving performance. By exploiting the non-linear modeling of deep neural networks (DNNs) and their ability to learn beyond pre-trained environments, this letter proposes a DNN-based feature extraction scheme for the classification of acoustic events. The effectiveness and robustness to noise of the proposed method are demonstrated using a database of indoor surveillance environments.
Hirobumi SAITO Prilando Rizki AKBAR Hiromi WATANABE Vinay RAVINDRA Jiro HIROKAWA Kenji URA Pyne BUDHADITYA
We proposed a new architecture of antenna, transmitter and receiver feeding configuration for small synthetic aperture radar (SAR) that is compatible with 100kg class satellite. Promising applications are constellations of earth observations together with optical sensors, and responsive, disaster monitoring missions. The SAR antenna is a deployable, passive, honeycomb panel antenna with slot array that can be stowed compactly. RF (radio frequency) instruments are in a satellite body and RF signal is fed to a deployable antenna through non-contacting choke flanges at deployable hinges. This paper describes its development strategy and the present development status of the small spaceborne SAR based on this architecture.
An enormous number of malware samples pose a major threat to our networked society. Antivirus software and intrusion detection systems are widely implemented on the hosts and networks as fundamental countermeasures. However, they may fail to detect evasive malware. Thus, setting a high priority for new varieties of malware is necessary to conduct in-depth analyses and take preventive measures. In this paper, we present a traffic model for malware that can classify network behaviors of malware and identify new varieties of malware. Our model comprises malware-specific features and general traffic features that are extracted from packet traces obtained from a dynamic analysis of the malware. We apply a clustering analysis to generate a classifier and evaluate our proposed model using large-scale live malware samples. The results of our experiment demonstrate the effectiveness of our model in finding new varieties of malware.
Hao GE Feng YANG Xiaoguang TU Mei XIE Zheng MA
Recently, numerous methods have been proposed to tackle the problem of fine-grained image classification. However, rare of them focus on the pre-processing step of image alignment. In this paper, we propose a new pre-processing method with the aim of reducing the variance of objects among the same class. As a result, the variance of objects between different classes will be more significant. The proposed approach consists of four procedures. The “parts” of the objects are firstly located. After that, the rotation angle and the bounding box could be obtained based on the spatial relationship of the “parts”. Finally, all the images are resized to similar sizes. The objects in the images possess the properties of translation, scale and rotation invariance after processed by the proposed method. Experiments on the CUB-200-2011 and CUB-200-2010 datasets have demonstrated that the proposed method could boost the recognition performance by serving as a pre-processing step of several popular classification algorithms.
Bayu Adhi TAMA Kyung-Hyune RHEE
Anomaly detection is one approach in intrusion detection systems (IDSs) which aims at capturing any deviation from the profiles of normal network activities. However, it suffers from high false alarm rate since it has impediment to distinguish the boundaries between normal and attack profiles. In this paper, we propose an effective anomaly detection approach by hybridizing three techniques, i.e. particle swarm optimization (PSO), ant colony optimization (ACO), and genetic algorithm (GA) for feature selection and ensemble of four tree-based classifiers, i.e. random forest (RF), naive bayes tree (NBT), logistic model trees (LMT), and reduces error pruning tree (REPT) for classification. Proposed approach is implemented on NSL-KDD dataset and from the experimental result, it significantly outperforms the existing methods in terms of accuracy and false alarm rate.
This paper investigates the effect of noises added to hidden units of AutoEncoders linked to multilayer perceptrons. It is shown that internal representation of learned features emerges and sparsity of hidden units increases when independent Gaussian noises are added to inputs of hidden units during the deep network training. It is also shown that the weights that connect the contaminated hidden units with the next layer have smaller values and outputs of hidden units tend to be more definite (0 or 1). This is expected to improve the generalization ability of the network through this automatic structuration by adding the noises. This network structuration was confirmed by experiments for MNIST digits classification via a deep neural network model.
Tadashi MATSUO Nobutaka SHIMADA
Appearance-based generic object recognition is a challenging problem because all possible appearances of objects cannot be registered, especially as new objects are produced every day. Function of objects, however, has a comparatively small number of prototypes. Therefore, function-based classification of new objects could be a valuable tool for generic object recognition. Object functions are closely related to hand-object interactions during handling of a functional object; i.e., how the hand approaches the object, which parts of the object and contact the hand, and the shape of the hand during interaction. Hand-object interactions are helpful for modeling object functions. However, it is difficult to assign discrete labels to interactions because an object shape and grasping hand-postures intrinsically have continuous variations. To describe these interactions, we propose the interaction descriptor space which is acquired from unlabeled appearances of human hand-object interactions. By using interaction descriptors, we can numerically describe the relation between an object's appearance and its possible interaction with the hand. The model infers the quantitative state of the interaction from the object image alone. It also identifies the parts of objects designed for hand interactions such as grips and handles. We demonstrate that the proposed method can unsupervisedly generate interaction descriptors that make clusters corresponding to interaction types. And also we demonstrate that the model can infer possible hand-object interactions.
Haibo YIN Jun-an YANG Wei WANG Hui LIU
Transfer boosting, a branch of instance-based transfer learning, is a commonly adopted transfer learning method. However, currently popular transfer boosting methods focus on binary classification problems even though there are many multi-classification tasks in practice. In this paper, we developed a new algorithm called MultiTransferBoost on the basis of TransferBoost for multi-classification. MultiTransferBoost firstly separated the multi-classification problem into several orthogonal binary classification problems. During each iteration, MultiTransferBoost boosted weighted instances from different source domains while each instance's weight was assigned and updated by evaluating the difficulty of the instance being correctly classified and the “transferability” of the instance's corresponding source domain to the target. The updating process repeated until it reached the predefined training error or iteration number. The weight update factors, which were analyzed and adjusted to minimize the Hamming loss of the output coding, strengthened the connections among the sub binary problems during each iteration. Experimental results demonstrated that MultiTransferBoost had better classification performance and less computational burden than existing instance-based algorithms using the One-Against-One (OAO) strategy.
Tomoaki YAMADA Chihiro MATSUI Ken TAKEUCHI
In order to realize solid-state drives (SSDs) with high performance, low energy consumption and high reliability, storage class memory (SCM)/multi-level cell (MLC) NAND flash hybrid SSD has been proposed. Algorithm of the hybrid SSD should be designed according to SCM specifications and workload characteristics. In this paper, SCMs are used as non-volatile cache. Cache operation guidelines and optimal SCM specifications for the hybrid SSD are provided for various workload characteristics. Three kinds of non-volatile cache operation for the hybrid SSD are discussed: i) write cache, ii) read-write cache without space control (RW cache) and iii) read-write cache with space control (RW cache w/ SC). SSD workloads are categorized into eight according to read/write ratio, access frequency and access data size. From evaluation result, the write cache algorithm is suitable for write-intensive workloads and read-cold-sequential workloads, while the RW cache algorithm is suitable for read-cold-random workloads to achieve the highest performance of the hybrid SSD. In contrast, as for read-hot-random workloads, write cache is appropriate when the SCM capacity is less than 3% of the NAND flash capacity. On the other hand, RW cache should be used in case that SCM capacity is more than 5% of NAND flash capacity. The effect of Memory-type SCM (M-SCM) and Storage-type SCM (S-SCM) on the hybrid SSD performance is also analyzed. The M-SCM latency is below 1 us (high speed) but the capacity is only 2% of the NAND flash capacity (small capacity). On the other hand, the S-SCM capacity is assumed to be 5% of the NAND flash capacity (large capacity) but S-SCM speed is larger than 1 us (low speed). If the additional SCM cost is limited to 20% of MLC NAND flash cost, up to 7-times and 8-times performance improvement are achieved in write-hot-random workload and read-hot-random workloads, respectively. Moreover, if the additional SCM cost is the same as MLC NAND flash cost, M-SCM/MLC NAND flash hybrid SSD achieves 24-times performance improvement.
The quality of codebook is very important in visual image classification. In order to boost the classification performance, a scheme of codebook generation for scene image recognition based on parallel key SIFT analysis (PKSA) is presented in this paper. The method iteratively applies classical k-means clustering algorithm and similarity analysis to evaluate key SIFT descriptors (KSDs) from the input images, and generates the codebook by a relaxed k-means algorithm according to the set of KSDs. With the purpose of evaluating the performance of the PKSA scheme, the image feature vector is calculated by sparse code with Spatial Pyramid Matching (ScSPM) after the codebook is constructed. The PKSA-based ScSPM method is tested and compared on three public scene image datasets. The experimental results show the proposed scheme of PKSA can significantly save computational time and enhance categorization rate.
Yuyin YU Lishan KE Zhiqiang LIN Qiuyan WANG
Permutation polynomials over Zpn are useful in the design of cryptographic algorithms. In this paper, we obtain an equivalent condition for polynomial functions over Zpn to be permutations, and this equivalent condition can help us to analysis the randomness of such functions. Our results provide a method to distinguish permutation polynomials from random functions. We also introduce how to improve the randomness of permutation polynomials over Zpn.
Shangqi ZHANG Haihong SHEN Chunlei HUO
Building detection from high resolution remote sensing images is challenging due to the high intraclass variability and the difficulty in describing buildings. To address the above difficulties, a novel approach is proposed based on the combination of shape-specific feature extraction and discriminative feature classification. Shape-specific feature can capture complex shapes and structures of buildings. Discriminative feature classification is effective in reflecting similarities among buildings and differences between buildings and backgrounds. Experiments demonstrate the effectiveness of the proposed approach.
Chang XU Yingguan WANG Yunlong ZHAN
This paper focus on the development of a single portable roadside magnetic sensor for vehicle classification. The magnetic sensor is a kind of anisotropic magnetic device that do not require to be embedded in the roadway-the device is placed next to the roadway and measure traffic in the immediately adjacent lane. A novel feature extraction and comparison approach is presented for vehicle classification with a single magnetic sensor, which is based on four different feature sets extracted from the detected magnetic signal. Furthermore, vehicle classification has been achieved with three common classification algorithms, including support vector machine, k-nearest neighbors and back-propagation neural network. Experimental results have demonstrated that the Peak-Peak feature set with back-propagation neural network approach performs much better than other approaches. Besides, the normalization technology has been proved it does work.