Chun-Jung WU Shin-Ying HUANG Katsunari YOSHIOKA Tsutomu MATSUMOTO
A drastic increase in cyberattacks targeting Internet of Things (IoT) devices using telnet protocols has been observed. IoT malware continues to evolve, and the diversity of OS and environments increases the difficulty of executing malware samples in an observation setting. To address this problem, we sought to develop an alternative means of investigation by using the telnet logs of IoT honeypots and analyzing malware without executing it. In this paper, we present a malware classification method based on malware binaries, command sequences, and meta-features. We employ both unsupervised or supervised learning algorithms and text-mining algorithms for handling unstructured data. Clustering analysis is applied for finding malware family members and revealing their inherent features for better explanation. First, the malware binaries are grouped using similarity analysis. Then, we extract key patterns of interaction behavior using an N-gram model. We also train a multiclass classifier to identify IoT malware categories based on common infection behavior. For misclassified subclasses, second-stage sub-training is performed using a file meta-feature. Our results demonstrate 96.70% accuracy, with high precision and recall. The clustering results reveal variant attack vectors and one denial of service (DoS) attack that used pure Linux commands.
Haitong YANG Guangyou ZHOU Tingting HE Maoxi LI
In this paper, we study domain adaptation of semantic role classification. Most systems utilize the supervised method for semantic role classification. But, these methods often suffer severe performance drops on out-of-domain test data. The reason for the performance drops is that there are giant feature differences between source and target domain. This paper proposes a framework called Adversarial Domain Adaption Network (ADAN) to relieve domain adaption of semantic role classification. The idea behind our method is that the proposed framework can derive domain-invariant features via adversarial learning and narrow down the gap between source and target feature space. To evaluate our method, we conduct experiments on English portion in the CoNLL 2009 shared task. Experimental results show that our method can largely reduce the performance drop on out-of-domain test data.
Xinbo REN Haiyuan WU Qian CHEN Toshiyuki IMAI Takashi KUBO Takashi AKASAKA
Clinical researches show that the morbidity of coronary artery disease (CAD) is gradually increasing in many countries every year, and it causes hundreds of thousands of people all over the world dying for each year. As the optical coherence tomography with high resolution and better contrast applied to the lesion tissue investigation of human vessel, many more micro-structures of the vessel could be easily and clearly visible to doctors, which help to improve the CAD treatment effect. Manual qualitative analysis and classification of vessel lesion tissue are time-consuming to doctors because a single-time intravascular optical coherence (IVOCT) data set of a patient usually contains hundreds of in-vivo vessel images. To overcome this problem, we focus on the investigation of the superficial layer of the lesion region and propose a model based on local multi-layer region for vessel lesion components (lipid, fibrous and calcified plaque) features characterization and extraction. At the pre-processing stage, we applied two novel automatic methods to remove the catheter and guide-wire respectively. Based on the detected lumen boundary, the multi-layer model in the proximity lumen boundary region (PLBR) was built. In the multi-layer model, features extracted from the A-line sub-region (ALSR) of each layer was employed to characterize the type of the tissue existing in the ALSR. We used 7 human datasets containing total 490 OCT images to assess our tissue classification method. Validation was obtained by comparing the manual assessment with the automatic results derived by our method. The proposed automatic tissue classification method achieved an average accuracy of 89.53%, 93.81% and 91.78% for fibrous, calcified and lipid plaque respectively.
Naranchimeg BOLD Chao ZHANG Takuya AKASHI
In recent decade, many state-of-the-art algorithms on image classification as well as audio classification have achieved noticeable successes with the development of deep convolutional neural network (CNN). However, most of the works only exploit single type of training data. In this paper, we present a study on classifying bird species by exploiting the combination of both visual (images) and audio (sounds) data using CNN, which has been sparsely treated so far. Specifically, we propose CNN-based multimodal learning models in three types of fusion strategies (early, middle, late) to settle the issues of combining training data cross domains. The advantage of our proposed method lies on the fact that we can utilize CNN not only to extract features from image and audio data (spectrogram) but also to combine the features across modalities. In the experiment, we train and evaluate the network structure on a comprehensive CUB-200-2011 standard data set combing our originally collected audio data set with respect to the data species. We observe that a model which utilizes the combination of both data outperforms models trained with only an either type of data. We also show that transfer learning can significantly increase the classification performance.
Tao BAN Ryoichi ISAWA Shin-Ying HUANG Katsunari YOSHIOKA Daisuke INOUE
Along with the proliferation of IoT (Internet of Things) devices, cyberattacks towards them are on the rise. In this paper, aiming at efficient precaution and mitigation of emerging IoT cyberthreats, we present a multimodal study on applying machine learning methods to characterize malicious programs which target multiple IoT platforms. Experiments show that opcode sequences obtained from static analysis and API sequences obtained by dynamic analysis provide sufficient discriminant information such that IoT malware can be classified with near optimal accuracy. Automated and accelerated identification and mitigation of new IoT cyberthreats can be enabled based on the findings reported in this study.
We propose a class-incremental learning framework for human activity recognition based on the Bag-of-Sequencelets model (BoS). The framework updates learned models efficiently without having to relearn them when training data of new classes are added. In this framework, all types of features including hand-crafted features and Convolutional Neural Networks (CNNs) based features and combinations of those features can be used as features for videos. Compared with the original BoS, the new framework can reduce the learning time greatly with little loss of classification accuracy.
Takashi HARADA Yuki ISHIKAWA Ken TANAKA Kenji MIKAWA
The packet classification problem to determine the behavior of incoming packets at the network devices. The processing latency of packet classification by linear search is proportional to the number of classification rules. To limit the latency caused by classification to a certain level, we should develop a classification algorithm that classifies packets in a time independent of the number of classification rules. Arbitrary (including noncontiguous) bitmask rules are efficiently expressive for controlling higher layer communication, achiving access control lists, Quality of Service and so on. In this paper, we propose a classification algorithm based on run-based trie [1] according to arbitrary bitmask rules. The space complexity of proposed algorithm is in linear in the size of a rule list. The time complexity except for construction of that can be regarded as constant which is independent the number of rules. Experimental results using a packet classification algorithm benchmark [2] show that our method classifies packets in constant time independent of the number of rules.
Asera WAYNE ASERA Masayoshi ARITSUGI
In this research, we propose a novel method to determine fingerprint liveness to improve the discriminative behavior and classification accuracy of the combined features. This approach detects if a fingerprint is from a live or fake source. In this approach, fingerprint images are analyzed in the differential excitation (DE) component and the centralized binary pattern (CBP) component, which yield the DE image and CBP image, respectively. The images obtained are used to generate a two-dimensional histogram that is subsequently used as a feature vector. To decide if a fingerprint image is from a live or fake source, the feature vector is processed using support vector machine (SVM) classifiers. To evaluate the performance of the proposed method and compare it to existing approaches, we conducted experiments using the datasets from the 2011 and 2015 Liveness Detection Competition (LivDet), collected from four sensors. The results show that the proposed method gave comparable or even better results and further prove that methods derived from combination of features provide a better performance than existing methods.
Tsuneo KATO Atsushi NAGAI Naoki NODA Jianming WU Seiichi YAMAMOTO
Data-driven untying of a recursive autoencoder (RAE) is proposed for utterance intent classification for spoken dialogue systems. Although an RAE expresses a nonlinear operation on two neighboring child nodes in a parse tree in the application of spoken language understanding (SLU) of spoken dialogue systems, the nonlinear operation is considered to be intrinsically different depending on the types of child nodes. To reduce the gap between the single nonlinear operation of an RAE and intrinsically different operations depending on the node types, a data-driven untying of autoencoders using part-of-speech (PoS) tags at leaf nodes is proposed. When using the proposed method, the experimental results on two corpora: ATIS English data set and Japanese data set of a smartphone-based spoken dialogue system showed improved accuracies compared to when using the tied RAE, as well as a reasonable difference in untying between two languages.
Sukhumarn ARCHASANTISUK Takahiro AOYAGI
Communication reliability and energy efficiency are important issues that have to be carefully considered in WBAN design. Due to the large path loss variation of the WBAN channel, transmission power control, which adaptively adjusts the radio transmit power to suit the channel condition, is considered in this paper. Human motion is one of the dominant factors that affect the channel characteristics in WBAN. Therefore, this paper introduces motion-aware temporal correlation model-based transmission power control that combines human motion classification and transmission power control to provide an effective approach to realizing reliable and energy-efficient WBAN communication. The human motion classification adopted in this study uses only the received signal strength to identify the human motion; no additional tool is required. The knowledge of human motion is then used to accurately estimate the channel condition and suitably select the transmit power. A performance evaluation shows that the proposed method works well both in the low and high WBAN network loads. Compared to using the fixed Tx power of -5dBm, the proposed method had similar packet loss rate but 20-28 and 27-33 percent lower average energy consumption for the low network traffic and high network traffic cases, respectively.
Abu Hena Al MUKTADIR Takaya MIYAZAWA Pedro MARTINEZ-JULIA Hiroaki HARAI Ved P. KAFLE
In this paper, we propose a method for automatic virtual resource allocation by using a multi-target classification-based scheme (MTCAS). In our method, an Infrastructure Provider (InP) bundles its CPU, memory, storage, and bandwidth resources as Network Elements (NEs) and categorizes them into several types in accordance to their function, capabilities, location, energy consumption, price, etc. MTCAS is used by the InP to optimally allocate a set of NEs to a Virtual Network Operator (VNO). Such NEs will be subject to some constraints, such as the avoidance of resource over-allocation and the satisfaction of multiple Quality of Service (QoS) metrics. In order to achieve a comparable or higher prediction accuracy by using less training time than the available ensemble-based multi-target classification (MTC) algorithms, we propose a majority-voting based ensemble algorithm (MVEN) for MTCAS. We numerically evaluate the performance of MTCAS by using the MVEN and available MTC algorithms with synthetic training datasets. The results indicate that the MVEN algorithm requires 70% less training time but achieves the same accuracy as the related ensemble based MTC algorithms. The results also demonstrate that increasing the amount of training data increases the efficacy ofMTCAS, thus reducing CPU and memory allocation by about 33% and 51%, respectively.
Eeva-Sofia HAUKIPURO Ville KOLEHMAINEN Janne MYLLÄRINEN Sebastian REMANDER Janne SALO Tuomas TAKKO Le Ngu NGUYEN Stephan SIGG Rainhard Dieter FINDLING
Biometric authentication, namely using biometric features for authentication is gaining popularity in recent years as further modalities, such as fingerprint, iris, face, voice, gait, and others are exploited. We explore the effectiveness of three simple Electroencephalography (EEG) related biometric authentication tasks, namely resting, thinking about a picture, and moving a single finger. We present details of the data processing steps we exploit for authentication, including extracting features from the frequency power spectrum and MFCC, and training a multilayer perceptron classifier for authentication. For evaluation purposes, we record an EEG dataset of 27 test subjects. We use three setups, baseline, task-agnostic, and task-specific, to investigate whether person-specific features can be detected across different tasks for authentication. We further evaluate, whether different tasks can be distinguished. Our results suggest that tasks are distinguishable, as well as that our authentication approach can work both exploiting features from a specific, fixed, task as well as using features across different tasks.
As the data size of Web-related multi-label classification problems continues to increase, the label space has also grown extremely large. For example, the number of labels appearing in Web page tagging and E-commerce recommendation tasks reaches hundreds of thousands or even millions. In this paper, we propose a graph partitioning tree (GPT), which is a novel approach for extreme multi-label learning. At an internal node of the tree, the GPT learns a linear separator to partition a feature space, considering approximate k-nearest neighbor graph of the label vectors. We also developed a simple sequential optimization procedure for learning the linear binary classifiers. Extensive experiments on large-scale real-world data sets showed that our method achieves better prediction accuracy than state-of-the-art tree-based methods, while maintaining fast prediction.
Song BIAN Masayuki HIROMOTO Takashi SATO
In this work, we provide the first practical secure email filtering scheme based on homomorphic encryption. Specifically, we construct a secure naïve Bayesian filter (SNBF) using the Paillier scheme, a partially homomorphic encryption (PHE) scheme. We first show that SNBF can be implemented with only the additive homomorphism, thus eliminating the need to employ expensive fully homomorphic schemes. In addition, the design space for specialized hardware architecture realizing SNBF is explored. We utilize a recursive Karatsuba Montgomery structure to accelerate the homomorphic operations, where multiplication of 2048-bit integers are carried out. Through the experiment, both software and hardware versions of the SNBF are implemented. On software, 104-105x runtime and 103x storage reduction are achieved by SNBF, when compared to existing fully homomorphic approaches. By instantiating the designed hardware for SNBF, a further 33x runtime and 1919x power reduction are achieved. The proposed hardware implementation classifies an average-length email in under 0.5s, which is much more practical than existing solutions.
Extreme multi-label classification methods have been widely used in Web-scale classification tasks such as Web page tagging and product recommendation. In this paper, we present a novel graph embedding method called “AnnexML”. At the training step, AnnexML constructs a k-nearest neighbor graph of label vectors and attempts to reproduce the graph structure in the embedding space. The prediction is efficiently performed by using an approximate nearest neighbor search method that efficiently explores the learned k-nearest neighbor graph in the embedding space. We conducted evaluations on several large-scale real-world data sets and compared our method with recent state-of-the-art methods. Experimental results show that our AnnexML can significantly improve prediction accuracy, especially on data sets that have a larger label space. In addition, AnnexML improves the trade-off between prediction time and accuracy. At the same level of accuracy, the prediction time of AnnexML was up to 58 times faster than that of SLEEC, a state-of-the-art embedding-based method.
In this letter, we propose a static wear leveling technique, called Recency-based Wear Leveling (RbWL). The basic idea of RbWL is to execute static wear leveling at minimum levels, because the frequent migrations of cold data by static wear leveling cause significant overhead in a NAND flash memory system. RbWL adjusts the execution frequency according to a threshold value that reflects the lifetime difference of the hot/cold blocks and the total lifetime of the NAND flash memory system. The evaluation results show that RbWL improves the lifetime of NAND flash memory systems by 52%, and it also reduces the overhead of wear leveling from 8% to 42% and from 13% to 51%, in terms of the number of erase operations and the number of page migrations of valid pages, respectively, compared with other algorithms.
Mayu OTANI Atsushi NISHIDA Yuta NAKASHIMA Tomokazu SATO Naokazu YOKOYA
Finding important regions is essential for applications, such as content-aware video compression and video retargeting to automatically crop a region in a video for small screens. Since people are one of main subjects when taking a video, some methods for finding important regions use a visual attention model based on face/pedestrian detection to incorporate the knowledge that people are important. However, such methods usually do not distinguish important people from passers-by and bystanders, which results in false positives. In this paper, we propose a deep neural network (DNN)-based method, which classifies a person into important or unimportant, given a video containing multiple people in a single frame and captured with a hand-held camera. Intuitively, important/unimportant labels are highly correlated given that corresponding people's spatial motions are similar. Based on this assumption, we propose to boost the performance of our important/unimportant classification by using conditional random fields (CRFs) built upon the DNN, which can be trained in an end-to-end manner. Our experimental results show that our method successfully classifies important people and the use of a DNN with CRFs improves the accuracy.
The research on inertial sensor based human action detection and recognition (HADR) is a new area in machine learning. We propose a novel time sequence based interval convolutional neutral networks framework for HADR by combining interesting interval proposals generator and interval-based classifier. Experiments demonstrate the good performance of our method.
Yuehua WANG Zhinong ZHONG Anran YANG Ning JING
Review rating prediction is an important problem in machine learning and data mining areas and has attracted much attention in recent years. Most existing methods for review rating prediction on Location-Based Social Networks only capture the semantics of texts, but ignore user information (social links, geolocations, etc.), which makes them less personalized and brings down the prediction accuracy. For example, a user's visit to a venue may be influenced by their friends' suggestions or the travel distance to the venue. To address this problem, we develop a review rating prediction framework named TSG by utilizing users' review Text, Social links and the Geolocation information with machine learning techniques. Experimental results demonstrate the effectiveness of the framework.
MathML is a standard markup language for describing math expressions. MathML consists of two sets of elements: Presentation Markup and Content Markup. The former is widely used to display math expressions in Web pages, while the latter is more suited to the calculation of math expressions. In this letter, we focus on the former and consider classifying Presentation MathML expressions. Identifying the classes of given Presentation MathML expressions is helpful for several applications, e.g., Presentation to Content MathML conversion, text-to-speech, and so on. We propose a method for classifying Presentation MathML expressions by using multilayer perceptron. Experimental results show that our method classifies MathML expressions with high accuracy.