Zhisheng HUO Limin XIAO Zhenxue HE Xiaoling RONG Bing WEI
Previous works have studied the throughput allocation of the heterogeneous storage system consisting of SSD and HDD in the dynamic setting where users are not all present in the system simultaneously, but those researches make multiple servers as one large resource pool, and cannot cope with the multi-server environment. We design a dynamic throughput allocation mechanism named DAM, which can handle the throughput allocation of multiple heterogeneous servers in the dynamic setting, and can provide a number of desirable properties. The experimental results show that DAM can make one dynamic throughput allocation of multiple servers for making sure users' local allocations in each server, and can provide one efficient and fair throughput allocation in the whole system.
Xinyu HE Lishuang LI Xingchen SONG Degen HUANG Fuji REN
Biomedical event extraction is an important and challenging task in Information Extraction, which plays a key role for medicine research and disease prevention. Most of the existing event detection methods are based on shallow machine learning methods which mainly rely on domain knowledge and elaborately designed features. Another challenge is that some crucial information as well as the interactions among words or arguments may be ignored since most works treat words and sentences equally. Therefore, we employ a Bidirectional Long Short Term Memory (BLSTM) neural network for event extraction, which can skip handcrafted complex feature extraction. Furthermore, we propose a multi-level attention mechanism, including word level attention which determines the importance of words in a sentence, and the sentence level attention which determines the importance of relevant arguments. Finally, we train dependency word embeddings and add sentence vectors to enrich semantic information. The experimental results show that our model achieves an F-score of 59.61% on the commonly used dataset (MLEE) of biomedical event extraction, which outperforms other state-of-the-art methods.
Shunta NAKAGAWA Tatsuya NAGAI Hideaki KANEHARA Keisuke FURUMOTO Makoto TAKITA Yoshiaki SHIRAISHI Takeshi TAKAHASHI Masami MOHRI Yasuhiro TAKANO Masakatu MORII
System administrators and security officials of an organization need to deal with vulnerable IT assets, especially those with severe vulnerabilities, to minimize the risk of these vulnerabilities being exploited. The Common Vulnerability Scoring System (CVSS) can be used as a means to calculate the severity score of vulnerabilities, but it currently requires human operators to choose input values. A word-level Convolutional Neural Network (CNN) has been proposed to estimate the input parameters of CVSS and derive the severity score of vulnerability notes, but its accuracy needs to be improved further. In this paper, we propose a character-level CNN for estimating the severity scores. Experiments show that the proposed scheme outperforms conventional one in terms of accuracy and how errors occur.
Jung-Been LEE Taek LEE Hoh Peter IN
Mining software artifacts is a useful way to understand the source code of software projects. Topic modeling in particular has been widely used to discover meaningful information from software artifacts. However, software artifacts are unstructured and contain a mix of textual types within the natural text. These software artifact characteristics worsen the performance of topic modeling. Among several natural language pre-processing tasks, removing stop words to reduce meaningless and uninteresting terms is an efficient way to improve the quality of topic models. Although many approaches are used to generate effective stop words, the lists are outdated or too general to apply to mining software artifacts. In addition, the performance of the topic model is sensitive to the datasets used in the training for each approach. To resolve these problems, we propose an automatic stop word generation approach for topic models of software artifacts. By measuring topic coherence among words in the topic using Pointwise Mutual Information (PMI), we added words with a low PMI score to our stop words list for every topic modeling loop. Through our experiment, we proved that our stop words list results in a higher performance of the topic model than lists from other approaches.
The Gabidulin-based locally repairable code (LRC) construction by Silberstein et al. is an important example of distance optimal (r,δ)-LRCs. Its distance optimality has been further shown to cover the case of multiple (r,δ)-locality, where the (r,δ)-locality constraints are different among different symbols. However, the optimality only holds under the ordered (r,δ) condition, where the parameters of the multiple (r,δ)-locality satisfy a specific ordering condition. In this letter, we show that Gabidulin-based LRCs are still distance optimal even without the ordered (r,δ) condition.
A passively mobile system is an abstract notion of mobile ad-hoc networks. It is a collection of agents with computing devices. Agents move in a region, but the algorithm cannot control their physical behavior (i.e., how they move). The population protocol model is one of the promising models in which the computation proceeds by the pairwise communication between two agents. The communicating agents update their states by a specified transition function (algorithm). In this paper, we consider a general form of the aggregation problem with a base station. The base station is a special agent having the computational power more powerful than others. In the aggregation problem, the base station has to sum up for inputs distributed to other agents. We propose an algorithm that solves the aggregation problem in sub-linear parallel time using a relatively small number of states per agent. More precisely, our algorithm solves the aggregation problem with input domain X in O(√n log2 n) parallel time and O(|X|2) states per agent (except for the base station) with high probability.
Daiki MIYAHARA Tatsuya SASAKI Takaaki MIZUKI Hideaki SONE
Kakuro is a popular logic puzzle, in which a player fills in all empty squares with digits from 1 to 9 so that the sum of digits in each (horizontal or vertical) line is equal to a given number, called a clue, and digits in each line are all different. In 2016, Bultel, Dreier, Dumas, and Lafourcade proposed a physical zero-knowledge proof protocol for Kakuro using a deck of cards; their proposed protocol enables a prover to convince a verifier that the prover knows the solution of a Kakuro puzzle without revealing any information about the solution. One possible drawback of their protocol would be that the protocol is not perfectly extractable, implying that a prover who does not know the solution can convince a verifier with a small probability; therefore, one has to repeat the protocol to make such an error become negligible. In this paper, to overcome this, we design zero-knowledge proof protocols for Kakuro having perfect extractability property. Our improvement relies on the ideas behind the copy protocols in the field of card-based cryptography. By executing our protocols with a real deck of physical playing cards, humans can practically perform an efficient zero-knowledge proof of knowledge for Kakuro.
We propose a class-incremental learning framework for human activity recognition based on the Bag-of-Sequencelets model (BoS). The framework updates learned models efficiently without having to relearn them when training data of new classes are added. In this framework, all types of features including hand-crafted features and Convolutional Neural Networks (CNNs) based features and combinations of those features can be used as features for videos. Compared with the original BoS, the new framework can reduce the learning time greatly with little loss of classification accuracy.
Ryo ASHIDA Sebastian KUHNERT Osamu WATANABE
Miller [9] proposed a linear-time algorithm for computing small separators for 2-connected planar graphs. We explain his algorithm and present a way to modify it to a space efficient version. Our algorithm can be regarded as a log-space reduction from the separator construction to the breadth first search tree construction.
Duhu MAN Mark W. JONES Danrong LI Honglong ZHANG Zhan SONG
The consistent alignment of point clouds obtained from multiple scanning positions is a crucial step for many 3D modeling systems. This is especially true for environment modeling. In order to observe the full scene, a common approach is to rotate the scanning device around a rotation axis using a turntable. The final alignment of each frame data can be computed from the position and orientation of the rotation axis. However, in practice, the precise mounting of scanning devices is impossible. It is hard to locate the vertical support of the turntable and rotation axis on a common line, particularly for lower cost consumer hardware. Therefore the calibration of the rotation axis of the turntable is an important step for the 3D reconstruction. In this paper we propose a novel calibration method for the rotation axis of the turntable. With the proposed rotation axis calibration method, multiple 3D profiles of the target scene can be aligned precisely. In the experiments, three different evaluation approaches are used to evaluate the calibration accuracy of the rotation axis. The experimental results show that the proposed rotation axis calibration method can achieve a high accuracy.
Yuto KITAGAWA Tasuku ISHIGOOKA Takuya AZUMI
This paper proposes an anomaly prediction method based on k-means clustering that assumes embedded devices with memory constraints. With this method, by checking control system behavior in detail using k-means clustering, it is possible to predict anomalies. However, continuing clustering is difficult because data accumulate in memory similar to existing k-means clustering method, which is problematic for embedded devices with low memory capacity. Therefore, we also propose k-means clustering to continue clustering for infinite stream data. The proposed k-means clustering method is based on online k-means clustering of sequential processing. The proposed k-means clustering method only stores data required for anomaly prediction and releases other data from memory. Due to these characteristics, the proposed k-means clustering realizes that anomaly prediction is performed by reducing memory consumption. Experiments were performed with actual data of control system for anomaly prediction. Experimental results show that the proposed anomaly prediction method can predict anomaly, and the proposed k-means clustering can predict anomalies similar to standard k-means clustering while reducing memory consumption. Moreover, the proposed k-means clustering demonstrates better results of anomaly prediction than existing online k-means clustering.
Tatsuya NAGAI Masaki KAMIZONO Yoshiaki SHIRAISHI Kelin XIA Masami MOHRI Yasuhiro TAKANO Masakatu MORII
Epidemic cyber incidents are caused by malicious websites using exploit kits. The exploit kit facilitate attackers to perform the drive-by download (DBD) attack. However, it is reported that malicious websites using an exploit kit have similarity in their website structure (WS)-trees. Hence, malicious website identification techniques leveraging WS-trees have been studied, where the WS-trees can be estimated from HTTP traffic data. Nevertheless, the defensive component of the exploit kit prevents us from capturing the WS-tree perfectly. This paper shows, hence, a new WS-tree construction procedure by using the fact that a DBD attack happens in a certain duration. This paper proposes, moreover, a new malicious website identification technique by clustering the WS-tree of the exploit kits. Experiment results assuming the D3M dataset verify that the proposed technique identifies exploit kits with a reasonable accuracy even when HTTP traffic from the malicious sites are partially lost.
Weijun LU Chao GENG Dunshan YU
Forecasting commodity futures price is a challenging task. We present an algorithm to predict the trend of commodity futures price based on a type of structuring data and back propagation neural network. The random volatility of futures can be filtered out in the structuring data. Moreover, it is not restricted by the type of futures contract. Experiments show the algorithm can achieve 80% accuracy in predicting price trends.
Naho ITO Most Shelina AKTAR Yuukou HORITA
In order to evaluate the vehicle detection method, it is necessary to know the correct vehicle position considered as “ground truth”. We propose indices considering subjective evaluation in vehicle detection utilizing IoU. Subjective evaluation experiments were carried out with respect to misregistration from ground truth in vehicle detection.
Yuki HAYAMI Daiki TAKASU Hisakazu AOYANAGI Hiroaki TAKAMATSU Yoshifumi SHIMODAIRA Gosuke OHASHI
The human visual system exhibits a characteristic known as the Helmholtz-Kohlrausch (H-K) effect: even if the hue and the lightness retain the same values, the actual lightness (perceived lightness) changes with changes in the color saturation. Quantification of this effect is expected to be useful for the future development and evaluation of high-quality displays. We have been studying the H-K effect in natural images projected by LED projectors, which play important roles in practical uses. To verify the effectiveness of the determinations of the H-K effect for natural images, we have performed a subjective-evaluation experiment by method of adjustment for natural images and compared the experimental values with values calculated from extended form of Nayatani's equation to apply to natural images. In general, we found a high correlation between the two, although there was a low correlation for some images. Therefore, we obtained a correction function derived from the subjective evaluation experiment value of 108 color (hue: 12 × saturation: 3 × lightness: 3) patterns and have applied it to estimate the equation H-K effect.
A fast cross-validation algorithm for model selection in kernel ridge regression problems is proposed, which is aiming to further reduce the computational cost of the algorithm proposed by An et al. by eigenvalue decomposition of a Gram matrix.
Jan LEWANDOWSKY Gerhard BAUCH Matthias TSCHAUNER Peter OPPERMANN
Receiver implementations with very low quantization resolution will play an important role in 5G, as high precision quantization and signal processing are costly in terms of computational resources and chip area. Therefore, low resolution receivers with quasi optimum performance will be required to meet complexity and latency constraints. The Information Bottleneck method allows for a novel, information centric approach to design such receivers. The method was originally introduced by Naftali Tishby et al. and mostly used in the machine learning field so far. Interestingly, it can also be applied to build surprisingly good digital communication receivers which work fundamentally different than state-of-the-art receivers. Instead of minimizing the quantization error, receiver components with maximum preservation of relevant information for a given bit width can be designed. All signal processing in the resulting receivers is performed using only simple lookup operations. In this paper, we first provide a brief introduction to the design of receiver components with the Information Bottleneck method. We keep referring to decoding of low-density parity-check codes as a practical example. The focus of the paper lies on practical decoder implementations on a digital signal processor which illustrate the potential of the proposed technique. An Information Bottleneck decoder with 4bit message passing decoding is found to outperform 8bit implementations of the well-known min-sum decoder in terms of bit error rate and to perform extremely close to an 8bit belief propagation decoder, while offering considerably higher net decoding throughput than both conventional decoders.
Conventional approaches to statistical parametric speech synthesis use context-dependent hidden Markov models (HMMs) clustered using decision trees to generate speech parameters from linguistic features. However, decision trees are not always appropriate to model complex context dependencies of linguistic features efficiently. An alternative scheme that replaces decision trees with deep neural networks (DNNs) was presented as a possible way to overcome the difficulty. By training the network to represent high-dimensional feedforward dependencies from linguistic features to acoustic features, DNN-based speech synthesis systems convert a text into a speech. To improved the naturalness of the synthesized speech, this paper presents a novel pre-training method for DNN-based statistical parametric speech synthesis systems. In our method, a deep relational model (DRM), which represents a joint probability of two visible variables, is applied to describe the joint distribution of acoustic and linguistic features. As with DNNs, a DRM consists several hidden layers and two visible layers. Although DNNs represent feedforward dependencies from one visible variables (inputs) to other visible variables (outputs), a DRM has an ability to represent the bidirectional dependencies between two visible variables. During the maximum-likelihood (ML) -based training, the model optimizes its parameters (connection weights between two adjacent layers, and biases) of a deep architecture considering the bidirectional conversion between 1) acoustic features given linguistic features, and 2) linguistic features given acoustic features generated from itself. Owing to considering whether the generated acoustic features are recognizable, our method can obtain reasonable parameters for speech synthesis. Experimental results in a speech synthesis task show that pre-trained DNN-based systems using our proposed method outperformed randomly-initialized DNN-based systems, especially when the amount of training data is limited. Additionally, speaker-dependent speech recognition experimental results also show that our method outperformed DNN-based systems, by setting the initial parameters of our method are the same as that in the synthesis experiments.
Takuya MIYASAKA Hiroshi SATO Masaharu TAKAHASHI
In recent years, MIMO technology which uses multiple antennas has been introduced to the mobile terminal to increase communication capacity per unit frequency. However, if MIMO antennas are put closely, a strong mutual coupling occurred. Moreover, CA which uses multiple frequencies is also utilized to improve communication speed. Therefore, reducing mutual coupling in multiple frequencies is required. In this paper, we propose a dual-band decoupling method by using a short stub and a branch element and confirmed that the proposed model performed decoupling, increased radiation efficiency.
Power line communication (PLC) networks play an important role in home networks and in next generation hybrid networks, which provide higher data rates (Gbps) and easier connectivity. The standard medium access control (MAC) protocol of PLC networks, IEEE 1901, uses a special carrier sense multiple access with collision avoidance (CSMA/CA) mechanism, in which the deferral counter technology is introduced to avoid unnecessary collisions. Although PLC networks have achieved great commercial success, MAC layer analysis for IEEE 1901 PLC networks received limited attention. Until now, a few studies used renewal theory and strong law of large number (SLLN) to analyze the MAC performance of IEEE 1901 protocol. These studies focus on saturated conditions and neglect the impacts of buffer size and traffic rate. Additionally, they are valid only for homogeneous traffic. Motivated by these limitations, we develop a unified and scalable analytical model for IEEE 1901 protocol in unsaturated conditions, which comprehensively considers the impacts of traffic rate, buffer size, and traffic types (homogeneous or heterogeneous traffic). In the modeling process, a multi-layer discrete Markov chain model is constructed to depict the basic working principle of IEEE 1901 protocol. The queueing process of the station buffer is captured by using Queueing theory. Furthermore, we present a detailed analysis for IEEE 1901 protocol under heterogeneous traffic conditions. Finally, we conduct extensive simulations to verify the analytical model and evaluate the MAC performance of IEEE 1901 protocol in PLC networks.