1-16hit |
Naoki FUKUSHI Daiki CHIBA Mitsuaki AKIYAMA Masato UCHIDA
In this paper, we propose a method to reduce the labeling cost while acquiring training data for a malicious domain name detection system using supervised machine learning. In the conventional systems, to train a classifier with high classification accuracy, large quantities of benign and malicious domain names need to be prepared as training data. In general, malicious domain names are observed less frequently than benign domain names. Therefore, it is difficult to acquire a large number of malicious domain names without a dedicated labeling method. We propose a method based on active learning that labels data around the decision boundary of classification, i.e., in the gray area, and we show that the classification accuracy can be improved by using approximately 1% of the training data used by the conventional systems. Another disadvantage of the conventional system is that if the classifier is trained with a small amount of training data, its generalization ability cannot be guaranteed. We propose a method based on ensemble learning that integrates multiple classifiers, and we show that the classification accuracy can be stabilized and improved. The combination of the two methods proposed here allows us to develop a new system for malicious domain name detection with high classification accuracy and generalization ability by labeling a small amount of training data.
Jing ZHANG Degen HUANG Kaiyu HUANG Zhuang LIU Fuji REN
Microblog data contains rich information of real-world events with great commercial values, so microblog-oriented natural language processing (NLP) tasks have grabbed considerable attention of researchers. However, the performance of microblog-oriented Chinese Word Segmentation (CWS) based on deep neural networks (DNNs) is still not satisfying. One critical reason is that the existing microblog-oriented training corpus is inadequate to train effective weight matrices for DNNs. In this paper, we propose a novel active learning method to extend the scale of the training corpus for DNNs. However, due to a large amount of partially overlapped sentences in the microblogs, it is difficult to select samples with high annotation values from raw microblogs during the active learning procedure. To select samples with higher annotation values, parameter λ is introduced to control the number of repeatedly selected samples. Meanwhile, various strategies are adopted to measure the overall annotation values of a sample during the active learning procedure. Experiments on the benchmark datasets of NLPCC 2015 show that our λ-active learning method outperforms the baseline system and the state-of-the-art method. Besides, the results also demonstrate that the performances of the DNNs trained on the extended corpus are significantly improved.
Ryosuke ONDA Yuki HIRAI Kay PENNY Bipin INDURKHYA Keiichi KANEKO
We developed a system called DELTA that supports the students' use of backward chaining (BC) to prove the congruence of two triangles. DELTA is designed as an interactive learning environment and supports the use of BC by providing hints and a function to automatically check the proofs inputted by the students. DELTA also has coloring, marking, and highlighting functions to support students' attempts to prove the congruence of two triangles. We evaluated the efficacy of DELTA with 36 students in the second grade of a junior high school in Japan. We found that (1) the mean number of problems, which the experimental group (EG) completely solved, was statistically higher than that of the control group on the post-test; (2) the EG effectively used the BC strategy to solve problems; and (3) the students' attempt to use both the forward chaining strategy and the BC strategy led to solving the problems completely.
Hiroko MURAKAMI Koichi SHINODA Sadaoki FURUI
We propose an active learning framework for speech recognition that reduces the amount of data required for acoustic modeling. This framework consists of two steps. We first obtain a phone-error distribution using an acoustic model estimated from transcribed speech data. Then, from a text corpus we select a sentence whose phone-occurrence distribution is close to the phone-error distribution and collect its speech data. We repeat this process to increase the amount of transcribed speech data. We applied this framework to speaker adaptation and acoustic model training. Our evaluation results showed that it significantly reduced the amount of transcribed data while maintaining the same level of accuracy.
Tsubasa KOBAYASHI Masashi SUGIYAMA
The objective of pool-based incremental active learning is to choose a sample to label from a pool of unlabeled samples in an incremental manner so that the generalization error is minimized. In this scenario, the generalization error often hits a minimum in the middle of the incremental active learning procedure and then it starts to increase. In this paper, we address the problem of early labeling stopping in probabilistic classification for minimizing the generalization error and the labeling cost. Among several possible strategies, we propose to stop labeling when the empirical class-posterior approximation error is maximized. Experiments on benchmark datasets demonstrate the usefulness of the proposed strategy.
An active learning method, called Two-stage Active learning algorithm (TAL), is developed for software defect prediction. Combining the clustering and support vector machine techniques, this method improves the performance of the predictor with less labeling effort. Experiments validate its effectiveness.
Yuzo HAMANAKA Koichi SHINODA Takuya TSUTAOKA Sadaoki FURUI Tadashi EMORI Takafumi KOSHINAKA
We propose a committee-based method of active learning for large vocabulary continuous speech recognition. Multiple recognizers are trained in this approach, and the recognition results obtained from these are used for selecting utterances. Those utterances whose recognition results differ the most among recognizers are selected and transcribed. Progressive alignment and voting entropy are used to measure the degree of disagreement among recognizers on the recognition result. Our method was evaluated by using 191-hour speech data in the Corpus of Spontaneous Japanese. It proved to be significantly better than random selection. It only required 63 h of data to achieve a word accuracy of 74%, while standard training (i.e., random selection) required 103 h of data. It also proved to be significantly better than conventional uncertainty sampling using word posterior probabilities.
Hidetoshi SHIMODAIRA Takafumi KANAMORI Masayoshi AOKI Kouta MINE
We propose multiscale bagging as a modification of the bagging procedure. In ordinary bagging, the bootstrap resampling is used for generating bootstrap samples. We replace it with the multiscale bootstrap algorithm. In multiscale bagging, the sample size m of bootstrap samples may be altered from the sample size n of learning dataset. For assessing the output of a classifier, we compute bootstrap probability of class label; the frequency of observing a specified class label in the outputs of classifiers learned from bootstrap samples. A scaling-law of bootstrap probability with respect to σ2=n/m has been developed in connection with the geometrical theory. We consider two different ways for using multiscale bagging of classifiers. The first usage is to construct a confidence set of class labels, instead of a single label. The second usage is to find inputs close to decision boundaries in the context of query by bagging for active learning. It turned out, interestingly, that an appropriate choice of m is m =-n, i.e., σ2=-1, for the first usage, and m =∞, i.e., σ2=0, for the second usage.
Kai-Yi CHIN Yen-Lin CHEN Jong-Shin CHEN Zeng-Wei HONG Jim-Min LIN
In our previous project, an XML-based authoring tool was provided for teachers to script multimedia teaching material with animated agents, and a stand-alone learning system was designed for students to display the material and interact with animated agents. We also provided evidence that the authoring tool and learning system in computer-assisted learning systems successfully enhances learning performance. The aim of this study is to continue the previous project, to develop a Web-based multimedia learning system that presents materials and an animated agent on a Web browser. The Web-based multimedia learning system can provide an opportunity for students to engage in independent learning or review of school course work. In order to demonstrate the efficiency of this learning system, it was applied to one elementary school. An experimental material, `Road Traffic Safety', was presented in two learning systems: a Web-based PowerPoint learning system and a Web-based multimedia learning system. The experiment was carried out in two classes that had a total of thirty-one 3rd-grade students. The results suggest that using our authoring tool in a Web-based learning system can improve learning, and in particular, enhance learners' problem-solving ability. Students with higher achievement on the post-test showed better comprehension in problem-solving questions. Furthermore, the feedback from the questionnaire surveys show students' learning interest can be fostered when an animated agent is integrated into multimedia teaching materials, and that students prefer to adopt the Web-based multimedia learning system for independent learning after school.
Kazuya UEKI Masashi SUGIYAMA Yasuyuki IHARA
We address the problem of perceived age estimation from face images, and propose a new semi-supervised approach involving two novel aspects. The first novelty is an efficient active learning strategy for reducing the cost of labeling face samples. Given a large number of unlabeled face samples, we reveal the cluster structure of the data and propose to label cluster-representative samples for covering as many clusters as possible. This simple sampling strategy allows us to boost the performance of a manifold-based semi-supervised learning method only with a relatively small number of labeled samples. The second contribution is to take the heterogeneous characteristics of human age perception into account. It is rare to misjudge the age of a 5-year-old child as 15 years old, but the age of a 35-year-old person is often misjudged as 45 years old. Thus, magnitude of the error is different depending on subjects' age. We carried out a large-scale questionnaire survey for quantifying human age perception characteristics, and propose to utilize the quantified characteristics in the framework of weighted regression. Consequently, our proposed method is expressed in the form of weighted least-squares with a manifold regularizer, which is scalable to massive datasets. Through real-world age estimation experiments, we demonstrate the usefulness of the proposed method.
This paper presents a new interactive learning method for spoken word acquisition through human-machine audio-visual interfaces. During the course of learning, the machine makes a decision about whether an orally input word is a word in the lexicon the machine has learned, using both speech and visual cues. Learning is carried out on-line, incrementally, based on a combination of active and unsupervised learning principles. If the machine judges with a high degree of confidence that its decision is correct, it learns the statistical models of the word and a corresponding image category as its meaning in an unsupervised way. Otherwise, it asks the user a question in an active way. The function used to estimate the degree of confidence is also learned adaptively on-line. Experimental results show that the combination of active and unsupervised learning principles enables the machine and the user to adapt to each other, which makes the learning process more efficient.
Masashi SUGIYAMA Hidemitsu OGAWA
In supervised learning, the selection of sample points and models is crucial for acquiring a higher level of the generalization capability. So far, the problems of active learning and model selection have been independently studied. If sample points and models are simultaneously optimized, then a higher level of the generalization capability is expected. We call this problem active learning with model selection. However, active learning with model selection can not be generally solved by simply combining existing active learning and model selection techniques because of the active learning/model selection dilemma: the model should be fixed for selecting sample points and conversely the sample points should be fixed for selecting models. In this paper, we show that the dilemma can be dissolved if there is a set of sample points that is optimal for all models in consideration. Based on this idea, we give a practical procedure for active learning with model selection in trigonometric polynomial models. The effectiveness of the proposed procedure is demonstrated through computer simulations.
Masashi SUGIYAMA Hidemitsu OGAWA
In this paper, we consider the problem of active learning, and give a necessary and sufficient condition of sample points for the optimal generalization capability. By utilizing the properties of pseudo orthogonal bases, we clarify the mechanism of achieving the optimal generalization capability. We also show that the condition does not only provide the optimal generalization capability but also reduces the computational complexity and memory required to calculate learning result functions. Based on the optimality condition, we give design methods of optimal sample points for trigonometric polynomial models. Finally, the effectiveness of the proposed active learning method is demonstrated through computer simulations.
Hiroyuki TAKIZAWA Taira NAKAJIMA Hiroaki KOBAYASHI Tadao NAKAMURA
A multilayer perceptron is usually considered a passive learner that only receives given training data. However, if a multilayer perceptron actively gathers training data that resolve its uncertainty about a problem being learnt, sufficiently accurate classification is attained with fewer training data. Recently, such active learning has been receiving an increasing interest. In this paper, we propose a novel active learning strategy. The strategy attempts to produce only useful training data for multilayer perceptrons to achieve accurate classification, and avoids generating redundant training data. Furthermore, the strategy attempts to avoid generating temporarily useful training data that will become redundant in the future. As a result, the strategy can allow multilayer perceptrons to achieve accurate classification with fewer training data. To demonstrate the performance of the strategy in comparison with other active learning strategies, we also propose an empirical active learning algorithm as an implementation of the strategy, which does not require expensive computations. Experimental results show that the proposed algorithm improves the classification accuracy of a multilayer perceptron with fewer training data than that for a conventional random selection algorithm that constructs a training data set without explicit strategies. Moreover, the algorithm outperforms typical active learning algorithms in the experiments. Those results show that the algorithm can construct an appropriate training data set at lower computational cost, because training data generation is usually costly. Accordingly, the algorithm proves the effectiveness of the strategy through the experiments. We also discuss some drawbacks of the algorithm.
Sethu VIJAYAKUMAR Hidemitsu OGAWA
In this paper, we discuss the problem of active training data selection for improving the generalization capability of a neural network. We look at the learning problem from a function approximation perspective and formalize it as an inverse problem. Based on this framework, we analytically derive a method of choosing a training data set optimized with respect to the Wiener optimization criterion. The final result uses the apriori correlation information on the original function ensemble to devise an efficient sampling scheme which, when used in conjunction with the learning scheme described here, is shown to result in optimal generalization. This result is substantiated through a simulated example and a learning problem in high dimensional function space.
Hiroyuki TAKIZAWA Taira NAKAJIMA Masaaki NISHI Hiroaki KOBAYASHI Tadao NAKAMURA
We apply two acceleration techniques for the backpropagation algorithm to an iterative gradient descent algorithm called the network inversion algorithm. Experimental results show that these techniques are also quite effective to decrease the number of iterations required for the detection of input vectors on the classification boundary of a multilayer perceptron.