The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] ensemble learning(10hit)

1-10hit
  • Weighted Generalized Hesitant Fuzzy Sets and Its Application in Ensemble Learning Open Access

    Haijun ZHOU  Weixiang LI  Ming CHENG  Yuan SUN  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2024/01/22
      Vol:
    E107-D No:5
      Page(s):
    694-703

    Traditional intuitionistic fuzzy sets and hesitant fuzzy sets will lose some information while representing vague information, to avoid this problem, this paper constructs weighted generalized hesitant fuzzy sets by remaining multiple intuitionistic fuzzy values and giving them corresponding weights. For weighted generalized hesitant fuzzy elements in weighted generalized hesitant fuzzy sets, the paper defines some basic operations and proves their operation properties. On this basis, the paper gives the comparison rules of weighted generalized hesitant fuzzy elements and presents two kinds of aggregation operators. As for weighted generalized hesitant fuzzy preference relation, this paper proposes its definition and computing method of its corresponding consistency index. Furthermore, the paper designs an ensemble learning algorithm based on weighted generalized hesitant fuzzy sets, carries out experiments on 6 datasets in UCI database and compares with various classification algorithms. The experiments show that the ensemble learning algorithm based on weighted generalized hesitant fuzzy sets has better performance in all indicators.

  • Ensemble Malware Classifier Considering PE Section Information

    Ren TAKEUCHI  Rikima MITSUHASHI  Masakatsu NISHIGAKI  Tetsushi OHKI  

     
    PAPER

      Pubricized:
    2023/09/19
      Vol:
    E107-A No:3
      Page(s):
    306-318

    The war between cyber attackers and security analysts is gradually intensifying. Owing to the ease of obtaining and creating support tools, recent malware continues to diversify into variants and new species. This increases the burden on security analysts and hinders quick analysis. Identifying malware families is crucial for efficiently analyzing diversified malware; thus, numerous low-cost, general-purpose, deep-learning-based classification techniques have been proposed in recent years. Among these methods, malware images that represent binary features as images are often used. However, no models or architectures specific to malware classification have been proposed in previous studies. Herein, we conduct a detailed analysis of the behavior and structure of malware and focus on PE sections that capture the unique characteristics of malware. First, we validate the features of each PE section that can distinguish malware families. Then, we identify PE sections that contain adequate features to classify families. Further, we propose an ensemble learning-based classification method that combines features of highly discriminative PE sections to improve classification accuracy. The validation of two datasets confirms that the proposed method improves accuracy over the baseline, thereby emphasizing its importance.

  • Ensemble Learning in CNN Augmented with Fully Connected Subnetworks

    Daiki HIRATA  Norikazu TAKAHASHI  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2023/04/05
      Vol:
    E106-D No:7
      Page(s):
    1258-1261

    Convolutional Neural Networks (CNNs) have shown remarkable performance in image recognition tasks. In this letter, we propose a new CNN model called the EnsNet which is composed of one base CNN and multiple Fully Connected SubNetworks (FCSNs). In this model, the set of feature maps generated by the last convolutional layer in the base CNN is divided along channels into disjoint subsets, and these subsets are assigned to the FCSNs. Each of the FCSNs is trained independent of others so that it can predict the class label of each feature map in the subset assigned to it. The output of the overall model is determined by majority vote of the base CNN and the FCSNs. Experimental results using the MNIST, Fashion-MNIST and CIFAR-10 datasets show that the proposed approach further improves the performance of CNNs. In particular, an EnsNet achieves a state-of-the-art error rate of 0.16% on MNIST.

  • A KPI Anomaly Detection Method Based on Fast Clustering

    Yun WU  Yu SHI  Jieming YANG  Lishan BAO  Chunzhe LI  

     
    PAPER

      Pubricized:
    2022/05/27
      Vol:
    E105-B No:11
      Page(s):
    1309-1317

    In the Artificial Intelligence for IT Operations scenarios, KPI (Key Performance Indicator) is a very important operation and maintenance monitoring indicator, and research on KPI anomaly detection has also become a hot spot in recent years. Aiming at the problems of low detection efficiency and insufficient representation learning of existing methods, this paper proposes a fast clustering-based KPI anomaly detection method HCE-DWL. This paper firstly adopts the combination of hierarchical agglomerative clustering (HAC) and deep assignment based on CNN-Embedding (CE) to perform cluster analysis (that is HCE) on KPI data, so as to improve the clustering efficiency of KPI data, and then separately the centroid of each KPI cluster and its Transformed Outlier Scores (TOS) are given weights, and finally they are put into the LightGBM model for detection (the Double Weight LightGBM model, referred to as DWL). Through comparative experimental analysis, it is proved that the algorithm can effectively improve the efficiency and accuracy of KPI anomaly detection.

  • An Interpretable Feature Selection Based on Particle Swarm Optimization

    Yi LIU  Wei QIN  Qibin ZHENG  Gensong LI  Mengmeng LI  

     
    LETTER-Pattern Recognition

      Pubricized:
    2022/05/09
      Vol:
    E105-D No:8
      Page(s):
    1495-1500

    Feature selection based on particle swarm optimization is often employed for promoting the performance of artificial intelligence algorithms. However, its interpretability has been lacking of concrete research. Improving the stability of the feature selection method is a way to effectively improve its interpretability. A novel feature selection approach named Interpretable Particle Swarm Optimization is developed in this paper. It uses four data perturbation ways and three filter feature selection methods to obtain stable feature subsets, and adopts Fuch map to convert them to initial particles. Besides, it employs similarity mutation strategy, which applies Tanimoto distance to choose the nearest 1/3 individuals to the previous particles to implement mutation. Eleven representative algorithms and four typical datasets are taken to make a comprehensive comparison with our proposed approach. Accuracy, F1, precision and recall rate indicators are used as classification measures, and extension of Kuncheva indicator is employed as the stability measure. Experiments show that our method has a better interpretability than the compared evolutionary algorithms. Furthermore, the results of classification measures demonstrate that the proposed approach has an excellent comprehensive classification performance.

  • Exploration into Gray Area: Toward Efficient Labeling for Detecting Malicious Domain Names

    Naoki FUKUSHI  Daiki CHIBA  Mitsuaki AKIYAMA  Masato UCHIDA  

     
    PAPER

      Pubricized:
    2019/10/08
      Vol:
    E103-B No:4
      Page(s):
    375-388

    In this paper, we propose a method to reduce the labeling cost while acquiring training data for a malicious domain name detection system using supervised machine learning. In the conventional systems, to train a classifier with high classification accuracy, large quantities of benign and malicious domain names need to be prepared as training data. In general, malicious domain names are observed less frequently than benign domain names. Therefore, it is difficult to acquire a large number of malicious domain names without a dedicated labeling method. We propose a method based on active learning that labels data around the decision boundary of classification, i.e., in the gray area, and we show that the classification accuracy can be improved by using approximately 1% of the training data used by the conventional systems. Another disadvantage of the conventional system is that if the classifier is trained with a small amount of training data, its generalization ability cannot be guaranteed. We propose a method based on ensemble learning that integrates multiple classifiers, and we show that the classification accuracy can be stabilized and improved. The combination of the two methods proposed here allows us to develop a new system for malicious domain name detection with high classification accuracy and generalization ability by labeling a small amount of training data.

  • AI@ntiPhish — Machine Learning Mechanisms for Cyber-Phishing Attack

    Yu-Hung CHEN  Jiann-Liang CHEN  

     
    INVITED PAPER

      Pubricized:
    2019/02/18
      Vol:
    E102-D No:5
      Page(s):
    878-887

    This study proposes a novel machine learning architecture and various learning algorithms to build-in anti-phishing services for avoiding cyber-phishing attack. For the rapid develop of information technology, hackers engage in cyber-phishing attack to steal important personal information, which draws information security concerns. The prevention of phishing website involves in various aspect, for example, user training, public awareness, fraudulent phishing, etc. However, recent phishing research has mainly focused on preventing fraudulent phishing and relied on manual identification that is inefficient for real-time detection systems. In this study, we used methods such as ANOVA, X2, and information gain to evaluate features. Then, we filtered out the unrelated features and obtained the top 28 most related features as the features to use for the training and evaluation of traditional machine learning algorithms, such as Support Vector Machine (SVM) with linear or rbf kernels, Logistic Regression (LR), Decision tree, and K-Nearest Neighbor (KNN). This research also evaluated the above algorithms with the ensemble learning concept by combining multiple classifiers, such as Adaboost, bagging, and voting. Finally, the eXtreme Gradient Boosting (XGBoost) model exhibited the best performance of 99.2%, among the algorithms considered in this study.

  • Unsupervised Weight Parameter Estimation for Exponential Mixture Distribution Based on Symmetric Kullback-Leibler Divergence

    Masato UCHIDA  

     
    LETTER-Information Theory

      Vol:
    E98-A No:11
      Page(s):
    2349-2353

    When there are multiple component predictors, it is promising to integrate them into one predictor for advanced reasoning. If each component predictor is given as a stochastic model in the form of probability distribution, an exponential mixture of the component probability distributions provides a good way to integrate them. However, weight parameters used in the exponential mixture model are difficult to estimate if there is no training samples for performance evaluation. As a suboptimal way to solve this problem, weight parameters may be estimated so that the exponential mixture model should be a balance point that is defined as an equilibrium point with respect to the distance from/to all component probability distributions. In this paper, we propose a weight parameter estimation method that represents this concept using a symmetric Kullback-Leibler divergence and generalize this method.

  • Ensemble Learning Based Segmentation of Metastatic Liver Tumours in Contrast-Enhanced Computed Tomography Open Access

    Akinobu SHIMIZU  Takuya NARIHIRA  Hidefumi KOBATAKE  Daisuke FURUKAWA  Shigeru NAWANO  Kenji SHINOZAKI  

     
    LETTER-Medical Image Processing

      Vol:
    E96-D No:4
      Page(s):
    864-868

    This paper presents an ensemble learning algorithm for liver tumour segmentation from a CT volume in the form of U-Boost and extends the loss functions to improve performance. Five segmentation algorithms trained by the ensemble learning algorithm with different loss functions are compared in terms of error rate and Jaccard Index between the extracted regions and true ones.

  • A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques

    Jakkrit TECHO  Cholwich NATTEE  Thanaruk THEERAMUNKONG  

     
    PAPER-Unknown Word Processing

      Vol:
    E92-D No:12
      Page(s):
    2321-2333

    While classification techniques can be applied for automatic unknown word recognition in a language without word boundary, it faces with the problem of unbalanced datasets where the number of positive unknown word candidates is dominantly smaller than that of negative candidates. To solve this problem, this paper presents a corpus-based approach that introduces a so-called group-based ranking evaluation technique into ensemble learning in order to generate a sequence of classification models that later collaborate to select the most probable unknown word from multiple candidates. Given a classification model, the group-based ranking evaluation (GRE) is applied to construct a training dataset for learning the succeeding model, by weighing each of its candidates according to their ranks and correctness when the candidates of an unknown word are considered as one group. A number of experiments have been conducted on a large Thai medical text to evaluate performance of the proposed group-based ranking evaluation approach, namely V-GRE, compared to the conventional naive Bayes classifier and our vanilla version without ensemble learning. As the result, the proposed method achieves an accuracy of 90.930.50% when the first rank is selected while it gains 97.260.26% when the top-ten candidates are considered, that is 8.45% and 6.79% improvement over the conventional record-based naive Bayes classifier and the vanilla version. Another result on applying only best features show 93.930.22% and up to 98.85 0.15% accuracy for top-1 and top-10, respectively. They are 3.97% and 9.78% improvement over naive Bayes and the vanilla version. Finally, an error analysis is given.