The search functionality is under construction.

Keyword Search Result

[Keyword] naive Bayes(8hit)

1-8hit
  • Machine Learning-Based Approach for Depression Detection in Twitter Using Content and Activity Features

    Hatoon S. ALSAGRI  Mourad YKHLEF  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2020/04/24
      Vol:
    E103-D No:8
      Page(s):
    1825-1832

    Social media channels, such as Facebook, Twitter, and Instagram, have altered our world forever. People are now increasingly connected than ever and reveal a sort of digital persona. Although social media certainly has several remarkable features, the demerits are undeniable as well. Recent studies have indicated a correlation between high usage of social media sites and increased depression. The present study aims to exploit machine learning techniques for detecting a probable depressed Twitter user based on both, his/her network behavior and tweets. For this purpose, we trained and tested classifiers to distinguish whether a user is depressed or not using features extracted from his/her activities in the network and tweets. The results showed that the more features are used, the higher are the accuracy and F-measure scores in detecting depressed users. This method is a data-driven, predictive approach for early detection of depression or other mental illnesses. This study's main contribution is the exploration part of the features and its impact on detecting the depression level.

  • Hardware-Accelerated Secured Naïve Bayesian Filter Based on Partially Homomorphic Encryption

    Song BIAN  Masayuki HIROMOTO  Takashi SATO  

     
    PAPER-Cryptography and Information Security

      Vol:
    E102-A No:2
      Page(s):
    430-439

    In this work, we provide the first practical secure email filtering scheme based on homomorphic encryption. Specifically, we construct a secure naïve Bayesian filter (SNBF) using the Paillier scheme, a partially homomorphic encryption (PHE) scheme. We first show that SNBF can be implemented with only the additive homomorphism, thus eliminating the need to employ expensive fully homomorphic schemes. In addition, the design space for specialized hardware architecture realizing SNBF is explored. We utilize a recursive Karatsuba Montgomery structure to accelerate the homomorphic operations, where multiplication of 2048-bit integers are carried out. Through the experiment, both software and hardware versions of the SNBF are implemented. On software, 104-105x runtime and 103x storage reduction are achieved by SNBF, when compared to existing fully homomorphic approaches. By instantiating the designed hardware for SNBF, a further 33x runtime and 1919x power reduction are achieved. The proposed hardware implementation classifies an average-length email in under 0.5s, which is much more practical than existing solutions.

  • Empirical Studies of a Kernel Density Estimation Based Naive Bayes Method for Software Defect Prediction

    Haijin JI  Song HUANG  Xuewei LV  Yaning WU  Yuntian FENG  

     
    PAPER-Software Engineering

      Pubricized:
    2018/10/03
      Vol:
    E102-D No:1
      Page(s):
    75-84

    Software defect prediction (SDP) plays a significant part in allocating testing resources reasonably, reducing testing costs, and ensuring software quality. One of the most widely used algorithms of SDP models is Naive Bayes (NB) because of its simplicity, effectiveness and robustness. In NB, when a data set has continuous or numeric attributes, they are generally assumed to follow normal distributions and incorporate the probability density function of normal distribution into their conditional probabilities estimates. However, after conducting a Kolmogorov-Smirnov test, we find that the 21 main software metrics follow non-normal distribution at the 5% significance level. Therefore, this paper proposes an improved NB approach, which estimates the conditional probabilities of NB with kernel density estimation of training data sets, to help improve the prediction accuracy of NB for SDP. To evaluate the proposed method, we carry out experiments on 34 software releases obtained from 10 open source projects provided by PROMISE repository. Four well-known classification algorithms are included for comparison, namely Naive Bayes, Support Vector Machine, Logistic Regression and Random Tree. The obtained results show that this new method is more successful than the four well-known classification algorithms in the most software releases.

  • Naive Bayes Classifier Based Partitioner for MapReduce

    Lei CHEN  Wei LU  Ergude BAO  Liqiang WANG  Weiwei XING  Yuanyuan CAI  

     
    PAPER-Graphs and Networks

      Vol:
    E101-A No:5
      Page(s):
    778-786

    MapReduce is an effective framework for processing large datasets in parallel over a cluster. Data locality and data skew on the reduce side are two essential issues in MapReduce. Improving data locality can decrease network traffic by moving reduce tasks to the nodes where the reducer input data is located. Data skew will lead to load imbalance among reducer nodes. Partitioning is an important feature of MapReduce because it determines the reducer nodes to which map output results will be sent. Therefore, an effective partitioner can improve MapReduce performance by increasing data locality and decreasing data skew on the reduce side. Previous studies considering both essential issues can be divided into two categories: those that preferentially improve data locality, such as LEEN, and those that preferentially improve load balance, such as CLP. However, all these studies ignore the fact that for different types of jobs, the priority of data locality and data skew on the reduce side may produce different effects on the execution time. In this paper, we propose a naive Bayes classifier based partitioner, namely, BAPM, which achieves better performance because it can automatically choose the proper algorithm (LEEN or CLP) by leveraging the naive Bayes classifier, i.e., considering job type and bandwidth as classification attributes. Our experiments are performed in a Hadoop cluster, and the results show that BAPM boosts the computing performance of MapReduce. The selection accuracy reaches 95.15%. Further, compared with other popular algorithms, under specific bandwidths, the improvement BAPM achieved is up to 31.31%.

  • Scalable Privacy-Preserving Data Mining with Asynchronously Partitioned Datasets

    Hiroaki KIKUCHI  Daisuke KAGAWA  Anirban BASU  Kazuhiko ISHII  Masayuki TERADA  Sadayuki HONGO  

     
    PAPER-Public Key Based Protocols

      Vol:
    E96-A No:1
      Page(s):
    111-120

    In the Naive Bayes classification problem using a vertically partitioned dataset, the conventional scheme to preserve privacy of each partition uses a secure scalar product and is based on the assumption that the data is synchronized amongst common unique identities. In this paper, we attempt to discard this assumption in order to develop a more efficient and secure scheme to perform classification with minimal disclosure of private data. Our proposed scheme is based on the work by Vaidya and Clifton [2], which uses commutative encryption to perform secure set intersection so that the parties with access to the individual partitions have no knowledge of the intersection. The evaluations presented in this paper are based on experimental results, which show that our proposed protocol scales well with large sparse datasets*.

  • Comparative Analysis of Automatic Exudate Detection between Machine Learning and Traditional Approaches

    Akara SOPHARAK  Bunyarit UYYANONVARA  Sarah BARMAN  Thomas WILLIAMSON  

     
    PAPER-Biological Engineering

      Vol:
    E92-D No:11
      Page(s):
    2264-2271

    To prevent blindness from diabetic retinopathy, periodic screening and early diagnosis are neccessary. Due to lack of expert ophthalmologists in rural area, automated early exudate (one of visible sign of diabetic retinopathy) detection could help to reduce the number of blindness in diabetic patients. Traditional automatic exudate detection methods are based on specific parameter configuration, while the machine learning approaches which seems more flexible may be computationally high cost. A comparative analysis of traditional and machine learning of exudates detection, namely, mathematical morphology, fuzzy c-means clustering, naive Bayesian classifier, Support Vector Machine and Nearest Neighbor classifier are presented. Detected exudates are validated with expert ophthalmologists' hand-drawn ground-truths. The sensitivity, specificity, precision, accuracy and time complexity of each method are also compared.

  • Semi-Supervised Learning to Classify Evaluative Expressions from Labeled and Unlabeled Texts

    Yasuhiro SUZUKI  Hiroya TAKAMURA  Manabu OKUMURA  

     
    PAPER

      Vol:
    E90-D No:10
      Page(s):
    1516-1522

    In this paper, we present a method to automatically acquire a large-scale vocabulary of evaluative expressions from a large corpus of blogs. For the purpose, this paper presents a semi-supervised method for classifying evaluative expressions, that is, tuples of subjects, their attributes, and evaluative words, that indicate either favorable or unfavorable opinions towards a specific subject. Due to its characteristics, our semi-supervised method can classify evaluative expressions in a corpus by their polarities, starting from a very small set of seed training examples and using contextual information in the sentences the expressions belong to. Our experimental results with real Weblog data as our corpus show that this bootstrapping approach can improve the accuracy of methods for classifying favorable and unfavorable opinions. We also show that a reasonable amount of evaluative expressions can be really acquired.

  • Topic Document Model Approach for Naive Bayes Text Classification

    Sang-Bum KIM  Hae-Chang RIM  Jin-Dong KIM  

     
    LETTER-Natural Language Processing

      Vol:
    E88-D No:5
      Page(s):
    1091-1094

    The multinomial naive Bayes model has been widely used for probabilistic text classification. However, the parameter estimation for this model sometimes generates inappropriate probabilities. In this paper, we propose a topic document model for the multinomial naive Bayes text classification, where the parameters are estimated from normalized term frequencies of each training document. Experiments are conducted on Reuters 21578 and 20 Newsgroup collections, and our proposed approach obtained a significant improvement in performance compared to the traditional multinomial naive Bayes.