1-2hit |
Fumihiro KANEI Daiki CHIBA Kunio HATO Katsunari YOSHIOKA Tsutomu MATSUMOTO Mitsuaki AKIYAMA
While the online advertisement is widely used on the web and on mobile applications, the monetary damages by advertising frauds (ad frauds) have become a severe problem. Countermeasures against ad frauds are evaded since they rely on noticeable features (e.g., burstiness of ad requests) that attackers can easily change. We propose an ad-fraud-detection method that leverages robust features against attacker evasion. We designed novel features on the basis of the statistics observed in an ad network calculated from a large amount of ad requests from legitimate users, such as the popularity of publisher websites and the tendencies of client environments. We assume that attackers cannot know of or manipulate these statistics and that features extracted from fraudulent ad requests tend to be outliers. These features are used to construct a machine-learning model for detecting fraudulent ad requests. We evaluated our proposed method by using ad-request logs observed within an actual ad network. The results revealed that our designed features improved the recall rate by 10% and had about 100,000-160,000 fewer false negatives per day than conventional features based on the burstiness of ad requests. In addition, by evaluating detection performance with long-term dataset, we confirmed that the proposed method is robust against performance degradation over time. Finally, we applied our proposed method to a large dataset constructed on an ad network and found several characteristics of the latest ad frauds in the wild, for example, a large amount of fraudulent ad requests is sent from cloud servers.
Yukihiro TAGAMI Hayato KOBAYASHI Shingo ONO Akira TAJIMA
Modeling user activities on the Web is a key problem for various Web services, such as news article recommendation and ad click prediction. In our work-in-progress paper[1], we introduced an approach that summarizes each sequence of user Web page visits using Paragraph Vector[3], considering users and URLs as paragraphs and words, respectively. The learned user representations are used among the user-related prediction tasks in common. In this paper, on the basis of analysis of our Web page visit data, we propose Backward PV-DM, which is a modified version of Paragraph Vector. We show experimental results on two ad-related data sets based on logs from Web services of Yahoo! JAPAN. Our proposed method achieved better results than those of existing vector models.