Kensuke SUMOTO Kenta KANAKOGI Hironori WASHIZAKI Naohiko TSUDA Nobukazu YOSHIOKA Yoshiaki FUKAZAWA Hideyuki KANUKA
Security-related issues have become more significant due to the proliferation of IT. Collating security-related information in a database improves security. For example, Common Vulnerabilities and Exposures (CVE) is a security knowledge repository containing descriptions of vulnerabilities about software or source code. Although the descriptions include various entities, there is not a uniform entity structure, making security analysis difficult using individual entities. Developing a consistent entity structure will enhance the security field. Herein we propose a method to automatically label select entities from CVE descriptions by applying the Named Entity Recognition (NER) technique. We manually labeled 3287 CVE descriptions and conducted experiments using a machine learning model called BERT to compare the proposed method to labeling with regular expressions. Machine learning using the proposed method significantly improves the labeling accuracy. It has an f1 score of about 0.93, precision of about 0.91, and recall of about 0.95, demonstrating that our method has potential to automatically label select entities from CVE descriptions.
Takehiro TAKAYANAGI Kiyoshi IZUMI
Personalized stock recommendations aim to suggest stocks tailored to individual investor needs, significantly aiding the financial decision making of an investor. This study shows the advantages of incorporating context into personalized stock recommendation systems. We embed item contextual information such as technical indicators, fundamental factors, and business activities of individual stocks. Simultaneously, we consider user contextual information such as investors' personality traits, behavioral characteristics, and attributes to create a comprehensive investor profile. Our model incorporating contextual information, validated on novel stock recommendation tasks, demonstrated a notable improvement over baseline models when incorporating these contextual features. Consistent outperformance across various hyperparameters further underscores the robustness and utility of our model in integrating stocks' features and investors' traits into personalized stock recommendations.
This paper addresses the novel task of detecting chorus sections in English and Japanese lyrics text. Although chorus-section detection using audio signals has been studied, whether chorus sections can be detected from text-only lyrics is an open issue. Another open issue is whether patterns of repeating lyric lines such as those appearing in chorus sections depend on language. To investigate these issues, we propose a neural-network-based model for sequence labeling. It can learn phrase repetition and linguistic features to detect chorus sections in lyrics text. It is, however, difficult to train this model since there was no dataset of lyrics with chorus-section annotations as there was no prior work on this task. We therefore generate a large amount of training data with such annotations by leveraging pairs of musical audio signals and their corresponding manually time-aligned lyrics; we first automatically detect chorus sections from the audio signals and then use their temporal positions to transfer them to the line-level chorus-section annotations for the lyrics. Experimental results show that the proposed model with the generated data contributes to detecting the chorus sections, that the model trained on Japanese lyrics can detect chorus sections surprisingly well in English lyrics, and that patterns of repeating lyric lines are language-independent.
Sachiko KANAMORI Hirotsune SATO Naoya TABATA Ryo NOJIMA
To protect user privacy and establish self-information control rights, service providers must notify users of their privacy policies and obtain their consent in advance. The frameworks that impose these requirements are mandatory. Although originally designed to protect user privacy, obtaining user consent in advance has become a mere formality. These problems are induced by the gap between service providers' privacy policies, which prioritize the observance of laws and guidelines, and user expectations which are to easily understand how their data will be handled. To reduce this gap, we construct a tool supporting users in reading privacy policies in Japanese. We designed the tool to present users with separate unique expressions containing relevant information to improve the display format of the privacy policy and render it more comprehensive for Japanese users. To accurately extract the unique expressions from privacy policies, we created training data for machine learning for the constructed tool. The constructed tool provides a summary of privacy policies for users to help them understand the policies of interest. Subsequently, we assess the effectiveness of the constructed tool in experiments and follow-up questionnaires. Our findings reveal that the constructed tool enhances the users' subjective understanding of the services they read about and their awareness of the related risks. We expect that the developed tool will help users better understand the privacy policy content and and make educated decisions based on their understanding of how service providers intend to use their personal data.
Binggang ZHUO Masaki MURATA Qing MA
Paragraph segmentation is a text segmentation task. Iikura et al. achieved excellent results on paragraph segmentation by introducing focal loss to Bidirectional Encoder Representations from Transformers. In this study, we investigated paragraph segmentation on Daily News and Novel datasets. Based on the approach proposed by Iikura et al., we used auxiliary loss to train the model to improve paragraph segmentation performance. Consequently, the average F1-score obtained by the approach of Iikura et al. was 0.6704 on the Daily News dataset, whereas that of our approach was 0.6801. Our approach thus improved the performance by approximately 1%. The performance improvement was also confirmed on the Novel dataset. Furthermore, the results of two-tailed paired t-tests indicated that there was a statistical significance between the performance of the two approaches.
Rizal Setya PERDANA Yoshiteru ISHIDA
This study presents a formulation for generating context-aware natural language by machine from visual representation. Given an image sequence input, the visual storytelling task (VST) aims to generate a coherent, object-focused, and contextualized sentence story. Previous works in this domain faced a problem in modeling an architecture that works in temporal multi-modal data, which led to a low-quality output, such as low lexical diversity, monotonous sentences, and inaccurate context. This study introduces a further improvement, that is, an end-to-end architecture, called cross-modal contextualize attention, optimized to extract visual-temporal features and generate a plausible story. Visual object and non-visual concept features are encoded from the convolutional feature map, and object detection features are joined with language features. Three scenarios are defined in decoding language generation by incorporating weights from a pre-trained language generation model. Extensive experiments are conducted to confirm that the proposed model outperforms other models in terms of automatic metrics and manual human evaluation.
Since cyber attacks such as cyberterrorism against Industrial Control Systems (ICSs) and cyber espionage against companies managing them have increased, the techniques to detect anomalies in early stages are required. To achieve the purpose, several studies have developed anomaly detection methods for ICSs. In particular, some techniques using packet flow regularity in industrial control networks have achieved high-accuracy detection of attacks disrupting the regularity, i.e. normal behaviour, of ICSs. However, these methods cannot identify scanning attacks employed in cyber espionage because the probing packets assimilate into a number of normal ones. For example, the malware called Havex is customised to clandestinely acquire information from targeting ICSs using general request packets. The techniques to detect such scanning attacks using widespread packets await further investigation. Therefore, the goal of this study was to examine high performance methods to identify anomalies even if elaborate packets to avoid alert systems were employed for attacks against industrial control networks. In this paper, a novel detection model for anomalous packets concealing behind normal traffic in industrial control networks was proposed. For the proposal of the sophisticated detection method, we took particular note of packet flow regularity and employed the Markov-chain model to detect anomalies. Moreover, we regarded not only original packets but similar ones to them as normal packets to reduce false alerts because it was indicated that an anomaly detection model using the Markov-chain suffers from the ample false positives affected by a number of normal, irregular packets, namely noise. To calculate the similarity between packets based on the packet flow regularity, a vector representation tool called word2vec was employed. Whilst word2vec is utilised for the culculation of word similarity in natural language processing tasks, we applied the technique to packets in ICSs to calculate packet similarity. As a result, the Markov-chain with word2vec model identified scanning packets assimulating into normal packets in higher performance than the conventional Markov-chain model. In conclusion, employing both packet flow regularity and packet similarity in industrial control networks contributes to improving the performance of anomaly detection in ICSs.
Takeshi HOMMA Yasunari OBUCHI Kazuaki SHIMA Rintaro IKESHITA Hiroaki KOKUBO Takuya MATSUMOTO
For voice-enabled car navigation systems that use a multi-purpose cloud speech recognition service (cloud ASR), utterance classification that is robust against speech recognition errors is needed to realize a user-friendly voice interface. The purpose of this study is to improve the accuracy of utterance classification for voice-enabled car navigation systems when inputs to a classifier are error-prone speech recognition results obtained from a cloud ASR. The role of utterance classification is to predict which car navigation function a user wants to execute from a spontaneous utterance. A cloud ASR causes speech recognition errors due to the noises that occur when traveling in a car, and the errors degrade the accuracy of utterance classification. There are many methods for reducing the number of speech recognition errors by modifying the inside of a speech recognizer. However, application developers cannot apply these methods to cloud ASRs because they cannot customize the ASRs. In this paper, we propose a system for improving the accuracy of utterance classification by modifying both speech-signal inputs to a cloud ASR and recognized-sentence outputs from an ASR. First, our system performs speech enhancement on a user's utterance and then sends both enhanced and non-enhanced speech signals to a cloud ASR. Speech recognition results from both speech signals are merged to reduce the number of recognition errors. Second, to reduce that of utterance classification errors, we propose a data augmentation method, which we call “optimal doping,” where not only accurate transcriptions but also error-prone recognized sentences are added to training data. An evaluation with real user utterances spoken to car navigation products showed that our system reduces the number of utterance classification errors by 54% from a baseline condition. Finally, we propose a semi-automatic upgrading approach for classifiers to benefit from the improved performance of cloud ASRs.
Bo SUN Akinori FUJINO Tatsuya MORI Tao BAN Takeshi TAKAHASHI Daisuke INOUE
Analyzing a malware sample requires much more time and cost than creating it. To understand the behavior of a given malware sample, security analysts often make use of API call logs collected by the dynamic malware analysis tools such as a sandbox. As the amount of the log generated for a malware sample could become tremendously large, inspecting the log requires a time-consuming effort. Meanwhile, antivirus vendors usually publish malware analysis reports (vendor reports) on their websites. These malware analysis reports are the results of careful analysis done by security experts. The problem is that even though there are such analyzed examples for malware samples, associating the vendor reports with the sandbox logs is difficult. This makes security analysts not able to retrieve useful information described in vendor reports. To address this issue, we developed a system called AMAR-Generator that aims to automate the generation of malware analysis reports based on sandbox logs by making use of existing vendor reports. Aiming at a convenient assistant tool for security analysts, our system employs techniques including template matching, API behavior mapping, and malicious behavior database to produce concise human-readable reports that describe the malicious behaviors of malware programs. Through the performance evaluation, we first demonstrate that AMAR-Generator can generate human-readable reports that can be used by a security analyst as the first step of the malware analysis. We also demonstrate that AMAR-Generator can identify the malicious behaviors that are conducted by malware from the sandbox logs; the detection rates are up to 96.74%, 100%, and 74.87% on the sandbox logs collected in 2013, 2014, and 2015, respectively. We also present that it can detect malicious behaviors from unknown types of sandbox logs.
Maoxi LI Qingyu XIANG Zhiming CHEN Mingwen WANG
The-state-of-the-art neural quality estimation (QE) of machine translation model consists of two sub-networks that are tuned separately, a bidirectional recurrent neural network (RNN) encoder-decoder trained for neural machine translation, called the predictor, and an RNN trained for sentence-level QE tasks, called the estimator. We propose to combine the two sub-networks into a whole neural network, called the unified neural network. When training, the bidirectional RNN encoder-decoder are initialized and pre-trained with the bilingual parallel corpus, and then, the networks are trained jointly to minimize the mean absolute error over the QE training samples. Compared with the predictor and estimator approach, the use of a unified neural network helps to train the parameters of the neural networks that are more suitable for the QE task. Experimental results on the benchmark data set of the WMT17 sentence-level QE shared task show that the proposed unified neural network approach consistently outperforms the predictor and estimator approach and significantly outperforms the other baseline QE approaches.
Katsuhide FUJITA Ryosuke WATANABE
Recently, the opportunity to discuss topics on a variety of online discussion bulletin boards has been increasing. However, it can be difficult to understand the contents of each discussion as the number of posts increases. Therefore, it is important to generate headlines that can automatically summarize each post in order to understand the contents of each discussion at a glance. In this paper, we propose a method to extract and generate post headlines for online discussion bulletin boards, automatically. We propose templates with multiple patterns to extract important sentences from the posts. In addition, we propose a method to generate headlines by matching the templates with the patterns. Then, we evaluate the effectiveness of our proposed method using questionnaires.
Kento WATANABE Yuichiroh MATSUBAYASHI Kentaro INUI Satoru FUKAYAMA Tomoyasu NAKANO Masataka GOTO
This paper addresses the issue of modeling the discourse nature of lyrics and presented the first study aiming at capturing the two common discourse-related notions: storylines and themes. We assume that a storyline is a chain of transitions over topics of segments and a song has at least one entire theme. We then hypothesize that transitions over topics of lyric segments can be captured by a probabilistic topic model which incorporates a distribution over transitions of latent topics and that such a distribution of topic transitions is affected by the theme of lyrics. Aiming to test those hypotheses, this study conducts experiments on the word prediction and segment order prediction tasks exploiting a large-scale corpus of popular music lyrics for both English and Japanese (around 100 thousand songs). The findings we gained from these experiments can be summarized into two respects. First, the models with topic transitions significantly outperformed the model without topic transitions in word prediction. This result indicates that typical storylines included in our lyrics datasets were effectively captured as a probabilistic distribution of transitions over latent topics of segments. Second, the model incorporating a latent theme variable on top of topic transitions outperformed the models without such variables in both word prediction and segment order prediction. From this result, we can conclude that considering the notion of theme does contribute to the modeling of storylines of lyrics.
Liang-Chun CHEN Chien-Lung HSU Nai-Wei LO Kuo-Hui YEH Ping-Hsien LIN
With the successful development and rapid advancement of social networking technology, people tend to exchange and share information via online social networks, such as Facebook and LINE.Massive amounts of information are aggregated promptly and circulated quickly among people. However, with the enormous volume of human-interactions, various types of swindles via online social networks have been launched in recent years. Effectively detecting fraudulent activities on social networks has taken on increased importance, and is a topic of ongoing interest. In this paper, we develop a fraud analysis and detection system based on real-time messaging communications, which constitute one of the most common human-interacted services of online social networks. An integrated platform consisting of various text-mining techniques, such as natural language processing, matrix processing and content analysis via a latent semantic model, is proposed. In the system implementation, we first collect a series of fraud events, all of which happened in Taiwan, to construct analysis modules for detecting such fraud events. An Android-based application is then built for alert notification when dubious logs and fraud events happen.
Satoshi MASUDA Tohru MATSUODANI Kazuhiko TSUDA
In the early phases of the system development process, stakeholders exchange ideas and describe requirements in natural language. Requirements described in natural language tend to be vague and include logical inconsistencies, whereas logical consistency is the key to raising the quality and lowering the cost of system development. Hence, it is important to find logical inconsistencies in the whole requirements at this early stage. In verification and validation of the requirements, there are techniques to derive logical formulas from natural language requirements and evaluate their inconsistencies automatically. Users manually chunk the requirements by paragraphs. However, paragraphs do not always represent logical chunks. There can be only one logical chunk over some paragraphs on the other hand some logical chunks in one paragraph. In this paper, we present a practical approach to detecting logical inconsistencies by clustering technique in natural language requirements. Software requirements specifications (SRSs) are the target document type. We use k-means clustering to cluster chunks of requirements and develop semantic role labeling rules to derive “conditions” and “actions” as semantic roles from the requirements by using natural language processing. We also construct an abstraction grammar to transform the conditions and actions into logical formulas. By evaluating the logical formulas with input data patterns, we can find logical inconsistencies. We implemented our approach and conducted experiments on three case studies of requirements written in natural English. The results indicate that our approach can find logical inconsistencies.
Sentence similarity computation is an increasingly important task in applications of natural language processing such as information retrieval, machine translation, text summarization and so on. From the viewpoint of information theory, the essential attribute of natural language is that the carrier of information and the capacity of information can be measured by information content which is already successfully used for word similarity computation in simple ways. Existing sentence similarity methods don't emphasize the information contained by the sentence, and the complicated models they employ often need using empirical parameters or training parameters. This paper presents a fully unsupervised computational model of sentence semantic similarity. It is also a simply and straightforward model that neither needs any empirical parameter nor rely on other NLP tools. The method can obtain state-of-the-art experimental results which show that sentence similarity evaluated by the model is closer to human judgment than multiple competing baselines. The paper also tests the proposed model on the influence of external corpus, the performance of various sizes of the semantic net, and the relationship between efficiency and accuracy.
Yoko NAKAJIMA Michal PTASZYNSKI Hirotoshi HONMA Fumito MASUI
In everyday life, people use past events and their own knowledge in predicting probable unfolding of events. To obtain the necessary knowledge for such predictions, newspapers and the Internet provide a general source of information. Newspapers contain various expressions describing past events, but also current and future events, and opinions. In our research we focused on automatically obtaining sentences that make reference to the future. Such sentences can contain expressions that not only explicitly refer to future events, but could also refer to past or current events. For example, if people read a news article that states “In the near future, there will be an upward trend in the price of gasoline,” they may be likely to buy gasoline now. However, if the article says “The cost of gasoline has just risen 10 yen per liter,” people will not rush to buy gasoline, because they accept this as reality and may expect the cost to decrease in the future. In the following study we firstly investigate future reference sentences in newspapers and Web news. Next, we propose a method for automatic extraction of such sentences by using semantic role labels, without typical approaches (temporal expressions, etc.). In a series of experiments, we extract semantic role patterns from future reference sentences and examine the validity of the extracted patterns in classification of future reference sentences.
Junhwi CHOI Seonghan RYU Kyusong LEE Gary Geunbae LEE
We propose a one-step error detection and correction interface for a voice word processor. This correction interface performs analysis region detection, user intention understanding and error correction utterance recognition, all from a single user utterance input. We evaluate the performance of each component first, and then compare the effectiveness of our interface to two previous interfaces. Our evaluation demonstrates that each component is technically superior to the baselines and that our one-step error detection and correction method yields an error correction interface that is more convenient and natural than the two previous interfaces.
To understand human emotion, it is necessary to be aware of the surrounding situation and individual personalities. In most previous studies, however, these important aspects were not considered. Emotion recognition has been considered as a classification problem. In this paper, we attempt new approaches to utilize a person's situational information and personality for use in understanding emotion. We propose a method of extracting situational information and building a personalized emotion model for reflecting the personality of each character in the text. To extract and utilize situational information, we propose a situation model using lexical and syntactic information. In addition, to reflect the personality of an individual, we propose a personalized emotion model using KBANN (Knowledge-based Artificial Neural Network). Our proposed system has the advantage of using a traditional keyword-spotting algorithm. In addition, we also reflect the fact that the strength of emotion decreases over time. Experimental results show that the proposed system can more accurately and intelligently recognize a person's emotion than previous methods.
Yoonjae CHOI Pum-Mo RYU Hyunki KIM Changki LEE
Event extraction is vital to social media monitoring and social event prediction. In this paper, we propose a method for social event extraction from web documents by identifying binary relations between named entities. There have been many studies on relation extraction, but their aims were mostly academic. For practical application, we try to identify 130 relation types that comprise 31 predefined event types, which address business and public issues. We use structured Support Vector Machine, the state of the art classifier to capture relations. We apply our method on news, blogs and tweets collected from the Internet and discuss the results.
Learning to rank refers to machine learning techniques for training the model in a ranking task. Learning to rank is useful for many applications in Information Retrieval, Natural Language Processing, and Data Mining. Intensive studies have been conducted on the problem and significant progress has been made [1],[2]. This short paper gives an introduction to learning to rank, and it specifically explains the fundamental problems, existing approaches, and future work of learning to rank. Several learning to rank methods using SVM techniques are described in details.