1-4hit |
Masataka ARAKI Marie KATSURAI Ikki OHMUKAI Hideaki TAKEDA
Most existing methods on research collaborator recommendation focus on promoting collaboration within a specific discipline and exploit a network structure derived from co-authorship or co-citation information. To find collaboration opportunities outside researchers' own fields of expertise and beyond their social network, we present an interdisciplinary collaborator recommendation method based on research content similarity. In the proposed method, we calculate textual features that reflect a researcher's interests using a research grant database. To find the most relevant researchers who work in other fields, we compare constructing a pairwise similarity matrix in a feature space and exploiting existing social networks with content-based similarity. We present a case study at the Graduate University for Advanced Studies in Japan in which actual collaborations across departments are used as ground truth. The results indicate that our content-based approach can accurately predict interdisciplinary collaboration compared with the conventional collaboration network-based approaches.
Marie KATSURAI Ikki OHMUKAI Hideaki TAKEDA
It is crucial to promote interdisciplinary research and recommend collaborators from different research fields via academic database analysis. This paper addresses a problem to characterize researchers' interests with a set of diverse research topics found in a large-scale academic database. Specifically, we first use latent Dirichlet allocation to extract topics as distributions over words from a training dataset. Then, we convert the textual features of a researcher's publications to topic vectors, and calculate the centroid of these vectors to summarize the researcher's interest as a single vector. In experiments conducted on CiNii Articles, which is the largest academic database in Japan, we show that the extracted topics reflect the diversity of the research fields in the database. The experiment results also indicate the applicability of the proposed topic representation to the author disambiguation problem.
Hiroyoshi NAGAO Koshiro TAMURA Marie KATSURAI
Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of the information provided by videos. Such an information pollutant problem can be solved by a comment classifier trained with an abstention option, which detects comments whose video categories are unclear. To improve the performance of this classification task, this paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, a Japanese online encyclopedia of entities that possibly appear in Nicovideo contents, to pre-train a bidirectional encoder representations from Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned such that it could determine whether a given comment falls into any of predefined categories. The experiments conducted on Nicovideo comment data demonstrated the effectiveness of Nicopedia BERT compared with existing BERT models pre-trained using Wikipedia or tweets. We also evaluated the performance of each model in an additional sentiment classification task, and the obtained results implied the applicability of Nicopedia BERT as a feature extractor of other social media text.
Marie KATSURAI Takahiro OGAWA Miki HASEYAMA
In this paper, a novel framework for extracting visual feature-based keyword relationships from an image database is proposed. From the characteristic that a set of relevant keywords tends to have common visual features, the keyword relationships in a target image database are extracted by using the following two steps. First, the relationship between each keyword and its corresponding visual features is modeled by using a classifier. This step enables detection of visual features related to each keyword. In the second step, the keyword relationships are extracted from the obtained results. Specifically, in order to measure the relevance between two keywords, the proposed method removes visual features related to one keyword from training images and monitors the performance of the classifier obtained for the other keyword. This measurement is the biggest difference from other conventional methods that focus on only keyword co-occurrences or visual similarities. Results of experiments conducted using an image database showed the effectiveness of the proposed method.