1-6hit |
Kosetsu TSUKUDA Masahiro HAMASAKI Masataka GOTO
For amateur creators, it has been becoming popular to create new content based on existing original work: such new content is called derivative work. We know that derivative creation is popular, but why are individual derivative works created? Although there are several factors that inspire the creation of derivative works, such factors cannot usually be observed on the Web. In this paper, we propose a model for inferring latent factors from sequences of derivative work posting events. We assume a sequence to be a stochastic process incorporating the following three factors: (1) the original work's attractiveness, (2) the original work's popularity, and (3) the derivative work's popularity. To characterize content popularity, we use content ranking data and incorporate rank-biased popularity based on the creators' browsing behaviors. Our main contributions are three-fold. First, to the best of our knowledge, this is the first study modeling derivative creation activity. Second, by using real-world datasets of music-related derivative work creation, we conducted quantitative experiments and showed the effectiveness of adopting all three factors to model derivative creation activity and considering creators' browsing behaviors in terms of the negative logarithm of the likelihood for test data. Third, we carried out qualitative experiments and showed that our model is useful in analyzing following aspects: (1) derivative creation activity in terms of category characteristics, (2) temporal development of factors that trigger derivative work posting events, (3) creator characteristics, (4) N-th order derivative creation process, and (5) original work ranking.
Kazuki OTOMO Satoru KOBAYASHI Kensuke FUKUDA Hiroshi ESAKI
System logs are useful to understand the status of and detect faults in large scale networks. However, due to their diversity and volume of these logs, log analysis requires much time and effort. In this paper, we propose a log event anomaly detection method for large-scale networks without pre-processing and feature extraction. The key idea is to embed a large amount of diverse data into hidden states by using latent variables. We evaluate our method with 12 months of system logs obtained from a nation-wide academic network in Japan. Through comparisons with Kleinberg's univariate burst detection and a traditional multivariate analysis (i.e., PCA), we demonstrate that our proposed method achieves 14.5% higher recall and 3% higher precision than PCA. A case study shows detected anomalies are effective information for troubleshooting of network system faults.
Deep Graphical Model (DGM) based on Generative Adversarial Nets (GANs) has shown promise in image generation and latent variable inference. One of the typical models is the Iterative Adversarial Inference model (GibbsNet), which learns the joint distribution between the data and its latent variable. We present RGNet (Re-inference GibbsNet) which introduces a re-inference chain in GibbsNet to improve the quality of generated samples and inferred latent variables. RGNet consists of the generative, inference, and discriminative networks. An adversarial game is cast between the generative and inference networks and the discriminative network. The discriminative network is trained to distinguish between (i) the joint inference-latent/data-space pairs and re-inference-latent/data-space pairs and (ii) the joint sampled-latent/generated-data-space pairs. We show empirically that RGNet surpasses GibbsNet in the quality of inferred latent variables and achieves comparable performance on image generation and inpainting tasks.
Ryo MASUMURA Taichi ASAMI Takanobu OBA Hirokazu MASATAKI Sumitaka SAKAUCHI Akinori ITO
This paper proposes a novel domain adaptation method that can utilize out-of-domain text resources and partially domain matched text resources in language modeling. A major problem in domain adaptation is that it is hard to obtain adequate adaptation effects from out-of-domain text resources. To tackle the problem, our idea is to carry out model merger in a latent variable space created from latent words language models (LWLMs). The latent variables in the LWLMs are represented as specific words selected from the observed word space, so LWLMs can share a common latent variable space. It enables us to perform flexible mixture modeling with consideration of the latent variable space. This paper presents two types of mixture modeling, i.e., LWLM mixture models and LWLM cross-mixture models. The LWLM mixture models can perform a latent word space mixture modeling to mitigate domain mismatch problem. Furthermore, in the LWLM cross-mixture models, LMs which individually constructed from partially matched text resources are split into two element models, each of which can be subjected to mixture modeling. For the approaches, this paper also describes methods to optimize mixture weights using a validation data set. Experiments show that the mixture in latent word space can achieve performance improvements for both target domain and out-of-domain compared with that in observed word space.
Kosetsu TSUKUDA Keisuke ISHIDA Masahiro HAMASAKI Masataka GOTO
Creating new content based on existing original work is becoming popular especially among amateur creators. Such new content is called derivative work and can be transformed into the next new derivative work. Such derivative work creation is called “N-th order derivative creation.” Although derivative creation is popular, the reason an individual derivative work was created is not observable. To infer the factors that trigger derivative work creation, we have proposed a model that incorporates three factors: (1) original work's attractiveness, (2) original work's popularity, and (3) derivative work's popularity. Based on this model, in this paper, we describe a public web service for browsing derivation factors called Songrium Derivation Factor Analysis. Our service is implemented by applying our model to original works and derivative works uploaded to a video sharing service. Songrium Derivation Factor Analysis provides various visualization functions: Original Works Map, Derivation Tree, Popularity Influence Transition Graph, Creator Distribution Map, and Creator Profile. By displaying such information when users browse and watch videos, we aim to enable them to find new content and understand the N-th order derivative creation activity at a deeper level.
Takamitsu HASHIMOTO Maomi UENO
Item response theory (IRT) is widely used for test analyses. Most models of IRT assume that a subject's responses to different items in a test are statistically independent. However, actual situations often violate this assumption. Thus, conditional independence (CI) tests among items given a latent ability variable are needed, but traditional CI tests suffer from biases. This study investigated a latent conditional independence (LCI) test given a latent variable. Results show that the LCI test can detect CI given a latent variable correctly, whereas traditional CI tests often fail to detect CI. Application of the LCI test to mathematics test data revealed that items that share common alternatives might be conditionally dependent.