The search functionality is under construction.

Author Search Result

[Author] Masataka GOTO(9hit)

1-9hit
  • Improvements of Voice Timbre Control Based on Perceived Age in Singing Voice Conversion

    Kazuhiro KOBAYASHI  Tomoki TODA  Tomoyasu NAKANO  Masataka GOTO  Satoshi NAKAMURA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2016/07/21
      Vol:
    E99-D No:11
      Page(s):
    2767-2777

    As one of the techniques enabling individual singers to produce the varieties of voice timbre beyond their own physical constraints, a statistical voice timbre control technique based on the perceived age has been developed. In this technique, the perceived age of a singing voice, which is the age of the singer as perceived by the listener, is used as one of the intuitively understandable measures to describe voice characteristics of the singing voice. The use of statistical voice conversion (SVC) with a singer-dependent multiple-regression Gaussian mixture model (MR-GMM), which effectively models the voice timbre variations caused by a change of the perceived age, makes it possible for individual singers to manipulate the perceived ages of their own singing voices while retaining their own singer identities. However, there still remain several issues; e.g., 1) a controllable range of the perceived age is limited; 2) quality of the converted singing voice is significantly degraded compared to that of a natural singing voice; and 3) each singer needs to sing the same phrase set as sung by a reference singer to develop the singer-dependent MR-GMM. To address these issues, we propose the following three methods; 1) a method using gender-dependent modeling to expand the controllable range of the perceived age; 2) a method using direct waveform modification based on spectrum differential to improve quality of the converted singing voice; and 3) a rapid unsupervised adaptation method based on maximum a posteriori (MAP) estimation to easily develop the singer-dependent MR-GMM. The experimental results show that the proposed methods achieve a wider controllable range of the perceived age, a significant quality improvement of the converted singing voice, and the development of the singer-dependnet MR-GMM using only a few arbitrary phrases as adaptation data.

  • Voice Timbre Control Based on Perceived Age in Singing Voice Conversion

    Kazuhiro KOBAYASHI  Tomoki TODA  Hironori DOI  Tomoyasu NAKANO  Masataka GOTO  Graham NEUBIG  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Voice Conversion and Speech Enhancement

      Vol:
    E97-D No:6
      Page(s):
    1419-1428

    The perceived age of a singing voice is the age of the singer as perceived by the listener, and is one of the notable characteristics that determines perceptions of a song. In this paper, we describe an investigation of acoustic features that have an effect on the perceived age, and a novel voice timbre control technique based on the perceived age for singing voice conversion (SVC). Singers can sing expressively by controlling prosody and voice timbre, but the varieties of voices that singers can produce are limited by physical constraints. Previous work has attempted to overcome this limitation through the use of statistical voice conversion. This technique makes it possible to convert singing voice timbre of an arbitrary source singer into those of an arbitrary target singer. However, it is still difficult to intuitively control singing voice characteristics by manipulating parameters corresponding to specific physical traits, such as gender and age. In this paper, we first perform an investigation of the factors that play a part in the listener's perception of the singer's age at first. Then, we applied a multiple-regression Gaussian mixture models (MR-GMM) to SVC for the purpose of controlling voice timbre based on the perceived age and we propose SVC based on the modified MR-GMM for manipulating the perceived age while maintaining singer's individuality. The experimental results show that 1) the perceived age of singing voices corresponds relatively well to the actual age of the singer, 2) prosodic features have a larger effect on the perceived age than spectral features, 3) the individuality of a singer is influenced more heavily by segmental features than prosodic features 4) the proposed voice timbre control method makes it possible to change the singer's perceived age while not having an adverse effect on the perceived individuality.

  • Modeling Storylines in Lyrics

    Kento WATANABE  Yuichiroh MATSUBAYASHI  Kentaro INUI  Satoru FUKAYAMA  Tomoyasu NAKANO  Masataka GOTO  

     
    PAPER-Natural Language Processing

      Pubricized:
    2017/12/22
      Vol:
    E101-D No:4
      Page(s):
    1167-1179

    This paper addresses the issue of modeling the discourse nature of lyrics and presented the first study aiming at capturing the two common discourse-related notions: storylines and themes. We assume that a storyline is a chain of transitions over topics of segments and a song has at least one entire theme. We then hypothesize that transitions over topics of lyric segments can be captured by a probabilistic topic model which incorporates a distribution over transitions of latent topics and that such a distribution of topic transitions is affected by the theme of lyrics. Aiming to test those hypotheses, this study conducts experiments on the word prediction and segment order prediction tasks exploiting a large-scale corpus of popular music lyrics for both English and Japanese (around 100 thousand songs). The findings we gained from these experiments can be summarized into two respects. First, the models with topic transitions significantly outperformed the model without topic transitions in word prediction. This result indicates that typical storylines included in our lyrics datasets were effectively captured as a probabilistic distribution of transitions over latent topics of segments. Second, the model incorporating a latent theme variable on top of topic transitions outperformed the models without such variables in both word prediction and segment order prediction. From this result, we can conclude that considering the notion of theme does contribute to the modeling of storylines of lyrics.

  • Songrium Derivation Factor Analysis: A Web Service for Browsing Derivation Factors by Modeling N-th Order Derivative Creation

    Kosetsu TSUKUDA  Keisuke ISHIDA  Masahiro HAMASAKI  Masataka GOTO  

     
    PAPER

      Pubricized:
    2018/01/18
      Vol:
    E101-D No:4
      Page(s):
    1096-1106

    Creating new content based on existing original work is becoming popular especially among amateur creators. Such new content is called derivative work and can be transformed into the next new derivative work. Such derivative work creation is called “N-th order derivative creation.” Although derivative creation is popular, the reason an individual derivative work was created is not observable. To infer the factors that trigger derivative work creation, we have proposed a model that incorporates three factors: (1) original work's attractiveness, (2) original work's popularity, and (3) derivative work's popularity. Based on this model, in this paper, we describe a public web service for browsing derivation factors called Songrium Derivation Factor Analysis. Our service is implemented by applying our model to original works and derivative works uploaded to a video sharing service. Songrium Derivation Factor Analysis provides various visualization functions: Original Works Map, Derivation Tree, Popularity Influence Transition Graph, Creator Distribution Map, and Creator Profile. By displaying such information when users browse and watch videos, we aim to enable them to find new content and understand the N-th order derivative creation activity at a deeper level.

  • Why and How People View Lyrics While Listening to Music on a Smartphone

    Kosetsu TSUKUDA  Masahiro HAMASAKI  Masataka GOTO  

     
    PAPER-Music Information Processing

      Pubricized:
    2023/01/18
      Vol:
    E106-D No:4
      Page(s):
    556-564

    Why and how do people view lyrics? Although various lyrics-based music systems have been proposed, this fundamental question remains unexplored. Better understanding of lyrics viewing behavior would be beneficial for both researchers and music streaming platforms to improve their lyrics-based systems. Therefore, in this paper, we investigate why and how people view lyrics, especially when they listen to music on a smartphone. To answer “why,” we conduct a questionnaire-based online user survey involving 206 participants. To answer “how,” we analyze over 23 million lyrics request logs sent from the smartphone application of a music streaming service. Our analysis results suggest several reusable insights, including the following: (1) People have high demand for viewing lyrics to confirm what the artist sings, more deeply understand the lyrics, sing the song, and figure out the structure such as verse and chorus. (2) People like to view lyrics after returning home at night and before going to sleep rather than during the daytime. (3) People usually view the same lyrics repeatedly over time. Applying these insights, we also discuss application examples that could enable people to more actively view lyrics and listen to new songs, which would not only diversify and enrich people's music listening experiences but also be beneficial especially for music streaming platforms.

  • A Method to Detect Chorus Sections in Lyrics Text

    Kento WATANABE  Masataka GOTO  

     
    PAPER-Music Information Processing

      Pubricized:
    2023/06/02
      Vol:
    E106-D No:9
      Page(s):
    1600-1609

    This paper addresses the novel task of detecting chorus sections in English and Japanese lyrics text. Although chorus-section detection using audio signals has been studied, whether chorus sections can be detected from text-only lyrics is an open issue. Another open issue is whether patterns of repeating lyric lines such as those appearing in chorus sections depend on language. To investigate these issues, we propose a neural-network-based model for sequence labeling. It can learn phrase repetition and linguistic features to detect chorus sections in lyrics text. It is, however, difficult to train this model since there was no dataset of lyrics with chorus-section annotations as there was no prior work on this task. We therefore generate a large amount of training data with such annotations by leveraging pairs of musical audio signals and their corresponding manually time-aligned lyrics; we first automatically detect chorus sections from the audio signals and then use their temporal positions to transfer them to the line-level chorus-section annotations for the lyrics. Experimental results show that the proposed model with the generated data contributes to detecting the chorus sections, that the model trained on Japanese lyrics can detect chorus sections surprisingly well in English lyrics, and that patterns of repeating lyric lines are language-independent.

  • Kiite Cafe: A Web Service Enabling Users to Listen to the Same Song at the Same Moment While Reacting to the Song

    Kosetsu TSUKUDA  Keisuke ISHIDA  Masahiro HAMASAKI  Masataka GOTO  

     
    PAPER-Music Information Processing

      Pubricized:
    2023/07/28
      Vol:
    E106-D No:11
      Page(s):
    1906-1915

    This paper describes a public web service called Kiite Cafe that lets users get together virtually to listen to music. When users listen to music on Kiite Cafe, their experiences are enhanced by two architectures: (i) visualization of each user's reactions, and (ii) selection of songs from users' favorite songs. These architectures enable users to feel social connection with others and the joy of introducing others to their favorite songs as if they were together listening to music in person. In addition, the architectures provide three user experiences: (1) motivation to react to played songs, (2) the opportunity to listen to a diverse range of songs, and (3) the opportunity to contribute as a curator. By analyzing the behavior logs of 2,399 Kiite Cafe users over a year, we quantitatively show that these user experiences can generate various effects (e.g., users react to a more diverse range of songs on Kiite Cafe than when listening alone). We also discuss how our proposed architectures can enrich music listening experiences with others.

  • DanceUnisoner: A Parametric, Visual, and Interactive Simulation Interface for Choreographic Composition of Group Dance

    Shuhei TSUCHIDA  Satoru FUKAYAMA  Jun KATO  Hiromu YAKURA  Masataka GOTO  

     
    PAPER-Human-computer Interaction

      Pubricized:
    2023/11/27
      Vol:
    E107-D No:3
      Page(s):
    386-399

    Composing choreography is challenging because it involves numerous iterative refinements. According to our video analysis and interviews, choreographers typically need to imagine dancers' movements to revise drafts on paper since testing new movements and formations with actual dancers takes time. To address this difficulty, we present an interactive group-dance simulation interface, DanceUnisoner, that assists choreographers in composing a group dance in a simulated environment. With DanceUnisoner, choreographers can arrange excerpts from solo-dance videos of dancers throughout a three-dimensional space. They can adjust various parameters related to the dancers in real time, such as each dancer's position and size and each movement's timing. To evaluate the effectiveness of the system's parametric, visual, and interactive interface, we asked seven choreographers to use it and compose group dances. Our observations, interviews, and quantitative analysis revealed their successful usage in iterative refinements and visual checking of choreography, providing insights to facilitate further computational creativity support for choreographers.

  • Modeling N-th Order Derivative Creation Based on Content Attractiveness and Time-Dependent Popularity

    Kosetsu TSUKUDA  Masahiro HAMASAKI  Masataka GOTO  

     
    PAPER

      Pubricized:
    2020/02/05
      Vol:
    E103-D No:5
      Page(s):
    969-981

    For amateur creators, it has been becoming popular to create new content based on existing original work: such new content is called derivative work. We know that derivative creation is popular, but why are individual derivative works created? Although there are several factors that inspire the creation of derivative works, such factors cannot usually be observed on the Web. In this paper, we propose a model for inferring latent factors from sequences of derivative work posting events. We assume a sequence to be a stochastic process incorporating the following three factors: (1) the original work's attractiveness, (2) the original work's popularity, and (3) the derivative work's popularity. To characterize content popularity, we use content ranking data and incorporate rank-biased popularity based on the creators' browsing behaviors. Our main contributions are three-fold. First, to the best of our knowledge, this is the first study modeling derivative creation activity. Second, by using real-world datasets of music-related derivative work creation, we conducted quantitative experiments and showed the effectiveness of adopting all three factors to model derivative creation activity and considering creators' browsing behaviors in terms of the negative logarithm of the likelihood for test data. Third, we carried out qualitative experiments and showed that our model is useful in analyzing following aspects: (1) derivative creation activity in terms of category characteristics, (2) temporal development of factors that trigger derivative work posting events, (3) creator characteristics, (4) N-th order derivative creation process, and (5) original work ranking.