The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] music(98hit)

1-20hit(98hit)

  • MDX-Mixer: Music Demixing by Leveraging Source Signals Separated by Existing Demixing Models Open Access

    Tomoyasu NAKANO  Masataka GOTO  

     
    PAPER-Music Information Processing

      Pubricized:
    2024/04/05
      Vol:
    E107-D No:8
      Page(s):
    1079-1088

    This paper presents MDX-Mixer, which improves music demixing (MDX) performance by leveraging source signals separated by multiple existing MDX models. Deep-learning-based MDX models have improved their separation performances year by year for four kinds of sound sources: “vocals,” “drums,” “bass,” and “other”. Our research question is whether mixing (i.e., weighted sum) the signals separated by state-of-the-art MDX models can obtain either the best of everything or higher separation performance. Previously, in singing voice separation and MDX, there have been studies in which separated signals of the same sound source are mixed with each other using time-invariant or time-varying positive mixing weights. In contrast to those, this study is novel in that it allows for negative weights as well and performs time-varying mixing using all of the separated source signals and the music acoustic signal before separation. The time-varying weights are estimated by modeling the music acoustic signals and their separated signals by dividing them into short segments. In this paper we propose two new systems: one that estimates time-invariant weights using 1×1 convolution, and one that estimates time-varying weights by applying the MLP-Mixer layer proposed in the computer vision field to each segment. The latter model is called MDX-Mixer. Their performances were evaluated based on the source-to-distortion ratio (SDR) using the well-known MUSDB18-HQ dataset. The results show that the MDX-Mixer achieved higher SDR than the separated signals given by three state-of-the-art MDX models.

  • Improved Source Localization Method of the Small-Aperture Array Based on the Parasitic Fly’s Coupled Ears and MUSIC-Like Algorithm Open Access

    Hongbo LI  Aijun LIU  Qiang YANG  Zhe LYU  Di YAO  

     
    LETTER-Noise and Vibration

      Pubricized:
    2023/12/08
      Vol:
    E107-A No:8
      Page(s):
    1355-1359

    To improve the direction-of-arrival estimation performance of the small-aperture array, we propose a source localization method inspired by the Ormia fly’s coupled ears and MUSIC-like algorithm. The Ormia can local its host cricket’s sound precisely despite the tremendous incompatibility between the spacing of its ear and the sound wavelength. In this paper, we first implement a biologically inspired coupled system based on the coupled model of the Ormia’s ears and solve its responses by the modal decomposition method. Then, we analyze the effect of the system on the received signals of the array. Research shows that the system amplifies the amplitude ratio and phase difference between the signals, equivalent to creating a virtual array with a larger aperture. Finally, we apply the MUSIC-like algorithm for DOA estimation to suppress the colored noise caused by the system. Numerical results demonstrate that the proposed method can improve the localization precision and resolution of the array.

  • Dance-Conditioned Artistic Music Generation by Creative-GAN Open Access

    Jiang HUANG  Xianglin HUANG  Lifang YANG  Zhulin TAO  

     
    PAPER-Multimedia Environment Technology

      Pubricized:
    2023/08/23
      Vol:
    E107-A No:5
      Page(s):
    836-844

    We present a novel adversarial, end-to-end framework based on Creative-GAN to generate artistic music conditioned on dance videos. Our proposed framework takes the visual and motion posture data as input, and then adopts a quantized vector as the audio representation to generate complex music corresponding to input. However, the GAN algorithm just imitate and reproduce works what humans have created, instead of generating something new and creative. Therefore, we newly introduce Creative-GAN, which extends the original GAN framework to two discriminators, one is to determine whether it is real music, and the other is to classify music style. The paper shows that our proposed Creative-GAN can generate novel and interesting music which is not found in the training dataset. To evaluate our model, a comprehensive evaluation scheme is introduced to make subjective and objective evaluation. Compared with the advanced methods, our experimental results performs better in measureing the music rhythm, generation diversity, dance-music correlation and overall quality of generated music.

  • CQTXNet: A Modified Xception Network with Attention Modules for Cover Song Identification

    Jinsoo SEO  Junghyun KIM  Hyemi KIM  

     
    LETTER

      Pubricized:
    2023/10/02
      Vol:
    E107-D No:1
      Page(s):
    49-52

    Song-level feature summarization is fundamental for the browsing, retrieval, and indexing of digital music archives. This study proposes a deep neural network model, CQTXNet, for extracting song-level feature summary for cover song identification. CQTXNet incorporates depth-wise separable convolution, residual network connections, and attention models to extend previous approaches. An experimental evaluation of the proposed CQTXNet was performed on two publicly available cover song datasets by varying the number of network layers and the type of attention modules.

  • Kiite Cafe: A Web Service Enabling Users to Listen to the Same Song at the Same Moment While Reacting to the Song

    Kosetsu TSUKUDA  Keisuke ISHIDA  Masahiro HAMASAKI  Masataka GOTO  

     
    PAPER-Music Information Processing

      Pubricized:
    2023/07/28
      Vol:
    E106-D No:11
      Page(s):
    1906-1915

    This paper describes a public web service called Kiite Cafe that lets users get together virtually to listen to music. When users listen to music on Kiite Cafe, their experiences are enhanced by two architectures: (i) visualization of each user's reactions, and (ii) selection of songs from users' favorite songs. These architectures enable users to feel social connection with others and the joy of introducing others to their favorite songs as if they were together listening to music in person. In addition, the architectures provide three user experiences: (1) motivation to react to played songs, (2) the opportunity to listen to a diverse range of songs, and (3) the opportunity to contribute as a curator. By analyzing the behavior logs of 2,399 Kiite Cafe users over a year, we quantitatively show that these user experiences can generate various effects (e.g., users react to a more diverse range of songs on Kiite Cafe than when listening alone). We also discuss how our proposed architectures can enrich music listening experiences with others.

  • A Method to Detect Chorus Sections in Lyrics Text

    Kento WATANABE  Masataka GOTO  

     
    PAPER-Music Information Processing

      Pubricized:
    2023/06/02
      Vol:
    E106-D No:9
      Page(s):
    1600-1609

    This paper addresses the novel task of detecting chorus sections in English and Japanese lyrics text. Although chorus-section detection using audio signals has been studied, whether chorus sections can be detected from text-only lyrics is an open issue. Another open issue is whether patterns of repeating lyric lines such as those appearing in chorus sections depend on language. To investigate these issues, we propose a neural-network-based model for sequence labeling. It can learn phrase repetition and linguistic features to detect chorus sections in lyrics text. It is, however, difficult to train this model since there was no dataset of lyrics with chorus-section annotations as there was no prior work on this task. We therefore generate a large amount of training data with such annotations by leveraging pairs of musical audio signals and their corresponding manually time-aligned lyrics; we first automatically detect chorus sections from the audio signals and then use their temporal positions to transfer them to the line-level chorus-section annotations for the lyrics. Experimental results show that the proposed model with the generated data contributes to detecting the chorus sections, that the model trained on Japanese lyrics can detect chorus sections surprisingly well in English lyrics, and that patterns of repeating lyric lines are language-independent.

  • Online EEG-Based Emotion Prediction and Music Generation for Inducing Affective States

    Kana MIYAMOTO  Hiroki TANAKA  Satoshi NAKAMURA  

     
    PAPER-Human-computer Interaction

      Pubricized:
    2022/02/15
      Vol:
    E105-D No:5
      Page(s):
    1050-1063

    Music is often used for emotion induction because it can change the emotions of people. However, since we subjectively feel different emotions when listening to music, we propose an emotion induction system that generates music that is adapted to each individual. Our system automatically generates suitable music for emotion induction based on the emotions predicted from an electroencephalogram (EEG). We examined three elements for constructing our system: 1) a music generator that creates music that induces emotions that resemble the inputs, 2) emotion prediction using EEG in real-time, and 3) the control of a music generator using the predicted emotions for making music that is suitable for inducing emotions. We constructed our proposed system using these elements and evaluated it. The results showed its effectiveness for inducing emotions and suggest that feedback loops that tailor stimuli to individuals can successfully induce emotions.

  • Pairwise Similarity Normalization Based on a Hubness Score for Improving Cover Song Retrieval Accuracy

    Jin S. SEO  

     
    LETTER-Music Information Processing

      Pubricized:
    2022/02/21
      Vol:
    E105-D No:5
      Page(s):
    1130-1134

    A hubness-score based normalization of the pairwise similarity is proposed for the sequence-alignment based cover song retrieval. The hubness, which is the tendency of some data points in high-dimensional data sets to link more frequently to other points than the rest of the points from the set, is widely-known to deteriorate the information retrieval accuracy. This paper tries to relieve the performance degradation due to the hubness by normalizing the pairwise similarity with a hubness score. Experiments on two cover song datasets confirm that the proposed similarity normalization improves the cover song retrieval accuracy.

  • DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching

    Satoshi MIZOGUCHI  Yuki SAITO  Shinnosuke TAKAMICHI  Hiroshi SARUWATARI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/07/30
      Vol:
    E104-D No:11
      Page(s):
    1971-1980

    We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.

  • Multi-Task Learning for Improved Recognition of Multiple Types of Acoustic Information

    Jae-Won KIM  Hochong PARK  

     
    LETTER-Speech and Hearing

      Pubricized:
    2021/07/14
      Vol:
    E104-D No:10
      Page(s):
    1762-1765

    We propose a new method for improving the recognition performance of phonemes, speech emotions, and music genres using multi-task learning. When tasks are closely related, multi-task learning can improve the performance of each task by learning common feature representation for all the tasks. However, the recognition tasks considered in this study demand different input signals of speech and music at different time scales, resulting in input features with different characteristics. In addition, a training dataset with multiple labels for all information sources is not available. Considering these issues, we conduct multi-task learning in a sequential training process using input features with a single label for one information source. A comparative evaluation confirms that the proposed method for multi-task learning provides higher performance for all recognition tasks than individual learning for each task as in conventional methods.

  • Robust Fractional Lower Order Correntropy Algorithm for DOA Estimation in Impulsive Noise Environments

    Quan TIAN  Tianshuang QIU  Jitong MA  Jingchun LI  Rong LI  

     
    PAPER-Antennas and Propagation

      Pubricized:
    2020/06/29
      Vol:
    E104-B No:1
      Page(s):
    35-48

    In array signal processing, many methods of handling cases of impulsive noise with an alpha-stable distribution have been studied. By introducing correntropy with a robust statistical property, this paper proposes a novel fractional lower order correntropy (FLOCR) method. The FLOCR-based estimator for array outputs is defined and applied with multiple signal classification (MUSIC) to estimate the direction of arrival (DOA) in alpha-stable distributed noise environments. Comprehensive Monte Carlo simulation results demonstrate that FLOCR-MUSIC outperforms existing algorithms in terms of root mean square error (RMSE) and the probability of resolution, especially in the presence of highly impulsive noise.

  • Salient Chromagram Extraction Based on Trend Removal for Cover Song Identification

    Jin S. SEO  

     
    LETTER

      Pubricized:
    2020/10/19
      Vol:
    E104-D No:1
      Page(s):
    51-54

    This paper proposes a salient chromagram by removing local trend to improve cover song identification accuracy. The proposed salient chromagram emphasizes tonal contents of music, which are well-preserved between an original song and its cover version, while reducing the effects of timber difference. We apply the proposed salient chromagram to the sequence-alignment based cover song identification. Experiments on two cover song datasets confirm that the proposed salient chromagram improves the cover song identification accuracy.

  • Modeling N-th Order Derivative Creation Based on Content Attractiveness and Time-Dependent Popularity

    Kosetsu TSUKUDA  Masahiro HAMASAKI  Masataka GOTO  

     
    PAPER

      Pubricized:
    2020/02/05
      Vol:
    E103-D No:5
      Page(s):
    969-981

    For amateur creators, it has been becoming popular to create new content based on existing original work: such new content is called derivative work. We know that derivative creation is popular, but why are individual derivative works created? Although there are several factors that inspire the creation of derivative works, such factors cannot usually be observed on the Web. In this paper, we propose a model for inferring latent factors from sequences of derivative work posting events. We assume a sequence to be a stochastic process incorporating the following three factors: (1) the original work's attractiveness, (2) the original work's popularity, and (3) the derivative work's popularity. To characterize content popularity, we use content ranking data and incorporate rank-biased popularity based on the creators' browsing behaviors. Our main contributions are three-fold. First, to the best of our knowledge, this is the first study modeling derivative creation activity. Second, by using real-world datasets of music-related derivative work creation, we conducted quantitative experiments and showed the effectiveness of adopting all three factors to model derivative creation activity and considering creators' browsing behaviors in terms of the negative logarithm of the likelihood for test data. Third, we carried out qualitative experiments and showed that our model is useful in analyzing following aspects: (1) derivative creation activity in terms of category characteristics, (2) temporal development of factors that trigger derivative work posting events, (3) creator characteristics, (4) N-th order derivative creation process, and (5) original work ranking.

  • Joint Angle, Velocity, and Range Estimation Using 2D MUSIC and Successive Interference Cancellation in FMCW MIMO Radar System

    Jonghyeok LEE  Sunghyun HWANG  Sungjin YOU  Woo-Jin BYUN  Jaehyun PARK  

     
    PAPER-Sensing

      Pubricized:
    2019/09/11
      Vol:
    E103-B No:3
      Page(s):
    283-290

    To estimate angle, velocity, and range information of multiple targets jointly in FMCW MIMO radar, two-dimensional (2D) MUSIC with matched filtering and FFT algorithm is proposed. By reformulating the received FMCW signal of the colocated MIMO radar, we exploit 2D MUSIC to estimate the angle and Doppler frequency of multiple targets. Then by using a matched filter together with the estimated angle and Doppler frequency and FFT operation, the range of the target is estimated. To effectively estimate the parameters of multiple targets with large distance differences, we also propose a successive interference cancellation method that uses the orthogonal projection. That is, rather than estimating the multiple target parameters simultaneously using 2D MUSIC, we estimate the target parameters sequentially, in which the parameters of the target having strongest reflected power are estimated first and then, their effect on the received signal is canceled out by using the orthogonal projection. Simulations verify the performance of the proposed algorithm.

  • Combining CNN and Broad Learning for Music Classification

    Huan TANG  Ning CHEN  

     
    PAPER-Music Information Processing

      Pubricized:
    2019/12/05
      Vol:
    E103-D No:3
      Page(s):
    695-701

    Music classification has been inspired by the remarkable success of deep learning. To enhance efficiency and ensure high performance at the same time, a hybrid architecture that combines deep learning and Broad Learning (BL) is proposed for music classification tasks. At the feature extraction stage, the Random CNN (RCNN) is adopted to analyze the Mel-spectrogram of the input music sound. Compared with conventional CNN, RCNN has more flexible structure to adapt to the variance contained in different types of music. At the prediction stage, the BL technique is introduced to enhance the prediction accuracy and reduce the training time as well. Experimental results on three benchmark datasets (GTZAN, Ballroom, and Emotion) demonstrate that: i) The proposed scheme achieves higher classification accuracy than the deep learning based one, which combines CNN and LSTM, on all three benchmark datasets. ii) Both RCNN and BL contribute to the performance improvement of the proposed scheme. iii) The introduction of BL also helps to enhance the prediction efficiency of the proposed scheme.

  • Multi-Scale Chroma n-Gram Indexing for Cover Song Identification

    Jin S. SEO  

     
    LETTER

      Pubricized:
    2019/10/23
      Vol:
    E103-D No:1
      Page(s):
    59-62

    To enhance cover song identification accuracy on a large-size music archive, a song-level feature summarization method is proposed by using multi-scale representation. The chroma n-grams are extracted in multiple scales to cope with both global and local tempo changes. We derive index from the extracted n-grams by clustering to reduce storage and computation for DB search. Experiments on the widely used music datasets confirmed that the proposed method achieves the state-of-the-art accuracy while reducing cost for cover song search.

  • Estimation of the Matrix Rank of Harmonic Components of a Spectrogram in a Piano Music Signal Based on the Stein's Unbiased Risk Estimator and Median Filter Open Access

    Seokjin LEE  

     
    LETTER-Music Information Processing

      Pubricized:
    2019/08/22
      Vol:
    E102-D No:11
      Page(s):
    2276-2279

    The estimation of the matrix rank of harmonic components of a music spectrogram provides some useful information, e.g., the determination of the number of basis vectors of the matrix-factorization-based algorithms, which is required for the automatic music transcription or in post-processing. In this work, we develop an algorithm based on Stein's unbiased risk estimator (SURE) algorithm with the matrix factorization model. The noise variance required for the SURE algorithm is estimated by suppressing the harmonic component via median filtering. An evaluation performed using the MIDI-aligned piano sounds (MAPS) database revealed an average estimation error of -0.26 (standard deviation: 4.4) for the proposed algorithm.

  • Human Activity Identification by Height and Doppler RCS Information Detected by MIMO Radar

    Dai SASAKAWA  Naoki HONMA  Takeshi NAKAYAMA  Shoichi IIZUKA  

     
    PAPER

      Pubricized:
    2019/01/22
      Vol:
    E102-B No:7
      Page(s):
    1270-1278

    This paper introduces a method that identifies human activity from the height and Doppler Radar Cross Section (RCS) information detected by Multiple-Input Multiple-Output (MIMO) radar. This method estimates the three-dimensional target location by applying the MUltiple SIgnal Classification (MUSIC) method to the observed MIMO channel; the Doppler RCS is calculated from the signal reflected from the target. A gesture recognition algorithm is applied to the trajectory of the temporal transition of the estimated human height and the Doppler RCS. In experiments, the proposed method achieves over 90% recognition rate (average).

  • Stereophonic Music Separation Based on Non-Negative Tensor Factorization with Cepstral Distance Regularization

    Shogo SEKI  Tomoki TODA  Kazuya TAKEDA  

     
    PAPER-Engineering Acoustics

      Vol:
    E101-A No:7
      Page(s):
    1057-1064

    This paper proposes a semi-supervised source separation method for stereophonic music signals containing multiple recorded or processed signals, where synthesized music is focused on the stereophonic music. As the synthesized music signals are often generated as linear combinations of many individual source signals and their respective mixing gains, phase or phase difference information between inter-channel signals, which represent spatial characteristics of recording environments, cannot be utilized as acoustic clues for source separation. Non-negative Tensor Factorization (NTF) is an effective technique which can be used to resolve this problem by decomposing amplitude spectrograms of stereo channel music signals into basis vectors and activations of individual music source signals, along with their corresponding mixing gains. However, it is difficult to achieve sufficient separation performance using this method alone, as the acoustic clues available for separation are limited. To address this issue, this paper proposes a Cepstral Distance Regularization (CDR) method for NTF-based stereo channel separation, which involves making the cepstrum of the separated source signals follow Gaussian Mixture Models (GMMs) of the corresponding the music source signal. These GMMs are trained in advance using available samples. Experimental evaluations separating three and four sound sources are conducted to investigate the effectiveness of the proposed method in both supervised and semi-supervised separation frameworks, and performance is also compared with that of a conventional NTF method. Experimental results demonstrate that the proposed method yields significant improvements within both separation frameworks, and that cepstral distance regularization provides better separation parameters.

  • Enhanced Performance of MUSIC Algorithm Using Spatial Interpolation in Automotive FMCW Radar Systems

    Seongwook LEE  Young-Jun YOON  Seokhyun KANG  Jae-Eun LEE  Seong-Cheol KIM  

     
    PAPER-Antennas and Propagation

      Pubricized:
    2017/06/28
      Vol:
    E101-B No:1
      Page(s):
    163-175

    In this paper, we propose a received signal interpolation method for enhancing the performance of multiple signal classification (MUSIC) algorithm. In general, the performance of the conventional MUSIC algorithm is very sensitive to signal-to-noise ratio (SNR) of the received signal. When array elements receive the signals with nonuniform SNR values, the resolution performance is degraded compared to elements receiving the signals with uniform SNR values. Hence, we propose a signal calibration technique for improving the resolution of the algorithm. First, based on original signals, rough direction of arrival (DOA) estimation is conducted. In this stage, using frequency-domain received signals, SNR values of each antenna element in the array are estimated. Then, a deteriorated element that has a relatively lower SNR value than those of the other elements is selected by our proposed scheme. Next, the received signal of the selected element is spatially interpolated based on the signals received from the neighboring elements and the DOA information extracted from the rough estimation. Finally, fine DOA estimation is performed again with the calibrated signal. Simulation results show that the angular resolution of the proposed method is better than that of the conventional MUSIC algorithm. Also, we apply the proposed scheme to actual data measured in the testing ground, and it gives us more enhanced DOA estimation result.

1-20hit(98hit)