The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)

381-400hit(2504hit)

  • Spectral Features Based on Local Normalized Center Moments for Speech Emotion Recognition

    Huawei TAO  Ruiyu LIANG  Xinran ZHANG  Li ZHAO  

     
    LETTER-Speech and Hearing

      Vol:
    E99-A No:10
      Page(s):
    1863-1866

    To discuss whether rotational invariance is the main role in spectrogram features, new spectral features based on local normalized center moments, denoted by LNCMSF, are proposed. The proposed LNCMSF firstly adopts 2nd order normalized center moments to describe local energy distribution of the logarithmic energy spectrum, then normalized center moment spectrograms NC1 and NC2 are gained. Secondly, DCT (Discrete Cosine Transform) is used to eliminate the correlation of NC1 and NC2, then high order cepstral coefficients TNC1 and TNC2 are obtained. Finally, LNCMSF is generated by combining NC1, NC2, TNC1 and TNC2. The rotational invariance test experiment shows that the rotational invariance is not a necessary property in partial spectrogram features. The recognition experiment shows that the maximum UA (Unweighted Average of Class-Wise Recall Rate) of LNCMSF are improved by at least 10.7% and 1.2% respectively, compared to that of MFCC (Mel Frequency Cepstrum Coefficient) and HuWSF (Weighted Spectral Features Based on Local Hu Moments).

  • Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis

    Xin WANG  Shinji TAKAKI  Junichi YAMAGISHI  

     
    PAPER-Speech synthesis

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2471-2480

    Building high-quality text-to-speech (TTS) systems without expert knowledge of the target language and/or time-consuming manual annotation of speech and text data is an important yet challenging research topic. In this kind of TTS system, it is vital to find representation of the input text that is both effective and easy to acquire. Recently, the continuous representation of raw word inputs, called “word embedding”, has been successfully used in various natural language processing tasks. It has also been used as the additional or alternative linguistic input features to a neural-network-based acoustic model for TTS systems. In this paper, we further investigate the use of this embedding technique to represent phonemes, syllables and phrases for the acoustic model based on the recurrent and feed-forward neural network. Results of the experiments show that most of these continuous representations cannot significantly improve the system's performance when they are fed into the acoustic model either as additional component or as a replacement of the conventional prosodic context. However, subjective evaluation shows that the continuous representation of phrases can achieve significant improvement when it is combined with the prosodic context as input to the acoustic model based on the feed-forward neural network.

  • N-gram Approximation of Latent Words Language Models for Domain Robust Automatic Speech Recognition Open Access

    Ryo MASUMURA  Taichi ASAMI  Takanobu OBA  Hirokazu MASATAKI  Sumitaka SAKAUCHI  Satoshi TAKAHASHI  

     
    PAPER-Language modeling

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2462-2470

    This paper aims to improve the domain robustness of language modeling for automatic speech recognition (ASR). To this end, we focus on applying the latent words language model (LWLM) to ASR. LWLMs are generative models whose structure is based on Bayesian soft class-based modeling with vast latent variable space. Their flexible attributes help us to efficiently realize the effects of smoothing and dimensionality reduction and so address the data sparseness problem; LWLMs constructed from limited domain data are expected to robustly cover unknown multiple domains in ASR. However, the attribute flexibility seriously increases computation complexity. If we rigorously compute the generative probability for an observed word sequence, we must consider the huge quantities of all possible latent word assignments. Since this is computationally impractical, some approximation is inevitable for ASR implementation. To solve the problem and apply this approach to ASR, this paper presents an n-gram approximation of LWLM. The n-gram approximation is a method that approximates LWLM as a simple back-off n-gram structure, and offers LWLM-based robust one-pass ASR decoding. Our experiments verify the effectiveness of our approach by evaluating perplexity and ASR performance in not only in-domain data sets but also out-of-domain data sets.

  • Statistical Bandwidth Extension for Speech Synthesis Based on Gaussian Mixture Model with Sub-Band Basis Spectrum Model

    Yamato OHTANI  Masatsune TAMURA  Masahiro MORITA  Masami AKAMINE  

     
    PAPER-Voice conversion

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2481-2489

    This paper describes a novel statistical bandwidth extension (BWE) technique based on a Gaussian mixture model (GMM) and a sub-band basis spectrum model (SBM), in which each dimensional component represents a specific acoustic space in the frequency domain. The proposed method can achieve the BWE from speech data with an arbitrary frequency bandwidth whereas the conventional methods perform the conversion from fixed narrow-band data. In the proposed method, we train a GMM with SBM parameters extracted from full-band spectra in advance. According to the bandwidth of input signal, the trained GMM is reconstructed to the GMM of the joint probability density between low-band SBM and high-band SBM components. Then high-band SBM components are estimated from low-band SBM components of the input signal based on the reconstructed GMM. Finally, BWE is achieved by adding the spectra decoded from estimated high-band SBM components to the ones of the input signal. To construct the full-band signal from the narrow-band one, we apply this method to log-amplitude spectra and aperiodic components. Objective and subjective evaluation results show that the proposed method extends the bandwidth of speech data robustly for the log-amplitude spectra. Experimental results also indicate that the aperiodic component extracted from the upsampled narrow-band signal realizes the same performance as the restored and the full-band aperiodic components in the proposed method.

  • A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models

    Shinnosuke TAKAMICHI  Tomoki TODA  Graham NEUBIG  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Voice conversion

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2490-2498

    This paper presents a novel statistical sample-based approach for Gaussian Mixture Model (GMM)-based Voice Conversion (VC). Although GMM-based VC has the promising flexibility of model adaptation, quality in converted speech is significantly worse than that of natural speech. This paper addresses the problem of inaccurate modeling, which is one of the main reasons causing the quality degradation. Recently, we have proposed statistical sample-based speech synthesis using rich context models for high-quality and flexible Hidden Markov Model (HMM)-based Text-To-Speech (TTS) synthesis. This method makes it possible not only to produce high-quality speech by introducing ideas from unit selection synthesis, but also to preserve flexibility of the original HMM-based TTS. In this paper, we apply this idea to GMM-based VC. The rich context models are first trained for individual joint speech feature vectors, and then we gather them mixture by mixture to form a Rich context-GMM (R-GMM). In conversion, an iterative generation algorithm using R-GMMs is used to convert speech parameters, after initialization using over-trained probability distributions. Because the proposed method utilizes individual speech features, and its formulation is the same as that of conventional GMM-based VC, it makes it possible to produce high-quality speech while keeping flexibility of the original GMM-based VC. The experimental results demonstrate that the proposed method yields significant improvements in term of speech quality and speaker individuality in converted speech.

  • Speeding up Deep Neural Networks in Speech Recognition with Piecewise Quantized Sigmoidal Activation Function

    Anhao XING  Qingwei ZHAO  Yonghong YAN  

     
    LETTER-Acoustic modeling

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2558-2561

    This paper proposes a new quantization framework on activation function of deep neural networks (DNN). We implement fixed-point DNN by quantizing the activations into powers-of-two integers. The costly multiplication operations in using DNN can be replaced with low-cost bit-shifts to massively save computations. Thus, applying DNN-based speech recognition on embedded systems becomes much easier. Experiments show that the proposed method leads to no performance degradation.

  • Transfer Semi-Supervised Non-Negative Matrix Factorization for Speech Emotion Recognition

    Peng SONG  Shifeng OU  Xinran ZHANG  Yun JIN  Wenming ZHENG  Jinglei LIU  Yanwei YU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2016/07/01
      Vol:
    E99-D No:10
      Page(s):
    2647-2650

    In practice, emotional speech utterances are often collected from different devices or conditions, which will lead to discrepancy between the training and testing data, resulting in sharp decrease of recognition rates. To solve this problem, in this letter, a novel transfer semi-supervised non-negative matrix factorization (TSNMF) method is presented. A semi-supervised negative matrix factorization algorithm, utilizing both labeled source and unlabeled target data, is adopted to learn common feature representations. Meanwhile, the maximum mean discrepancy (MMD) as a similarity measurement is employed to reduce the distance between the feature distributions of two databases. Finally, the TSNMF algorithm, which optimizes the SNMF and MMD functions together, is proposed to obtain robust feature representations across databases. Extensive experiments demonstrate that in comparison to the state-of-the-art approaches, our proposed method can significantly improve the cross-corpus recognition rates.

  • Investigation of Combining Various Major Language Model Technologies including Data Expansion and Adaptation Open Access

    Ryo MASUMURA  Taichi ASAMI  Takanobu OBA  Hirokazu MASATAKI  Sumitaka SAKAUCHI  Akinori ITO  

     
    PAPER-Language modeling

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2452-2461

    This paper aims to investigate the performance improvements made possible by combining various major language model (LM) technologies together and to reveal the interactions between LM technologies in spontaneous automatic speech recognition tasks. While it is clear that recent practical LMs have several problems, isolated use of major LM technologies does not appear to offer sufficient performance. In consideration of this fact, combining various LM technologies has been also examined. However, previous works only focused on modeling technologies with limited text resources, and did not consider other important technologies in practical language modeling, i.e., use of external text resources and unsupervised adaptation. This paper, therefore, employs not only manual transcriptions of target speech recognition tasks but also external text resources. In addition, unsupervised LM adaptation based on multi-pass decoding is also added to the combination. We divide LM technologies into three categories and employ key ones including recurrent neural network LMs or discriminative LMs. Our experiments show the effectiveness of combining various LM technologies in not only in-domain tasks, the subject of our previous work, but also out-of-domain tasks. Furthermore, we also reveal the relationships between the technologies in both tasks.

  • Investigation of DNN-Based Audio-Visual Speech Recognition

    Satoshi TAMURA  Hiroshi NINOMIYA  Norihide KITAOKA  Shin OSUGA  Yurie IRIBE  Kazuya TAKEDA  Satoru HAYAMIZU  

     
    PAPER-Acoustic modeling

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2444-2451

    Audio-Visual Speech Recognition (AVSR) is one of techniques to enhance robustness of speech recognizer in noisy or real environments. On the other hand, Deep Neural Networks (DNNs) have recently attracted a lot of attentions of researchers in the speech recognition field, because we can drastically improve recognition performance by using DNNs. There are two ways to employ DNN techniques for speech recognition: a hybrid approach and a tandem approach; in the hybrid approach an emission probability on each Hidden Markov Model (HMM) state is computed using a DNN, while in the tandem approach a DNN is composed into a feature extraction scheme. In this paper, we investigate and compare several DNN-based AVSR methods to mainly clarify how we should incorporate audio and visual modalities using DNNs. We carried out recognition experiments using a corpus CENSREC-1-AV, and we discuss the results to find out the best DNN-based AVSR modeling. Then it turns out that a tandem-based method using audio Deep Bottle-Neck Features (DBNFs) and visual ones with multi-stream HMMs is the most suitable, followed by a hybrid approach and another tandem scheme using audio-visual DBNFs.

  • Topics Arising from the WRC-15 with Respect to Satellite-Related Agenda Items Open Access

    Nobuyuki KAWAI  Satoshi IMATA  

     
    INVITED PAPER

      Vol:
    E99-B No:10
      Page(s):
    2113-2120

    Along with remarkable advancement of radiocommunication services including satellite services, the radio-frequency spectrum and geostationary-satellite orbit are getting congested. WRC-15 was held in November 2015 to study and implement efficient use of those natural resources. There were a number of satellite-related agenda items associated with frequency allocation, new usages of satellite communications and satellite regulatory issues. This paper overviews the outcome from these agenda items of WRC-15 as well as the agenda items for the next WRC (i.e. the WRC-19).

  • Measurement of Wireless LAN Characteristics in Sewer Pipes for Sewer Inspection Systems Using Drifting Wireless Sensor Nodes

    Taiki NAGASHIMA  Yudai TANAKA  Susumu ISHIHARA  

     
    PAPER

      Vol:
    E99-B No:9
      Page(s):
    1989-1997

    Deterioration of sewer pipes is one of very important problems in Japan. Sewer inspections have been carried out mainly by visual check or wired remote robots with a camera. However, such inspection schemes involve high labor and/or monetary cost. Sewer inspection with boat-type video cameras or unwired robots takes a long time to check the result of the inspection because video data are obtained after the equipment is retrieved from the pipe. To realize low cost, safe and quick inspection of sewer pipes, we have proposed a sewer inspection system using drifting wireless sensor nodes. Water, soil, and the narrow space in the pipe make the long-range and high throughput wireless radio communication difficult. Therefore, we have to identify suitable radio frequency and antenna configuration based on wireless communication characteristics in sewer pipes. If the frequency is higher, the Fresnel zone, the needed space for the line of sight is small, but the path loss in free space is large. On the other hand, if the frequency is lower, the size of the Fresnel zone is large, but the path loss in free space is small. We conducted wireless communication experiments using 920MHz, 2.4GHz, and 5GHz band off-the-shelf devices in an experimental underground pipe. The measurement results show that the wireless communication range of 5GHz (IEEE 802.11a) is over 8m in a 200mm-diameter pipe and is longer than 920MHz (ARIB STD-T108), 2.4GHz (IEEE 802.11g, IEEE 802.15.4) band at their maximum transmission power. In addition, we confirmed that devices that use IEEE 802.11a and 54Mbps bit rate can transmit about 43MB data while they are in the communication range of an AP and drift at 1m/s in a 200mm-diameter pipe, and it is bigger than one of devices that use other bit rate.

  • Vehicle Detection Using Local Size-Specific Classifiers

    SeungJong NOH  Moongu JEON  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2016/06/17
      Vol:
    E99-D No:9
      Page(s):
    2351-2359

    As the number of surveillance cameras keeps increasing, the demand for automated traffic-monitoring systems is growing. In this paper, we propose a practical vehicle detection method for such systems. In the last decade, vehicle detection mainly has been performed by employing an image scan strategy based on sliding windows whereby a pre-trained appearance model is applied to all image areas. In this approach, because the appearance models are built from vehicle sample images, the normalization of the scales and aspect ratios of samples can significantly influence the performance of vehicle detection. Thus, to successfully apply sliding window schemes to detection, it is crucial to select the normalization sizes very carefully in a wise manner. To address this, we present a novel vehicle detection technique. In contrast to conventional methods that determine the normalization sizes without considering given scene conditions, our technique first learns local region-specific size models based on scene-contextual clues, and then utilizes the obtained size models to normalize samples to construct more elaborate appearance models, namely local size-specific classifiers (LSCs). LSCs can provide advantages in terms of both accuracy and operational speed because they ignore unnecessary information on vehicles that are observable in faraway areas from each sliding window position. We conduct experiments on real highway traffic videos, and demonstrate that the proposed method achieves a 16% increased detection accuracy with at least 3 times faster operational speed compared with the state-of-the-art technique.

  • Restriction on Motion of Break Arcs Magnetically Blown-Out by Surrounding Walls in a 450VDC/10A Resistive Circuit

    Keisuke KATO  Junya SEKIKAWA  

     
    PAPER

      Vol:
    E99-C No:9
      Page(s):
    1009-1015

    Silver electrical contacts are separated at constant speed and break arcs are generated between them in a 200V-450VDC and 10A resistive circuit. The motion of the break arcs is restricted by some surrounding alumina plates. Transverse magnetic field of a permanent magnet is applied to the break arcs. Changing the supply voltage and the height of a wall located at the upper side of the break arcs, the arc lengthening time and motion of the break arcs are investigated. As a result, the higher supply voltage causes an increase of the arc lengthening time. The arc lengthening time increases significantly when the break arcs expand into the whole of the surrounding walls.

  • Analysis over Spectral Efficiency and Power Scaling in Massive MIMO Dual-Hop Systems with Multi-Pair Users

    Yi WANG  Baofeng JI  Yongming HUANG  Chunguo LI  Ying HU  Yewang QIAN  Luxi YANG  

     
    PAPER-Information Theory

      Vol:
    E99-A No:9
      Page(s):
    1665-1673

    This paper considers a massive multiple-input-multiple-output (MIMO) relaying system with multi-pair single-antenna users. The relay node adopts maximum-ratio combining/maximum-ratio transmission (MRC/MRT) stratagem for reception/transmission. We analyze the spectral efficiency (SE) and power scaling laws with respect to the number of relay antennas and other system parameters. First, by using the law of large numbers, we derive the closed-form expression of the SE, based on which, it is shown that the SE per user increases with the number of relay antennas but decreases with the number of user pairs, both logarithmically. It is further discovered that the transmit power at the source users and the relay can be continuously reduced as the number of relay antennas becomes large while the SE can maintains a constant value, which also means that the energy efficiency gain can be obtained simultaneously. Moreover, it is proved that the number of served user pairs can grow proportionally over the number of relay antennas with arbitrary SE requirement and no extra power cost. All the analytical results are verified through the numerical simulations.

  • Knowledge-Based Reestablishment of Primary Exclusive Region in Database-Driven Spectrum Sharing

    Shota YAMASHITA  Koji YAMAMOTO  Takayuki NISHIO  Masahiro MORIKURA  

     
    PAPER

      Vol:
    E99-B No:9
      Page(s):
    2019-2027

    Technological developments in wireless communication have led to an increasing demand for radio frequencies. This has necessitated the practice of spectrum sharing to ensure optimal usage of the limited frequencies, provided this does not cause interference. This paper presents a framework for managing an unexpected situation in which a primary user experiences harmful interference with regard to database-driven secondary use of spectrum allocated to the primary user towards 5G mobile networks, where the primary user is assumed to be a radar system. In our proposed framework, the primary user informs a database that they are experiencing harmful interference. Receiving the information, the database updates a primary exclusive region in which secondary users are unable to operate in the licensed spectrum. Subsequent to the update, this primary exclusive region depends on the knowledge about the secondary users when the primary user experiences harmful interference, knowledge of which is stored in the database. We assume a circular primary exclusive region centered at a primary receiver and derive an optimal radius of the primary exclusive region by applying stochastic geometry. Then, for each type of knowledge stored in the database for the secondary user, we evaluate the optimal radius for a target probability that the primary user experiences harmful interference. The results show that the more detailed the knowledge of the secondary user's density and transmission power stored in the database, the smaller the radius that has to be determined for the primary exclusive region after the update and the more efficient the spatial reuse of the licensed spectrum that can be achieved.

  • Effect of Contact Lubricant on Contact Resistance Characteristics — Contact Resistance of Lubricated Surface and Observation of Lubricant Molecules —

    Terutaka TAMAI  Masahiro YAMAKAWA  Yuta NAKAMURA  

     
    PAPER

      Vol:
    E99-C No:9
      Page(s):
    985-991

    The electrical lubricants have been accepted to reduce friction of contacts and to prevent degradation of contact resistance. However, as the lubricant has an electrical insulation property it seems that application to contact surface is unsuitable for contact resistance. These mechanisms in contact interfaces have not fully understood. In this paper, relationships between contact resistance and contact load were examined with both clean and lubricated surfaces. Orientation of the lubricant molecules was observed by high magnification images of STM and AFM. There was no difference in contact resistance characteristics for both clean and lubricated surfaces in spite of lubricants thickness. The molecules were orientated perpendicular to the surface. This fact turns over an established theory of adsorption of non-polar lubricant to surface.

  • Occurrence of Reignitions of Break Arcs When Moving Range of Arc Spots are Restricted within the Contact Surfaces

    Junya SEKIKAWA  

     
    PAPER

      Vol:
    E99-C No:9
      Page(s):
    992-998

    Silver contacts are separated at constant speed and break arcs are generated in a 300V-450V DC and 10A resistive circuit. The transverse magnetic field of a permanent magnet is applied to the break arcs. Motion of the break arcs, arc duration and the number of reignitions are investigated when side surfaces of the contacts are covered with insulator pipes. Following results are shown. The motion of the break arcs and the arc duration when the anode is covered with the pipe are the same as those without pipes. When the cathode is covered with the pipe, the motion of break arcs change from that without the pipes and reignitions occur more frequently. The arc duration becomes longer than that without the pipes because of the occurrence of reignitions. The number of reignition increases with increasing the supply voltage in 300V-400V. The period of occurrence of the reignition with pipes is shorter than that when the cathode is covered with the pipe.

  • Observation of Break Arc Rotated by Radial Magnetic Field in a 48VDC Resistive Circuit Using Two High-Speed Cameras

    Jun MATSUOKA  Junya SEKIKAWA  

     
    BRIEF PAPER

      Vol:
    E99-C No:9
      Page(s):
    1027-1030

    Break arcs are rotated with the radial magnetic field formed by a magnet embedded in a fixed cathode contact. The break arcs are generated in a 48VDC resistive circuit. The circuit current when the contacts are closed is 10A. The depth of the magnet varies from 1mm to 4mm to change the strength of the radial magnetic field for rotating break arcs. Images of break arcs are taken by two high-speed cameras from two directions and the rotational motion of the break arcs is observed. The rotational period of rotational motion of the break arcs is investigated. The following results are obtained. The break arcs rotate clockwise on the cathode surface seen from anode side. This rotation direction conforms to the direction of the Lorentz force that affects to the break arcs with the radial magnetic field. The rotational period gradually decreases during break operation. When the depth of magnet is larger, the rotational period becomes longer.

  • Complex Networks Clustering for Lower Power Scan Segmentation in At-Speed Testing

    Zhou JIANG  Guiming LUO  Kele SHEN  

     
    PAPER-Electronic Circuits

      Vol:
    E99-C No:9
      Page(s):
    1071-1079

    The scan segmentation method is an efficient solution to deal with the test power problem; However, the use of multiple capture cycles may cause capture violations, thereby leading to fault coverage loss. This issue is much more severe in at-speed testing. In this paper, two scan partition schemes based on complex networks clustering ara proposed to minimize the capture violations without increasing test-data volume and extra area overhead. In the partition process, we use a more accurate notion, spoiled nodes, instead of violation edges to analyse the dependency of flip-flops (ffs), and we use the shortest-path betweenness (SPB) method and the Laplacian-based graph partition method to find the best combination of these flip-flops. Beyond that, the proposed methods can use any given power-unaware set of patterns to test circuits, reducing both shift and capture power in at-speed testing. Extensive experiments have been performed on reference circuit ISCAS89 and IWLS2005 to verify the effectiveness of the proposed methods.

  • Multiple Multicast Transmission Exploiting Channel Simplification

    Changyong SHIN  Yong-Jai PARK  

     
    LETTER-Communication Theory and Signals

      Vol:
    E99-A No:9
      Page(s):
    1745-1749

    In this letter, we present a spectrally efficient multicast method which enables a transmitter to simultaneously transmit multiple multicast streams without any interference among multicast groups. By using unique combiners at receivers with multiple antennas within each multicast group, the proposed method simplifies multiple channels between the transmitter and the receivers to an equivalent channel. In addition, we establish the sufficient condition for the system configuration which should be satisfied for the channel simplification and provide a combiner design technique for the receivers. To remove interference among multicast groups, the precoder for the transmitter is designed by utilizing the equivalent channels. By exploiting time resources efficiently, the channel simplification (CS) based method achieves a higher sum rate than the time division multiplexing (TDM) based method, which the existing multicast techniques fundamentally employ, at high signal-to-noise ratio (SNR) regime. Furthermore, we present a multicast method combining the CS based method with the TDM based method to utilize the benefits of both methods. Simulation results successfully demonstrate that the combined multicast method obtains a better sum rate performance at overall SNR regime.

381-400hit(2504hit)