The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)

441-460hit(2504hit)

  • Speech Recognition of English by Japanese Using Lexicon Represented by Multiple Reduced Phoneme Sets

    Xiaoyun WANG  Seiichi YAMAMOTO  

     
    PAPER-Speech and Hearing

      Pubricized:
    2015/09/10
      Vol:
    E98-D No:12
      Page(s):
    2271-2279

    Recognition of second language (L2) speech is still a challenging task even for state-of-the-art automatic speech recognition (ASR) systems, partly because pronunciation by L2 speakers is usually significantly influenced by the mother tongue of the speakers. The authors previously proposed using a reduced phoneme set (RPS) instead of the canonical one of L2 when the mother tongue of speakers is known, and demonstrated that this reduced phoneme set improved the recognition performance through experiments using English utterances spoken by Japanese. However, the proficiency of L2 speakers varies widely, as does the influence of the mother tongue on their pronunciation. As a result, the effect of the reduced phoneme set is different depending on the speakers' proficiency in L2. In this paper, the authors examine the relation between proficiency of speakers and a reduced phoneme set customized for them. The experimental results are then used as the basis of a novel speech recognition method using a lexicon in which the pronunciation of each lexical item is represented by multiple reduced phoneme sets, and the implementation of a language model most suitable for that lexicon is described. Experimental results demonstrate the high validity of the proposed method.

  • Supervised Denoising Pre-Training for Robust ASR with DNN-HMM

    Shin Jae KANG  Kang Hyun LEE  Nam Soo KIM  

     
    LETTER-Speech and Hearing

      Pubricized:
    2015/09/07
      Vol:
    E98-D No:12
      Page(s):
    2345-2348

    In this letter, we propose a novel supervised pre-training technique for deep neural network (DNN)-hidden Markov model systems to achieve robust speech recognition in adverse environments. In the proposed approach, our aim is to initialize the DNN parameters such that they yield abstract features robust to acoustic environment variations. In order to achieve this, we first derive the abstract features from an early fine-tuned DNN model which is trained based on a clean speech database. By using the derived abstract features as the target values, the standard error back-propagation algorithm with the stochastic gradient descent method is performed to estimate the initial parameters of the DNN. The performance of the proposed algorithm was evaluated on Aurora-4 DB, and better results were observed compared to a number of conventional pre-training methods.

  • A Fundamental Inequality for Lower-Bounding the Error Probability for Classical and Classical-Quantum Multiple Access Channels and Its Applications

    Takuya KUBO  Hiroshi NAGAOKA  

     
    PAPER-Shannon Theory

      Vol:
    E98-A No:12
      Page(s):
    2376-2383

    In the study of the capacity problem for multiple access channels (MACs), a lower bound on the error probability obtained by Han plays a crucial role in the converse parts of several kinds of channel coding theorems in the information-spectrum framework. Recently, Yagi and Oohama showed a tighter bound than the Han bound by means of Polyanskiy's converse. In this paper, we give a new bound which generalizes and strengthens the Yagi-Oohama bound, and demonstrate that the bound plays a fundamental role in deriving extensions of several known bounds. In particular, the Yagi-Oohama bound is generalized to two different directions; i.e, to general input distributions and to general encoders. In addition we extend these bounds to the quantum MACs and apply them to the converse problems for several information-spectrum settings.

  • Using Correlated Regression Models to Calculate Cumulative Attributes for Age Estimation

    Lili PAN  Qiangsen HE  Yali ZHENG  Mei XIE  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2015/08/28
      Vol:
    E98-D No:12
      Page(s):
    2349-2352

    Facial age estimation requires accurately capturing the mapping relationship between facial features and corresponding ages, so as to precisely estimate ages for new input facial images. Previous works usually use one-layer regression model to learn this complex mapping relationship, resulting in low estimation accuracy. In this letter, we propose a new gender-specific regression model with a two-layer structure for more accurate age estimation. Different from recent two-layer models that use a global regressor to calculate cumulative attributes (CA) and use CA to estimate age, we use gender-specific ones to calculate CA with more flexibility and precision. Extensive experimental results on FG-NET and Morph 2 datasets demonstrate the superiority of our method over other state-of-the-art age estimation methods.

  • Performance Enhancement of Cross-Talk Canceller for Four-Speaker System by Selective Speaker Operation

    Su-Jin CHOI  Jeong-Yong BOO  Ki-Jun KIM  Hochong PARK  

     
    LETTER-Speech and Hearing

      Pubricized:
    2015/08/25
      Vol:
    E98-D No:12
      Page(s):
    2341-2344

    We propose a method of enhancing the performance of a cross-talk canceller for a four-speaker system with respect to sweet spot size and ringing effect. For the large sweet spot of a cross-talk canceller, the speaker layout needs to be symmetrical to the listener's position. In addition, a ringing effect of the cross-talk canceller is reduced when many speakers are located close to each other. Based on these properties, the proposed method first selects the two speakers in a four-speaker system that are most symmetrical to the target listener's position and then adds the remaining speakers between these two to the final selection. By operating only these selected speakers, the proposed method enlarges the sweet spot size and reduces the ringing effect. We conducted objective and subjective evaluations and verified that the proposed method improves the performance of the cross-talk canceller compared to the conventional method.

  • F0 Parameterization of Glottalized Tones in HMM-Based Speech Synthesis for Hanoi Vietnamese

    Duy Khanh NINH  Yoichi YAMASHITA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2015/09/07
      Vol:
    E98-D No:12
      Page(s):
    2280-2289

    A conventional HMM-based speech synthesis system for Hanoi Vietnamese often suffers from hoarse quality due to incomplete F0 parameterization of glottalized tones. Since estimating F0 from glottalized waveform is rather problematic for usual F0 extractors, we propose a pitch marking algorithm where pitch marks are propagated from regular regions of a speech signal to glottalized ones, from which complete F0 contours for the glottalized tones are derived. The proposed F0 parameterization scheme was confirmed to significantly reduce the hoarseness whilst slightly improving the tone naturalness of synthetic speech by both objective and listening tests. The pitch marking algorithm works as a refinement step based on the results of an F0 extractor. Therefore, the proposed scheme can be combined with any F0 extractor.

  • Speech Enhancement Combining NMF Weighted by Speech Presence Probability and Statistical Model

    Yonggang HU  Xiongwei ZHANG  Xia ZOU  Gang MIN  Meng SUN  Yunfei ZHENG  

     
    LETTER-Speech and Hearing

      Vol:
    E98-A No:12
      Page(s):
    2701-2704

    The conventional non-negative matrix factorization (NMF)-based speech enhancement is accomplished by updating iteratively with the prior knowledge of the clean speech and noise spectra bases. With the probabilistic estimation of whether the speech is present or not in a certain frame, this letter proposes a speech enhancement algorithm incorporating the speech presence probability (SPP) obtained via noise estimation to the NMF process. To take advantage of both the NMF-based and statistical model-based approaches, the final enhanced speech is achieved by applying a statistical model-based filter to the output of the SPP weighted NMF. Objective evaluations using perceptual evaluation of speech quality (PESQ) on TIMIT with 20 noise types at various signal-to-noise ratio (SNR) levels demonstrate the superiority of the proposed algorithm over the conventional NMF and statistical model-based baselines.

  • Circularity of the Fractional Fourier Transform and Spectrum Kurtosis for LFM Signal Detection in Gaussian Noise Model

    Guang Kuo LU  Man Lin XIAO  Ping WEI  Hong Shu LIAO  

     
    LETTER-Digital Signal Processing

      Vol:
    E98-A No:12
      Page(s):
    2709-2712

    This letter investigates the circularity of fractional Fourier transform (FRFT) coefficients containing noise only, and proves that all coefficients coming from white Gaussian noise are circular via the discrete FRFT. In order to use the spectrum kurtosis (SK) as a Gaussian test to check if linear frequency modulation (LFM) signals are present in a set of FRFT points, the effect of the noncircularity of Gaussian variables upon the SK of FRFT coefficients is studied. The SK of the α th-order FRFT coefficients for LFM signals embedded in a white Gaussian noise is also derived in this letter. Finally the signal detection algorithm based on FRFT and SK is proposed. The effectiveness and robustness of this algorithm are evaluated via simulations under lower SNR and weaker components.

  • Beyond 110 GHz InP-HEMT Based Mixer Module Using Flip-Chip Assembly for Precise Spectrum Analysis

    Shoichi SHIBA  Masaru SATO  Hiroshi MATSUMURA  Yoichi KAWANO  Tsuyoshi TAKAHASHI  Toshihide SUZUKI  Yasuhiro NAKASHA  Taisuke IWAI  Naoki HARA  

     
    PAPER

      Vol:
    E98-C No:12
      Page(s):
    1112-1119

    A wide-bandwidth fundamental mixer operating at a frequency above 110GHz for precise spectrum analysis was developed using the InP HEMT technology. A single-ended resistive mixer was adopted for the mixer circuit. An IF amplifier and LO buffer amplifier were also developed and integrated into the mixer chip. As for packaging into a metal block module, a flip-chip bonding technique was introduced. Compared to face-up mounting with wire connections, flip-chip bonding exhibited good frequency flatness in signal loss. The mixer module with a built-in IF amplifier achieved a conversion gain of 5dB at an RF frequency of 135GHz and a 3-dB bandwidth of 35GHz. The mixer module with an LO buffer amplifier operated well even at an LO power of -20dBm.

  • On Makespan, Migrations, and QoS Workloads' Execution Times in High Speed Data Centers Open Access

    Daniel LAGO  Edmundo MADEIRA  Deep MEDHI  

     
    INVITED PAPER

      Vol:
    E98-B No:11
      Page(s):
    2099-2110

    With the growth of cloud-based services, cloud data centers are experiencing large growth. A key component in a cloud data center is the network technology deployed. In particular, Ethernet technology, commonly deployed in cloud data centers, is already envisioned for 10 Tbps Ethernet. In this paper, we study and analyze the makespan, workload execution times, and virtual machine migrations as the network speed increases. In particular, we consider homogeneous and heterogeneous data centers, virtual machine scheduling algorithms, and workload scheduling algorithms. Results obtained from our study indicate that the increase in a network's speed reduces makespan and workloads execution times, while aiding in the increase of the number of virtual machine migrations. We further observed that the number of migrations' behaviors in relation to the speed of the networks also depends on the employed virtual machines scheduling algorithm.

  • Error Correction Using Long Context Match for Smartphone Speech Recognition

    Yuan LIANG  Koji IWANO  Koichi SHINODA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2015/07/31
      Vol:
    E98-D No:11
      Page(s):
    1932-1942

    Most error correction interfaces for speech recognition applications on smartphones require the user to first mark an error region and choose the correct word from a candidate list. We propose a simple multimodal interface to make the process more efficient. We develop Long Context Match (LCM) to get candidates that complement the conventional word confusion network (WCN). Assuming that not only the preceding words but also the succeeding words of the error region are validated by users, we use such contexts to search higher-order n-grams corpora for matching word sequences. For this purpose, we also utilize the Web text data. Furthermore, we propose a combination of LCM and WCN (“LCM + WCN”) to provide users with candidate lists that are more relevant than those yielded by WCN alone. We compare our interface with the WCN-based interface on the Corpus of Spontaneous Japanese (CSJ). Our proposed “LCM + WCN” method improved the 1-best accuracy by 23%, improved the Mean Reciprocal Rank (MRR) by 28%, and our interface reduced the user's load by 12%.

  • Robust ASR Based on ETSI Advanced Front-End Using Complex Speech Analysis

    Keita HIGA  Keiichi FUNAKI  

     
    PAPER

      Vol:
    E98-A No:11
      Page(s):
    2211-2219

    The advanced front-end (AFE) for automatic speech recognition (ASR) was standardized by the European Telecommunications Standards Institute (ETSI). The AFE provides speech enhancement realized by an iterative Wiener filter (IWF) in which a smoothed FFT spectrum over adjacent frames is used to design the filter. We have previously proposed robust time-varying complex Auto-Regressive (TV-CAR) speech analysis for an analytic signal and evaluated the performance of speech processing such as F0 estimation and speech enhancement. TV-CAR analysis can estimate more accurate spectrum than FFT, especially in low frequencies because of the nature of the analytic signal. In addition, TV-CAR can estimate more accurate speech spectrum against additive noise. In this paper, a time-invariant version of wide-band TV-CAR analysis is introduced to the IWF in the AFE and is evaluated using the CENSREC-2 database and its baseline script.

  • Acoustic Event Detection in Speech Overlapping Scenarios Based on High-Resolution Spectral Input and Deep Learning

    Miquel ESPI  Masakiyo FUJIMOTO  Tomohiro NAKATANI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2015/06/23
      Vol:
    E98-D No:10
      Page(s):
    1799-1807

    We present a method for recognition of acoustic events in conversation scenarios where speech usually overlaps with other acoustic events. While speech is usually considered the most informative acoustic event in a conversation scene, it does not always contain all the information. Non-speech events, such as a door knock, steps, or a keyboard typing can reveal aspects of the scene that speakers miss or avoid to mention. Moreover, being able to robustly detect these events could further support speech enhancement and recognition systems by providing useful information cues about the surrounding scenarios and noise. In acoustic event detection, state-of-the-art techniques are typically based on derived features (e.g. MFCC, or Mel-filter-banks) which have successfully parameterized the spectrogram of speech but reduce resolution and detail when we are targeting other kinds of events. In this paper, we propose a method that learns features in an unsupervised manner from high-resolution spectrogram patches (considering a patch as a certain number of consecutive frame features stacked together), and integrates within the deep neural network framework to detect and classify acoustic events. Superiority over both previous works in the field, and similar approaches based on derived features, has been assessed by statical measures and evaluation with CHIL2007 corpus, an annotated database of seminar recordings.

  • Measurement-Based Spectrum Database for Flexible Spectrum Management

    Koya SATO  Masayuki KITAMURA  Kei INAGE  Takeo FUJII  

     
    PAPER

      Vol:
    E98-B No:10
      Page(s):
    2004-2013

    In this paper, we propose the novel concept of a spectrum database for improving the efficiency of spectrum utilization. In the current design of TV white space spectrum databases, a propagation model is utilized to determine the spectrum availability. However, this propagation model has poor accuracy for radio environment estimation because it requires a large interference margin for the PU coverage area to ensure protection of primary users (PUs); thus, it decreases the spectrum sharing efficiency. The proposed spectrum database consists of radio environment measurement results from sensors on mobile terminals such as vehicles and smart phones. In the proposed database, actual measurements of radio signals are used to estimate radio information regarding PUs. Because the sensors on mobile terminals can gather a large amount of data, accurate propagation information can be obtained, including information regarding propagation loss and shadowing. In this paper, we first introduce the architecture of the proposed spectrum database. Then, we present experimental results for the database construction using actual TV broadcast signals. Additionally, from the evaluation results, we discuss the extent to which the proposed database can mitigate the excess interference margin.

  • Robust Voice Activity Detection Algorithm Based on Feature of Frequency Modulation of Harmonics and Its DSP Implementation

    Chung-Chien HSU  Kah-Meng CHEONG  Tai-Shih CHI  Yu TSAO  

     
    PAPER-Speech and Hearing

      Pubricized:
    2015/07/10
      Vol:
    E98-D No:10
      Page(s):
    1808-1817

    This paper proposes a voice activity detection (VAD) algorithm based on an energy related feature of the frequency modulation of harmonics. A multi-resolution spectro-temporal analysis framework, which was developed to extract texture features of the audio signal from its Fourier spectrogram, is used to extract frequency modulation features of the speech signal. The proposed algorithm labels the voice active segments of the speech signal by comparing the energy related feature of the frequency modulation of harmonics with a threshold. Then, the proposed VAD is implemented on one of Texas Instruments (TI) digital signal processor (DSP) platforms for real-time operation. Simulations conducted on the DSP platform demonstrate the proposed VAD performs significantly better than three standard VADs, ITU-T G.729B, ETSI AMR1 and AMR2, in non-stationary noise in terms of the receiver operating characteristic (ROC) curves and the recognition rates from a practical distributed speech recognition (DSR) system.

  • A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion

    Peng SONG  Wenming ZHENG  Xinran ZHANG  Yun JIN  Cheng ZHA  Minghai XIN  

     
    LETTER-Speech and Hearing

      Vol:
    E98-A No:10
      Page(s):
    2178-2181

    Most of the current voice conversion methods are conducted based on parallel speech, which is not easily obtained in practice. In this letter, a novel iterative speaker model alignment (ISMA) method is proposed to address this problem. First, the source and target speaker models are each trained from the background model by adopting maximum a posteriori (MAP) algorithm. Then, a novel ISMA method is presented for alignment and transformation of spectral features. Finally, the proposed ISMA approach is further combined with a Gaussian mixture model (GMM) to improve the conversion performance. A series of objective and subjective experiments are carried out on CMU ARCTIC dataset, and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approach.

  • User Equipment Centric Downlink Access in Unlicensed Spectrum for Heterogeneous Mobile Network Open Access

    Riichi KUDO  B. A. Hirantha Sithira ABEYSEKERA  Yusuke ASAI  Takeo ICHIKAWA  Yasushi TAKATORI  Masato MIZOGUCHI  

     
    PAPER

      Vol:
    E98-B No:10
      Page(s):
    1969-1977

    Combining heterogeneous wireless networks that cross licensed and unlicensed spectra is a promising way of supporting the surge in mobile traffic. The unlicensed band is mostly used by wireless LAN (WLAN) nodes which employ carrier sense multiple access/collision avoidance (CSMA/CA). Since the number of WLAN devices and their traffic are increasing, the wireless resource of the unlicensed band is expected be more depleted in 2020s. In such a wireless environment, the throughput could be extremely low and unstable due to the hidden terminal problem and exposed terminal problem despite of the large resources of the allocated frequency band and high peak PHY rate. In this paper, we propose user equipment (UE) centric access in the unlicensed band, with support by licensed band access in the mobile network. The proposed access enables robust downlink transmission from the access point (AP) to the UEs by mitigating the hidden terminal problem. The licensed spectrum access passes information on the user data waiting at the AP to the UEs and triggers UE reception opportunity (RXOP) acquisition. Furthermore, the adaptive use of UE centric downlink access is presented by using the channel utilization measured at the AP. Computer simulations confirm that licensed access assistance enhances the robustness of the unlicensed band access against the hidden terminal problem.

  • Separation of Mass Spectra Based on Probabilistic Latent Component Analysis for Explosives Detection

    Yohei KAWAGUCHI  Masahito TOGAMI  Hisashi NAGANO  Yuichiro HASHIMOTO  Masuyuki SUGIYAMA  Yasuaki TAKADA  

     
    PAPER

      Vol:
    E98-A No:9
      Page(s):
    1888-1897

    A new algorithm for separating mass spectra into individual substances for explosives detection is proposed. In the field of mass spectrometry, separation methods, such as principal-component analysis (PCA) and independent-component analysis (ICA), are widely used. All components, however, have no negative values, and the orthogonality condition imposed on components also does not necessarily hold in the case of mass spectra. Because these methods allow negative values and PCA imposes an orthogonality condition, they are not suitable for separation of mass spectra. The proposed algorithm is based on probabilistic latent-component analysis (PLCA). PLCA is a statistical formulation of non-negative matrix factorization (NMF) using KL divergence. Because PLCA imposes the constraint of non-negativity but not orthogonality, the algorithm is effective for separating components of mass spectra. In addition, to estimate the components more accurately, a sparsity constraint is applied to PLCA for explosives detection. The main contribution is industrial application of the algorithm into an explosives-detection system. Results of an experimental evaluation of the algorithm with data obtained in a real railway station demonstrate that the proposed algorithm outperforms PCA and ICA. Also, results of calculation time demonstrate that the algorithm can work in real time.

  • Mass Spectra Separation for Explosives Detection by Using an Attenuation Model

    Yohei KAWAGUCHI  Masahito TOGAMI  Hisashi NAGANO  Yuichiro HASHIMOTO  Masuyuki SUGIYAMA  Yasuaki TAKADA  

     
    PAPER

      Vol:
    E98-A No:9
      Page(s):
    1898-1905

    A new algorithm for separating mass spectra into individual substances is proposed for explosives detection. The conventional algorithm based on probabilistic latent component analysis (PLCA) is effective in many cases because it makes use of the fact that non-negativity and sparsity hold for mass spectra in explosives detection. The algorithm, however, fails to separate mass spectra in some cases because uncertainty can not be resolved only by non-negativity and sparsity constraints. To resolve the uncertainty, an algorithm based on shift-invariant PLCA (SIPLCA) utilizing temporal correlation of mass spectra is proposed in this paper. In addition, to prevent overfitting, the temporal correlation is modeled with a function representing attenuation by focusing on the fact that the amount of a substance is attenuated continuously and slowly with time. Results of an experimental evaluation of the algorithm with data obtained in a real railway station demonstrate that the proposed algorithm outperforms the PLCA-based conventional algorithm and the simple SIPLCA-based one. The main novelty of this paper is that an evaluation of the detection performance of explosives detection is demonstrated. Results of the evaluation indicate that the proposed separation algorithm can improve the detection performance.

  • Motion of Break Arcs Occurring between Silver Electrical Contacts with Copper Arc Runners

    Haruki MIYAGAWA  Junya SEKIKAWA  

     
    BRIEF PAPER

      Vol:
    E98-C No:9
      Page(s):
    919-922

    Copper arc runners are fixed on silver electrical contacts. Break arcs are generated between the contacts in a DC resistive circuit. Circuit current when contacts are closed is 10A. Supply voltage is changed from 200V to 450V. The following results are shown. Cathode spots stay on the cathode surface but anode spots run on the runner when the supply voltage is 250V and over. In cases of the supply voltage is greater than 250V, the break arcs run on the runner when the arcs are successfully extinguished, and stays on the runner in cases of the failure of arc extinction. The arc lengths just before arc extinction with or without the runners are also investigated. The arc lengths are the same with or without the runners for each supply voltage.

441-460hit(2504hit)