IEICE global.ieice.org Site

Keyword Search Result

[Keyword] SPE(2504hit)

441-460hit(2504hit)

Speech Recognition of English by Japanese Using Lexicon Represented by Multiple Reduced Phoneme Sets
Xiaoyun WANG Seiichi YAMAMOTO

PAPER-Speech and Hearing

Pubricized:
2015/09/10
Vol:
E98-D No:12
Page(s):
2271-2279
Recognition of second language (L2) speech is still a challenging task even for state-of-the-art automatic speech recognition (ASR) systems, partly because pronunciation by L2 speakers is usually significantly influenced by the mother tongue of the speakers. The authors previously proposed using a reduced phoneme set (RPS) instead of the canonical one of L2 when the mother tongue of speakers is known, and demonstrated that this reduced phoneme set improved the recognition performance through experiments using English utterances spoken by Japanese. However, the proficiency of L2 speakers varies widely, as does the influence of the mother tongue on their pronunciation. As a result, the effect of the reduced phoneme set is different depending on the speakers' proficiency in L2. In this paper, the authors examine the relation between proficiency of speakers and a reduced phoneme set customized for them. The experimental results are then used as the basis of a novel speech recognition method using a lexicon in which the pronunciation of each lexical item is represented by multiple reduced phoneme sets, and the implementation of a language model most suitable for that lexicon is described. Experimental results demonstrate the high validity of the proposed method.
Supervised Denoising Pre-Training for Robust ASR with DNN-HMM
Shin Jae KANG Kang Hyun LEE Nam Soo KIM

LETTER-Speech and Hearing

Pubricized:
2015/09/07
Vol:
E98-D No:12
Page(s):
2345-2348
In this letter, we propose a novel supervised pre-training technique for deep neural network (DNN)-hidden Markov model systems to achieve robust speech recognition in adverse environments. In the proposed approach, our aim is to initialize the DNN parameters such that they yield abstract features robust to acoustic environment variations. In order to achieve this, we first derive the abstract features from an early fine-tuned DNN model which is trained based on a clean speech database. By using the derived abstract features as the target values, the standard error back-propagation algorithm with the stochastic gradient descent method is performed to estimate the initial parameters of the DNN. The performance of the proposed algorithm was evaluated on Aurora-4 DB, and better results were observed compared to a number of conventional pre-training methods.
A Fundamental Inequality for Lower-Bounding the Error Probability for Classical and Classical-Quantum Multiple Access Channels and Its Applications
Takuya KUBO Hiroshi NAGAOKA

PAPER-Shannon Theory

Vol:
E98-A No:12
Page(s):
2376-2383
In the study of the capacity problem for multiple access channels (MACs), a lower bound on the error probability obtained by Han plays a crucial role in the converse parts of several kinds of channel coding theorems in the information-spectrum framework. Recently, Yagi and Oohama showed a tighter bound than the Han bound by means of Polyanskiy's converse. In this paper, we give a new bound which generalizes and strengthens the Yagi-Oohama bound, and demonstrate that the bound plays a fundamental role in deriving extensions of several known bounds. In particular, the Yagi-Oohama bound is generalized to two different directions; i.e, to general input distributions and to general encoders. In addition we extend these bounds to the quantum MACs and apply them to the converse problems for several information-spectrum settings.
Using Correlated Regression Models to Calculate Cumulative Attributes for Age Estimation
Lili PAN Qiangsen HE Yali ZHENG Mei XIE

LETTER-Image Recognition, Computer Vision

Pubricized:
2015/08/28
Vol:
E98-D No:12
Page(s):
2349-2352
Facial age estimation requires accurately capturing the mapping relationship between facial features and corresponding ages, so as to precisely estimate ages for new input facial images. Previous works usually use one-layer regression model to learn this complex mapping relationship, resulting in low estimation accuracy. In this letter, we propose a new gender-specific regression model with a two-layer structure for more accurate age estimation. Different from recent two-layer models that use a global regressor to calculate cumulative attributes (CA) and use CA to estimate age, we use gender-specific ones to calculate CA with more flexibility and precision. Extensive experimental results on FG-NET and Morph 2 datasets demonstrate the superiority of our method over other state-of-the-art age estimation methods.
Performance Enhancement of Cross-Talk Canceller for Four-Speaker System by Selective Speaker Operation
Su-Jin CHOI Jeong-Yong BOO Ki-Jun KIM Hochong PARK

LETTER-Speech and Hearing

Pubricized:
2015/08/25
Vol:
E98-D No:12
Page(s):
2341-2344
We propose a method of enhancing the performance of a cross-talk canceller for a four-speaker system with respect to sweet spot size and ringing effect. For the large sweet spot of a cross-talk canceller, the speaker layout needs to be symmetrical to the listener's position. In addition, a ringing effect of the cross-talk canceller is reduced when many speakers are located close to each other. Based on these properties, the proposed method first selects the two speakers in a four-speaker system that are most symmetrical to the target listener's position and then adds the remaining speakers between these two to the final selection. By operating only these selected speakers, the proposed method enlarges the sweet spot size and reduces the ringing effect. We conducted objective and subjective evaluations and verified that the proposed method improves the performance of the cross-talk canceller compared to the conventional method.
F0 Parameterization of Glottalized Tones in HMM-Based Speech Synthesis for Hanoi Vietnamese
Duy Khanh NINH Yoichi YAMASHITA

PAPER-Speech and Hearing

Pubricized:
2015/09/07
Vol:
E98-D No:12
Page(s):
2280-2289
A conventional HMM-based speech synthesis system for Hanoi Vietnamese often suffers from hoarse quality due to incomplete F0 parameterization of glottalized tones. Since estimating F0 from glottalized waveform is rather problematic for usual F0 extractors, we propose a pitch marking algorithm where pitch marks are propagated from regular regions of a speech signal to glottalized ones, from which complete F0 contours for the glottalized tones are derived. The proposed F0 parameterization scheme was confirmed to significantly reduce the hoarseness whilst slightly improving the tone naturalness of synthetic speech by both objective and listening tests. The pitch marking algorithm works as a refinement step based on the results of an F0 extractor. Therefore, the proposed scheme can be combined with any F0 extractor.
Speech Enhancement Combining NMF Weighted by Speech Presence Probability and Statistical Model
Yonggang HU Xiongwei ZHANG Xia ZOU Gang MIN Meng SUN Yunfei ZHENG

LETTER-Speech and Hearing

Vol:
E98-A No:12
Page(s):
2701-2704
- HTML
- PDF(436.2KB) >> Buy this Article
- Errata[Uploaded on November 1,2016]
The conventional non-negative matrix factorization (NMF)-based speech enhancement is accomplished by updating iteratively with the prior knowledge of the clean speech and noise spectra bases. With the probabilistic estimation of whether the speech is present or not in a certain frame, this letter proposes a speech enhancement algorithm incorporating the speech presence probability (SPP) obtained via noise estimation to the NMF process. To take advantage of both the NMF-based and statistical model-based approaches, the final enhanced speech is achieved by applying a statistical model-based filter to the output of the SPP weighted NMF. Objective evaluations using perceptual evaluation of speech quality (PESQ) on TIMIT with 20 noise types at various signal-to-noise ratio (SNR) levels demonstrate the superiority of the proposed algorithm over the conventional NMF and statistical model-based baselines.
Circularity of the Fractional Fourier Transform and Spectrum Kurtosis for LFM Signal Detection in Gaussian Noise Model
Guang Kuo LU Man Lin XIAO Ping WEI Hong Shu LIAO

LETTER-Digital Signal Processing

Vol:
E98-A No:12
Page(s):
2709-2712
This letter investigates the circularity of fractional Fourier transform (FRFT) coefficients containing noise only, and proves that all coefficients coming from white Gaussian noise are circular via the discrete FRFT. In order to use the spectrum kurtosis (SK) as a Gaussian test to check if linear frequency modulation (LFM) signals are present in a set of FRFT points, the effect of the noncircularity of Gaussian variables upon the SK of FRFT coefficients is studied. The SK of the α th-order FRFT coefficients for LFM signals embedded in a white Gaussian noise is also derived in this letter. Finally the signal detection algorithm based on FRFT and SK is proposed. The effectiveness and robustness of this algorithm are evaluated via simulations under lower SNR and weaker components.
Beyond 110 GHz InP-HEMT Based Mixer Module Using Flip-Chip Assembly for Precise Spectrum Analysis
Shoichi SHIBA Masaru SATO Hiroshi MATSUMURA Yoichi KAWANO Tsuyoshi TAKAHASHI Toshihide SUZUKI Yasuhiro NAKASHA Taisuke IWAI Naoki HARA

PAPER

Vol:
E98-C No:12
Page(s):
1112-1119
A wide-bandwidth fundamental mixer operating at a frequency above 110GHz for precise spectrum analysis was developed using the InP HEMT technology. A single-ended resistive mixer was adopted for the mixer circuit. An IF amplifier and LO buffer amplifier were also developed and integrated into the mixer chip. As for packaging into a metal block module, a flip-chip bonding technique was introduced. Compared to face-up mounting with wire connections, flip-chip bonding exhibited good frequency flatness in signal loss. The mixer module with a built-in IF amplifier achieved a conversion gain of 5dB at an RF frequency of 135GHz and a 3-dB bandwidth of 35GHz. The mixer module with an LO buffer amplifier operated well even at an LO power of -20dBm.
On Makespan, Migrations, and QoS Workloads' Execution Times in High Speed Data Centers Open Access
Daniel LAGO Edmundo MADEIRA Deep MEDHI

INVITED PAPER

Vol:
E98-B No:11
Page(s):
2099-2110
With the growth of cloud-based services, cloud data centers are experiencing large growth. A key component in a cloud data center is the network technology deployed. In particular, Ethernet technology, commonly deployed in cloud data centers, is already envisioned for 10 Tbps Ethernet. In this paper, we study and analyze the makespan, workload execution times, and virtual machine migrations as the network speed increases. In particular, we consider homogeneous and heterogeneous data centers, virtual machine scheduling algorithms, and workload scheduling algorithms. Results obtained from our study indicate that the increase in a network's speed reduces makespan and workloads execution times, while aiding in the increase of the number of virtual machine migrations. We further observed that the number of migrations' behaviors in relation to the speed of the networks also depends on the employed virtual machines scheduling algorithm.
Error Correction Using Long Context Match for Smartphone Speech Recognition
Yuan LIANG Koji IWANO Koichi SHINODA

PAPER-Speech and Hearing

Pubricized:
2015/07/31
Vol:
E98-D No:11
Page(s):
1932-1942
Most error correction interfaces for speech recognition applications on smartphones require the user to first mark an error region and choose the correct word from a candidate list. We propose a simple multimodal interface to make the process more efficient. We develop Long Context Match (LCM) to get candidates that complement the conventional word confusion network (WCN). Assuming that not only the preceding words but also the succeeding words of the error region are validated by users, we use such contexts to search higher-order n-grams corpora for matching word sequences. For this purpose, we also utilize the Web text data. Furthermore, we propose a combination of LCM and WCN (“LCM + WCN”) to provide users with candidate lists that are more relevant than those yielded by WCN alone. We compare our interface with the WCN-based interface on the Corpus of Spontaneous Japanese (CSJ). Our proposed “LCM + WCN” method improved the 1-best accuracy by 23%, improved the Mean Reciprocal Rank (MRR) by 28%, and our interface reduced the user's load by 12%.
Robust ASR Based on ETSI Advanced Front-End Using Complex Speech Analysis
Keita HIGA Keiichi FUNAKI

PAPER

Vol:
E98-A No:11
Page(s):
2211-2219
The advanced front-end (AFE) for automatic speech recognition (ASR) was standardized by the European Telecommunications Standards Institute (ETSI). The AFE provides speech enhancement realized by an iterative Wiener filter (IWF) in which a smoothed FFT spectrum over adjacent frames is used to design the filter. We have previously proposed robust time-varying complex Auto-Regressive (TV-CAR) speech analysis for an analytic signal and evaluated the performance of speech processing such as F0 estimation and speech enhancement. TV-CAR analysis can estimate more accurate spectrum than FFT, especially in low frequencies because of the nature of the analytic signal. In addition, TV-CAR can estimate more accurate speech spectrum against additive noise. In this paper, a time-invariant version of wide-band TV-CAR analysis is introduced to the IWF in the AFE and is evaluated using the CENSREC-2 database and its baseline script.
Acoustic Event Detection in Speech Overlapping Scenarios Based on High-Resolution Spectral Input and Deep Learning
Miquel ESPI Masakiyo FUJIMOTO Tomohiro NAKATANI

PAPER-Speech and Hearing

Pubricized:
2015/06/23
Vol:
E98-D No:10
Page(s):
1799-1807
We present a method for recognition of acoustic events in conversation scenarios where speech usually overlaps with other acoustic events. While speech is usually considered the most informative acoustic event in a conversation scene, it does not always contain all the information. Non-speech events, such as a door knock, steps, or a keyboard typing can reveal aspects of the scene that speakers miss or avoid to mention. Moreover, being able to robustly detect these events could further support speech enhancement and recognition systems by providing useful information cues about the surrounding scenarios and noise. In acoustic event detection, state-of-the-art techniques are typically based on derived features (e.g. MFCC, or Mel-filter-banks) which have successfully parameterized the spectrogram of speech but reduce resolution and detail when we are targeting other kinds of events. In this paper, we propose a method that learns features in an unsupervised manner from high-resolution spectrogram patches (considering a patch as a certain number of consecutive frame features stacked together), and integrates within the deep neural network framework to detect and classify acoustic events. Superiority over both previous works in the field, and similar approaches based on derived features, has been assessed by statical measures and evaluation with CHIL2007 corpus, an annotated database of seminar recordings.
Measurement-Based Spectrum Database for Flexible Spectrum Management
Koya SATO Masayuki KITAMURA Kei INAGE Takeo FUJII

PAPER

Vol:
E98-B No:10
Page(s):
2004-2013
In this paper, we propose the novel concept of a spectrum database for improving the efficiency of spectrum utilization. In the current design of TV white space spectrum databases, a propagation model is utilized to determine the spectrum availability. However, this propagation model has poor accuracy for radio environment estimation because it requires a large interference margin for the PU coverage area to ensure protection of primary users (PUs); thus, it decreases the spectrum sharing efficiency. The proposed spectrum database consists of radio environment measurement results from sensors on mobile terminals such as vehicles and smart phones. In the proposed database, actual measurements of radio signals are used to estimate radio information regarding PUs. Because the sensors on mobile terminals can gather a large amount of data, accurate propagation information can be obtained, including information regarding propagation loss and shadowing. In this paper, we first introduce the architecture of the proposed spectrum database. Then, we present experimental results for the database construction using actual TV broadcast signals. Additionally, from the evaluation results, we discuss the extent to which the proposed database can mitigate the excess interference margin.
Robust Voice Activity Detection Algorithm Based on Feature of Frequency Modulation of Harmonics and Its DSP Implementation
Chung-Chien HSU Kah-Meng CHEONG Tai-Shih CHI Yu TSAO

PAPER-Speech and Hearing

Pubricized:
2015/07/10
Vol:
E98-D No:10
Page(s):
1808-1817
This paper proposes a voice activity detection (VAD) algorithm based on an energy related feature of the frequency modulation of harmonics. A multi-resolution spectro-temporal analysis framework, which was developed to extract texture features of the audio signal from its Fourier spectrogram, is used to extract frequency modulation features of the speech signal. The proposed algorithm labels the voice active segments of the speech signal by comparing the energy related feature of the frequency modulation of harmonics with a threshold. Then, the proposed VAD is implemented on one of Texas Instruments (TI) digital signal processor (DSP) platforms for real-time operation. Simulations conducted on the DSP platform demonstrate the proposed VAD performs significantly better than three standard VADs, ITU-T G.729B, ETSI AMR1 and AMR2, in non-stationary noise in terms of the receiver operating characteristic (ROC) curves and the recognition rates from a practical distributed speech recognition (DSR) system.
A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion
Peng SONG Wenming ZHENG Xinran ZHANG Yun JIN Cheng ZHA Minghai XIN

LETTER-Speech and Hearing

Vol:
E98-A No:10
Page(s):
2178-2181
Most of the current voice conversion methods are conducted based on parallel speech, which is not easily obtained in practice. In this letter, a novel iterative speaker model alignment (ISMA) method is proposed to address this problem. First, the source and target speaker models are each trained from the background model by adopting maximum a posteriori (MAP) algorithm. Then, a novel ISMA method is presented for alignment and transformation of spectral features. Finally, the proposed ISMA approach is further combined with a Gaussian mixture model (GMM) to improve the conversion performance. A series of objective and subjective experiments are carried out on CMU ARCTIC dataset, and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approach.
User Equipment Centric Downlink Access in Unlicensed Spectrum for Heterogeneous Mobile Network Open Access
Riichi KUDO B. A. Hirantha Sithira ABEYSEKERA Yusuke ASAI Takeo ICHIKAWA Yasushi TAKATORI Masato MIZOGUCHI

PAPER

Vol:
E98-B No:10
Page(s):
1969-1977
Combining heterogeneous wireless networks that cross licensed and unlicensed spectra is a promising way of supporting the surge in mobile traffic. The unlicensed band is mostly used by wireless LAN (WLAN) nodes which employ carrier sense multiple access/collision avoidance (CSMA/CA). Since the number of WLAN devices and their traffic are increasing, the wireless resource of the unlicensed band is expected be more depleted in 2020s. In such a wireless environment, the throughput could be extremely low and unstable due to the hidden terminal problem and exposed terminal problem despite of the large resources of the allocated frequency band and high peak PHY rate. In this paper, we propose user equipment (UE) centric access in the unlicensed band, with support by licensed band access in the mobile network. The proposed access enables robust downlink transmission from the access point (AP) to the UEs by mitigating the hidden terminal problem. The licensed spectrum access passes information on the user data waiting at the AP to the UEs and triggers UE reception opportunity (RXOP) acquisition. Furthermore, the adaptive use of UE centric downlink access is presented by using the channel utilization measured at the AP. Computer simulations confirm that licensed access assistance enhances the robustness of the unlicensed band access against the hidden terminal problem.
Separation of Mass Spectra Based on Probabilistic Latent Component Analysis for Explosives Detection
Yohei KAWAGUCHI Masahito TOGAMI Hisashi NAGANO Yuichiro HASHIMOTO Masuyuki SUGIYAMA Yasuaki TAKADA

PAPER

Vol:
E98-A No:9
Page(s):
1888-1897
A new algorithm for separating mass spectra into individual substances for explosives detection is proposed. In the field of mass spectrometry, separation methods, such as principal-component analysis (PCA) and independent-component analysis (ICA), are widely used. All components, however, have no negative values, and the orthogonality condition imposed on components also does not necessarily hold in the case of mass spectra. Because these methods allow negative values and PCA imposes an orthogonality condition, they are not suitable for separation of mass spectra. The proposed algorithm is based on probabilistic latent-component analysis (PLCA). PLCA is a statistical formulation of non-negative matrix factorization (NMF) using KL divergence. Because PLCA imposes the constraint of non-negativity but not orthogonality, the algorithm is effective for separating components of mass spectra. In addition, to estimate the components more accurately, a sparsity constraint is applied to PLCA for explosives detection. The main contribution is industrial application of the algorithm into an explosives-detection system. Results of an experimental evaluation of the algorithm with data obtained in a real railway station demonstrate that the proposed algorithm outperforms PCA and ICA. Also, results of calculation time demonstrate that the algorithm can work in real time.
Mass Spectra Separation for Explosives Detection by Using an Attenuation Model
Yohei KAWAGUCHI Masahito TOGAMI Hisashi NAGANO Yuichiro HASHIMOTO Masuyuki SUGIYAMA Yasuaki TAKADA

PAPER

Vol:
E98-A No:9
Page(s):
1898-1905
A new algorithm for separating mass spectra into individual substances is proposed for explosives detection. The conventional algorithm based on probabilistic latent component analysis (PLCA) is effective in many cases because it makes use of the fact that non-negativity and sparsity hold for mass spectra in explosives detection. The algorithm, however, fails to separate mass spectra in some cases because uncertainty can not be resolved only by non-negativity and sparsity constraints. To resolve the uncertainty, an algorithm based on shift-invariant PLCA (SIPLCA) utilizing temporal correlation of mass spectra is proposed in this paper. In addition, to prevent overfitting, the temporal correlation is modeled with a function representing attenuation by focusing on the fact that the amount of a substance is attenuated continuously and slowly with time. Results of an experimental evaluation of the algorithm with data obtained in a real railway station demonstrate that the proposed algorithm outperforms the PLCA-based conventional algorithm and the simple SIPLCA-based one. The main novelty of this paper is that an evaluation of the detection performance of explosives detection is demonstrated. Results of the evaluation indicate that the proposed separation algorithm can improve the detection performance.
Motion of Break Arcs Occurring between Silver Electrical Contacts with Copper Arc Runners
Haruki MIYAGAWA Junya SEKIKAWA

BRIEF PAPER

Vol:
E98-C No:9
Page(s):
919-922
Copper arc runners are fixed on silver electrical contacts. Break arcs are generated between the contacts in a DC resistive circuit. Circuit current when contacts are closed is 10A. Supply voltage is changed from 200V to 450V. The following results are shown. Cathode spots stay on the cathode surface but anode spots run on the runner when the supply voltage is 250V and over. In cases of the supply voltage is greater than 250V, the break arcs run on the runner when the arcs are successfully extinguished, and stays on the runner in cases of the failure of arc extinction. The arc lengths just before arc extinction with or without the runners are also investigated. The arc lengths are the same with or without the runners for each supply voltage.

441-460hit(2504hit)

Keyword Search Result

[Keyword] SPE(2504hit)

Speech Recognition of English by Japanese Using Lexicon Represented by Multiple Reduced Phoneme Sets

Supervised Denoising Pre-Training for Robust ASR with DNN-HMM

A Fundamental Inequality for Lower-Bounding the Error Probability for Classical and Classical-Quantum Multiple Access Channels and Its Applications

Using Correlated Regression Models to Calculate Cumulative Attributes for Age Estimation

Performance Enhancement of Cross-Talk Canceller for Four-Speaker System by Selective Speaker Operation

F0 Parameterization of Glottalized Tones in HMM-Based Speech Synthesis for Hanoi Vietnamese

Speech Enhancement Combining NMF Weighted by Speech Presence Probability and Statistical Model

Circularity of the Fractional Fourier Transform and Spectrum Kurtosis for LFM Signal Detection in Gaussian Noise Model

Beyond 110 GHz InP-HEMT Based Mixer Module Using Flip-Chip Assembly for Precise Spectrum Analysis

On Makespan, Migrations, and QoS Workloads' Execution Times in High Speed Data Centers Open Access

Error Correction Using Long Context Match for Smartphone Speech Recognition

Robust ASR Based on ETSI Advanced Front-End Using Complex Speech Analysis

Acoustic Event Detection in Speech Overlapping Scenarios Based on High-Resolution Spectral Input and Deep Learning

Measurement-Based Spectrum Database for Flexible Spectrum Management

Robust Voice Activity Detection Algorithm Based on Feature of Frequency Modulation of Harmonics and Its DSP Implementation

A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion

User Equipment Centric Downlink Access in Unlicensed Spectrum for Heterogeneous Mobile Network Open Access

Separation of Mass Spectra Based on Probabilistic Latent Component Analysis for Explosives Detection

Mass Spectra Separation for Explosives Detection by Using an Attenuation Model

Motion of Break Arcs Occurring between Silver Electrical Contacts with Copper Arc Runners

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles