The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)

341-360hit(2504hit)

  • An Improved Perceptual MBSS Noise Reduction with an SNR-Based VAD for a Fully Operational Digital Hearing Aid

    Zhaoyang GUO  Xin'an WANG  Bo WANG  Shanshan YONG  

     
    PAPER-Speech and Hearing

      Pubricized:
    2017/02/17
      Vol:
    E100-D No:5
      Page(s):
    1087-1096

    This paper first reviews the state-of-the-art noise reduction methods and points out their vulnerability in noise reduction performance and speech quality, especially under the low signal-noise ratios (SNR) environments. Then this paper presents an improved perceptual multiband spectral subtraction (MBSS) noise reduction algorithm (NRA) and a novel robust voice activity detection (VAD) based on the amended sub-band SNR. The proposed SNR-based VAD can considerably increase the accuracy of discrimination between noise and speech frame. The simulation results show that the proposed NRA has better segmental SNR (segSNR) and perceptual evaluation of speech quality (PESQ) performance than other noise reduction algorithms especially under low SNR environments. In addition, a fully operational digital hearing aid chip is designed and fabricated in the 0.13 µm CMOS process based on the proposed NRA. The final chip implementation shows that the whole chip dissipates 1.3 mA at the 1.2 V operation. The acoustic test result shows that the maximum output sound pressure level (OSPL) is 114.6 dB SPL, the equivalent input noise is 5.9 dB SPL, and the total harmonic distortion is 2.5%. So the proposed digital hearing aid chip is a promising candidate for high performance hearing-aid systems.

  • Learning Corpus-Invariant Discriminant Feature Representations for Speech Emotion Recognition

    Peng SONG  Shifeng OU  Zhenbin DU  Yanyan GUO  Wenming MA  Jinglei LIU  Wenming ZHENG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2017/02/02
      Vol:
    E100-D No:5
      Page(s):
    1136-1139

    As a hot topic of speech signal processing, speech emotion recognition methods have been developed rapidly in recent years. Some satisfactory results have been achieved. However, it should be noted that most of these methods are trained and evaluated on the same corpus. In reality, the training data and testing data are often collected from different corpora, and the feature distributions of different datasets often follow different distributions. These discrepancies will greatly affect the recognition performance. To tackle this problem, a novel corpus-invariant discriminant feature representation algorithm, called transfer discriminant analysis (TDA), is presented for speech emotion recognition. The basic idea of TDA is to integrate the kernel LDA algorithm and the similarity measurement of distributions into one objective function. Experimental results under the cross-corpus conditions show that our proposed method can significantly improve the recognition rates.

  • Development of the “VoiceTra” Multi-Lingual Speech Translation System Open Access

    Shigeki MATSUDA  Teruaki HAYASHI  Yutaka ASHIKARI  Yoshinori SHIGA  Hidenori KASHIOKA  Keiji YASUDA  Hideo OKUMA  Masao UCHIYAMA  Eiichiro SUMITA  Hisashi KAWAI  Satoshi NAKAMURA  

     
    INVITED PAPER

      Pubricized:
    2017/01/13
      Vol:
    E100-D No:4
      Page(s):
    621-632

    This study introduces large-scale field experiments of VoiceTra, which is the world's first speech-to-speech multilingual translation application for smart phones. In the study, approximately 10 million input utterances were collected since the experiments commenced. The usage of collected data was analyzed and discussed. The study has several important contributions. First, it explains system configuration, communication protocol between clients and servers, and details of multilingual automatic speech recognition, multilingual machine translation, and multilingual speech synthesis subsystems. Second, it demonstrates the effects of mid-term system updates using collected data to improve an acoustic model, a language model, and a dictionary. Third, it analyzes system usage.

  • Correlation-Based Optimal Chirp Rate Allocation for Chirp Spread Spectrum Using Multiple Linear Chirps

    Kwang-Yul KIM  Seung-Woo LEE  Yu-Min HWANG  Jae-Seang LEE  Yong-Sin KIM  Jin-Young KIM  Yoan SHIN  

     
    LETTER-Spread Spectrum Technologies and Applications

      Vol:
    E100-A No:4
      Page(s):
    1088-1091

    A chirp spread spectrum (CSS) system uses a chirp signal which changes the instantaneous frequency according to time for spreading a transmission bandwidth. In the CSS system, the transmission performance can be simply improved by increasing the time-bandwidth product which is known as the processing gain. However, increasing the transmission bandwidth is limited because of the spectrum regulation. In this letter, we propose a correlation-based chirp rate allocation method to improve the transmission performance by analyzing the cross-correlation coefficient in the same time-bandwidth product. In order to analyze the transmission performance of the proposed method, we analytically derive the cross-correlation coefficient according to the time-bandwidth separation product and simulate the transmission performance. The simulation results show that the proposed method can analytically allocate the optimal chirp rate and improve the transmission performance.

  • DCT-OFDM Watermarking Scheme Based on Communication System Model

    Minoru KURIBAYASHI  Shogo SHIGEMOTO  Nobuo FUNABIKI  

     
    PAPER-Spread Spectrum Technologies and Applications

      Vol:
    E100-A No:4
      Page(s):
    944-952

    In conventional spread spectrum (SS) watermarking schemes, random sequences are used for the modulation of watermark information. However, because of the mutual interference among those sequences, it requires complicated removal operation to improve the performance. In this paper, we propose an efficient spread spectrum watermarking scheme by introducing the orthogonal frequency divisiion multiplexing (OFDM) technique at the modulation of watermark information. The SS sequences in the proposed method are the DCT basic vectors modulated by a pseudo-random number (PN) sequence. We investigate the SS-based method considering the host interference at the blind detection scenario and analyze the noise caused by attacks. Because every operation is invertible, the quantization index modulation (QIM)-based method is applicable for the OFDM modulated signals. We also consider the property of watermark extracting operation in SS-based and QIM-based method and formalize their models of noisy channel in order to employ an error correcting code. The performance of their methods with error correcting code is numerically evaluated under the constraints of same distortion level in watermarked content. The experimental results indicated a criteria for the selection of SS-based and QIM-based methods for given content, which is determined by the amount of host interference. In case that the host interference is 0.8 times smaller than a watermark signal, the SS-based method is suitable. When it is 1.0 times larger, the QIM-based method should be selected.

  • Accent Sandhi Estimation of Tokyo Dialect of Japanese Using Conditional Random Fields Open Access

    Masayuki SUZUKI  Ryo KUROIWA  Keisuke INNAMI  Shumpei KOBAYASHI  Shinya SHIMIZU  Nobuaki MINEMATSU  Keikichi HIROSE  

     
    INVITED PAPER

      Pubricized:
    2016/12/08
      Vol:
    E100-D No:4
      Page(s):
    655-661

    When synthesizing speech from Japanese text, correct assignment of accent nuclei for input text with arbitrary contents is indispensable in obtaining naturally-sounding synthetic speech. A phenomenon called accent sandhi occurs in utterances of Japanese; when a word is uttered in a sentence, its accent nucleus may change depending on the contexts of preceding/succeeding words. This paper describes a statistical method for automatically predicting the accent nucleus changes due to accent sandhi. First, as the basis of the research, a database of Japanese text was constructed with labels of accent phrase boundaries and accent nucleus positions when uttered in sentences. A single native speaker of Tokyo dialect Japanese annotated all the labels for 6,344 Japanese sentences. Then, using this database, a conditional-random-field-based method was developed using this database to predict accent phrase boundaries and accent nuclei. The proposed method predicted accent nucleus positions for accent phrases with 94.66% accuracy, clearly surpassing the 87.48% accuracy obtained using our rule-based method. A listening experiment was also conducted on synthetic speech obtained using the proposed method and that obtained using the rule-based method. The results show that our method significantly improved the naturalness of synthetic speech.

  • Development and Evaluation of Online Infrastructure to Aid Teaching and Learning of Japanese Prosody Open Access

    Nobuaki MINEMATSU  Ibuki NAKAMURA  Masayuki SUZUKI  Hiroko HIRANO  Chieko NAKAGAWA  Noriko NAKAMURA  Yukinori TAGAWA  Keikichi HIROSE  Hiroya HASHIMOTO  

     
    INVITED PAPER

      Pubricized:
    2016/12/22
      Vol:
    E100-D No:4
      Page(s):
    662-669

    This paper develops an online and freely available framework to aid teaching and learning the prosodic control of Tokyo Japanese: how to generate its adequate word accent and phrase intonation. This framework is called OJAD (Online Japanese Accent Dictionary) [1] and it provides three features. 1) Visual, auditory, systematic, and comprehensive illustration of patterns of accent change (accent sandhi) of verbs and adjectives. Here only the changes caused by twelve fundamental conjugations are focused upon. 2) Visual illustration of the accent pattern of a given verbal expression, which is a combination of a verb and its postpositional auxiliary words. 3) Visual illustration of the pitch pattern of any given sentence and the expected positions of accent nuclei in the sentence. The third feature is technically implemented by using an accent change prediction module that we developed for Japanese Text-To-Speech (TTS) synthesis [2],[3]. Experiments show that accent nucleus assignment to given texts by the proposed framework is much more accurate than that by native speakers. Subjective assessment and objective assessment done by teachers and learners show extremely high pedagogical effectiveness of the developed framework.

  • XY-Separable Scale-Space Filtering by Polynomial Representations and Its Applications Open Access

    Gou KOUTAKI  Keiichi UCHIMURA  

     
    INVITED PAPER

      Pubricized:
    2017/01/11
      Vol:
    E100-D No:4
      Page(s):
    645-654

    In this paper, we propose the application of principal component analysis (PCA) to scale-spaces. PCA is a standard method used in computer vision. Because the translation of an input image into scale-space is a continuous operation, it requires the extension of conventional finite matrix-based PCA to an infinite number of dimensions. Here, we use spectral theory to resolve this infinite eigenvalue problem through the use of integration, and we propose an approximate solution based on polynomial equations. In order to clarify its eigensolutions, we apply spectral decomposition to Gaussian scale-space and scale-normalized Laplacian of Gaussian (sLoG) space. As an application of this proposed method, we introduce a method for generating Gaussian blur images and sLoG images, demonstrating that the accuracy of such an image can be made very high by using an arbitrary scale calculated through simple linear combination. Furthermore, to make the scale-space filtering efficient, we approximate the basis filter set using Gaussian lobes approximation and we can obtain XY-Separable filters. As a more practical example, we propose a new Scale Invariant Feature Transform (SIFT) detector.

  • Phoneme Set Design Based on Integrated Acoustic and Linguistic Features for Second Language Speech Recognition

    Xiaoyun WANG  Tsuneo KATO  Seiichi YAMAMOTO  

     
    PAPER-Speech and Hearing

      Pubricized:
    2016/12/29
      Vol:
    E100-D No:4
      Page(s):
    857-864

    Recognition of second language (L2) speech is a challenging task even for state-of-the-art automatic speech recognition (ASR) systems, partly because pronunciation by L2 speakers is usually significantly influenced by the mother tongue of the speakers. Considering that the expressions of non-native speakers are usually simpler than those of native ones, and that second language speech usually includes mispronunciation and less fluent pronunciation, we propose a novel method that maximizes unified acoustic and linguistic objective function to derive a phoneme set for second language speech recognition. The authors verify the efficacy of the proposed method using second language speech collected with a translation game type dialogue-based computer assisted language learning (CALL) system. In this paper, the authors examine the performance based on acoustic likelihood, linguistic discrimination ability and integrated objective function for second language speech. Experiments demonstrate the validity of the phoneme set derived by the proposed method.

  • A Low-Computation Compressive Wideband Spectrum Sensing Algorithm Based on Multirate Coprime Sampling

    Shiyu REN  Zhimin ZENG  Caili GUO  Xuekang SUN  

     
    LETTER-Digital Signal Processing

      Vol:
    E100-A No:4
      Page(s):
    1060-1065

    Compressed sensing (CS)-based wideband spectrum sensing has been a hot topic because it can cut high signal acquisition costs. However, using CS-based approaches, the spectral recovery requires large computational complexity. This letter proposes a wideband spectrum sensing algorithm based on multirate coprime sampling. It can detect the entire wideband directly from sub-Nyquist samples without spectral recovery, thus it brings a significant reduction of computational complexity. Compared with the excellent spectral recovery algorithm, i.e., orthogonal matching pursuit, our algorithm can maintain good sensing performance with computational complexity being several orders of magnitude lower.

  • A Speech Enhancement Method Based on Multi-Task Bayesian Compressive Sensing

    Hanxu YOU  Zhixian MA  Wei LI  Jie ZHU  

     
    PAPER-Speech and Hearing

      Pubricized:
    2016/11/30
      Vol:
    E100-D No:3
      Page(s):
    556-563

    Traditional speech enhancement (SE) algorithms usually have fluctuant performance when they deal with different types of noisy speech signals. In this paper, we propose multi-task Bayesian compressive sensing based speech enhancement (MT-BCS-SE) algorithm to achieve not only comparable performance to but also more stable performance than traditional SE algorithms. MT-BCS-SE algorithm utilizes the dependence information among compressive sensing (CS) measurements and the sparsity of speech signals to perform SE. To obtain sufficient sparsity of speech signals, we adopt overcomplete dictionary to transform speech signals into sparse representations. K-SVD algorithm is employed to learn various overcomplete dictionaries. The influence of the overcomplete dictionary on MT-BCS-SE algorithm is evaluated through large numbers of experiments, so that the most suitable dictionary could be adopted by MT-BCS-SE algorithm for obtaining the best performance. Experiments were conducted on well-known NOIZEUS corpus to evaluate the performance of the proposed algorithm. In these cases of NOIZEUS corpus, MT-BCS-SE is shown that to be competitive or even superior to traditional SE algorithms, such as optimally-modified log-spectral amplitude (OMLSA), multi-band spectral subtraction (SSMul), and minimum mean square error (MMSE), in terms of signal-noise ratio (SNR), speech enhancement gain (SEG) and perceptual evaluation of speech quality (PESQ) and to have better stability than traditional SE algorithms.

  • Lexicon-Based Local Representation for Text-Dependent Speaker Verification

    Hanxu YOU  Wei LI  Lianqiang LI  Jie ZHU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2016/12/05
      Vol:
    E100-D No:3
      Page(s):
    587-589

    A text-dependent i-vector extraction scheme and a lexicon-based binary vector (L-vector) representation are proposed to improve the performance of text-dependent speaker verification. I-vector and L-vector are used to represent the utterances for enrollment and test. An improved cosine distance kernel is constructed by combining i-vector and L-vector together and is used to distinguish both speaker identity and lexical (or text) diversity with back-end support vector machine (SVM). Experiments are conducted on RSR 2015 Corpus part 1 and part 2, the results indicate that at most 30% improvement can be obtained compared with traditional i-vector baseline.

  • An Efficient Image to Sound Mapping Method Using Speech Spectral Phase and Multi-Column Image

    Arata KAWAMURA  Hiro IGARASHI  Youji IIGUNI  

     
    LETTER-Digital Signal Processing

      Vol:
    E100-A No:3
      Page(s):
    893-895

    Image-to-sound mapping is a technique that transforms an image to a sound signal, which is subsequently treated as a sound spectrogram. In general, the transformed sound differs from a human speech signal. Herein an efficient image-to-sound mapping method, which provides an understandable speech signal without any training, is proposed. To synthesize such a speech signal, the proposed method utilizes a multi-column image and a speech spectral phase that is obtained from a long-time observation of the speech. The original image can be retrieved from the sound spectrogram of the synthesized speech signal. The synthesized speech and the reconstructed image qualities are evaluated using objective tests.

  • An Improved Supervised Speech Separation Method Based on Perceptual Weighted Deep Recurrent Neural Networks

    Wei HAN  Xiongwei ZHANG  Meng SUN  Li LI  Wenhua SHI  

     
    LETTER-Speech and Hearing

      Vol:
    E100-A No:2
      Page(s):
    718-721

    In this letter, we propose a novel speech separation method based on perceptual weighted deep recurrent neural network (DRNN) which incorporate the masking properties of the human auditory system. In supervised training stage, we firstly utilize the clean label speech of two different speakers to calculate two perceptual weighting matrices. Then, the obtained different perceptual weighting matrices are utilized to adjust the mean squared error between the network outputs and the reference features of both the two clean speech so that the two different speech can mask each other. Experimental results on TSP speech corpus demonstrate that the proposed speech separation approach can achieve significant improvements over the state-of-the-art methods when tested with different mixing cases.

  • Joint Optimization of Perceptual Gain Function and Deep Neural Networks for Single-Channel Speech Enhancement

    Wei HAN  Xiongwei ZHANG  Gang MIN  Xingyu ZHOU  Meng SUN  

     
    LETTER-Noise and Vibration

      Vol:
    E100-A No:2
      Page(s):
    714-717

    In this letter, we explore joint optimization of perceptual gain function and deep neural networks (DNNs) for a single-channel speech enhancement task. A DNN architecture is proposed which incorporates the masking properties of the human auditory system to make the residual noise inaudible. This new DNN architecture directly trains a perceptual gain function which is used to estimate the magnitude spectrum of clean speech from noisy speech features. Experimental results demonstrate that the proposed speech enhancement approach can achieve significant improvements over the baselines when tested with TIMIT sentences corrupted by various types of noise, no matter whether the noise conditions are included in the training set or not.

  • Polymer Surface Modification Due to Active Oxygen Species and Ultraviolet Light Exposures

    Kazuki HOSOYA  Ryo WAKAYAMA  Kei OYA  Satoru IWAMORI  

     
    BRIEF PAPER

      Vol:
    E100-C No:2
      Page(s):
    137-140

    Active oxygen species (AOS), e.g., excited singlet oxygen atom [O(1D)], excited singlet oxygen molecules (1O2), ground-state oxygen atom [O(3P)] and hydroxyl radical (OH), generated under two wavelengths (185 and 254 nm) of ultraviolet (UV) light were exposed to polyethylene (PE), polypropylene (PP) and polystyrene (PS) sheets. We investigated effects of the AOS exposure on the surface modification of these polymer sheets. Nonwoven sheet was used for the surface modification to eliminate an effect of the UV light irradiation. Although hydrophobicity of the PE and PP surfaces was maintained, the PS was changed into the hydrophilic surface.

  • Hierarchical Sparse Bayesian Learning with Beta Process Priors for Hyperspectral Imagery Restoration

    Shuai LIU  Licheng JIAO  Shuyuan YANG  Hongying LIU  

     
    PAPER-Pattern Recognition

      Pubricized:
    2016/11/04
      Vol:
    E100-D No:2
      Page(s):
    350-358

    Restoration is an important area in improving the visual quality, and lays the foundation for accurate object detection or terrain classification in image analysis. In this paper, we introduce Beta process priors into hierarchical sparse Bayesian learning for recovering underlying degraded hyperspectral images (HSI), including suppressing the various noises and inferring the missing data. The proposed method decomposes the HSI into the weighted summation of the dictionary elements, Gaussian noise term and sparse noise term. With these, the latent information and the noise characteristics of HSI can be well learned and represented. Solved by Gibbs sampler, the underlying dictionary and the noise can be efficiently predicted with no tuning of any parameters. The performance of the proposed method is compared with state-of-the-art ones and validated on two hyperspectral datasets, which are contaminated with the Gaussian noises, impulse noises, stripes and dead pixel lines, or with a large number of data missing uniformly at random. The visual and quantitative results demonstrate the superiority of the proposed method.

  • Efficient Selection of Users' Pair in Cognitive Radio Network to Maximize Throughput Using Simultaneous Transmit-Sense Approach

    Muhammad Sajjad KHAN  Muhammad USMAN  Vu-Van HIEP  Insoo KOO  

     
    PAPER-Terrestrial Wireless Communication/Broadcasting Technologies

      Pubricized:
    2016/09/01
      Vol:
    E100-B No:2
      Page(s):
    380-389

    Protection of the licensed user (LU) and utilization of the spectrum are the most important goals in cognitive radio networks. To achieve the first goal, a cognitive user (CU) is required to sense for a longer time period, but this adversely affects the second goal, i.e., throughput or utilization of the network, because of the reduced time left for transmission in a time slot. This tradeoff can be controlled by simultaneous sensing and data transmission for the whole frame duration. However, increasing the sensing time to the frame duration consumes more energy. We propose a new frame structure in this paper, in which transmission is done for the whole frame duration whereas sensing is performed only until the required detection probability is satisfied. This means the CU is not required to perform sensing for the whole frame duration, and thus, conserves some energy by sensing for a smaller duration. With the proposed frame structure, throughput of all the CUs is estimated for the frame and, based on the estimated throughput and consumed energy in sensing and transmission, the energy efficient pair of CUs (transmitter and receiver) that maximizes system throughput by consuming less energy, is selected for a time slot. The selected CUs transmits data for the whole time slot, whereas sensing is performed only for certain duration. The performance improvement of the proposed scheme is demonstrated through simulations by comparing it with existing schemes.

  • Utilizing Shape-Based Feature and Discriminative Learning for Building Detection

    Shangqi ZHANG  Haihong SHEN  Chunlei HUO  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2016/11/18
      Vol:
    E100-D No:2
      Page(s):
    392-395

    Building detection from high resolution remote sensing images is challenging due to the high intraclass variability and the difficulty in describing buildings. To address the above difficulties, a novel approach is proposed based on the combination of shape-specific feature extraction and discriminative feature classification. Shape-specific feature can capture complex shapes and structures of buildings. Discriminative feature classification is effective in reflecting similarities among buildings and differences between buildings and backgrounds. Experiments demonstrate the effectiveness of the proposed approach.

  • Throughput Enhancement for SATCOM Systems Using Dynamic Spectrum Controlled Channel Allocation under Variable Propagation Conditions

    Katsuya NAKAHIRA  Jun MASHINO  Jun-ichi ABE  Daisuke MURAYAMA  Tadao NAKAGAWA  Takatoshi SUGIYAMA  

     
    PAPER-Satellite Communications

      Pubricized:
    2016/08/31
      Vol:
    E100-B No:2
      Page(s):
    390-399

    This paper proposes a dynamic spectrum controlled (DSTC) channel allocation algorithm to increase the total throughput of satellite communication (SATCOM) systems. To effectively use satellite resources such as the satellite's maximum transponder bandwidth and maximum transmission power and to handle the propagation gain variation at all earth stations, the DSTC algorithm uses two new transmission techniques: spectrum compression and spectrum division. The algorithm controls various transmission parameters, such as the spectrum compression ratio, number of spectrum divisions, combination of modulation method and FEC coding rate (MODCOD), transmission power, and spectrum bandwidth to ensure a constant transmission bit rate under variable propagation conditions. Simulation results show that the DSTC algorithm achieves up to 1.6 times higher throughput than a simple MODCOD-based algorithm.

341-360hit(2504hit)