Miki SATO Toru IWASAWA Akihiko SUGIYAMA Toshihiro NISHIZAWA Yosuke TAKANO
This paper presents a single-chip speech dialogue module and its evaluation on a personal robot. This module is implemented on an application processor that was developed primarily for mobile phones to provide a compact size, low power-consumption, and low cost. It performs speech recognition with preprocessing functions such as direction-of-arrival (DOA) estimation, noise cancellation, beamforming with an array of microphones, and echo cancellation. Text-to-speech (TTS) conversion is also equipped with. Evaluation results obtained on a new personal robot, PaPeRo-mini, which is a scale-down version of PaPeRo, demonstrate an 85% correct rate in DOA estimation, and as much as 54% and 30% higher speech recognition rates in noisy environments and during robot utterances, respectively. These results are shown to be comparable to those obtained by PaPeRo.
Bong-Jin LEE Chi-Sang JUNG Jeung-Yoon CHOI Hong-Goo KANG
This letter describes the importance of transition regions, e.g. at phoneme boundaries, for automatic speaker recognition compared with using steady-state regions. Experimental results of automatic speaker identification tasks confirm that transition regions include the most speaker distinctive features. A possible reason for obtaining such results is described in view of articulation, in particular, the degree of freedom of articulators. These results are expected to provide useful information in designing an efficient automatic speaker recognition system.
Yusuke IJIMA Takashi NOSE Makoto TACHIBANA Takao KOBAYASHI
In this paper, we propose a rapid model adaptation technique for emotional speech recognition which enables us to extract paralinguistic information as well as linguistic information contained in speech signals. This technique is based on style estimation and style adaptation using a multiple-regression HMM (MRHMM). In the MRHMM, the mean parameters of the output probability density function are controlled by a low-dimensional parameter vector, called a style vector, which corresponds to a set of the explanatory variables of the multiple regression. The recognition process consists of two stages. In the first stage, the style vector that represents the emotional expression category and the intensity of its expressiveness for the input speech is estimated on a sentence-by-sentence basis. Next, the acoustic models are adapted using the estimated style vector, and then standard HMM-based speech recognition is performed in the second stage. We assess the performance of the proposed technique in the recognition of simulated emotional speech uttered by both professional narrators and non-professional speakers.
Osamu SHIMADA Akihiko SUGIYAMA Toshiyuki NOMURA
This paper proposes a low complexity noise suppressor with hybrid filterbanks and adaptive time-frequency tiling. An analysis hybrid filterbank provides efficient transformation by further decomposing low-frequency bins after a coarse transformation with a short frame size. A synthesis hybrid filterbank also reduces computational complexity in a similar fashion to the analysis hybrid filterbank. Adaptive time-frequency tiling reduces the number of spectral gain calculations. It adaptively generates tiling information in the time-frequency plane based on the signal characteristics. The average number of instructions on a typical DSP chip has been reduced by 30% to 7.5 MIPS in case of mono signals sampled at 44.1 kHz. A Subjective test result shows that the sound quality of the proposed method is comparable to that of the conventional one.
In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.
Peng WANG Xiaofeng ZHONG Limin XIAO Shidong ZHOU Jing WANG Yong BAI
In this letter, the performance improvement by the deployment of multiple antennas in cognitive radio systems is studied from a system-level view. The term opportunistic spectrum efficiency (OSE) is defined as the performance metric to evaluate the spectrum opportunities that can actually be exploited by the secondary user (SU). By applying a simple energy combining detector, we show that deploying multiple antennas at the SU transceiver can improve the maximum achievable OSE significantly. Numerical results also reveal that the improvement comes from the reduction of both the detection overhead and the false alarm probability.
In this study, a discriminative weight training is applied to a support vector machine (SVM) based speech/music classification for a 3GPP2 selectable mode vocoder (SMV). In the proposed approach, the speech/music decision rule is derived by the SVM by incorporating optimally weighted features derived from the SMV based on a minimum classification error (MCE) method. This method differs from that of the previous work in that different weights are assigned to each feature of the SMV a novel process. According to the experimental results, the proposed approach is effective for speech/music classification using the SVM.
Juinn-Horng DENG Jeng-Kuang HWANG
Recently, a new multi-carrier CDMA (MC-CDMA) system with cyclic-shift orthogonal keying (CSOK) has been proposed and shown to be more spectral and power efficient than conventional MC-CDMA systems. In this paper, a novel extension called the multiplexed CSOK (MCSOK) MC-CDMA system is proposed to further increase the data rate while maintaining a low peak-to-average power ratio (PAPR). First, the data stream is divided into multiple parallel substreams that are mapped into QPSK-CSOK symbols in terms of cyclic shifted Chu sequences. Second, these sequences are repeated, modulated, summed, and placed on IFFT subcarriers, resulting in a constant-modulus multiplexed signal that preserves the desired orthogonality among substreams. The receiver performs frequency-domain equalization and uses efficient demultiplexing, despreading, and demapping schemes to detect the modulation symbols. Furthermore, an alternate MCSOK system configuration with high link quality is also presented. Simulations show that the proposed MCSOK system attains lower PAPR and BER, as compared to conventional MC-CDMA system using Walsh codes. Under a rich multipath environment, the high link quality configuration exhibits excellent performance with both diversity gain and MCSOK modulation gain.
Lei WANG Baoyu ZHENG Qingmin MENG Chao CHEN
Based on Free Probability Theory (FPT), which has become an important branch of Random Matrix Theory (RMT), a new scheme of frequency band sensing for Cognitive Radio (CR) in Direct-Sequence Code-Division Multiple-Access (DS-CDMA) multiuser network is proposed. Unlike previous studies in the field, the new scheme does not require the knowledge of the spreading sequences of users and is related to the behavior of the asymptotic free behavior of random matrices. Simulation results show that the asymptotic claims hold true even for a small number of observations (which makes it convenient for time-varying topologies) outperforming classical energy detection scheme and another scheme based on random matrix theory.
Junichi HONDA Kazunori UCHIDA Kwang-Yeol YOON
This paper is concerned with the estimation of radio communication distance when both the transmitter and receiver are arbitrarily distributed on a random rough surface such as desert, terrain, sea surface and so on. First, we simulate electromagnetic wave propagation along the rough surface by using the discrete ray tracing method (DRTM) proposed by authors recently. Second, we determine three parameters by conjugate gradient method (CGM) combined with the method of least-squares. Finally, we derive an analytical expression which can estimate the maximum communication distance when the input power of a transmitter and the minimum detectable electric intensity of a receiver are specified. Random rough surfaces are assumed to be Gaussian, pn-th order power law or exponential distributions.
Spectrum sensing is a key technology within Cognitive Radio (CR) systems. Cooperative spectrum sensing using a distributed model provides improved detection for the primary user, which opens the CR system to a new security threat. This threat is the decrease of the cooperative sensing performance due to the spectrum sensing data falsification which is generated from malicious users. Our proposed scheme, based on robust statistics, utilizes only available past sensing nodes' received power data for estimating the distribution parameters of the primary signal presence and absence hypotheses. These estimated parameters are used to perform the Dempster-Shafer theory of evidence data fusion which causes the elimination of malicious users. Furthermore, in order to enhance performance, a node's reliability weight is supplemented along with the data fusion scheme. Simulation results indicate that our proposed scheme can provide a powerful capability in eliminating malicious users as well as a high gain of data fusion under various cases of channel condition.
Koichi ISHIHARA Takayuki KOBAYASHI Riichi KUDO Yasushi TAKATORI Akihide SANO Yutaka MIYAMOTO
In this paper, we use frequency-domain equalization (FDE) to create coherent optical single-carrier (CO-SC) transmission systems that are very tolerant of chromatic dispersion (CD) and polarization mode dispersion (PMD). The efficient transmission of a 25-Gb/s NRZ-QPSK signal by using the proposed FDE is demonstrated under severe CD and PMD conditions. We also discuss the principle of FDE and some techniques suitable for implementing CO-SC-FDE. The results show that a CO-SC-FDE system is very tolerant of CD and PMD and can achieve high transmission rates over single mode fiber without optical dispersion compensation.
Hyoungsuk JEON Sooyeol IM Youmin KIM Seunghee KIM Jinup KIM Hyuckjae LEE
The public safety spectrum is generally under-utilized due to the unique traffic characteristics of bursty and mission critical. This letter considers the application of dynamic spectrum access (DSA) to the combined spectrum of public safety (PS) and commercial (CMR) users in a common shared network that can provide both PS and CMR services. Our scenario includes the 700 MHz Public/Private Partnership which was recently issued by the Federal Communications Commission. We first propose an efficient DSA mechanism to coordinate the combined spectrum, and then establish a call admission control that reflects the proposed DSA in a wideband code division multiple access based network. The essentials of our proposed DSA are opportunistic access to the public safety spectrum and priority access to the commercial spectrum. Simulation results show that these schemes are well harmonized in various network environments.
Hiroshi FUKETA Masanori HASHIMOTO Yukio MITSUYAMA Takao ONOYE
Timing margin of a chip varies chip by chip due to manufacturing variability, and depends on operating environment and aging. Adaptive speed control with timing error prediction is promising to mitigate the timing margin variation, whereas it inherently has a critical risk of timing error occurrence when a circuit is slowed down. This paper presents how to evaluate the relation between timing error rate and power dissipation in self-adaptive circuits with timing error prediction. The discussion is experimentally validated using adders in subthreshold operation in a 90 nm CMOS process. We show a trade-off between timing error rate and power dissipation, and reveal the dependency of the trade-off on design parameters.
Abdellah KADDAI Mohammed HALIMI
In this paper an algebraic trellis vector quantization (ATVQ) that introduces algebraic codebooks into trellis coded vector quantization (TCVQ) structure is presented. Low encoding complexity and minimum memory storage requirements are achieved using the proposed approach. It exploits advantages of both the TCVQ and the algebraic codebooks to know the delayed decision, the codebook widening, the low computational complexity and the no storage of codebook. This novel vector quantization scheme is used to encode the wideband speech line spectral frequencies (LSF) parameters. Experimental results on wideband speech have shown that ATVQ yields the same performance as the traditional split vector quantization (SVQ) and the TCVQ in terms of spectral distortion (SD). It can achieve a transparent quality at 47 bits/frame with a considerable reduction of memory storage and computation complexity when compared to SVQ and TCVQ.
Thi Huong TRAN Yuanfeng SHE Jiro HIROKAWA Kimio SAKURAI Yoshinori KOGAMI Makoto ANDO
This paper presents a measurement method for determining effective conductivity of copper-clad dielectric laminate substrates in the millimeter-wave region. The conductivity is indirectly evaluated from measured resonant frequencies and unloaded Q values of a number of Whispering Gallery modes excited in a circular disk sample, which consists of a copper-clad dielectric substrate with a large diameter of 20-30 wavelengths. We can, therefore, obtain easily the frequency dependence of the effective conductivity of the sample under test in a wide range of frequency at once. Almost identical conductivity is predicted for two kinds of WG resonators (the copper-clad type and the sandwich type) with different field distribution; it is self-consistent and provides the important foundation for the method if not for the alternative method at this moment. We measure three kinds of copper foils in 55-65 GHz band, where the conductivity of electrodeposited copper foil is smaller than that of rolled copper foil and shiny-both-sides copper foil. The measured conductivity for the electrodeposited copper foil decreases with an increase in the frequency. The transmission losses measured for microstrip lines which are fabricated from these substrates are accurately predicted with the conductivity evaluated by this method.
CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.
Motohiro TANABE Masahiro UMEHIRA Koichi ISHIHARA Yasushi TAKATORI
An OFDMA based channel access scheme is proposed for dynamic spectrum access to utilize frequency spectrum efficiently. Though the OFDMA based scheme is flexible enough to change the bandwidth and channel of the transmitted signals, the OFDMA signal has large PAPR (Peak to Average Power Ratio). In addition, if the OFDMA receiver does not use a filter to extract sub-carriers before FFT (Fast Fourier Transform) processing, the designated sub-carriers suffer large interference from the adjacent channel signals in the FFT processing on the receiving side. To solve the problems such as PAPR and adjacent channel interference encountered in the OFDMA based scheme, this paper proposes a novel dynamic channel access scheme using overlap FFT filter-bank based on single carrier modulation. It also shows performance evaluation results of the proposed scheme by computer simulation.
Kunihiko TESHIMA Koji YAMAMOTO Hidekazu MURATA Susumu YOSHIDA
In the present paper, the performance of cooperative relaying networks with adaptive relaying scheme selection is analyzed. Cooperative relaying is a new technique to achieve spatial diversity gain by using neighboring stations. However, when multiple stations transmit simultaneously, the number of interference signals increases. Therefore, the introduction of cooperative relaying in radio communication systems does not always increase the network capacity due to the co-channel interference. Therefore, in order to achieve high spectral efficiency, it is necessary to select cooperative relaying or non-cooperative relaying adaptively. Assuming both centralized and decentralized adaptive controls, the spectrum efficiency is evaluated. The performance under decentralized control is evaluated using a game-theoretic approach. Simulation results show that the introduction of cooperative relaying with centralized control always increases the spectral efficiency. On the other hand, Simulation results also show that, when each source selects a relaying scheme independently and selfishly to maximize its own spectral efficiency, the introduction of the cooperative relaying may reduce the spectral efficiency due to the increase in the number of interference signals.
Abdorasoul GHASEMI S. Mohammad RAZAVIZADEH
A simple distributed Medium Access Control (MAC) protocol for cognitive wireless networks is proposed. It is assumed that the network is slotted, the spectrum is divided into a number of channels, and the primary network statistical aggregate traffic model on each channel is given by independent Bernoulli random variables. The objective of the cognitive MAC is to maximize the exploitation of the channels idle time slots. The cognitive users can achieve this aim by appropriate hopping between the channels at each decision stage. The proposed protocol is based on the rule of least failures that is deployed by each user independently. Using this rule, at each decision stage, a channel with the least number of recorded collisions with the primary and other cognitive users is selected for exploitation. The performance of the proposed protocol for multiple cognitive users is investigated analytically and verified by simulation. It is shown that as the number of users increases the user decision under this protocol comes close to the optimum decision to maximize its own utilization. In addition, to improve opportunity utilization in the case of a large number of cognitive users, an extension to the proposed MAC protocol is presented and evaluated by simulation.