Nobuo KAWAGUCHI Shigeki MATSUBARA Kazuya TAKEDA Fumitada ITAKURA
CIAIR, Nagoya University, has been compiling an in-car speech database since 1999. This paper discusses the basic information contained in this database and an analysis on the effects of driving status based on the database. We have developed a system called the Data Collection Vehicle (DCV), which supports synchronous recording of multi-channel audio data from 12 microphones which can be placed throughout the vehicle, multi-channel video recording from three cameras, and the collection of vehicle-related data. In the compilation process, each subject had conversations with three types of dialog system: a human, a "Wizard of Oz" system, and a spoken dialog system. Vehicle information such as speed, engine RPM, accelerator/brake-pedal pressure, and steering-wheel motion were also recorded. In this paper, we report on the effect that driving status has on phenomena specific to spoken language
Nobuhiko KITAWAKI Kou NAGAI Takeshi YAMADA
Recently, wideband speech communication using 7 kHz-wideband speech coding, as described in ITU-T Recommendations G.722, G.722.1, and G.722.2, has become increasingly necessary for use in advanced IP telephony using PCs, since, for this application, hands-free communication using separate microphones and loudspeakers is indispensable, and in this situation wideband speech is particularly helpful in enhancing the naturalness of communication. An objective quality measurement methodology for wideband-speech coding has been studied, its essential components being an objective quality measure and an input test signal. This paper describes Wideband-PESQ conforming to the draft Annex to ITU-T Recommendation P.862, "Perceptual Evaluation of Speech Quality (PESQ)," as the objective quality measure, by evaluating the consistency between the subjectively evaluated MOS (Mean Opinion Score) and objectively estimated MOS. This paper also describes the verification of artificial voice conforming to Recommendation P.50 "Artificial Voices," as the input test signal for such measurements, by evaluating the consistency between the objectively estimated MOS using a real voice and that obtained using an artificial voice.
Gerasimos XYDAS Dimitris SPILIOTOPOULOS Georgios KOUROUPETROGLOU
Synthetic speech usually suffers from bad F0 contour surface. The prediction of the underlying pitch targets robustly relies on the quality of the predicted prosodic structures, i.e. the corresponding sequences of tones and breaks. In the present work, we have utilized a linguistically enriched annotated corpus to build data-driven models for predicting prosodic structures with increased accuracy. We have then used a linear regression approach for the F0 modeling. An appropriate XML annotation scheme has been introduced to encode syntax, grammar, new or already given information, phrase subject/object information, as well as rhetorical elements in the corpus, by exploiting a Natural Language Generator (NLG) system. To prove the benefits from the introduction of the enriched input meta-information, we first show that while tone and break CART predictors have high accuracy when standing alone (92.35% for breaks, 87.76% for accents and 99.03% for endtones), their application in the TtS chain degrades the Linear Regression pitch target model. On the other hand, the enriched linguistic meta-information minimizes errors of models leading to a more natural F0 surface. Both objective and subjective evaluation were adopted for the intonation contours by taking into account the propagated errors introduced by each model in the synthesis chain.
A wavelet feature selection derived by using fuzzy evaluation index for speaker identification is described. The concept of a flexible membership function incorporating weighed distance is introduced in the evaluation index to make the modeling of clusters more appropriate. Our results have shown that this feature selection introduced better performance than the wavelet features with respect to the percentages of recognition.
Qi ZHU Noriyuki OHTSUKI Yoshikazu MIYANAGA Norinobu YOSHIDA
This paper proposes a new robust adaptive processing algorithm that is based on the extended least squares (ELS) method with running spectrum filtering (RSF). By utilizing the different characteristics of running spectra between speech signals and noise signals, RSF can retain speech characteristics while noise is effectively reduced. Then, by using ELS, autoregressive moving average (ARMA) parameters can be estimated accurately. In experiments on real speech contaminated by white Gaussian noise and factory noise, we found that the method we propose offered spectrum estimates that were robust against additive noise.
We propose a novel approach based on wavelet decomposition for progressive full spectral rendering. In the fourth progressive stage, our method renders an image that is 95% similar to the final non-progressive approach but requires less than 70% of the execution time. The quality of the rendered image is visually plausible that is indistinguishable from that of the non-progressive method. Our approach is graceful, efficient, progressive, and flexible for full spectral rendering.
In this letter, we propose a low-complexity estimation method of cyclic-prefix (CP) length for a discrete multitone (DMT) very high-speed digital subscriber line (VDSL) system. Using the sign bits of the received DMT VDSL signals, the proposed method provides a good estimate of CP length, which is suitable for various channel characteristics. This simple estimation method is consistent with the initialization procedure of T1E1.4 multi-carrier modulation (MCM)-based VDSL Standard. Finally, simulation results with VDSL test loops are presented.
Seung-Kyun RYU Hong-Goo KANG Sung-Kyo JUNG Dae-Hee YOUN
This paper proposes an algorithm to improve the performance of the noise power spectrum estimation using the minimum statistics (MS). The minimum statistics noise estimator (MSNE) that is most efficient for speech enhancement often underestimates noise power when the signal characteristics changes abruptly. The proposed algorithm improves the accuracy of noise estimation by removing harmonic components of the speech signal. Simulation results verify that the performance of the proposed algorithm is better than that of the conventional algorithm in terms of the segmental SNR (SegSNR) and the spectral distance (SD).
Fumiyuki ADACHI Kazuaki TAKEDA Hiromichi TOMEBA
In this Letter, a frequency-domain pre-rake transmission is presented for a direct sequence spread spectrum with time division duplex (DSSS/TDD) system under a frequency-selective fading channel. The mathematical relationship between frequency-domain and time-domain pre-rake transmissions is discussed. It is confirmed by the computer simulation that, similar to the time-domain pre-rake transmission, frequency-domain pre-rake transmission can improve the bit error rate (BER) performance. The frequency-domain pre-rake transmission shows only slight performance degradation compared to the frequency-domain rake reception for large SF.
Two methods using comparable corpora to select translation equivalents appropriate to a domain were devised and evaluated. The first method ranks translation equivalents of a target word according to similarity of their contexts to that of the target word. The second method ranks translation equivalents according to the ratio of associated words that suggest them. An experiment using the EDR bilingual dictionary together with Wall Street Journal and Nihon Keizai Shimbun corpora showed that the method using the ratio of associated words outperforms the method based on contextual similarity. Namely, in a quantitative evaluation using pseudo words, the maximum F-measure of the former method was 86%, while that of the latter method was 82%. The key feature of the method using the ratio of associated words is that it outputs selected translation equivalents together with representative associated words, enabling the translation equivalents to be validated.
Saburo TANAKA Shozen KUDO Yoshimi HATSUKADE Tatsuoki NAGAISHI Kazuaki NISHI Hajime OTA Shuichi SUZUKI
There is a possibility that individuals ingest contaminants that have been accidentally mixed with food because processed foods have become very common. Therefore a detection method of small contaminants in food and pharmaceuticals is required. High-Tc SQUID detection systems for metallic contaminants in foods and drugs have been developed for safety purposes. We developed two systems; one large system is for meat blocks and the other small system is for powdered drugs or packaged foods. Both systems consist of SQUID magnetometers, a permanent magnet for magnetization and a belt conveyor. All samples were magnetized before measurements and detected by high Tc SQUIDs. As a result, we successfully detected small syringe needles with a length of 2 mm in a meat block and a stainless steel ball as small as 0.3 mm in diameter.
Kiyohiro FURUTANI Takeshi HAMAMOTO Takeo MIKI Masaya NAKANO Takashi KONO Shigeru KIKUDA Yasuhiro KONISHI Tsutomu YOSHIHARA
This paper describes two circuit techniques useful for the design of high density and high speed low cost double data rate memories. One is a highly flexible row and column redundancy circuit which allows the division of flexible row redundancy unit into multiple column redundancy unit for higher flexibility, with a new test mode circuit which enables the use of the finer pitch laser fuse. Another is a compact read data path which allows the smooth data flow without wait time in the high frequency operation with less area penalty. These circuit techniques achieved the compact chip size with the cell efficiency of 60.6% and the high bandwidth of 400 MHz operation with CL=2.5.
Authentication server based security protocols are mainly used for enhancing security of wireless networks. In this paper, we specify RADIUS security protocol in wireless networks with Casper and CSP, and then verify their security properties such as secrecy and authentication using FDR. We also show that RADIUS protocol is vulnerable to the man-in-the-middle attack. In addition, we discuss its security weakness and potential countermeasures related with RADIUS. Finally, we fix it and propose a modified RADIUS protocol against the man-in-the-middle attack.
Shinji WATANABE Yasuhiro MINAMI Atsushi NAKAMURA Naonori UEDA
A Shared-State Hidden Markov Model (SS-HMM) has been widely used as an acoustic model in speech recognition. In this paper, we propose a method for constructing SS-HMMs within a practical Bayesian framework. Our method derives the Bayesian model selection criterion for the SS-HMM based on the variational Bayesian approach. The appropriate phonetic decision tree structure of the SS-HMM is found by using the Bayesian criterion. Unlike the conventional asymptotic criteria, this criterion is applicable even in the case of an insufficient amount of training data. The experimental results on isolated word recognition demonstrate that the proposed method does not require the tuning parameter that must be tuned according to the amount of training data, and is useful for selecting the appropriate SS-HMM structure for practical use.
Masahiro OGUSU Kazuhiko IDE Shigeru OHSHIMA
An inverse-RZ modulation scheme for dense WDM systems is proposed. Inverse-RZ signals have tolerances to chromatic dispersion and optical bandwidth limitation. The strongly pre-filtered inverse-RZ signals can be adapted to ultra-dense WDM systems, in which the spectral efficiencies are over 1.0 b/s/Hz. We have confirmed the error-free transmission of pre-filtered and co-polarized 40-Gb/s inverse-RZ signals where the channel intervals were 37.5 GHz.
Won Seug CHOI Harksoo KIM Jungyun SEO
Analysis of speech acts and discourse structures is essential to a dialogue understanding system because speech acts and discourse structures are closely tied with the speaker's intention. However, it has been difficult to infer a speech act and a discourse structure from a surface utterance because they highly depend on the context of the utterance. We propose a statistical dialogue analysis model to determine discourse structures as well as speech acts using a maximum entropy model. The model can automatically acquire probabilistic discourse knowledge from an annotated dialogue corpus. Moreover, the model can analyze speech acts and discourse structures in one framework. In the experiment, the model showed better performance than other previous works.
It is reported that TCP does not perform well in high-speed wide area networks. Because MulTCP behaves like the aggregate of N TCP flows, MulTCP can be used to achieve throughputs of 1 Gbps or more. However, no performance evaluation of MulTCP in high-speed wide area networks has been published. Computer simulations are used to evaluate the performance of MulTCP. The results clarify that synchronized packet losses greatly impact the performance of MulTCP.
Yukihiro TAHARA Hideyuki OH-HASHI Kazuyuki TOTANI Moriyasu MIYAZAKI Sei-ichi SAITO Osami ISHIDA
A low-loss serial power combiner using suspended stripline is described. It consists of novel broadside-coupled directional couplers which have shunt capacitances at the edges of the coupled sections. These additional shunt capacitances compensate for poor directivities of the couplers because of inhomogeneous dielectric in suspended stripline structure. The fabricated three-way power combiner has achieved good performance with insertion loss less than 0.23 dB over a bandwidth of 10% in 2 GHz band.
Akio OGIHARA Hitoshi UNNO Akira SHIOZAKI
We propose discrimination method of synthetic speech using pitch pattern of speech signal. By applying the proposed synthetic speech discrimination system as pre-process before the conventional HMM speaker verification system, we can improve the safety of conventional speaker verification system against imposture using synthetic speech. The proposed method distinguishes between synthetic speech and natural speech according to the pitch pattern which is distribution of value of normalized short-range autocorrelation function. We performed the experiment of user verification, and confirmed the validity of the proposed method.
In this study, we construct balanced Boolean functions with a high nonlinearity and an optimum algebraic degree for both odd and even dimensions. Our approach is based on modifying functions from the Maiorana-McFarland's superclass, which has been introduced by Carlet. A drawback of Maiorana-McFarland's function is that their restrictions obtained by fixing some variables in their input are affine. Affine functions are cryptographically weak functions, so there is a risk that this property will be exploited in attacks. Due to the contribution of Carlet, our constructions do not have the potential weakness that is shared by the Maiorana-McFarland construction or its modifications.