The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)

1501-1520hit(2504hit)

  • CIAIR In-Car Speech Corpus--Influence of Driving Status--

    Nobuo KAWAGUCHI  Shigeki MATSUBARA  Kazuya TAKEDA  Fumitada ITAKURA  

     
    LETTER

      Vol:
    E88-D No:3
      Page(s):
    578-582

    CIAIR, Nagoya University, has been compiling an in-car speech database since 1999. This paper discusses the basic information contained in this database and an analysis on the effects of driving status based on the database. We have developed a system called the Data Collection Vehicle (DCV), which supports synchronous recording of multi-channel audio data from 12 microphones which can be placed throughout the vehicle, multi-channel video recording from three cameras, and the collection of vehicle-related data. In the compilation process, each subject had conversations with three types of dialog system: a human, a "Wizard of Oz" system, and a spoken dialog system. Vehicle information such as speed, engine RPM, accelerator/brake-pedal pressure, and steering-wheel motion were also recorded. In this paper, we report on the effect that driving status has on phenomena specific to spoken language

  • Objective Quality Assessment of Wideband Speech Coding

    Nobuhiko KITAWAKI  Kou NAGAI  Takeshi YAMADA  

     
    PAPER-Network

      Vol:
    E88-B No:3
      Page(s):
    1111-1118

    Recently, wideband speech communication using 7 kHz-wideband speech coding, as described in ITU-T Recommendations G.722, G.722.1, and G.722.2, has become increasingly necessary for use in advanced IP telephony using PCs, since, for this application, hands-free communication using separate microphones and loudspeakers is indispensable, and in this situation wideband speech is particularly helpful in enhancing the naturalness of communication. An objective quality measurement methodology for wideband-speech coding has been studied, its essential components being an objective quality measure and an input test signal. This paper describes Wideband-PESQ conforming to the draft Annex to ITU-T Recommendation P.862, "Perceptual Evaluation of Speech Quality (PESQ)," as the objective quality measure, by evaluating the consistency between the subjectively evaluated MOS (Mean Opinion Score) and objectively estimated MOS. This paper also describes the verification of artificial voice conforming to Recommendation P.50 "Artificial Voices," as the input test signal for such measurements, by evaluating the consistency between the objectively estimated MOS using a real voice and that obtained using an artificial voice.

  • Modeling Improved Prosody Generation from High-Level Linguistically Annotated Corpora

    Gerasimos XYDAS  Dimitris SPILIOTOPOULOS  Georgios KOUROUPETROGLOU  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E88-D No:3
      Page(s):
    510-518

    Synthetic speech usually suffers from bad F0 contour surface. The prediction of the underlying pitch targets robustly relies on the quality of the predicted prosodic structures, i.e. the corresponding sequences of tones and breaks. In the present work, we have utilized a linguistically enriched annotated corpus to build data-driven models for predicting prosodic structures with increased accuracy. We have then used a linear regression approach for the F0 modeling. An appropriate XML annotation scheme has been introduced to encode syntax, grammar, new or already given information, phrase subject/object information, as well as rhetorical elements in the corpus, by exploiting a Natural Language Generator (NLG) system. To prove the benefits from the introduction of the enriched input meta-information, we first show that while tone and break CART predictors have high accuracy when standing alone (92.35% for breaks, 87.76% for accents and 99.03% for endtones), their application in the TtS chain degrades the Linear Regression pitch target model. On the other hand, the enriched linguistic meta-information minimizes errors of models leading to a more natural F0 surface. Both objective and subjective evaluation were adopted for the intonation contours by taking into account the propagated errors introduced by each model in the synthesis chain.

  • Wavelet Feature Selection Using Fuzzy Approach to Text Independent Speaker Recognition

    Shung-Yung LUNG  

     
    LETTER-Speech and Hearing

      Vol:
    E88-A No:3
      Page(s):
    779-781

    A wavelet feature selection derived by using fuzzy evaluation index for speaker identification is described. The concept of a flexible membership function incorporating weighed distance is introduced in the evaluation index to make the modeling of clusters more appropriate. Our results have shown that this feature selection introduced better performance than the wavelet features with respect to the percentages of recognition.

  • Noise-Robust Speech Analysis Using Running Spectrum Filtering

    Qi ZHU  Noriyuki OHTSUKI  Yoshikazu MIYANAGA  Norinobu YOSHIDA  

     
    PAPER-Speech and Hearing

      Vol:
    E88-A No:2
      Page(s):
    541-548

    This paper proposes a new robust adaptive processing algorithm that is based on the extended least squares (ELS) method with running spectrum filtering (RSF). By utilizing the different characteristics of running spectra between speech signals and noise signals, RSF can retain speech characteristics while noise is effectively reduced. Then, by using ELS, autoregressive moving average (ARMA) parameters can be estimated accurately. In experiments on real speech contaminated by white Gaussian noise and factory noise, we found that the method we propose offered spectrum estimates that were robust against additive noise.

  • Progressive Spectral Rendering Using Wavelet Decomposition

    Jin-Ren CHERN  Chung-Ming WANG  

     
    LETTER-Computer Graphics

      Vol:
    E88-D No:2
      Page(s):
    341-345

    We propose a novel approach based on wavelet decomposition for progressive full spectral rendering. In the fourth progressive stage, our method renders an image that is 95% similar to the final non-progressive approach but requires less than 70% of the execution time. The quality of the rendered image is visually plausible that is indistinguishable from that of the non-progressive method. Our approach is graceful, efficient, progressive, and flexible for full spectral rendering.

  • Low-Complexity Estimation Method of Cyclic-Prefix Length for DMT VDSL System

    Hui-Chul WON  Gi-Hong IM  

     
    LETTER-Transmission Systems and Transmission Equipment for Communications

      Vol:
    E88-B No:2
      Page(s):
    758-761

    In this letter, we propose a low-complexity estimation method of cyclic-prefix (CP) length for a discrete multitone (DMT) very high-speed digital subscriber line (VDSL) system. Using the sign bits of the received DMT VDSL signals, the proposed method provides a good estimate of CP length, which is suitable for various channel characteristics. This simple estimation method is consistent with the initialization procedure of T1E1.4 multi-carrier modulation (MCM)-based VDSL Standard. Finally, simulation results with VDSL test loops are presented.

  • Improving the Performance of the Minimum Statistics Noise Estimator for Single Channel Speech Enhancement

    Seung-Kyun RYU  Hong-Goo KANG  Sung-Kyo JUNG  Dae-Hee YOUN  

     
    LETTER-Speech and Hearing

      Vol:
    E88-A No:2
      Page(s):
    582-585

    This paper proposes an algorithm to improve the performance of the noise power spectrum estimation using the minimum statistics (MS). The minimum statistics noise estimator (MSNE) that is most efficient for speech enhancement often underestimates noise power when the signal characteristics changes abruptly. The proposed algorithm improves the accuracy of noise estimation by removing harmonic components of the speech signal. Simulation results verify that the performance of the proposed algorithm is better than that of the conventional algorithm in terms of the segmental SNR (SegSNR) and the spectral distance (SD).

  • Frequency-Domain Pre-Rake Transmission for DSSS/TDD Mobile Communications Systems

    Fumiyuki ADACHI  Kazuaki TAKEDA  Hiromichi TOMEBA  

     
    LETTER-Wireless Communication Technologies

      Vol:
    E88-B No:2
      Page(s):
    784-787

    In this Letter, a frequency-domain pre-rake transmission is presented for a direct sequence spread spectrum with time division duplex (DSSS/TDD) system under a frequency-selective fading channel. The mathematical relationship between frequency-domain and time-domain pre-rake transmissions is discussed. It is confirmed by the computer simulation that, similar to the time-domain pre-rake transmission, frequency-domain pre-rake transmission can improve the bit error rate (BER) performance. The frequency-domain pre-rake transmission shows only slight performance degradation compared to the frequency-domain rake reception for large SF.

  • Adapting a Bilingual Dictionary to Domains

    Hiroyuki KAJI  

     
    PAPER-Natural Language Processing

      Vol:
    E88-D No:2
      Page(s):
    302-312

    Two methods using comparable corpora to select translation equivalents appropriate to a domain were devised and evaluated. The first method ranks translation equivalents of a target word according to similarity of their contexts to that of the target word. The second method ranks translation equivalents according to the ratio of associated words that suggest them. An experiment using the EDR bilingual dictionary together with Wall Street Journal and Nihon Keizai Shimbun corpora showed that the method using the ratio of associated words outperforms the method based on contextual similarity. Namely, in a quantitative evaluation using pseudo words, the maximum F-measure of the former method was 86%, while that of the latter method was 82%. The key feature of the method using the ratio of associated words is that it outputs selected translation equivalents together with representative associated words, enabling the translation equivalents to be validated.

  • High-Tc SQUID Metal Detection System for Food and Pharmaceutical Contaminants

    Saburo TANAKA  Shozen KUDO  Yoshimi HATSUKADE  Tatsuoki NAGAISHI  Kazuaki NISHI  Hajime OTA  Shuichi SUZUKI  

     
    INVITED PAPER

      Vol:
    E88-C No:2
      Page(s):
    175-179

    There is a possibility that individuals ingest contaminants that have been accidentally mixed with food because processed foods have become very common. Therefore a detection method of small contaminants in food and pharmaceuticals is required. High-Tc SQUID detection systems for metallic contaminants in foods and drugs have been developed for safety purposes. We developed two systems; one large system is for meat blocks and the other small system is for powdered drugs or packaged foods. Both systems consist of SQUID magnetometers, a permanent magnet for magnetization and a belt conveyor. All samples were magnetized before measurements and detected by high Tc SQUIDs. As a result, we successfully detected small syringe needles with a length of 2 mm in a meat block and a stainless steel ball as small as 0.3 mm in diameter.

  • Highly Flexible Row and Column Redundancy and Cycle Time Adaptive Read Data Path for Double Data Rate Synchronous Memories

    Kiyohiro FURUTANI  Takeshi HAMAMOTO  Takeo MIKI  Masaya NAKANO  Takashi KONO  Shigeru KIKUDA  Yasuhiro KONISHI  Tsutomu YOSHIHARA  

     
    PAPER-Integrated Electronics

      Vol:
    E88-C No:2
      Page(s):
    255-263

    This paper describes two circuit techniques useful for the design of high density and high speed low cost double data rate memories. One is a highly flexible row and column redundancy circuit which allows the division of flexible row redundancy unit into multiple column redundancy unit for higher flexibility, with a new test mode circuit which enables the use of the finer pitch laser fuse. Another is a compact read data path which allows the smooth data flow without wait time in the high frequency operation with less area penalty. These circuit techniques achieved the compact chip size with the cell efficiency of 60.6% and the high bandwidth of 400 MHz operation with CL=2.5.

  • Model Checking of RADIUS Protocol in Wireless Networks

    Il-Gon KIM  Jin-Young CHOI  

     
    LETTER-Internet

      Vol:
    E88-B No:1
      Page(s):
    397-398

    Authentication server based security protocols are mainly used for enhancing security of wireless networks. In this paper, we specify RADIUS security protocol in wireless networks with Casper and CSP, and then verify their security properties such as secrecy and authentication using FDR. We also show that RADIUS protocol is vulnerable to the man-in-the-middle attack. In addition, we discuss its security weakness and potential countermeasures related with RADIUS. Finally, we fix it and propose a modified RADIUS protocol against the man-in-the-middle attack.

  • Selection of Shared-State Hidden Markov Model Structure Using Bayesian Criterion

    Shinji WATANABE  Yasuhiro MINAMI  Atsushi NAKAMURA  Naonori UEDA  

     
    PAPER

      Vol:
    E88-D No:1
      Page(s):
    1-9

    A Shared-State Hidden Markov Model (SS-HMM) has been widely used as an acoustic model in speech recognition. In this paper, we propose a method for constructing SS-HMMs within a practical Bayesian framework. Our method derives the Bayesian model selection criterion for the SS-HMM based on the variational Bayesian approach. The appropriate phonetic decision tree structure of the SS-HMM is found by using the Bayesian criterion. Unlike the conventional asymptotic criteria, this criterion is applicable even in the case of an insufficient amount of training data. The experimental results on isolated word recognition demonstrate that the proposed method does not require the tuning parameter that must be tuned according to the amount of training data, and is useful for selecting the appropriate SS-HMM structure for practical use.

  • Ultra-Dense WDM with over 100% Spectral Efficiency Using Co-polarized 40-Gb/s Inverse-RZ Signals

    Masahiro OGUSU  Kazuhiko IDE  Shigeru OHSHIMA  

     
    PAPER-Transmission Systems and Transmission Equipment for Communications

      Vol:
    E88-B No:1
      Page(s):
    195-202

    An inverse-RZ modulation scheme for dense WDM systems is proposed. Inverse-RZ signals have tolerances to chromatic dispersion and optical bandwidth limitation. The strongly pre-filtered inverse-RZ signals can be adapted to ultra-dense WDM systems, in which the spectral efficiencies are over 1.0 b/s/Hz. We have confirmed the error-free transmission of pre-filtered and co-polarized 40-Gb/s inverse-RZ signals where the channel intervals were 37.5 GHz.

  • An Integrated Dialogue Analysis Model for Determining Speech Acts and Discourse Structures

    Won Seug CHOI  Harksoo KIM  Jungyun SEO  

     
    PAPER-Natural Language Processing

      Vol:
    E88-D No:1
      Page(s):
    150-157

    Analysis of speech acts and discourse structures is essential to a dialogue understanding system because speech acts and discourse structures are closely tied with the speaker's intention. However, it has been difficult to infer a speech act and a discourse structure from a surface utterance because they highly depend on the context of the utterance. We propose a statistical dialogue analysis model to determine discourse structures as well as speech acts using a maximum entropy model. The model can automatically acquire probabilistic discourse knowledge from an annotated dialogue corpus. Moreover, the model can analyze speech acts and discourse structures in one framework. In the experiment, the model showed better performance than other previous works.

  • Performance Evaluation of MulTCP in High-Speed Wide Area Networks

    Masayoshi NABESHIMA  

     
    LETTER-Internet

      Vol:
    E88-B No:1
      Page(s):
    392-396

    It is reported that TCP does not perform well in high-speed wide area networks. Because MulTCP behaves like the aggregate of N TCP flows, MulTCP can be used to achieve throughputs of 1 Gbps or more. However, no performance evaluation of MulTCP in high-speed wide area networks has been published. Computer simulations are used to evaluate the performance of MulTCP. The results clarify that synchronized packet losses greatly impact the performance of MulTCP.

  • A Low-Loss Serial Power Combiner Using Novel Suspended Stripline Couplers

    Yukihiro TAHARA  Hideyuki OH-HASHI  Kazuyuki TOTANI  Moriyasu MIYAZAKI  Sei-ichi SAITO  Osami ISHIDA  

     
    PAPER

      Vol:
    E88-C No:1
      Page(s):
    15-19

    A low-loss serial power combiner using suspended stripline is described. It consists of novel broadside-coupled directional couplers which have shunt capacitances at the edges of the coupled sections. These additional shunt capacitances compensate for poor directivities of the couplers because of inhomogeneous dielectric in suspended stripline structure. The fabricated three-way power combiner has achieved good performance with insertion loss less than 0.23 dB over a bandwidth of 10% in 2 GHz band.

  • Discrimination Method of Synthetic Speech Using Pitch Frequency against Synthetic Speech Falsification

    Akio OGIHARA  Hitoshi UNNO  Akira SHIOZAKI  

     
    PAPER-Biometrics

      Vol:
    E88-A No:1
      Page(s):
    280-286

    We propose discrimination method of synthetic speech using pitch pattern of speech signal. By applying the proposed synthetic speech discrimination system as pre-process before the conventional HMM speaker verification system, we can improve the safety of conventional speaker verification system against imposture using synthetic speech. The proposed method distinguishes between synthetic speech and natural speech according to the pitch pattern which is distribution of value of normalized short-range autocorrelation function. We performed the experiment of user verification, and confirmed the validity of the proposed method.

  • Constructing Boolean Functions by Modifying Maiorana-McFarland's Superclass Functions

    Xiangyong ZENG  Lei HU  

     
    PAPER-Symmetric Key Cryptography

      Vol:
    E88-A No:1
      Page(s):
    59-66

    In this study, we construct balanced Boolean functions with a high nonlinearity and an optimum algebraic degree for both odd and even dimensions. Our approach is based on modifying functions from the Maiorana-McFarland's superclass, which has been introduced by Carlet. A drawback of Maiorana-McFarland's function is that their restrictions obtained by fixing some variables in their input are affine. Affine functions are cryptographically weak functions, so there is a risk that this property will be exploited in attacks. Due to the contribution of Carlet, our constructions do not have the potential weakness that is shared by the Maiorana-McFarland construction or its modifications.

1501-1520hit(2504hit)