The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] ATI(18690hit)

10841-10860hit(18690hit)

  • Channel-Count-Independent BIST for Multi-Channel SerDes

    Kouichi YAMAGUCHI  Muneo FUKAISHI  

     
    PAPER-Interface and Interconnect Techniques

      Vol:
    E89-C No:3
      Page(s):
    314-319

    This paper describes a BIST circuit for testing SoC integrated multi-channel serializer/deserializer (SerDes) macros. A newly developed packet-based PRBS generator enables the BIST to perform at-speed testing of asynchronous data transfers. In addition, a new technique for chained alignment checks between adjacent channels helps achieve a channel-count-independent architecture for verification of multi-channel alignment between SerDes macros. Fabricated in a 0.13-µm CMOS process and operating at > 500 MHz, the BIST has successfully verified all SerDes functions in at-speed testing of 5-Gbps20-ch SerDes macros.

  • Text-Independent/Text-Prompted Speaker Recognition by Combining Speaker-Specific GMM with Speaker Adapted Syllable-Based HMM

    Seiichi NAKAGAWA  Wei ZHANG  Mitsuo TAKAHASHI  

     
    PAPER-Speaker Recognition

      Vol:
    E89-D No:3
      Page(s):
    1058-1065

    We presented a new text-independent/text-prompted speaker recognition method by combining speaker-specific Gaussian Mixture Model (GMM) with syllable-based HMM adapted by MLLR or MAP. The robustness of this speaker recognition method for speaking style's change was evaluated in this paper. The speaker identification experiment using NTT database which consists of sentences data uttered at three speed modes (normal, fast and slow) by 35 Japanese speakers (22 males and 13 females) on five sessions over ten months was conducted. Each speaker uttered only 5 training utterances (about 20 seconds in total). A combination method reduced the identification error rate by about 50%. We obtained the accuracy of 98.8% for text-independent speaker identification for three speaking style modes (normal, fast, slow) by using a short test utterance (about 4 seconds). Especially, we obtained the accuracy of 99.4% for normal speaking mode. This result was superior to conventional methods for the same database. We show that the attractive result was brought from the compensational effect between speaker specific GMM and speaker adapted syllable based HMM.

  • A Frame Detector for Zero-Padded OFDM Systems

    Young-Hwan YOU  Eu-Suk SHIM  Hyoung-Kyu SONG  

     
    LETTER-Transmission Systems and Transmission Equipment for Communications

      Vol:
    E89-B No:3
      Page(s):
    963-965

    This letter proposes an orthogonal frequency division multiplexing (OFDM) frame synchronization scheme when the guard interval (GI) consists of a zero-padded (ZP) sequence. The frame synchronization method uses the ZP symbol where nothing is transmitted for GI so that the drop in received power can be detected to find the beginning of the frame. Simulations reveal that this method significantly improves synchronization performance of the ZP-OFDM system in a multipath fading channel.

  • On the Number of Integrators Needed for Dynamic Observer Error Linearization via Integrators

    Kyungtak YU  Nam-Hoon JO  Jin Heon SEO  

     
    LETTER-Systems and Control

      Vol:
    E89-A No:3
      Page(s):
    817-821

    In this letter, an illustrative example is given, which shows that the number of integrators needed for the dynamic observer error linearization using integrators can not be bounded by a function of the dimension of the system and the number of outputs in contrast to dynamic feedback linearization results.

  • Theoretical Limits on Sequences with Ear Zero/Low Correlation Zones

    Fanxin ZENG  

     
    LETTER-Fundamental Theories for Communications

      Vol:
    E89-B No:3
      Page(s):
    949-951

    Sequences with ear zero correlation zones (EZCZs) are employed to suppress inter-symbol interference (ISI) and inter-user interference (IUI) in wireless communications. Theoretical limits on correlation functions of such sequences are investigated, lower bounds on the relations among length of sequence, width of EZCZs/ELCZs and family size are derived and presented, which play an important role in assessing performance of such sequences.

  • High-Performance Distributed Raman Amplification Systems with Limited Pump Power

    Hiroji MASUDA  Masahito TOMIZAWA  Yutaka MIYAMOTO  Kazuo HAGIMOTO  

     
    PAPER-Fiber-Optic Transmission for Communications

      Vol:
    E89-B No:3
      Page(s):
    715-723

    We have clarified both theoretically and experimentally the basic performance of distributed Raman amplification (DRA) transmission systems in trunk networks with DSF or SMF spans with limited pump power where the pump power is limited by practical considerations. The gain and noise characteristics of a fiber span with splice loss are accurately determined by employing three approximation models. A novel pumping scheme called band enhanced pumping (BEP) is proposed that improves the DRA gain and optical SNR (OSNR) by 1.5 and 0.55 dB, respectively, compared with those of a conventional pumping scheme, under typical system conditions. We show that a DRA system with a DSF span has OSNRs that are 2.1 and 2.9 dB higher than those of a system with an SMF span at limited pump powers of 200 and 400 mW, respectively, as typical examples.

  • Single-Channel Multiple Regression for In-Car Speech Enhancement

    Weifeng LI  Katsunobu ITOU  Kazuya TAKEDA  Fumitada ITAKURA  

     
    PAPER-Speech Enhancement

      Vol:
    E89-D No:3
      Page(s):
    1032-1039

    We address issues for improving hands-free speech enhancement and speech recognition performance in different car environments using a single distant microphone. This paper describes a new single-channel in-car speech enhancement method that estimates the log spectra of speech at a close-talking microphone based on the nonlinear regression of the log spectra of noisy signal captured by a distant microphone and the estimated noise. The proposed method provides significant overall quality improvements in our subjective evaluation on the regression-enhanced speech, and performed best in most objective measures. Based on our isolated word recognition experiments conducted under 15 real car environments, the proposed adaptive nonlinear regression approach shows an advantage in average relative word error rate (WER) reductions of 50.8% and 13.1%, respectively, compared to original noisy speech and ETSI advanced front-end (ETSI ES 202 050).

  • Substring Count Estimation in Extremely Long Strings

    Jinuk BAE  Sukho LEE  

     
    PAPER-Database

      Vol:
    E89-D No:3
      Page(s):
    1148-1156

    To estimate the number of substring matches against string data, count suffix trees (CS-tree) have been used as a kind of alphanumeric histograms. Although the trees are useful for substring count estimation in short data strings (e.g. name or title), they reveal several drawbacks when the target is changed to extremely long strings. First, it becomes too hard or at least slow to build CS-trees, because their origin, the suffix tree, has memory-bottleneck problem with long strings. Secondly, some of CS-tree-node counts are incorrect due to frequent pruning of nodes. Therefore, we propose the count q-gram tree (CQ-tree) as an alphanumeric histogram for long strings. By adopting q-grams (or length-q substrings), CQ-trees can be created fast and correctly within small available memory. Furthermore, we mathematically provide the lower and upper bounds that the count estimation can reach to. To the best of our knowledge, our work is the first one to present such bounds among research activities to estimate the alphanumeric selectivity. Our experimental study shows that the CQ-tree outperforms the CS-tree in terms of the building time and accuracy.

  • Channel Characterization and Performance Evaluation of Mobile Communication Employing Stratospheric Platforms

    ISKANDAR  Shigeru SHIMAMOTO  

     
    PAPER-Integrated Systems for Communications

      Vol:
    E89-B No:3
      Page(s):
    937-944

    Stratospheric platforms have been recently proposed as a new wireless infrastructure for realizing the next generation of communication systems. To provide high quality services, an investigation of the wireless stratospheric platform channel is essential. This paper proposes a definition and describes an analysis of the wireless channel for the link between stratospheric platforms and terrestrial mobile users based on an experiment in a semi-urban environment. Narrowband channel characteristics are presented in terms of Ricean factor (K factor) and local mean received power over a wide range of elevation angles ranging from 10to 90. Finally, we evaluated average bit error probability based on the proposed channel model to examine the channel performance. For the environment in which the measurements were conducted, we find that elevation angles greater than 40yield better performance.

  • Robust Talker Direction Estimation Based on Weighted CSP Analysis and Maximum Likelihood Estimation

    Yuki DENDA  Takanobu NISHIURA  Yoichi YAMASHITA  

     
    PAPER-Speech Enhancement

      Vol:
    E89-D No:3
      Page(s):
    1050-1057

    This paper describes a new talker direction estimation method for front-end processing to capture distant-talking speech by using a microphone array. The proposed method consists of two algorithms: One is a TDOA (Time Delay Of Arrival) estimation algorithm based on a weighted CSP (Cross-power Spectrum Phase) analysis with an average speech spectrum and CSP coefficient subtraction. The other is a talker direction estimation algorithm based on ML (Maximum Likelihood) estimation in a time sequence of the estimated TDOAs. To evaluate the effectiveness of the proposed method, talker direction estimation experiments were carried out in an actual office room. The results confirmed that the talker direction estimation performance of the proposed method is superior to that of the conventional methods in both diffused- and directional-noise environments.

  • ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles

    Shigeki MATSUDA  Takatoshi JITSUHIRO  Konstantin MARKOV  Satoshi NAKAMURA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    989-997

    In this paper, we describe a parallel decoding-based ASR system developed of ATR that is robust to noise type, SNR and speaking style. It is difficult to recognize speech affected by various factors, especially when an ASR system contains only a single acoustic model. One solution is to employ multiple acoustic models, one model for each different condition. Even though the robustness of each acoustic model is limited, the whole ASR system can handle various conditions appropriately. In our system, there are two recognition sub-systems which use different features such as MFCC and Differential MFCC (DMFCC). Each sub-system has several acoustic models depending on SNR, speaker gender and speaking style, and during recognition each acoustic model is adapted by fast noise adaptation. From each sub-system, one hypothesis is selected based on posterior probability. The final recognition result is obtained by combining the best hypotheses from the two sub-systems. On the AURORA-2J task used widely for the evaluation of noise robustness, our system achieved higher recognition performance than a system which contains only a single model. Also, our system was tested using normal and hyper-articulated speech contaminated by several background noises, and exhibited high robustness to noise and speaking styles.

  • A High-Accuracy Passive 3D Measurement System Using Phase-Based Image Matching

    Mohammad Abdul MUQUIT  Takuma SHIBAHARA  Takafumi AOKI  

     
    PAPER-Image/Vision Processing

      Vol:
    E89-A No:3
      Page(s):
    686-697

    This paper presents a high-accuracy 3D (three-dimen-sional) measurement system using multi-camera passive stereo vision to reconstruct 3D surfaces of free form objects. The proposed system is based on an efficient stereo correspondence technique, which consists of (i) coarse-to-fine correspondence search, and (ii) outlier detection and correction, both employing phase-based image matching. The proposed sub-pixel correspondence search technique contributes to dense reconstruction of arbitrary-shaped 3D surfaces with high accuracy. The outlier detection and correction technique contributes to high reliability of reconstructed 3D points. Through a set of experiments, we show that the proposed system measures 3D surfaces of objects with sub-mm accuracy. Also, we demonstrate high-quality dense 3D reconstruction of a human face as a typical example of free form objects. The result suggests a potential possibility of our approach to be used in many computer vision applications.

  • Spatial Fading Simulator Using a Cavity-Excited Circular Array (CECA) for Performance Evaluation of Antenna Arrays

    Chulgyun PARK  Jun-ichi TAKADA  Kei SAKAGUCHI  Takashi OHIRA  

     
    PAPER-Antennas and Propagation

      Vol:
    E89-B No:3
      Page(s):
    906-913

    In this paper we propose a novel spatial fading simulator to evaluate the performance of an array antenna and show its spatial stochastic characteristics by computer simulation based on parameters verified by experimental data. We introduce a cavity-excited circular array (CECA) as a fading simulator that can simulate realistic mobile communication environments. To evaluate the antenna array, two stochastic characteristics are necessary. The first one is the fading phenomenon and the second is the angular spread (AS) of the incident wave. The computer simulation results with respect to fading and AS show that CECA works well as a spatial fading simulator for performance evaluation of an antenna array. We first present the basic structure, features and design methodology of CECA, and then show computer simulation results of the spatial stochastic characteristics. The results convince us that CECA is useful to evaluate performance of antenna arrays.

  • DSRED: A New Queue Management Scheme for the Next Generation Internet

    Bing ZHENG  Mohammed ATIQUZZAMAN  

     
    PAPER-Internet

      Vol:
    E89-B No:3
      Page(s):
    764-774

    Random Early Detection (RED), an active queue management scheme, has been recommended by the Internet Engineering Task Force (IETF) for the next generation routers. RED suffers from a number of performance problems, such as low throughput, large delay/jitter, and induces instability in networks. Many of the previous attempts to improve the performance of RED have been based on optimizing the values of the RED parameters. However, results have shown that such optimizations resulted in limited improvement in the performance. In this paper, we propose Double Slope RED (DSRED), a new active queue management scheme to improve the performance of RED. The proposed scheme is based on dynamically changing the slope of the packet drop probability curve as a function of the level of congestion in the buffer. Results show that our proposed scheme results in better performance than original RED.

  • PS-ZCPA Based Feature Extraction with Auditory Masking, Modulation Enhancement and Noise Reduction for Robust ASR

    Muhammad GHULAM  Takashi FUKUDA  Kouichi KATSURADA  Junsei HORIKAWA  Tsuneo NITTA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    1015-1023

    A pitch-synchronous (PS) auditory feature extraction method based on ZCPA (Zero-Crossings Peak-Amplitudes) was proposed previously and showed more robustness over a conventional ZCPA and MFCC based features. In this paper, firstly, a non-linear adaptive threshold adjustment procedure is introduced into the PS-ZCPA method to get optimal results in noisy conditions with different signal-to-noise ratio (SNR). Next, auditory masking, a well-known auditory perception, and modulation enhancement that simulates a strong relationship between modulation spectrums and intelligibility of speech are embedded into the PS-ZCPA method. Finally, a Wiener filter based noise reduction procedure is integrated into the method to make it more noise-robust, and the performance is evaluated against ETSI ES202 (WI008), which is a standard front-end for distributed speech recognition. All the experiments were carried out on Aurora-2J database. The experimental results demonstrated improved performance of the PS-ZCPA method by embedding auditory masking into it, and a slightly improved performance by using modulation enhancement. The PS-ZCPA method with Wiener filter based noise reduction also showed better performance than ETSI ES202 (WI008).

  • Adaptive Clock Recovery Method Utilizing Proportional-Integral-Derivative (PID) Control for Circuit Emulation

    Youichi FUKADA  Takeshi YASUDA  Shuji KOMATSU  Koichi SAITO  Yoichi MAEDA  Yasuyuki OKUMURA  

     
    PAPER

      Vol:
    E89-B No:3
      Page(s):
    690-695

    This paper describes a novel adaptive clock recovery method that uses proportional-integral-derivative (PID) control. The adaptive clock method is a clock recovery technique that synchronizes connected terminals via packet networks, and will be indispensable for circuit emulation services in the next generation Ethernet. Our adaptive clock method simultaneously achieves a short starting-time, accuracy, stable recovery clock frequency, and few buffer delays using the PID control technique. We explain the numerical simulations, experimental results, and circuit designs.

  • Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models

    Randy GOMEZ  Akinobu LEE  Tomoki TODA  Hiroshi SARUWATARI  Kiyohiro SHIKANO  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    998-1005

    This paper describes the method of using multi-template unsupervised speaker adaptation based on HMM-Sufficient Statistics to push up the adaptation performance while keeping adaptation time within few seconds with just one arbitrary utterance. This adaptation scheme is mainly composed of two processes. The first part is done offline which involves the training of multiple class-dependent acoustic models and the creation of speakers' HMM-Sufficient Statistics based on gender and age. The second part is performed online where adaptation begins using the single utterance of a test speaker. From this utterance, the system will classify the speaker's class and consequently select the N-best neighbor speakers close to the utterance using Gaussian Mixture Models (GMM). The classified speakers' class template model is then adopted as a base model. From this template model, the adapted model is rapidly constructed using the N-best neighbor speakers' HMM-Sufficient Statistics. Experiments in noisy environment conditions with 20 dB, 15 dB and 10 dB SNR office, crowd, booth, and car noise are performed. The proposed multi-template method achieved 89.5% word accuracy rate compared with 88.1% of the conventional single-template method, while the baseline recognition rate without adaptation is 86.4%. Moreover, experiments using Vocal Tract Length Normalization (VTLN) and supervised Maximum Likelihood Linear Regression (MLLR) are also compared.

  • A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features

    Makoto TACHIBANA  Junichi YAMAGISHI  Takashi MASUKO  Takao KOBAYASHI  

     
    PAPER-Speech Synthesis

      Vol:
    E89-D No:3
      Page(s):
    1092-1099

    This paper proposes a technique for synthesizing speech with a desired speaking style and/or emotional expression, based on model adaptation in an HMM-based speech synthesis framework. Speaking styles and emotional expressions are characterized by many segmental and suprasegmental features in both spectral and prosodic features. Therefore, it is essential to take account of these features in the model adaptation. The proposed technique called style adaptation, deals with this issue. Firstly, the maximum likelihood linear regression (MLLR) algorithm, based on a framework of hidden semi-Markov model (HSMM) is presented to provide a mathematically rigorous and robust adaptation of state duration and to adapt both the spectral and prosodic features. Then, a novel tying method for the regression matrices of the MLLR algorithm is also presented to allow the incorporation of both the segmental and suprasegmental speech features into the style adaptation. The proposed tying method uses regression class trees with contextual information. From the results of several subjective tests, we show that these techniques can perform style adaptation while maintaining naturalness of the synthetic speech.

  • Comparative Study of Speaker Identification Methods: dPLRM, SVM and GMM

    Tomoko MATSUI  Kunio TANABE  

     
    PAPER-Speaker Recognition

      Vol:
    E89-D No:3
      Page(s):
    1066-1073

    A comparison of performances is made of three text-independent speaker identification methods based on dual Penalized Logistic Regression Machine (dPLRM), Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) with experiments by 10 male speakers. The methods are compared for the speech data which were collected over the period of 13 months in 6 utterance-sessions of which the earlier 3 sessions were for obtaining training data of 12 seconds' utterances. Comparisons are made with the Mel-frequency cepstrum (MFC) data versus the log-power spectrum data and also with training data in a single session versus in plural ones. It is shown that dPLRM with the log-power spectrum data is competitive with SVM and GMM methods with MFC data, when trained for the combined data collected in the earlier three sessions. dPLRM outperforms GMM method especially as the amount of training data becomes smaller. Some of these findings have been already reported in [1]-[3].

  • Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes

    Takashi SAITO  

     
    PAPER-Speech Analysis

      Vol:
    E89-D No:3
      Page(s):
    1100-1106

    This paper describes a method of generating F0 contours from natural F0 segmental shapes for speech synthesis. The extracted shapes of the F0 units are basically held invariant by eliminating any averaging operations in the analysis phase and by minimizing modification operations in the synthesis phase. The use of natural F0 shapes has great potential to cover a wide variety of speaking styles with the same framework, including not only read-aloud speech, but also dialogues and emotional speech. A linear-regression statistical model is used to "manipulate" the stored raw F0 shapes to build them up into a sentential F0 contour. Through experimental evaluations, the proposed model is shown to provide stable and robust F0 contour prediction for various speakers. By using this model, linguistically derived information about a sentence can be directly mapped, in a purely data-driven manner, to acoustic F0 values of the sentential intonation contour for a given target speaker.

10841-10860hit(18690hit)