IEICE global.ieice.org Site

Keyword Search Result

[Keyword] ATI(18690hit)

10841-10860hit(18690hit)

Channel-Count-Independent BIST for Multi-Channel SerDes
Kouichi YAMAGUCHI Muneo FUKAISHI

PAPER-Interface and Interconnect Techniques

Vol:
E89-C No:3
Page(s):
314-319
This paper describes a BIST circuit for testing SoC integrated multi-channel serializer/deserializer (SerDes) macros. A newly developed packet-based PRBS generator enables the BIST to perform at-speed testing of asynchronous data transfers. In addition, a new technique for chained alignment checks between adjacent channels helps achieve a channel-count-independent architecture for verification of multi-channel alignment between SerDes macros. Fabricated in a 0.13-µm CMOS process and operating at > 500 MHz, the BIST has successfully verified all SerDes functions in at-speed testing of 5-Gbps20-ch SerDes macros.
Text-Independent/Text-Prompted Speaker Recognition by Combining Speaker-Specific GMM with Speaker Adapted Syllable-Based HMM
Seiichi NAKAGAWA Wei ZHANG Mitsuo TAKAHASHI

PAPER-Speaker Recognition

Vol:
E89-D No:3
Page(s):
1058-1065
We presented a new text-independent/text-prompted speaker recognition method by combining speaker-specific Gaussian Mixture Model (GMM) with syllable-based HMM adapted by MLLR or MAP. The robustness of this speaker recognition method for speaking style's change was evaluated in this paper. The speaker identification experiment using NTT database which consists of sentences data uttered at three speed modes (normal, fast and slow) by 35 Japanese speakers (22 males and 13 females) on five sessions over ten months was conducted. Each speaker uttered only 5 training utterances (about 20 seconds in total). A combination method reduced the identification error rate by about 50%. We obtained the accuracy of 98.8% for text-independent speaker identification for three speaking style modes (normal, fast, slow) by using a short test utterance (about 4 seconds). Especially, we obtained the accuracy of 99.4% for normal speaking mode. This result was superior to conventional methods for the same database. We show that the attractive result was brought from the compensational effect between speaker specific GMM and speaker adapted syllable based HMM.
A Frame Detector for Zero-Padded OFDM Systems
Young-Hwan YOU Eu-Suk SHIM Hyoung-Kyu SONG

LETTER-Transmission Systems and Transmission Equipment for Communications

Vol:
E89-B No:3
Page(s):
963-965
This letter proposes an orthogonal frequency division multiplexing (OFDM) frame synchronization scheme when the guard interval (GI) consists of a zero-padded (ZP) sequence. The frame synchronization method uses the ZP symbol where nothing is transmitted for GI so that the drop in received power can be detected to find the beginning of the frame. Simulations reveal that this method significantly improves synchronization performance of the ZP-OFDM system in a multipath fading channel.
On the Number of Integrators Needed for Dynamic Observer Error Linearization via Integrators
Kyungtak YU Nam-Hoon JO Jin Heon SEO

LETTER-Systems and Control

Vol:
E89-A No:3
Page(s):
817-821
In this letter, an illustrative example is given, which shows that the number of integrators needed for the dynamic observer error linearization using integrators can not be bounded by a function of the dimension of the system and the number of outputs in contrast to dynamic feedback linearization results.
Theoretical Limits on Sequences with Ear Zero/Low Correlation Zones
Fanxin ZENG

LETTER-Fundamental Theories for Communications

Vol:
E89-B No:3
Page(s):
949-951
Sequences with ear zero correlation zones (EZCZs) are employed to suppress inter-symbol interference (ISI) and inter-user interference (IUI) in wireless communications. Theoretical limits on correlation functions of such sequences are investigated, lower bounds on the relations among length of sequence, width of EZCZs/ELCZs and family size are derived and presented, which play an important role in assessing performance of such sequences.
High-Performance Distributed Raman Amplification Systems with Limited Pump Power
Hiroji MASUDA Masahito TOMIZAWA Yutaka MIYAMOTO Kazuo HAGIMOTO

PAPER-Fiber-Optic Transmission for Communications

Vol:
E89-B No:3
Page(s):
715-723
We have clarified both theoretically and experimentally the basic performance of distributed Raman amplification (DRA) transmission systems in trunk networks with DSF or SMF spans with limited pump power where the pump power is limited by practical considerations. The gain and noise characteristics of a fiber span with splice loss are accurately determined by employing three approximation models. A novel pumping scheme called band enhanced pumping (BEP) is proposed that improves the DRA gain and optical SNR (OSNR) by 1.5 and 0.55 dB, respectively, compared with those of a conventional pumping scheme, under typical system conditions. We show that a DRA system with a DSF span has OSNRs that are 2.1 and 2.9 dB higher than those of a system with an SMF span at limited pump powers of 200 and 400 mW, respectively, as typical examples.
Single-Channel Multiple Regression for In-Car Speech Enhancement
Weifeng LI Katsunobu ITOU Kazuya TAKEDA Fumitada ITAKURA

PAPER-Speech Enhancement

Vol:
E89-D No:3
Page(s):
1032-1039
We address issues for improving hands-free speech enhancement and speech recognition performance in different car environments using a single distant microphone. This paper describes a new single-channel in-car speech enhancement method that estimates the log spectra of speech at a close-talking microphone based on the nonlinear regression of the log spectra of noisy signal captured by a distant microphone and the estimated noise. The proposed method provides significant overall quality improvements in our subjective evaluation on the regression-enhanced speech, and performed best in most objective measures. Based on our isolated word recognition experiments conducted under 15 real car environments, the proposed adaptive nonlinear regression approach shows an advantage in average relative word error rate (WER) reductions of 50.8% and 13.1%, respectively, compared to original noisy speech and ETSI advanced front-end (ETSI ES 202 050).
Substring Count Estimation in Extremely Long Strings
Jinuk BAE Sukho LEE

PAPER-Database

Vol:
E89-D No:3
Page(s):
1148-1156
To estimate the number of substring matches against string data, count suffix trees (CS-tree) have been used as a kind of alphanumeric histograms. Although the trees are useful for substring count estimation in short data strings (e.g. name or title), they reveal several drawbacks when the target is changed to extremely long strings. First, it becomes too hard or at least slow to build CS-trees, because their origin, the suffix tree, has memory-bottleneck problem with long strings. Secondly, some of CS-tree-node counts are incorrect due to frequent pruning of nodes. Therefore, we propose the count q-gram tree (CQ-tree) as an alphanumeric histogram for long strings. By adopting q-grams (or length-q substrings), CQ-trees can be created fast and correctly within small available memory. Furthermore, we mathematically provide the lower and upper bounds that the count estimation can reach to. To the best of our knowledge, our work is the first one to present such bounds among research activities to estimate the alphanumeric selectivity. Our experimental study shows that the CQ-tree outperforms the CS-tree in terms of the building time and accuracy.
Channel Characterization and Performance Evaluation of Mobile Communication Employing Stratospheric Platforms
ISKANDAR Shigeru SHIMAMOTO

PAPER-Integrated Systems for Communications

Vol:
E89-B No:3
Page(s):
937-944
Stratospheric platforms have been recently proposed as a new wireless infrastructure for realizing the next generation of communication systems. To provide high quality services, an investigation of the wireless stratospheric platform channel is essential. This paper proposes a definition and describes an analysis of the wireless channel for the link between stratospheric platforms and terrestrial mobile users based on an experiment in a semi-urban environment. Narrowband channel characteristics are presented in terms of Ricean factor (K factor) and local mean received power over a wide range of elevation angles ranging from 10to 90. Finally, we evaluated average bit error probability based on the proposed channel model to examine the channel performance. For the environment in which the measurements were conducted, we find that elevation angles greater than 40yield better performance.
Robust Talker Direction Estimation Based on Weighted CSP Analysis and Maximum Likelihood Estimation
Yuki DENDA Takanobu NISHIURA Yoichi YAMASHITA

PAPER-Speech Enhancement

Vol:
E89-D No:3
Page(s):
1050-1057
This paper describes a new talker direction estimation method for front-end processing to capture distant-talking speech by using a microphone array. The proposed method consists of two algorithms: One is a TDOA (Time Delay Of Arrival) estimation algorithm based on a weighted CSP (Cross-power Spectrum Phase) analysis with an average speech spectrum and CSP coefficient subtraction. The other is a talker direction estimation algorithm based on ML (Maximum Likelihood) estimation in a time sequence of the estimated TDOAs. To evaluate the effectiveness of the proposed method, talker direction estimation experiments were carried out in an actual office room. The results confirmed that the talker direction estimation performance of the proposed method is superior to that of the conventional methods in both diffused- and directional-noise environments.
ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles
Shigeki MATSUDA Takatoshi JITSUHIRO Konstantin MARKOV Satoshi NAKAMURA

PAPER-Speech Recognition

Vol:
E89-D No:3
Page(s):
989-997
In this paper, we describe a parallel decoding-based ASR system developed of ATR that is robust to noise type, SNR and speaking style. It is difficult to recognize speech affected by various factors, especially when an ASR system contains only a single acoustic model. One solution is to employ multiple acoustic models, one model for each different condition. Even though the robustness of each acoustic model is limited, the whole ASR system can handle various conditions appropriately. In our system, there are two recognition sub-systems which use different features such as MFCC and Differential MFCC (DMFCC). Each sub-system has several acoustic models depending on SNR, speaker gender and speaking style, and during recognition each acoustic model is adapted by fast noise adaptation. From each sub-system, one hypothesis is selected based on posterior probability. The final recognition result is obtained by combining the best hypotheses from the two sub-systems. On the AURORA-2J task used widely for the evaluation of noise robustness, our system achieved higher recognition performance than a system which contains only a single model. Also, our system was tested using normal and hyper-articulated speech contaminated by several background noises, and exhibited high robustness to noise and speaking styles.
A High-Accuracy Passive 3D Measurement System Using Phase-Based Image Matching
Mohammad Abdul MUQUIT Takuma SHIBAHARA Takafumi AOKI

PAPER-Image/Vision Processing

Vol:
E89-A No:3
Page(s):
686-697
This paper presents a high-accuracy 3D (three-dimen-sional) measurement system using multi-camera passive stereo vision to reconstruct 3D surfaces of free form objects. The proposed system is based on an efficient stereo correspondence technique, which consists of (i) coarse-to-fine correspondence search, and (ii) outlier detection and correction, both employing phase-based image matching. The proposed sub-pixel correspondence search technique contributes to dense reconstruction of arbitrary-shaped 3D surfaces with high accuracy. The outlier detection and correction technique contributes to high reliability of reconstructed 3D points. Through a set of experiments, we show that the proposed system measures 3D surfaces of objects with sub-mm accuracy. Also, we demonstrate high-quality dense 3D reconstruction of a human face as a typical example of free form objects. The result suggests a potential possibility of our approach to be used in many computer vision applications.
Spatial Fading Simulator Using a Cavity-Excited Circular Array (CECA) for Performance Evaluation of Antenna Arrays
Chulgyun PARK Jun-ichi TAKADA Kei SAKAGUCHI Takashi OHIRA

PAPER-Antennas and Propagation

Vol:
E89-B No:3
Page(s):
906-913
In this paper we propose a novel spatial fading simulator to evaluate the performance of an array antenna and show its spatial stochastic characteristics by computer simulation based on parameters verified by experimental data. We introduce a cavity-excited circular array (CECA) as a fading simulator that can simulate realistic mobile communication environments. To evaluate the antenna array, two stochastic characteristics are necessary. The first one is the fading phenomenon and the second is the angular spread (AS) of the incident wave. The computer simulation results with respect to fading and AS show that CECA works well as a spatial fading simulator for performance evaluation of an antenna array. We first present the basic structure, features and design methodology of CECA, and then show computer simulation results of the spatial stochastic characteristics. The results convince us that CECA is useful to evaluate performance of antenna arrays.
DSRED: A New Queue Management Scheme for the Next Generation Internet
Bing ZHENG Mohammed ATIQUZZAMAN

PAPER-Internet

Vol:
E89-B No:3
Page(s):
764-774
Random Early Detection (RED), an active queue management scheme, has been recommended by the Internet Engineering Task Force (IETF) for the next generation routers. RED suffers from a number of performance problems, such as low throughput, large delay/jitter, and induces instability in networks. Many of the previous attempts to improve the performance of RED have been based on optimizing the values of the RED parameters. However, results have shown that such optimizations resulted in limited improvement in the performance. In this paper, we propose Double Slope RED (DSRED), a new active queue management scheme to improve the performance of RED. The proposed scheme is based on dynamically changing the slope of the packet drop probability curve as a function of the level of congestion in the buffer. Results show that our proposed scheme results in better performance than original RED.
PS-ZCPA Based Feature Extraction with Auditory Masking, Modulation Enhancement and Noise Reduction for Robust ASR
Muhammad GHULAM Takashi FUKUDA Kouichi KATSURADA Junsei HORIKAWA Tsuneo NITTA

PAPER-Speech Recognition

Vol:
E89-D No:3
Page(s):
1015-1023
A pitch-synchronous (PS) auditory feature extraction method based on ZCPA (Zero-Crossings Peak-Amplitudes) was proposed previously and showed more robustness over a conventional ZCPA and MFCC based features. In this paper, firstly, a non-linear adaptive threshold adjustment procedure is introduced into the PS-ZCPA method to get optimal results in noisy conditions with different signal-to-noise ratio (SNR). Next, auditory masking, a well-known auditory perception, and modulation enhancement that simulates a strong relationship between modulation spectrums and intelligibility of speech are embedded into the PS-ZCPA method. Finally, a Wiener filter based noise reduction procedure is integrated into the method to make it more noise-robust, and the performance is evaluated against ETSI ES202 (WI008), which is a standard front-end for distributed speech recognition. All the experiments were carried out on Aurora-2J database. The experimental results demonstrated improved performance of the PS-ZCPA method by embedding auditory masking into it, and a slightly improved performance by using modulation enhancement. The PS-ZCPA method with Wiener filter based noise reduction also showed better performance than ETSI ES202 (WI008).
Adaptive Clock Recovery Method Utilizing Proportional-Integral-Derivative (PID) Control for Circuit Emulation
Youichi FUKADA Takeshi YASUDA Shuji KOMATSU Koichi SAITO Yoichi MAEDA Yasuyuki OKUMURA

PAPER

Vol:
E89-B No:3
Page(s):
690-695
This paper describes a novel adaptive clock recovery method that uses proportional-integral-derivative (PID) control. The adaptive clock method is a clock recovery technique that synchronizes connected terminals via packet networks, and will be indispensable for circuit emulation services in the next generation Ethernet. Our adaptive clock method simultaneously achieves a short starting-time, accuracy, stable recovery clock frequency, and few buffer delays using the PID control technique. We explain the numerical simulations, experimental results, and circuit designs.
Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models
Randy GOMEZ Akinobu LEE Tomoki TODA Hiroshi SARUWATARI Kiyohiro SHIKANO

PAPER-Speech Recognition

Vol:
E89-D No:3
Page(s):
998-1005
This paper describes the method of using multi-template unsupervised speaker adaptation based on HMM-Sufficient Statistics to push up the adaptation performance while keeping adaptation time within few seconds with just one arbitrary utterance. This adaptation scheme is mainly composed of two processes. The first part is done offline which involves the training of multiple class-dependent acoustic models and the creation of speakers' HMM-Sufficient Statistics based on gender and age. The second part is performed online where adaptation begins using the single utterance of a test speaker. From this utterance, the system will classify the speaker's class and consequently select the N-best neighbor speakers close to the utterance using Gaussian Mixture Models (GMM). The classified speakers' class template model is then adopted as a base model. From this template model, the adapted model is rapidly constructed using the N-best neighbor speakers' HMM-Sufficient Statistics. Experiments in noisy environment conditions with 20 dB, 15 dB and 10 dB SNR office, crowd, booth, and car noise are performed. The proposed multi-template method achieved 89.5% word accuracy rate compared with 88.1% of the conventional single-template method, while the baseline recognition rate without adaptation is 86.4%. Moreover, experiments using Vocal Tract Length Normalization (VTLN) and supervised Maximum Likelihood Linear Regression (MLLR) are also compared.
A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features
Makoto TACHIBANA Junichi YAMAGISHI Takashi MASUKO Takao KOBAYASHI

PAPER-Speech Synthesis

Vol:
E89-D No:3
Page(s):
1092-1099
This paper proposes a technique for synthesizing speech with a desired speaking style and/or emotional expression, based on model adaptation in an HMM-based speech synthesis framework. Speaking styles and emotional expressions are characterized by many segmental and suprasegmental features in both spectral and prosodic features. Therefore, it is essential to take account of these features in the model adaptation. The proposed technique called style adaptation, deals with this issue. Firstly, the maximum likelihood linear regression (MLLR) algorithm, based on a framework of hidden semi-Markov model (HSMM) is presented to provide a mathematically rigorous and robust adaptation of state duration and to adapt both the spectral and prosodic features. Then, a novel tying method for the regression matrices of the MLLR algorithm is also presented to allow the incorporation of both the segmental and suprasegmental speech features into the style adaptation. The proposed tying method uses regression class trees with contextual information. From the results of several subjective tests, we show that these techniques can perform style adaptation while maintaining naturalness of the synthetic speech.
Comparative Study of Speaker Identification Methods: dPLRM, SVM and GMM
Tomoko MATSUI Kunio TANABE

PAPER-Speaker Recognition

Vol:
E89-D No:3
Page(s):
1066-1073
A comparison of performances is made of three text-independent speaker identification methods based on dual Penalized Logistic Regression Machine (dPLRM), Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) with experiments by 10 male speakers. The methods are compared for the speech data which were collected over the period of 13 months in 6 utterance-sessions of which the earlier 3 sessions were for obtaining training data of 12 seconds' utterances. Comparisons are made with the Mel-frequency cepstrum (MFC) data versus the log-power spectrum data and also with training data in a single session versus in plural ones. It is shown that dPLRM with the log-power spectrum data is competitive with SVM and GMM methods with MFC data, when trained for the combined data collected in the earlier three sessions. dPLRM outperforms GMM method especially as the amount of training data becomes smaller. Some of these findings have been already reported in [1]-[3].
Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes
Takashi SAITO

PAPER-Speech Analysis

Vol:
E89-D No:3
Page(s):
1100-1106
This paper describes a method of generating F0 contours from natural F0 segmental shapes for speech synthesis. The extracted shapes of the F0 units are basically held invariant by eliminating any averaging operations in the analysis phase and by minimizing modification operations in the synthesis phase. The use of natural F0 shapes has great potential to cover a wide variety of speaking styles with the same framework, including not only read-aloud speech, but also dialogues and emotional speech. A linear-regression statistical model is used to "manipulate" the stored raw F0 shapes to build them up into a sentential F0 contour. Through experimental evaluations, the proposed model is shown to provide stable and robust F0 contour prediction for various speakers. By using this model, linguistically derived information about a sentence can be directly mapped, in a purely data-driven manner, to acoustic F0 values of the sentential intonation contour for a given target speaker.

10841-10860hit(18690hit)

Keyword Search Result

[Keyword] ATI(18690hit)

Channel-Count-Independent BIST for Multi-Channel SerDes

Text-Independent/Text-Prompted Speaker Recognition by Combining Speaker-Specific GMM with Speaker Adapted Syllable-Based HMM

A Frame Detector for Zero-Padded OFDM Systems

On the Number of Integrators Needed for Dynamic Observer Error Linearization via Integrators

Theoretical Limits on Sequences with Ear Zero/Low Correlation Zones

High-Performance Distributed Raman Amplification Systems with Limited Pump Power

Single-Channel Multiple Regression for In-Car Speech Enhancement

Substring Count Estimation in Extremely Long Strings

Channel Characterization and Performance Evaluation of Mobile Communication Employing Stratospheric Platforms

Robust Talker Direction Estimation Based on Weighted CSP Analysis and Maximum Likelihood Estimation

ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles

A High-Accuracy Passive 3D Measurement System Using Phase-Based Image Matching

Spatial Fading Simulator Using a Cavity-Excited Circular Array (CECA) for Performance Evaluation of Antenna Arrays

DSRED: A New Queue Management Scheme for the Next Generation Internet

PS-ZCPA Based Feature Extraction with Auditory Masking, Modulation Enhancement and Noise Reduction for Robust ASR

Adaptive Clock Recovery Method Utilizing Proportional-Integral-Derivative (PID) Control for Circuit Emulation

Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models

A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features

Comparative Study of Speaker Identification Methods: dPLRM, SVM and GMM

Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles