IEICE global.ieice.org Site

Keyword Search Result

[Keyword] ATI(18690hit)

10901-10920hit(18690hit)

Robust Talker Direction Estimation Based on Weighted CSP Analysis and Maximum Likelihood Estimation
Yuki DENDA Takanobu NISHIURA Yoichi YAMASHITA

PAPER-Speech Enhancement

Vol:
E89-D No:3
Page(s):
1050-1057
This paper describes a new talker direction estimation method for front-end processing to capture distant-talking speech by using a microphone array. The proposed method consists of two algorithms: One is a TDOA (Time Delay Of Arrival) estimation algorithm based on a weighted CSP (Cross-power Spectrum Phase) analysis with an average speech spectrum and CSP coefficient subtraction. The other is a talker direction estimation algorithm based on ML (Maximum Likelihood) estimation in a time sequence of the estimated TDOAs. To evaluate the effectiveness of the proposed method, talker direction estimation experiments were carried out in an actual office room. The results confirmed that the talker direction estimation performance of the proposed method is superior to that of the conventional methods in both diffused- and directional-noise environments.
Noise Reduction in Time Domain Using Referential Reconstruction
Takehiro IHARA Takayuki NAGAI Kazuhiko OZEKI Akira KUREMATSU

PAPER-Speech and Hearing

Vol:
E89-D No:3
Page(s):
1203-1213
We present a novel approach for single-channel noise reduction of speech signals contaminated by additive noise. In this approach, the system requires speech samples to be uttered in advance by the same speaker as that of the input signal. Speech samples used in this method must have enough phonetic variety to reconstruct the input signal. In the proposed method, which we refer to as referential reconstruction, we have used a small database created from examples of speech, which will be called reference signals. Referential reconstruction uses an example-based approach, in which the objective is to find the candidate speech frame which is the most similar to the clean input frame without noise, although the input frame is contaminated with noise. When candidate frames are found, they become final outputs without any special processing. In order to find the candidate frames, a correlation coefficient is used as a similarity measure. Through automatic speech recognition experiments, the proposed method was shown to be effective, particularly for low-SNR speech signals corrupted with white noise or noise in high-frequency bands. Since the direct implementation of this method requires infeasible computational cost for searching through reference signals, a coarse-to-fine strategy is introduced in this paper.
Design and Performance of an LDPC-Coded FH-OFDMA System in the Uplink Cellular Environments
Yun Hee KIM Kwang Soon KIM Sang Hyun LEE

PAPER-Wireless Communication Technologies

Vol:
E89-B No:3
Page(s):
828-836
An LDPC-coded FH-OFDMA system is proposed for the uplink of a packet-based cellular system, where the frequency hopping (FH) is based on a resource block (RB) for coherent demodulation. For the system, different RB types are employed either for better intercell interference (ICI) averaging capability or for better channel estimation performance. For the receiver, practical iterative channel estimation and decoding methods are proposed to improve the channel estimation performance without boosting the pilot power and to mitigate the adverse effects of the ICI. Extensive simulation results are provided to show the effect of the RB size on the channel estimation and ICI averaging performance as well as possible application of the proposed receiver in harsh mobile environments with dynamic packet allocation.
A Shape-Preserving Method for Watermarking 2D Vector Maps Based on Statistic Detection
Cheng Yong SHAO Hai Long WANG Xia Mu NIU Xiao Tong WANG

LETTER-Application Information Security

Vol:
E89-D No:3
Page(s):
1290-1293
A statistic based algorithm for watermarking 2D vector maps is proposed. Instead of 2D coordinates, a one-dimensional distance sequence extracted from the original map is used as the cover data to achieve the shape-preserving ability. The statistical feature of the cover data is utilized for data embedding. Experiment results indicate the scheme's better performance in invisibility, as well as its robustness to certain attacks.
Teeth Image Recognition for Biometrics
Tae-Woo KIM Tae-Kyung CHO

LETTER-Image Recognition, Computer Vision

Vol:
E89-D No:3
Page(s):
1309-1313
This paper presents a personal identification method based on BMME and LDA for images acquired at anterior and posterior occlusion expression of teeth. The method consists of teeth region extraction, BMME, and pattern recognition for the images acquired at the anterior and posterior occlusion state of teeth. Two occlusions can provide consistent teeth appearance in images and BMME can reduce matching error in pattern recognition. Using teeth images can be beneficial in recognition because teeth, rigid objects, cannot be deformed at the moment of image acquisition. In the experiments, the algorithm was successful in teeth recognition for personal identification for 20 people, which encouraged our method to be able to contribute to multi-modal authentication systems.
Substring Count Estimation in Extremely Long Strings
Jinuk BAE Sukho LEE

PAPER-Database

Vol:
E89-D No:3
Page(s):
1148-1156
To estimate the number of substring matches against string data, count suffix trees (CS-tree) have been used as a kind of alphanumeric histograms. Although the trees are useful for substring count estimation in short data strings (e.g. name or title), they reveal several drawbacks when the target is changed to extremely long strings. First, it becomes too hard or at least slow to build CS-trees, because their origin, the suffix tree, has memory-bottleneck problem with long strings. Secondly, some of CS-tree-node counts are incorrect due to frequent pruning of nodes. Therefore, we propose the count q-gram tree (CQ-tree) as an alphanumeric histogram for long strings. By adopting q-grams (or length-q substrings), CQ-trees can be created fast and correctly within small available memory. Furthermore, we mathematically provide the lower and upper bounds that the count estimation can reach to. To the best of our knowledge, our work is the first one to present such bounds among research activities to estimate the alphanumeric selectivity. Our experimental study shows that the CQ-tree outperforms the CS-tree in terms of the building time and accuracy.
Channel Characterization and Performance Evaluation of Mobile Communication Employing Stratospheric Platforms
ISKANDAR Shigeru SHIMAMOTO

PAPER-Integrated Systems for Communications

Vol:
E89-B No:3
Page(s):
937-944
Stratospheric platforms have been recently proposed as a new wireless infrastructure for realizing the next generation of communication systems. To provide high quality services, an investigation of the wireless stratospheric platform channel is essential. This paper proposes a definition and describes an analysis of the wireless channel for the link between stratospheric platforms and terrestrial mobile users based on an experiment in a semi-urban environment. Narrowband channel characteristics are presented in terms of Ricean factor (K factor) and local mean received power over a wide range of elevation angles ranging from 10to 90. Finally, we evaluated average bit error probability based on the proposed channel model to examine the channel performance. For the environment in which the measurements were conducted, we find that elevation angles greater than 40yield better performance.
Robust Beamforming of Microphone Array Using H_∞ Adaptive Filtering Technique
Jwu-Sheng HU Wei-Han LIU Chieh-Cheng CHENG

PAPER-Speech/Audio Processing

Vol:
E89-A No:3
Page(s):
708-715
In ASR (Automatic Speech Recognition) applications, one of the most important issues in the real-time beamforming of microphone arrays is the inability to capture the whole acoustic dynamics via a finite-length of data and a finite number of array elements. For example, the reflected source signal impinging from the side-lobe direction presents a coherent interference, and the non-minimal phase channel dynamics may require an infinite amount of data in order to achieve perfect equalization (or inversion). All these factors appear as uncertainties or un-modeled dynamics in the receiving signals. Traditional adaptive algorithms such as NLMS that do not consider these errors will result in performance deterioration. In this paper, a time domain beamformer using H∞ filtering approach is proposed to adjust the beamforming parameters. Furthermore, this work also proposes a frequency domain approach called SPFDBB (Soft Penalty Frequency Domain Block Beamformer) using H∞ filtering approach that can reduce computational efforts and provide a purified data to the ASR application. Experimental results show that the adaptive H∞ filtering method is robust to the modeling errors and suppresses much more noise interference than that in the NLMS based method. Consequently, the correct rate of ASR is also enhanced.
On the Number of Integrators Needed for Dynamic Observer Error Linearization via Integrators
Kyungtak YU Nam-Hoon JO Jin Heon SEO

LETTER-Systems and Control

Vol:
E89-A No:3
Page(s):
817-821
In this letter, an illustrative example is given, which shows that the number of integrators needed for the dynamic observer error linearization using integrators can not be bounded by a function of the dimension of the system and the number of outputs in contrast to dynamic feedback linearization results.
A High-Accuracy Passive 3D Measurement System Using Phase-Based Image Matching
Mohammad Abdul MUQUIT Takuma SHIBAHARA Takafumi AOKI

PAPER-Image/Vision Processing

Vol:
E89-A No:3
Page(s):
686-697
This paper presents a high-accuracy 3D (three-dimen-sional) measurement system using multi-camera passive stereo vision to reconstruct 3D surfaces of free form objects. The proposed system is based on an efficient stereo correspondence technique, which consists of (i) coarse-to-fine correspondence search, and (ii) outlier detection and correction, both employing phase-based image matching. The proposed sub-pixel correspondence search technique contributes to dense reconstruction of arbitrary-shaped 3D surfaces with high accuracy. The outlier detection and correction technique contributes to high reliability of reconstructed 3D points. Through a set of experiments, we show that the proposed system measures 3D surfaces of objects with sub-mm accuracy. Also, we demonstrate high-quality dense 3D reconstruction of a human face as a typical example of free form objects. The result suggests a potential possibility of our approach to be used in many computer vision applications.
DSRED: A New Queue Management Scheme for the Next Generation Internet
Bing ZHENG Mohammed ATIQUZZAMAN

PAPER-Internet

Vol:
E89-B No:3
Page(s):
764-774
Random Early Detection (RED), an active queue management scheme, has been recommended by the Internet Engineering Task Force (IETF) for the next generation routers. RED suffers from a number of performance problems, such as low throughput, large delay/jitter, and induces instability in networks. Many of the previous attempts to improve the performance of RED have been based on optimizing the values of the RED parameters. However, results have shown that such optimizations resulted in limited improvement in the performance. In this paper, we propose Double Slope RED (DSRED), a new active queue management scheme to improve the performance of RED. The proposed scheme is based on dynamically changing the slope of the packet drop probability curve as a function of the level of congestion in the buffer. Results show that our proposed scheme results in better performance than original RED.
Theoretical Limits on Sequences with Ear Zero/Low Correlation Zones
Fanxin ZENG

LETTER-Fundamental Theories for Communications

Vol:
E89-B No:3
Page(s):
949-951
Sequences with ear zero correlation zones (EZCZs) are employed to suppress inter-symbol interference (ISI) and inter-user interference (IUI) in wireless communications. Theoretical limits on correlation functions of such sequences are investigated, lower bounds on the relations among length of sequence, width of EZCZs/ELCZs and family size are derived and presented, which play an important role in assessing performance of such sequences.
Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes
Takashi SAITO

PAPER-Speech Analysis

Vol:
E89-D No:3
Page(s):
1100-1106
This paper describes a method of generating F0 contours from natural F0 segmental shapes for speech synthesis. The extracted shapes of the F0 units are basically held invariant by eliminating any averaging operations in the analysis phase and by minimizing modification operations in the synthesis phase. The use of natural F0 shapes has great potential to cover a wide variety of speaking styles with the same framework, including not only read-aloud speech, but also dialogues and emotional speech. A linear-regression statistical model is used to "manipulate" the stored raw F0 shapes to build them up into a sentential F0 contour. Through experimental evaluations, the proposed model is shown to provide stable and robust F0 contour prediction for various speakers. By using this model, linguistically derived information about a sentence can be directly mapped, in a purely data-driven manner, to acoustic F0 values of the sentential intonation contour for a given target speaker.
A Development of Circuit Emulation System on TDM over Ethernet Comprising OAM and Protection Function
Akihiko TANAKA Atsushi IWAMURA Masahiko MIZUTANI Yoshihiro ASHI

PAPER

Vol:
E89-B No:3
Page(s):
668-674
The Ethernet network is widely used and adopted to the access portion or metro area for the reason of new applications for native Ethernet services or its economical advantage. Apart from these applications for native Ethernet, an encapsulation technology to transport legacy services over Ethernet, i.e. TDM over Ethernet, is focused on. In order to apply it to the carrier networks, it is necessary to meet Quality of Service (QoS) requirements, and the consideration of operation, administration and maintenance (OAM) aspects are indispensable. Furthermore, in order for higher reliability, it is required to apply protection function to the networks. We have studied the encapsulation method of TDM signals applied to circuit emulator accommodating TDM signals over Ethernet. In addition, the OAM mechanism and the protection function are studied. This paper shows the frame format, the detail of the OAM mechanism and the protection function, and introduces a developed circuit for adaptation of TDM over Ethernet.
Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models
Randy GOMEZ Akinobu LEE Tomoki TODA Hiroshi SARUWATARI Kiyohiro SHIKANO

PAPER-Speech Recognition

Vol:
E89-D No:3
Page(s):
998-1005
This paper describes the method of using multi-template unsupervised speaker adaptation based on HMM-Sufficient Statistics to push up the adaptation performance while keeping adaptation time within few seconds with just one arbitrary utterance. This adaptation scheme is mainly composed of two processes. The first part is done offline which involves the training of multiple class-dependent acoustic models and the creation of speakers' HMM-Sufficient Statistics based on gender and age. The second part is performed online where adaptation begins using the single utterance of a test speaker. From this utterance, the system will classify the speaker's class and consequently select the N-best neighbor speakers close to the utterance using Gaussian Mixture Models (GMM). The classified speakers' class template model is then adopted as a base model. From this template model, the adapted model is rapidly constructed using the N-best neighbor speakers' HMM-Sufficient Statistics. Experiments in noisy environment conditions with 20 dB, 15 dB and 10 dB SNR office, crowd, booth, and car noise are performed. The proposed multi-template method achieved 89.5% word accuracy rate compared with 88.1% of the conventional single-template method, while the baseline recognition rate without adaptation is 86.4%. Moreover, experiments using Vocal Tract Length Normalization (VTLN) and supervised Maximum Likelihood Linear Regression (MLLR) are also compared.
Comparative Study of Speaker Identification Methods: dPLRM, SVM and GMM
Tomoko MATSUI Kunio TANABE

PAPER-Speaker Recognition

Vol:
E89-D No:3
Page(s):
1066-1073
A comparison of performances is made of three text-independent speaker identification methods based on dual Penalized Logistic Regression Machine (dPLRM), Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) with experiments by 10 male speakers. The methods are compared for the speech data which were collected over the period of 13 months in 6 utterance-sessions of which the earlier 3 sessions were for obtaining training data of 12 seconds' utterances. Comparisons are made with the Mel-frequency cepstrum (MFC) data versus the log-power spectrum data and also with training data in a single session versus in plural ones. It is shown that dPLRM with the log-power spectrum data is competitive with SVM and GMM methods with MFC data, when trained for the combined data collected in the earlier three sessions. dPLRM outperforms GMM method especially as the amount of training data becomes smaller. Some of these findings have been already reported in [1]-[3].
PS-ZCPA Based Feature Extraction with Auditory Masking, Modulation Enhancement and Noise Reduction for Robust ASR
Muhammad GHULAM Takashi FUKUDA Kouichi KATSURADA Junsei HORIKAWA Tsuneo NITTA

PAPER-Speech Recognition

Vol:
E89-D No:3
Page(s):
1015-1023
A pitch-synchronous (PS) auditory feature extraction method based on ZCPA (Zero-Crossings Peak-Amplitudes) was proposed previously and showed more robustness over a conventional ZCPA and MFCC based features. In this paper, firstly, a non-linear adaptive threshold adjustment procedure is introduced into the PS-ZCPA method to get optimal results in noisy conditions with different signal-to-noise ratio (SNR). Next, auditory masking, a well-known auditory perception, and modulation enhancement that simulates a strong relationship between modulation spectrums and intelligibility of speech are embedded into the PS-ZCPA method. Finally, a Wiener filter based noise reduction procedure is integrated into the method to make it more noise-robust, and the performance is evaluated against ETSI ES202 (WI008), which is a standard front-end for distributed speech recognition. All the experiments were carried out on Aurora-2J database. The experimental results demonstrated improved performance of the PS-ZCPA method by embedding auditory masking into it, and a slightly improved performance by using modulation enhancement. The PS-ZCPA method with Wiener filter based noise reduction also showed better performance than ETSI ES202 (WI008).
A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features
Makoto TACHIBANA Junichi YAMAGISHI Takashi MASUKO Takao KOBAYASHI

PAPER-Speech Synthesis

Vol:
E89-D No:3
Page(s):
1092-1099
This paper proposes a technique for synthesizing speech with a desired speaking style and/or emotional expression, based on model adaptation in an HMM-based speech synthesis framework. Speaking styles and emotional expressions are characterized by many segmental and suprasegmental features in both spectral and prosodic features. Therefore, it is essential to take account of these features in the model adaptation. The proposed technique called style adaptation, deals with this issue. Firstly, the maximum likelihood linear regression (MLLR) algorithm, based on a framework of hidden semi-Markov model (HSMM) is presented to provide a mathematically rigorous and robust adaptation of state duration and to adapt both the spectral and prosodic features. Then, a novel tying method for the regression matrices of the MLLR algorithm is also presented to allow the incorporation of both the segmental and suprasegmental speech features into the style adaptation. The proposed tying method uses regression class trees with contextual information. From the results of several subjective tests, we show that these techniques can perform style adaptation while maintaining naturalness of the synthetic speech.
Multi-Ported Register File for Reducing the Impact of PVT Variation
Yuuichirou IKEDA Masaya SUMITA Makoto NAGATA

PAPER-Signal Integrity and Variability

Vol:
E89-C No:3
Page(s):
356-363
We have developed a 32-bit, 32-word, and 9-read, 7-write ported register file. This register file has several circuits and techniques for reducing the impact of process variation that is marked in recent process technologies, voltage variation, and temperature variation, so called PVT variation. We describe these circuits and techniques in detail, and confirm their effects by simulation and measurement of the test chip.
Analysis of Reactance Oscillators Having Multi-Mode Oscillations
Yoshihiro YAMAGAMI Yoshifumi NISHIO Akio USHIDA

PAPER-Circuit Theory

Vol:
E89-A No:3
Page(s):
764-771
We consider oscillators consisting of a reactance circuit and a negative resistor. They may happen to have multi-mode oscillations around the anti-resonant frequencies of the reactance circuit. This kind of oscillators can be easily synthesized by setting the resonant and anti-resonant frequencies of the reactance circuits. However, it is not easy to analyze the oscillation phenomena, because they have multiple oscillations whose oscillations depend on the initial guesses. In this paper, we propose a Spice-oriented solution algorithm combining the harmonic balance method with Newton homotopy method that can find out the multiple solutions on the homotopy paths. In our analysis, the determining equations from the harmonic balance method are given by modified equivalent circuit models of "DC," "Cosine" and "Sine" circuits. The modified circuits can be solved by a simulator STC (solution curve tracing circuit), where the multiple oscillations are found by the transient analysis of Spice. Thus, we need not to derive the troublesome circuit equations, nor the mathematical transformations to get the determining equations. It makes the solution algorithms much simpler.

10901-10920hit(18690hit)

Keyword Search Result

[Keyword] ATI(18690hit)

Robust Talker Direction Estimation Based on Weighted CSP Analysis and Maximum Likelihood Estimation

Noise Reduction in Time Domain Using Referential Reconstruction

Design and Performance of an LDPC-Coded FH-OFDMA System in the Uplink Cellular Environments

A Shape-Preserving Method for Watermarking 2D Vector Maps Based on Statistic Detection

Teeth Image Recognition for Biometrics

Substring Count Estimation in Extremely Long Strings

Channel Characterization and Performance Evaluation of Mobile Communication Employing Stratospheric Platforms

Robust Beamforming of Microphone Array Using H_∞ Adaptive Filtering Technique

On the Number of Integrators Needed for Dynamic Observer Error Linearization via Integrators

A High-Accuracy Passive 3D Measurement System Using Phase-Based Image Matching

DSRED: A New Queue Management Scheme for the Next Generation Internet

Theoretical Limits on Sequences with Ear Zero/Low Correlation Zones

Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes

A Development of Circuit Emulation System on TDM over Ethernet Comprising OAM and Protection Function

Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models

Comparative Study of Speaker Identification Methods: dPLRM, SVM and GMM

PS-ZCPA Based Feature Extraction with Auditory Masking, Modulation Enhancement and Noise Reduction for Robust ASR

A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features

Multi-Ported Register File for Reducing the Impact of PVT Variation

Analysis of Reactance Oscillators Having Multi-Mode Oscillations

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles