Muhammad GHULAM Takashi FUKUDA Kouichi KATSURADA Junsei HORIKAWA Tsuneo NITTA
A pitch-synchronous (PS) auditory feature extraction method based on ZCPA (Zero-Crossings Peak-Amplitudes) was proposed previously and showed more robustness over a conventional ZCPA and MFCC based features. In this paper, firstly, a non-linear adaptive threshold adjustment procedure is introduced into the PS-ZCPA method to get optimal results in noisy conditions with different signal-to-noise ratio (SNR). Next, auditory masking, a well-known auditory perception, and modulation enhancement that simulates a strong relationship between modulation spectrums and intelligibility of speech are embedded into the PS-ZCPA method. Finally, a Wiener filter based noise reduction procedure is integrated into the method to make it more noise-robust, and the performance is evaluated against ETSI ES202 (WI008), which is a standard front-end for distributed speech recognition. All the experiments were carried out on Aurora-2J database. The experimental results demonstrated improved performance of the PS-ZCPA method by embedding auditory masking into it, and a slightly improved performance by using modulation enhancement. The PS-ZCPA method with Wiener filter based noise reduction also showed better performance than ETSI ES202 (WI008).
Gabriel Porto VILLARDI Giuseppe Thadeu Freitas de ABREU Ryuji KOHNO
The application of Orthogonal Space-Time Block Codes (O-STBC) as the encoding scheme in the presence of "non-quasi-static" fading was considered. A simple and efficient adaptive method of channel estimation based on the interpolation of estimates acquired at the pre-amble and post-amble of framed blocks of information is developed. Moreover, the proposed method is proven, both theoretically and by simulations, to outperform the alternative of channel tracking, despite its significant low complexity.
Sakriani SAKTI Satoshi NAKAMURA Konstantin MARKOV
Over the last decade, the Bayesian approach has increased in popularity in many application areas. It uses a probabilistic framework which encodes our beliefs or actions in situations of uncertainty. Information from several models can also be combined based on the Bayesian framework to achieve better inference and to better account for modeling uncertainty. The approach we adopted here is to utilize the benefits of the Bayesian framework to improve acoustic model precision in speech recognition systems, which modeling a wider-than-triphone context by approximating it using several less context-dependent models. Such a composition was developed in order to avoid the crucial problem of limited training data and to reduce the model complexity. To enhance the model reliability due to unseen contexts and limited training data, flooring and smoothing techniques are applied. Experimental results show that the proposed Bayesian pentaphone model improves word accuracy in comparison with the standard triphone model.
Seiichi NAKAGAWA Wei ZHANG Mitsuo TAKAHASHI
We presented a new text-independent/text-prompted speaker recognition method by combining speaker-specific Gaussian Mixture Model (GMM) with syllable-based HMM adapted by MLLR or MAP. The robustness of this speaker recognition method for speaking style's change was evaluated in this paper. The speaker identification experiment using NTT database which consists of sentences data uttered at three speed modes (normal, fast and slow) by 35 Japanese speakers (22 males and 13 females) on five sessions over ten months was conducted. Each speaker uttered only 5 training utterances (about 20 seconds in total). A combination method reduced the identification error rate by about 50%. We obtained the accuracy of 98.8% for text-independent speaker identification for three speaking style modes (normal, fast, slow) by using a short test utterance (about 4 seconds). Especially, we obtained the accuracy of 99.4% for normal speaking mode. This result was superior to conventional methods for the same database. We show that the attractive result was brought from the compensational effect between speaker specific GMM and speaker adapted syllable based HMM.
Yuuichirou IKEDA Masaya SUMITA Makoto NAGATA
We have developed a 32-bit, 32-word, and 9-read, 7-write ported register file. This register file has several circuits and techniques for reducing the impact of process variation that is marked in recent process technologies, voltage variation, and temperature variation, so called PVT variation. We describe these circuits and techniques in detail, and confirm their effects by simulation and measurement of the test chip.
Yoichi YUYAMA Akira TSUCHIYA Kazutoshi KOBAYASHI Hidetoshi ONODERA
In this paper, we propose alternate self shielding to remove critical transitions of on-chip global interconnect. Our proposed method alternates shield and signal wires cycle by cycle. The conventional self-shielding methods need additional wires to remove critical transition by encoding. The proposed alternate self-shielding, however, requires no additional wires. We evaluate our method by simulating signal transimission with a circuit simulator. As a result, our proposed method is superior in bit rate compared to others from 10% to 75%.
A novel class of microstrip bandpass filter is configured using the impedance transformers and an improved stepped impedance resonator (SIR). This SIR is composed of a central narrow strip section with an aperture on ground and two wide strip sections at the two sides. This low-high-low SIR resonator has a promising capability in achieving an extremely large ratio of first two resonant frequencies for design of a bandpass filter with ultra-broad stopband. The two quarter-wavelength transformers with low and high impedances, referred as to impedance- and admittance-inverters, are modeled and utilized as alternative types of inductive and capacitive coupling elements with highly tightened degrees for wideband filter design. After extensive investigation is made on the two transformers and the proposed SIR, the two novel bandpass filters are constructed, designed and implemented. Two sets of predicted and measured frequency responses over a wide frequency range both quantitatively exhibit their several attractive features, such as ultra-broad stopband with deep rejection and broadened dominant passband with low insertion loss.
The objective of this paper is to present a decision support system which uses a computer-based procedure to detect tumor blocks or lesions in digitized medical images. The authors developed a simple method with a low computation effort to detect tumors on T2-weighted Magnetic Resonance Imaging (MRI) brain images, focusing on the connection between the spatial pixel value and tumor properties from four different perspectives: 1) cases having minuscule differences between two images using a fixed block-based method, 2) tumor shape and size using the edge and binary images, 3) tumor properties based on texture values using spatial pixel intensity distribution controlled by a global discriminate value, and 4) the occurrence of content-specific tumor pixel for threshold images. Measurements of the following medical datasets were performed: 1) different time interval images, and 2) different brain disease images on single and multiple slice images. Experimental results have revealed that our proposed technique incurred an overall error smaller than those in other proposed methods. In particular, the proposed method allowed decrements of false alarm and missed alarm errors, which demonstrate the effectiveness of our proposed technique. In this paper, we also present a prototype system, known as PCB, to evaluate the performance of the proposed methods by actual experiments, comparing the detection accuracy and system performance.
Dianjun CHEN Takeshi HASHIMOTO
We propose two sequence design schemes for an overloaded space-time spreading system with multiple antennas. One scheme is for a system in which the amplitude of user signals needs not be adjusted and provides tradeoffs between the user capacity and diversity order. This scheme has a certain similarity to time-sharing, but its performance is further improved by time-diversity. Another is to achieve full diversity order by varying user signal amplitudes. The diversity orders of the respective schemes are theoretically proved and their performances are demonstrated by simulation.
Modern digital systems design requires us to explore a large and complex design space to find a best configuration which satisfies design requirements. Such exploration requires a sound representation of design space from which design candidates are efficiently generated, each of which then is evaluated. This paper proposes a plan-generation-evaluation framework which supports a complete process of such design space exploration. The plan phase constitutes a design space of all possible design alternatives by means of a formally defined representation scheme of attributed AND-OR graph. The generation phase generates a set of candidates by algorithmic pruning of the design space in an attributed AND-OR graph with respect to design requirements as well as architectural constraints. Finally, the evaluation phase measures performance of design candidates in a pruned graph to select a best one. A complete process of cache design is exemplified to show the effectiveness of the proposed framework.
Koichi ITO Masahiko HIRATSUKA Takafumi AOKI Tatsuo HIGUCHI
This paper presents a shortest path search algorithm using a model of excitable reaction-diffusion dynamics. In our previous work, we have proposed a framework of Digital Reaction-Diffusion System (DRDS)--a model of a discrete-time discrete-space reaction-diffusion system useful for nonlinear signal processing tasks. In this paper, we design a special DRDS, called an "excitable DRDS," which emulates excitable reaction-diffusion dynamics and produces traveling waves. We also demonstrate an application of the excitable DRDS to the shortest path search problem defined on two-dimensional (2-D) space with arbitrary boundary conditions.
Tetsuki TANIGUCHI Hoang Huy PHAM Nam Xuan TRAN Yoshio KARASAWA
This paper presents a simple method to determine weights of single carrier multiple input multiple output (MIMO) broadband communication systems adopting tapped delay line (TDL) structure in receiver side for the effective communication under frequency selective fading (FSF) environment. First, assuming the perfect knowledge of the channel matrix in both arrays, an iterative design method of transmitter and receiver weights is proposed. In this approach, both weights are determined alternately to maximize signal to noise plus interference ratio (SINR) by fixing the weight of one side while optimizing the other, and this operation is repeated until SINR converges. Next, considering the case of uninformed transmitter, maximum SINR design method of MIMO system is extended for space time block coding (STBC) scheme working under FSF. Through computer simulations, it is demonstrated that the proposed schemes achieves higher SINR than conventional method with delay-less structure, particularly for the fading with long duration.
Seokho YOON Suk Chan KIM Sun Yong KIM
Recently, a novel detector was proposed by the authors for code acquisition in non-Gaussian impulsive channels [3], which dramatically outperforms the conventional squared-sum detector; however, it requires exact knowledge of the non-Gaussian noise dispersion. In this paper, a robust detector is proposed, which employs the signs and ranks of the received signal samples, instead of their actual values, and so does not require knowledge of the non-Gaussian noise dispersion. The acquisition performance of the proposed detector is compared with that of the detector of [3] in terms of the mean acquisition time. The simulation results show that the proposed scheme is not only robust to deviations from the true value of the non-Gaussian noise dispersion, but also has comparable performance to that of the scheme of [3] using exact knowledge of the non-Gaussian noise dispersion.
Yi QIAN Rose Qingyang HU Catherine ROSENBERG
There are many system proposals for satellite-based broadband communications that promise high capacity and ease of access. Many of these proposals require advanced switching technology and signal processing on-board the satellite(s). One solution is based on a geo-synchronous (GEO) satellite system equipped with on-board processing and on-board switching. An important feature of this system is allowing for a maximum number of simultaneous users, hence, requiring effective medium access control (MAC) layer protocols for connection admission control (CAC) and bandwidth on demand (BoD) algorithms. In this paper, an integrated CAC and BoD algorithm is proposed for a broadband satellite communication system with heterogeneous traffic. A detailed modeling and simulation approach is presented for performance evaluation of the integrated CAC and BoD algorithm based on heterogeneous traffic types. The proposed CAC and BoD scheme is shown to be able to efficiently utilize available bandwidth and to gain high throughput, and also to maintain good Grade of Service (GoS) for all the traffic types. The end-to-end delay for real-time traffic in the system falls well within ITU's Quality of Service (QoS) specification for GEO-based satellite systems.
Hiroshi YOSHIOKA Yushi SHIRATO Kazuji WATANABE
We propose a novel simplified Viterbi equalizer for high symbol rate FWA (Fixed Wireless Access) systems carrying 64QAM signals. Reduced complexity and improved performance are achieved adopting two approaches. The first one is reducing the number of survival paths, taking advantage of the large D/U common in LOS (line of sight) communications. The second one is using a multi-stage process to generate desired signal replicas based on their likelihoods. Computer simulations confirm that the proposed replica generation method offers a performance improvement of about 1 dB and the proposed Viterbi equalizer offers reduced complexity with no performance penalty compared to full Viterbi equalizer.
M. Shahidur RAHMAN Tetsuya SHIMAMURA
A new system identification based method has been proposed for accurate estimation of vocal tract parameters. An often encountered problem in using the conventional linear prediction analysis is due to the harmonic structure of the excitation source of voiced speech. This harmonic characteristic is coupled with the estimation of autoregressive (AR) coefficients that results in difficulties in estimating the vocal tract filter. This paper models the effective voice source from the residual obtained through the covariance analysis in the first-pass which is then used as input to the second-pass least-square analysis. A better source-filter separation is thus achieved. The formant frequencies and corresponding bandwidths obtained using the proposed method for synthetic vowels are found to be accurate up to a factor of more than three (in percent) compared to the conventional method. Since the source characteristic is taken into account, local variations due to the positioning of analysis window are reduced significantly. The validity of the proposed method is also examined by inspecting the spectra obtained from natural vowel sounds uttered by high-pitched female speaker.
Shinichi NAKAJIMA Sumio WATANABE
In unidentifiable models, the Bayes estimation has the advantage of generalization performance over the maximum likelihood estimation. However, accurate approximation of the posterior distribution requires huge computational costs. In this paper, we consider an alternative approximation method, which we call a subspace Bayes approach. A subspace Bayes approach is an empirical Bayes approach where a part of the parameters are regarded as hyperparameters. Consequently, in some three-layer models, this approach requires much less computational costs than Markov chain Monte Carlo methods. We show that, in three-layer linear neural networks, a subspace Bayes approach is asymptotically equivalent to a positive-part James-Stein type shrinkage estimation, and theoretically clarify its generalization error and training error. We also discuss the domination over the maximum likelihood estimation and the relation to the variational Bayes approach.
Yasuo SATO Shuji HAMADA Toshiyuki MAEDA Atsuo TAKATORI Seiji KAJIHARA
In this paper we introduce a statistical quality model for delay testing that reflects fabrication process quality, design delay margin, and test timing accuracy. The model provides a measure that predicts the chip defect level that cause delay failure, including marginal small delay. We can therefore use the model to make test vectors that are effective in terms of both testing cost and chip quality. The results of experiments using ISCAS89 benchmark data and some large industrial design data reflect various characteristics of our statistical delay quality model.
We propose a novel blind watermarking algorithm, called XFuseMark, which can hide a small, visually meaningful, grayscale logo in a host image instead of using a random-noise-like sequence based on the multiresolution fusion principles, and extract a recognizable version of the embedded logo even without reference to the original host data at the receiving end. XFuseMark is not only secure, i.e., only authorized users holding a private key are able to conduct the logo extraction operation, but also robust against noise addition and image compression. Experiments verify the practical performance of XFuseMark.
Yangxing LIU Satoshi GOTO Takeshi IKENAGA
Text detection in color images has become an active research area in the past few decades. In this paper, we present a novel approach to accurately detect text in color images possibly with a complex background. The proposed algorithm is based on the combination of connected component and texture feature analysis of unknown text region contours. First, we utilize an elaborate color image edge detection algorithm to extract all possible text edge pixels. Connected component analysis is performed on these edge pixels to detect the external contour and possible internal contours of potential text regions. The gradient and geometrical characteristics of each region contour are carefully examined to construct candidate text regions and classify part non-text regions. Then each candidate text region is verified with texture features derived from wavelet domain. Finally, the Expectation maximization algorithm is introduced to binarize each text region to prepare data for recognition. In contrast to previous approach, our algorithm combines both the efficiency of connected component based method and robustness of texture based analysis. Experimental results show that our proposed algorithm is robust in text detection with respect to different character size, orientation, color and language and can provide reliable text binarization result.