In the last three decades of the 20th Century, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, to medium size vocabulary voice interactive command and control systems for business automation, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. Although we have witnessed many new technological promises, we have also encountered a number of practical limitations that hinder a widespread deployment of applications and services. On one hand, fast progress was observed in statistical speech and language modeling. On the other hand only spotty successes have been reported in applying knowledge sources in acoustics, speech and language science to improving speech recognition performance and robustness to adverse conditions. In this paper we review some key advances in several areas of speech recognition. A bottom-up detection framework is also proposed to facilitate worldwide research collaboration for incorporating technology advances in both statistical modeling and knowledge integration into going beyond the current speech recognition limitations and benefiting the society in the 21st century.
Yoshinari KAMAKURA Hironori RYOUKE Kenji TANIGUCHI
Electron transport in bulk Si and MOSFET inversion layers is studied using an ensemble Monte Carlo (EMC) technique coupled with the molecular dynamics (MD) method. The Coulomb interactions among point charges (electrons and negative ions) are directly taken into account in the simulation. It is demonstrated that the static screening of Coulomb interactions is correctly simulated by the EMC/MD method. Furthermore, we calculate the inversion layer mobility in Si MOSFETs, and mobility roll-off near the threshold voltage is observed by the present approach.
Transmit adaptive array requires the forward link channel state for evaluating the optimum transmit weight in which a feedback channel transports the forward link channel state to the base station. Since the feedback information limits the transmission rate of the reverse link traffic, it is necessary to keep the number of feedback bits to a minimum. This paper presents a system in which the N transmit antennas are extended to the 2N transmit antennas while the feedback channel is limited as that of N-transmit antenna system. The increased antennas can give additional diversity gain but requires higher rate of feedback bits. The limited feedback channel increases the quantization error of feedback information since the number of feedback bits assigned to each antenna is reduced. In order to overcome the limited rate of feedback channel problem, this paper proposes the transmit antenna selection schemes which can effectively use the limited feedback bits, reduce the computational complexity at the mobile station, and eventually achieve diversity gain. System performances are investigated for the case of N=4 for the various antenna selection schemes on both flat fading and multi-path fading channels.
New algorithms for the soft-decision and the hard-decision maximum likelihood decoding (MLD) for binary linear block codes are proposed. It has been widely known that both MLD can be regarded as an integer programming with binary arithmetic conditions. Recently, Conti and Traverso have proposed an efficient algorithm which uses Grobner bases to solve integer programming with ordinary integer arithmetic conditions. In this paper, the Conti-Traverso algorithm is extended to solve integer programming with modulo arithmetic conditions. We also show how to transform the soft-decision and the hard-decision MLD to integer programming for which the extended Conti-Traverso algorithm is applicable.
Konstantin MARKOV Satoshi NAKAMURA
In current HMM based speech recognition systems, it is difficult to supplement acoustic spectrum features with additional information such as pitch, gender, articulator positions, etc. On the other hand, Bayesian Networks (BN) allow for easy combination of different continuous as well as discrete features by exploring conditional dependencies between them. However, the lack of efficient algorithms has limited their application in continuous speech recognition. In this paper we propose new acoustic model, where HMM are used for modeling of temporal speech characteristics and state probability model is represented by BN. In our experimental system based on HMM/BN model, in addition to speech observation variable, state BN has two more (hidden) variables representing noise type and SNR value. Evaluation results on AURORA2 database showed 36.4% word error rate reduction for closed noise test which is comparable with other much more complex systems utilizing effective adaptation and noise robust methods.
Noriyuki MIURA Hirotaka KOMATSUBARA Marie MOCHIZUKI Hirokazu HAYASHI Koichi FUKUDA
In this paper, we propose a TCAD driven hot carrier reduction methodology of 3.3 V I/O pMOSFETs design. The hot carrier reliability of surface channel I/O pMOSFET having drain structure in common with core devices has a critical issue. It is substantially important for the high-reliability devices to reduce both drain avalanche and channel hot hole components. The drain structures are successfully optimized in short time by applications of TCAD local models. Considering tradeoffs between hot carrier injection (HCI) and drive current (ION), SDE/HALO of both core and I/O transistors can be totally optimized for reduction of process-steps and/or photo-masks.
Scott T. DUNHAM Pavel FASTENKO Zudian QIN Milan DIEBEL
In this work, we review our recent efforts to make effective use of atomistic calculations for the advancement of VLSI process simulation. We focus on three example applications: the behavior of implanted fluorine, arsenic diffusion and activation, and the impact of charge interactions on doping fluctuations.
Phu Chien NGUYEN Takao OCHI Masato AKAGI
This paper presents a method of temporal decomposition (TD) for line spectral frequency (LSF) parameters, called "Modified Restricted Temporal Decomposition" (MRTD), and its application to low rate speech coding. The LSF parameters have not been used for TD due to the stability problems in the linear predictive coding (LPC) model. To overcome this deficiency, a refinement process is applied to the event vectors in the proposed TD method to preserve their LSF ordering property. Meanwhile, the restricted second order TD model, where only two adjacent event functions can overlap and all event functions at any time sum up to one, is utilized to reduce the computational cost of TD. In addition, based on the geometric interpretation of TD the MRTD method enforces a new property on the event functions, named the "well-shapedness" property, to model the temporal structure of speech more effectively. This paper also proposes a method for speech coding at rates around 1.2 kbps based on STRAIGHT, a high quality speech analysis-synthesis method, using MRTD. In this speech coding method, MRTD based vector quantization is used for encoding spectral information of speech. Subjective test results indicate that the speech quality of the proposed speech coding method is close to that of the 4.8 kbps FS-1016 CELP coder.
Nobuyoshi KIKUMA Mitoshi FUJIMOTO
This paper reviews the historical development of adaptive antennas in Japan. First of all, we watch basic adaptive algorithms. In 1980s, particularly, the following issues were a matter of considerable concern to us; (a) behavior to the coherent interference like multipath waves or radar clutters, (b) signal degradation in case that the direction of arrival (DOA) of desired signal is different from the DOA specified beforehand in the adaptive antennas with the DOA of the desired signal as a prior knowledge, and (c) performance of adaptive antennas when the desired signal and interference are broadband. Although there are a lot of development and modification of adaptive algorithms in Japan, we refer in this paper only to the above-mentioned topics. Secondly, our attention is paid to implementation of adaptive antennas and advanced technologies. A large number of researches on the subjects have been carried out in Japan. Particularly, we focus on the initiative studies in Japan toward mobile communication application. They include researches of mobile radio propagation for adaptive antennas, calibration methods, and adaptive antenna for mobile terminals. As a matter of course, we also refer to adaptive antenna technologies for advanced communication schemes such as CDMA, SDMA, OFDM and so on. Finally, we take notice of some pilot products which were developed to verify the effect of the adaptive antenna in the practical environments. As the initiative ones, a couple of equipments are introduced in this paper.
Young-Joo SUH Min-Sun KIM Young-Jae KIM
There is a growing demand that mobile networks should provide quality-of-service (QoS) to mobile users since portable devices become popular and more and more applications require real-time services. Providing QoS to mobile hosts is very difficult due to mobility of hosts. The resource ReSerVation Protocol (RSVP) establishes and maintains a reservation state to ensure a given QoS level between the sender and receiver. However, RSVP is designed for fixed networks and thus it is inadequate in wireless mobile networking environments. In this paper, we propose a resource reservation protocol for mobile hosts in mobile networks. The proposed protocol extends the RSVP by introducing RSVP agents in local networks to manage the reservations. The proposed protocol reduces packet delay, bandwidth overhead, and the number of RSVP messages to maintain reservation states. We examined the performance of the proposed protocol by simulation and we got an improved performance over the existing protocols.
In this paper, we investigate the electron-hole energy states and energy gap in three-dimensional (3D) InAs/GaAs quantum rings and dots with different shapes under external magnetic fields. Our realistic model formulation includes: (i) the effective mass Hamiltonian in non-parabolic approximation for electrons, (ii) the effective mass Hamiltonian in parabolic approximation for holes, (iii) the position- and energy-dependent quasi-particle effective mass approximation for electrons, (iv) the finite hard wall confinement potential, and (v) the Ben Daniel-Duke boundary conditions. To solve the 3D nonlinear problem without any fitting parameters, we have applied the nonlinear iterative method to obtain self-consistent solutions. Due to the penetration of applied magnetic fields into torus ring region, for ellipsoidal- and rectangular-shaped quantum rings we find nonperiodical oscillations of the energy gap between the lowest electron and hole states as a function of external magnetic fields. The nonperiodical oscillation is different from 1D periodical argument and strongly dependent on structure shape and size. The result is useful to study magneto-optical properties of the nanoscale quantum rings and dots.
Hiroshi SARUWATARI Toshiya KAWAMURA Tsuyoki NISHIKAWA Kiyohiro SHIKANO
We propose a new algorithm for blind source separation (BSS), in which independent component analysis (ICA) and beamforming are combined to resolve the low-convergence problem through optimization in ICA. The proposed method consists of the following two parts: frequency-domain ICA with direction-of-arrival (DOA) estimation, and null beamforming based on the estimated DOA. The alternation of learning between ICA and beamforming can realize fast- and high-convergence optimization. The results of the signal separation experiments reveal that the signal separation performance of the proposed algorithm is superior to that of the conventional ICA-based BSS method.
Sung Kyung KIM Meejoung KIM Chung Gu KANG
Emerging requirements for higher rate data services and better spectrum efficiency are the main issues of third-generation mobile radio systems. In particular, a new concept of burst switching has been introduced for supporting the packet data services in the CDMA-based wireless system. In the burst switching system, radio resources are allocated to users for the duration of data bursts, which is a series of packets, as opposed to the conventional packet switching scheme. To implement the burst switching scheme, three different states (active, control hold, dormant states) are defined and two transition timers are employed to release the fundamental and supplemental code channels, respectively, at certain instances. Furthermore, the system is subject to burst admission control policy, with which a burst is admitted only when the number of currently available channels is greater than the admission threshold. Since there exists a trade-off between the additional packet access delay during a burst and resource utilization depending on the time-out value of the transition timer and burst admission threshold, it is critical to understand the performance characteristics in terms of the underlying design parameters. In this paper, we develop an analytic model and present a Quasi-Birth-Death (QBD) queueing analysis for evaluating the performance of burst switching schemes. This work focuses on the trade-off studies for optimizing the time-out value of the transition timer so as to minimize the average delay performance. Theoretical performance measures are derived by means of the matrix geometric method and furthermore, some simulation results are presented to validate the proposed analytical approach.
For any pair of distinct nodes in an n-pancake graph, we give an algorithm for construction of n-1 internally disjoint paths connecting the nodes in the time complexity of polynomial order of n. The length of each path obtained and the time complexity of the algorithm are estimated theoretically and verified by computer simulation.
Masahiro SERIZAWA Hironori ITO Toshiyuki NOMURA
This paper proposes a silence compression algorithm operating at multi-rates (MR) and with dual-bandwidths (DB), a narrowband and a wideband, for the MPEG (Moving Picture Experts Group)-4 CELP (Code Excited Linear Prediction) standard. The MR/DB operations are implemented by a Variable-Frame-size/Dual-Bandwidth Voice Activity Detection (VF/DB-VAD) module with bandwidth conversions of the input signal, and a Variable-Frame-size Comfort Noise Generator (VF-CNG) module. The CNG module adaptively smoothes the Root Mean Square (RMS) value of the input signal to improve the coding quality during transition periods. The algorithm also employs a Dual-Rate Discontinuous Transmission (DR-DTX) module to reduce an average transmission bitrate during silence periods. Subjective test results show that the proposed silence compression algorithm gives no degradation in coding quality for clean and noisy speech signals. These signals include about 20 to 30% non-speech frames and the average transmission bitrates are reduced by 20 to 40%. The proposed algorithm has been adopted as a part of the ISO/IEC MPEG-4 CELP version 2 standard.
Radomir S. STANKOVI Jaakko ASTOLA
This paper presents a group theoretic approach to the design of Decision diagrams (DDs) with increased functionality of nodes. Basic characteristics of DDs determine their applications, and thus, the optimization of DDs with respect to different characteristics is an important task. Increased functionality of nodes provides for optimization of DDs. In this paper, the methods for optimization of binary DDs by pairing of variables are interpreted as the optimization of DDs by changing the domain group for the represented functions. Then, it is pointed out that, for Abelian groups, the increased functionality of nodes by using larger subgroups may improve some of the characteristics of DDs at the price of other characteristics. With this motivation, we proposed the use of non-Abelian groups for the domain of represented functions by taking advantages from basic features of their group representations. At the same time, the present methods for optimization of DDs, do not offer any criterion or efficient algorithm to choose among a variety of possible different DDs for an assumed domain group. Therefore, we propose Fourier DDs on non-Abelian groups to exploit the reduced cardinality of the Fourier spectrum on these groups.
This paper reviews the antenna system for Japanese celullar systems and PHS (Personal Handphone System). The unique features of the Japanese cellualr system are multi-band operation, compact diversity antennas, electronic beam tilting, and indoor booster systems. The original antennas for the above purpose will be described. The PHS is also a unique mobile communication system in Japan, and is mainly used for high speed, low cost data transmission. Its original antennas are also presented in this paper.
Yasumasa TSUKAMOTO Tatsuya KUNIKIYO Koji NII Hiroshi MAKINO Shuhei IWADE Kiyoshi ISHIKAWA Yasuo INOUE Norihiko KOTANI
It is still an open problem to elucidate the scaling merits of an embedded SRAM with Low Operating Power (LOP) MOSFETs fabricated in 50, 70 and 100 nm CMOS technology nodes. Taking into account a realistic SRAM cell layout, we evaluated the parasitic capacitance of the bit line (BL) as well as the word line (WL) in each generation. By means of a 3-Dimensional (3D) interconnect simulator (Raphael), we focused on the scaling merit through a comparison of the simulated SRAM BL delay for each CMOS technology node. In this paper, we propose two kinds of original interconnect structure which modify ITRS (International Technology Roadmap for Semiconductors), and make it clear that the original interconnect structures with reduced gate overlap capacitance guarantee the scaling merits of SRAM cells fabricated with LOP MOSFETs in 50 and 70 nm CMOS technology nodes.
Harri VALPOLA Erkki OJA Alexander ILIN Antti HONKELA Juha KARHUNEN
Blind separation of sources from their linear mixtures is a well understood problem. However, if the mixtures are nonlinear, this problem becomes generally very difficult. This is because both the nonlinear mapping and the underlying sources must be learned from the data in a blind manner, and the problem is highly ill-posed without a suitable regularization. In our approach, multilayer perceptrons are used as nonlinear generative models for the data, and variational Bayesian (ensemble) learning is applied for finding the sources. The variational Bayesian technique automatically provides a reasonable regularization of the nonlinear blind separation problem. In this paper, we first consider a static nonlinear mixing model, with a successful application to real-world speech data compression. Then we discuss extraction of sources from nonlinear dynamic processes, and detection of abrupt changes in the process dynamics. In a difficult test problem with chaotic data, our approach clearly outperforms currently available nonlinear prediction and change detection techniques. The proposed methods are computationally demanding, but they can be applied to blind nonlinear problems of higher dimensions than other existing approaches.
Pando GEORGIEV Andrzej CICHOCKI
In this paper we consider blind source separation (BSS) problem of signals which are spatially uncorrelated of order four, but temporally correlated of order four (for instance speech or biomedical signals). For such type of signals we propose a new sufficient condition for separation using fourth order statistics, stating that the separation is possible, if the source signals have distinct normalized cumulant functions (depending on time delay). Using this condition we show that the BSS problem can be converted to a symmetric eigenvalue problem of a generalized cumulant matrix Z(4)(b) depending on L-dimensional parameter b, if this matrix has distinct eigenvalues. We prove that the set of parameters b which produce Z(4)(b) with distinct eigenvalues form an open subset of RL, whose complement has a measure zero. We propose a new separating algorithm which uses Jacobi's method for joint diagonalization of cumulant matrices depending on time delay. We empasize the following two features of this algorithm: 1) The optimal number of matrices for joint diago- nalization is 100-150 (established experimentally), which for large dimensional problems is much smaller than those of JADE; 2) It works well even if the signals from the above class are, additionally, white (of order two) with zero kurtosis (as shown by an example).