In this paper, we describe the recent trend in automatic speech recognition. First, we should point out that the current art of speech recognition by machines is admittedly inferior to the ability of human beings. In particular, we assert that the improvement of acoustic models is necessary. Second, we describe robust feature parameters for noisy environments, which are important in practical usage. Then, we indicate that much training data in the same environment as the recognition stage are useful from the viewpoints of information theory and pattern recognition. Third, we discuss acoustic models and language models which are central issues in speech recognition techniques. Then the principle and limitations of the hidden Markov model (HMM) and recent extended models are discussed. The role of language models is to eliminate improbable candidate words, that is, to reduce the search space. In other words, language models having smaller entropy are preferable. From this standpoint, we survey stochastic language models. Finally, we state some points which deserve attention when constructing speech recognition systems.
Toshinori TAKAI Yuichi KAJI Hiroyuki SEKI
We propose a new decidable subclass of term rewriting systems (TRSs) for which strongly normalizing (SN) property is decidable. The new class is called almost orthogonal inverse finite path overlapping TRSs (AO-FPO-1-TRSs) and the class properly includes AO growing TRSs for which SN is decidable. Tree automata technique is used to show that SN is decidable for AO-FPO-1-TRSs.
Sang-Kook HAN Duk-Ho JEON Hyun-Do JUNG
Two novel linearization processes in electro-absorption-modulator (EAM) are proposed and demonstrated. These two modulation schemes are used to compensate the nonlinear component of the EAM by controlling the DC bias voltages of the each EAM separately. The simulations on the nonlinearity of EAM and linearization process are performed in both time and frequency domains. From a serially cascaded modulation simulation, a reduction of 16 dB in IMD3, 45 dB in IMD5 and the following increase of 15 dB in linear dynamic rage (LDR) are achieved. In dual-parallel modulation experiment at 8 GHz, a reduction of 23 dB in IMD3 and the following increase of 15.1 dB in LDR of are achieved compared to those of a single EAM operation.
Yoshio KAMEDA Shinichi YOROZU Shuichi TAHARA
We describe the logic design of a single-flux-quantum (SFQ) 22 unit switch. It is the main component of the SFQ Banyan packet switch we are developing that enables a switching capacity of over 1 Tbit/s. In this paper, we focus on the design of the controller in the unit switch. The controller does not have a simple "off-the-shelf" conventional circuit, like those used in shift registers or adders. To design such a complicated random logic circuit, we need to adopt a systematic top-down design approach. Using a graphical technique, we first obtained logic functions. Next, to use the deep pipeline architecture, we broke down the functions into one-level logic operations that can be executed within one clock cycle. Finally, we mapped the functions on to the physical circuits using pre-designed SFQ standard cells. The 22 unit switch consists of 59 logic gates and needs about 600 Josephson junctions without gate interconnections. We tested the gate-level circuit by logic simulation and found that it operates correctly at a throughput of 40 GHz.
Keiichi TOKUDA Takashi MASUKO Noboru MIYAZAKI Takao KOBAYASHI
This paper proposes a new kind of hidden Markov model (HMM) based on multi-space probability distribution, and derives a parameter estimation algorithm for the extended HMM. HMMs are widely used statistical models for characterizing sequences of speech spectra, and have been successfully applied to speech recognition systems. HMMs are categorized into discrete HMMs and continuous HMMs, which can model sequences of discrete symbols and continuous vectors, respectively. However, we cannot apply both the conventional discrete and continuous HMMs to observation sequences which consist of continuous values and discrete symbols: F0 pattern modeling of speech is a good illustration. The proposed HMM includes discrete HMM and continuous HMM as special cases, and furthermore, can model sequences which consist of observation vectors with variable dimensionality and discrete symbols.
Hiroshi NAGAHASHI Mohamed IMINE
This paper develops a simple algorithm for calculating a polynomial curve or surface in a parallel way. The number of arithmetic operations and the necessary time for the calculation are evaluated in terms of polynomial degree and resolution of a curve and the number of processors used. We made some comparisons between our method and a conventional method for generating polynomial curves and surfaces, especially in computation time and approximation error due to the reduction of the polynomial degree. It is shown that our method can perform fast calculation within tolerable error.
Masahiro ISHIKAWA Kazutaka FURUSE Hanxiong CHEN Nobuo OHBO
Clustering is one of the most important topics in the field of knowledge discovery from databases. Especially, hierarchical clustering is useful since it gives a hierarchical view of a whole database and can be used to guide users in browsing a huge database. In many cases, clustering can be modeled as a graph partitioning problem. When an appropriate distance function between database objects is given, a database can be viewed as an edge-weighted complete graph, where vertices are database objects and weights of edges are distances between them. Then a process of MST (Minimal Spanning Tree) construction can be viewed as a process of a single-linkage agglomerative clustering process for database objects. In this paper, we propose an efficient MST construction method for a large complete metric graph, which is derived from a database with a metric distance function defined on it. Our method utilizes a metric index to reduce the number of distance calculations. The basic idea is to exclude those edges less probable to be a part of an MST by using the metric postulate. For this purpose, we introduce a new metric index named MetricMatrix. Experimental results show that our method can drastically reduce the number of distance calculations needed for MST construction in comparison with the classical method.
Johannes KNEIP Matthias WEISS Wolfram DRESCHER Volker AUE Jurgen STROBEL Thomas OBERTHUR Michael BOLLE Gerhard FETTWEIS
This paper presents the HiperSonic 1, a multi-standard, application-specific signal processor, designed to execute the baseband conversion algorithms in IEEE802.11a- and HIPERLAN/2-based 5 GHz wireless LAN applications. In contrast to widely existing, dedicated implementations, most of the computational effort here was mapped onto a configurable, data- and instruction-parallel DSP core. The core is supplemented by mixed signal A/D, D/A converters and hardware accelerators. Memory and register architecture, instruction set and peripheral interfaces of the chip were carefully optimized for the targeted applications, leading to a sound combination of flexibility, die area and power consumption. The 120 MHz, 7.6 million-transistor solution was implemented in 0.18 µm CMOS and performs IEEE802.11a or HiperLAN/2 compliant baseband processing at data rates up to 60 Mbit/s.
Hiroyuki EHARA Koji YOSHIDA Kazutoshi YASUNAGA Toshiyuki MORII
This paper presents a high quality 4-kbit/s speech coding algorithm based on a CELP algorithm. The coder operates on speech frames of 20 ms. The algorithm has following four main features: multiple sub-codebooks, backward adaptive mode switching, dispersed-pulse structure, and noise post-processing. The multiple sub-codebooks consist of a pulse-codebook and a random-codebook so that they can handle both signals, noise-like (e.g. unvoiced, stationary noise) and pulse-like (e.g. voiced). The backward adaptive mode switching is performed using decoded parameters; therefore, no additional mode bit is transmitted. The random-codebook size is switched with the backward adaptively selected mode. The subjective quality of unvoiced speech or noise-like signal can be improved by this switching operation because the random-codebook size is greatly increased in such signal mode. The dispersed-pulse structure provides better performance of sparse pulse excitation using dispersed pulses instead of simple unit pulses. The noise post-processing employs a stationary background noise generator for producing stationary noise signal. It significantly improves subjective quality of decoded signal under various background noise conditions. Subjective listening tests are conducted in accordance with ACR and DCR tests. The ACR test results indicate that the fundamental performance of the MDP-CELP is equivalent to that of 32-kbit/s adaptive differential pulse code modulation (ADPCM). The DCR test results show that the performance of the MDP-CELP is equivalent to or better than that of 8-kbit/s conjugate-structure algebraic code excited linear prediction (CS-ACELP) under several background noise conditions.
Kenji SATO Shoichiro KUWAHARA Yutaka MIYAMOTO Koichi MURATA Hiroshi MIYAZAWA
Phase-inversion between neighboring pulses appearing in carrier-suppressed return-to-zero pulses is effective in reducing the signal distortion due to chromatic dispersion and nonlinear effects. A generation method of the anti-phase pulses at 40 GHz is demonstrated by using semiconductor mode-locked lasers integrated with chirped gratings. Operation principle and pulse characteristics are described. Suppression of pulse distortion due to fiber dispersion is confirmed for generated anti-phase pulses. Repeaterless 150-km dispersion-shifted-fiber L-band transmission at 42.7 Gbit/s is demonstrated by using the pulse source.
Takaaki MIZUKI Takao NISHIZEKI
Suppose that there are players in two hierarchical groups and a computationally unlimited eavesdropper. Using a random deal of cards, a player in the higher group wishes to send a one-bit message information-theoretically securely either to all the players in her group or to all the players in the two groups. This can be done by the so-called 2-level key set protocol. In this paper we give a necessary and sufficient condition for the 2-level key set protocol to succeed.
Young I. SON Hyungbo SHIM Kyoung-cheol PARK Jin H. SEO
We present a state-space approach to the problem of designing a parallel feedforward compensator (PFC), which has the same dimension of the input i.e. input-dimensional, for a class of non-square linear systems such that the closed-loop system is strictly passive. For a non-minimum phase system or a system with high relative degree, passification of the system cannot be achieved by any other methodologies except by using a PFC. In our scheme, we first determine a squaring gain matrix and an additional dynamics that is connected to the system in a feedforward way, then a static passifying control law is designed. Consequently, the actual feedback controller will be the static control law combined with the feedforward dynamics. Necessary and sufficient conditions for the existence of the PFC are given by the static output feedback formulation, which enables to utilize linear matrix inequality (LMI). Since the proposed PFC is input-dimensional, our design procedure can be viewed as a solution to the low-order dynamic output feedback control problem in the literature. The effectiveness of the proposed method is illustrated by some numerical examples.
Much has been said and written about the changes in analog IC technology such as shrinking line widths, vanishingly low supply voltages, severe power limitations, and digital noise. But beyond these technology changes and their subsequent methodology changes, a far more subtle revolution is happening in the nature of the profession itself. Technology, software, and product evolution have all conspired to create a new kind of analog IC designer, one very different from the IC designers of the past.
Pi-Chung WANG Chia-Tai CHAN Yaw-Chung CHEN
In the previous work, Lampson et al. proposed an IP lookup algorithm which performs binary search on prefixes (BSP). The algorithm is attractive, even for IPv6, because of its bounded worst-case memory requirement. To achieve fast forwarding, it may need to slow down the insertion speed. Although this can be justified, the routing-table reconstruction in BSP is too time-consuming to handle the frequent route updates. In this work, we propose a fast forwarding-table construction algorithm which can accomplish more than 4,000 route updates per second. Moreover, it is simple enough to fulfill the need of fast packet forwarding. With the enhanced multiway search tree, we further reduced the depth of the tree and eliminated the pointer storage; this reduces the forwarding table size and shortens the lookup time.
This letter proposes a new approach for feature extraction using steerable filters. This approach is based on the concept of orientation-energy histogram which yields the local direction of dominant orientation. The testing is carried out using a training set of 1000 and a set of 300 unknown 40 40 hand-written digits. As a result of the simulations, 92% correct recognition is provided.
Miodrag J. MIHALJEVIC Hideki IMAI
It is shown that the effective secret-key size of TOYOCRYPT-HS1 stream cipher is only 96 bits, although the secret key consists of 128 bits. This characteristic opens a door for developing an algorithm for cryptanalysis based on the time-memory-data trade-off with the overall complexity significantly smaller than the exhaustive search over the effective key space.
Ichiro TAKASHIMA Riichi KAJIWARA Toshio IIJIMA
The concept of a "standardized brain" is familiar in modern functional neuro-imaging techniques including PET and fMRI, but it has never been adopted for optical imaging studies that deal with a regional cortical area rather than the whole brain. In this paper, we propose a "standardized barrel cortex" for rodents, and present a method for mapping optically detected neural activity onto the standard cortex. The standard cortex is defined as a set of simple cortical columns, which are modeled on the cytoarchitectonic patterns of cell aggregates in cortical layer IV of the barrel cortex. Referring to its underlying anatomical structure, the method warps the surface image of individual cortices to fit the standard cortex. The cortex is warped using a two-dimensional free-form deformation technique with direct manipulation. Since optical imaging provides a map of neural activity on the cortical surface, the warping consequently remaps it on the standard cortex. Data presented in this paper show that somatosensory evoked neural activity is successfully represented on the standardized cortex, suggesting that the combination of optical imaging with our method is a promising approach for investigating the functional architecture of the cortex.
Takayoshi TAKEHARA Hideki TODE Koso MURAKAMI
The requirement to realize large-capacity, high-speed and guaranteed Quality of Service (QoS) communications in IP networks is a recent development. A technique to satisfy these requirements, Multi-Protocol Label Switching (MPLS) is the focus of this paper. In the future, it is expected that congestion and faults on a Label Switched Path (LSP) will seriously affect service contents because various applications are densely served in a large area. In MPLS, however, methods to solve these problems are not clear. Therefore, this study proposes a concrete traffic engineering method to avoid heavy congestion, and at the same time, endeavors to realize a fault-tolerant network by autonomous restoration, or self-healing.
Fu-Kun CHEN Jar-Ferr YANG Yu-Pin LIN
For multimedia communications, the computational scalability of a multimedia codec is required to match with different working platforms and integrated services of media sources. In this paper, two condensed stochastic codebook search approaches are proposed to progressively reduce the computation required for the algebraic code excited linear predictive (ACELP) and multi-pulse maximum likelihood quantization (MP-MLQ) coders. By reducing the candidates of the codebook before search procedure, the proposed methods can effectively diminish the computation required for the ITU-T G.723.1 dual rate speech coder. Simulation results show that the proposed methods can save over 50 percent for the stochastic codebook search with perceptually intangible degradation in speech quality.
Yuichi TOHMORI Hiroyuki ISHII Hiromi OOHASHI Yuzo YOSHIKUNI
This paper describes the recent progress made in developing wavelength tunable semiconductor light sources for WDM applications. Wide and quasi-continuous wavelength tunings were investigated for a wavelength-selectable laser and a wavelength tunable distributed Bragg reflector (DBR) laser having a super structure grating (SSG). A wavelength-selectable laser consisting of a DFB laser array, a multi-mode interferometer (MMI), and a semiconductor optical amplifier (SOA) demonstrated a quasi-continuous tuning range of 46.9 nm by using temperature control. A wavelength-tunable DBR laser with SSG exhibited a quasi-continuous tuning range of 62.4 nm by using three tuning current controls. Wavelength stabilization was also demonstrated under the temperature variations of 5.