Hiroyoshi YAMAMOTO Yoshihiko NANKAKU Chiyomi MIYAJIMA Keiichi TOKUDA Tadashi KITAMURA
This paper investigates the parameter tying structures of a mixture of factor analyzers (MFA) and discriminative training of MFA for speaker identification. The parameters of factor loading matrices or diagonal matrices are shared in different mixtures of MFA. Then, minimum classification error (MCE) training is applied to the MFA parameters to enhance the discrimination ability. The result of a text-independent speaker identification experiment shows that MFA outperforms the conventional Gaussian mixture model (GMM) with diagonal or full covariance matrices and achieves the best performance when sharing the diagonal matrices, resulting in a relative gain of 26% over the GMM with diagonal covariance matrices. The improvement is more significant especially in sparse training data condition. The recognition performance is further improved by MCE training with an additional gain of 3% error reduction.
Hirokazu TAKENOUCHI Tatsushi NAKAHARA Kiyoto TAKAHATA Ryo TAKAHASHI Hiroyuki SUZUKI
Asynchronous optical packet switching (OPS) is a promising solution to support the continuous growth of transmission capacity demand. It has been, however, quite difficult to implement key functions needed at the node of such networks with all-optical approaches. We have proposed a new optoelectronic system composed of a packet-by-packet optical clock-pulse generator (OCG), an all-optical serial-to-parallel converter (SPC), a photonic parallel-to-serial converter (PSC), and CMOS circuitry. The system makes it possible to carry out various required functions such as buffering (random access memory), optical packet compression/decompression, and optical label swapping for high-speed asynchronous optical packets.
This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual daily-life situations. The paper proposes that the expresion of affect (often translated as 'kansei' in Japanese) is the main factor differentiating laboratory speech from real-world conversational speech, and presents a framework for the specification of affect through differences in speaking style and voice quality. Having an enormous corpus of speech samples available for concatenation allows the selection of complete phrase-sized utterance segments, and changes the focus of unit selection from segmental or phonetic continuity to one of prosodic and discoursal appropriateness instead. Samples of the resulting large-corpus-based synthesis can be heard at http://feast.his.atr.jp/AESOP.
A game-theoretic analysis is applied to the evaluation of capacity and stability of a wireless ad hoc network in which each source node independently chooses a route to the destination node so as to enhance throughput. First, the throughput of individual multihop transmission with rate adaptation is evaluated. Observations from this evaluation indicate that the optimal number of hops in terms of the achievable end-to-end throughput depends on the received signal-to-noise ratio. Next, the decentralized adaptive route selection problem in which each source node competes for resources over arbitrary topologies is defined as a game. Numerical results reveal that in some cases this game has no Nash equilibria; i.e., each rational source node cannot determine a unique route. The occurrence of such cases depends on both the transmit power and spatial arrangement of the nodes. Then, the obtained network throughput under the equilibrium conditions is compared to the capacity under centralized scheduling. Numerical results reveal that when the transmit power is low, decentralized adaptive route selection may attain throughput near the capacity.
Hiroshi SHIMAMORI Teruhiko KOHAMA Tamotsu NINOMIYA
Paralleled converter system with synchronous rectifiers (SRs) causes several problems such as surge voltage, inhalation current and circulating current. Generally, the system stops operation of the SRs in light load to avoid these problems. However, simultaneously, large voltage fluctuations in the output of the modules are occurred due to forward voltage drop of diode. The fluctuations cause serious faults to the semiconductor devices working in very low voltage such as CPU and VLSI. Moreover, the voltage fluctuations generate unstable current fluctuations in the paralleled converter system with current-sharing control. This paper proposes new switching control methods for rectifiers to reduce the voltage and current fluctuations. The effectiveness of the proposed methods is confirmed by computer simulation and experimental results.
Chee Seong GOH Sze Yun SET Kazuro KIKUCHI
We report tunable optical devices based on fiber Bragg gratings (FBGs), whose filtering characteristics are controlled by strain distributions. These devices include a widely wavelength tunable filter, a tunable group-velocity dispersion (GVD) compensator, a tunable dispersion slope (DS) compensator, and a variable-bandwidth optical add/drop multiplexer (OADM), which will play important roles for next-generation reconfigurable optical networks.
Weifeng LI Tetsuya SHINDE Hiroshi FUJIMURA Chiyomi MIYAJIMA Takanori NISHINO Katunobu ITOU Kazuya TAKEDA Fumitada ITAKURA
This paper describes a new multi-channel method of noisy speech recognition, which estimates the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by distributed microphones. The advantages of the proposed method are as follows: 1) The method does not require a sensitive geometric layout, calibration of the sensors nor additional pre-processing for tracking the speech source; 2) System works in very small computation amounts; and 3) Regression weights can be statistically optimized over the given training data. Once the optimal regression weights are obtained by regression learning, they can be utilized to generate the estimated log spectrum in the recognition phase, where the speech of close-talking is no longer required. The performance of the proposed method is illustrated by speech recognition of real in-car dialogue data. In comparison to the nearest distant microphone and multi-microphone adaptive beamformer, the proposed approach obtains relative word error rate (WER) reductions of 9.8% and 3.6%, respectively.
Amaro LIMA Heiga ZEN Yoshihiko NANKAKU Keiichi TOKUDA Tadashi KITAMURA Fernando G. RESENDE
This paper presents an analysis of the applicability of Sparse Kernel Principal Component Analysis (SKPCA) for feature extraction in speech recognition, as well as, a proposed approach to make the SKPCA technique realizable for a large amount of training data, which is an usual context in speech recognition systems. Although the KPCA (Kernel Principal Component Analysis) has proved to be an efficient technique for being applied to speech recognition, it has the disadvantage of requiring training data reduction, when its amount is excessively large. This data reduction is important to avoid computational unfeasibility and/or an extremely high computational burden related to the feature representation step of the training and the test data evaluations. The standard approach to perform this data reduction is to randomly choose frames from the original data set, which does not necessarily provide a good statistical representation of the original data set. In order to solve this problem a likelihood related re-estimation procedure was applied to the KPCA framework, thus creating the SKPCA, which nevertheless is not realizable for large training databases. The proposed approach consists in clustering the training data and applying to these clusters a SKPCA like data reduction technique generating the reduced data clusters. These reduced data clusters are merged and reduced in a recursive procedure until just one cluster is obtained, making the SKPCA approach realizable for a large amount of training data. The experimental results show the efficiency of SKPCA technique with the proposed approach over the KPCA with the standard sparse solution using randomly chosen frames and the standard feature extraction techniques.
High-resolution spectrum estimation techniques have been extensively studied in recent publications. Knowledge of the noise variance is vital for spectrum estimation from noise-corrupted observations. This paper presents the use of noise compensation and data extrapolation for spectrum estimation. We assume that the observed data sequence can be represented by a set of autoregressive parameters. A recently proposed iterative algorithm is then used for noise variance estimation while autoregressive parameters are used for data extrapolation. We also present analytical results to show the exponential decay characteristics of the extrapolated samples and the frequency domain smoothing effect of data extrapolation. Some statistical results are also derived. The proposed noise-compensated data extrapolation approach is applied to both the autoregressive and FFT-based spectrum estimation methods. Finally, simulation results show the superiority of the method in terms of bias reduction and resolution improvement for sinusoids buried in noise.
In this report, we propose a tracking algorithm of speaker direction using microphones located at vertices of an equilateral triangle. The method realizes tracking by minimizing a performance index that consists of the cross spectra at three different microphone pairs in the triangular array. We adopt the steepest descent method to minimize it, and for guaranteeing global convergence to the correct direction with high accuracy, we alter the performance index during the adaptation depending on the convergence state. Through some computer simulation and experiments in a real acoustic environment, we show the effectiveness of the proposed method.
Keiji YASUDA Fumiaki SUGAYA Toshiyuki TAKEZAWA Genichiro KIKUI Seiichi YAMAMOTO Masuzo YANAGIDA
In this paper we propose an objective method for assessing the capability of a speech translation system. It automates the translation paired comparison method, which gives a simple, easy to understand TOEIC score proposed by Sugaya et al., to succinctly evaluate a speech translation system. To avoid the expensive evaluation cost of the original method where large manual effort is required, the new objective method automates the procedure by employing an objective metric such as BLEU and DP-based measure. The evaluation results obtained by the proposed method are similar to those of the original method. Also, the proposed method is used to evaluate the usefulness of a speech translation system. It is then found that our speech translation system is useful in general, even to users with higher TOEIC score than the system's.
Takatoshi JITSUHIRO Satoshi NAKAMURA
We propose a new method both for automatically creating non-uniform, context-dependent HMM topologies, and selecting the number of mixture components based on the Variational Bayesian (VB) approach. Although the Maximum Likelihood (ML) criterion is generally used to create HMM topologies, it has an over-fitting problem. Recently, to avoid this problem, the VB approach has been applied to create acoustic models for speech recognition. We introduce the VB approach to the Successive State Splitting (SSS) algorithm, which can create both contextual and temporal variations for HMMs. Experimental results indicate that the proposed method can automatically create a more efficient model than the original method. We evaluated a method to increase the number of mixture components by using the VB approach and considering temporal structures. The VB approach obtained almost the same performance as the smaller number of mixture components in comparison with that obtained by using ML-based methods.
Tetsuro UEDA Shinsuke TANAKA Dola SAHA Siuli ROY Somprakash BANDYOPADHYAY
Use of directional antenna in the context of ad hoc wireless networks can largely reduce radio interference, thereby improving the utilization of wireless medium. Our major contribution in this paper is to devise a MAC protocol that exploits the advantages of directional antenna in ad hoc networks for improved system performance. In this paper, we have illustrated a MAC protocol for ad hoc networks using directional antenna with the objective of effective utilization of the shared wireless medium. In order to implement effective MAC protocol in this context, a node should know how to set its transmission direction to transmit a packet to its neighbors and to avoid transmission in other directions where data communications are already in progress. In this paper, we are proposing a receiver-centric approach for location tracking and MAC protocol, so that, nodes become aware of its neighborhood and also the direction of the nodes for communicating directionally. A node develops its location-awareness from these neighborhood-awareness and direction-awareness. In this context, researchers usually assume that the gain of directional antennas is equal to the gain of corresponding omni-directional antenna. However, for a given amount of input power, the range R with directional antenna will be much larger than that using omni-directional antenna. In this paper, we also propose a two level transmit power control mechanism in order to approximately equalize the transmission range R of an antenna operating at omni-directional and directional mode. This will not only improve medium utilization but also help to conserve the power of the transmitting node during directional transmission. Our proposed directional MAC protocol can be effective in both ITS (Intelligent Transportation System), which we simulate in String and Parallel Topology, and in any community network, which we simulate in Random Topology. The performance evaluation on QualNet network simulator clearly indicates the efficiency of our protocol.
Masahiko MATSUSHITA Hiromitsu NISHIZAKI Takehito UTSURO Seiichi NAKAGAWA
This paper presents speech-driven Web retrieval models which accept spoken search topics (queries) in the NTCIR-3 Web retrieval task. The major focus of this paper is on improving speech recognition accuracy of spoken queries and then improving retrieval accuracy in speech-driven Web retrieval. We experimentally evaluated the techniques of combining outputs of multiple LVCSR models in recognition of spoken queries. As model combination techniques, we compared the SVM learning technique with conventional voting schemes such as ROVER. In addition, for investigating the effects on the retrieval performance in vocabulary size of the language model, we prepared two kinds of language models: the one's vocabulary size was 20,000, the other's one was 60,000. Then, we evaluated the differences in the recognition rates of the spoken queries and the retrieval performance. We showed that the techniques of multiple LVCSR model combination could achieve improvement both in speech recognition and retrieval accuracies in speech-driven text retrieval. Comparing with the retrieval accuracies when an LM with a 20,000/60,000 vocabulary size is used in an LVCSR system, we found that the larger the vocabulary size is, the better the retrieval accuracy is.
Chiu-Ching TUAN Chen-Chau YANG
Model-based movement patterns play a crucial role in evaluating the performance of mobility-dependent Personal Communication Service (PCS) strategies. This study proposes a new normal walk model to represent more closely the daily movement patterns of a mobile station (MS) in PCS networks than a conventional random walk model. A drift angle θ in this model is applied to determine the relative direction in which an MS handoffs in the next one step, based on the concepts that most real trips follow the shortest path and the directions of daily motion are mostly symmetric. Hence, θ is assumed to approach the normal distribution with the parameters: µ is set to 0and σ is in the range of 5to 90. Varying σ thus redistributes the probabilities associated with θ to make the normal mobility patterns more realistic than the random ones. Experimental results verify that the proposed normal walk is correct and valid for modeling an n-layer mesh cluster of PCS networks. Moreover, when σ = 79.5, a normal walk can almost represent, and even replace, a random walk.
Bin ZHEN Mamoru KOBAYASHI Masashi SHIMIZU
Radio frequency identification (RFID) enables everyday objects to be identified, tracked, and recorded. The RFID tags are must be extremely simple and of low cost to be suitable for large scale application. An efficient RFID anti-collision mechanism must have low access latency and low power consumption. This paper investigates how to recognize multiple RFID tags within the reader's interrogation ranges without knowing the number of tags in advance by using framed ALOHA. To optimize power consumption and overall tag read time, a combinatory model was proposed to analyze both passive and active tags with consideration on capture effect over wireless fading channels. By using the model, the parameters on tag set estimation and frame size update were presented. Simulations were conducted to verify the analysis. In addition, we come up with a proposal to combat capture effect in deterministic anti-collision algorithms.
Hideki TODE Makoto WADA Kazuhiko KINOSHITA Toshihiro MASAKI Koso MURAKAMI
A flooding algorithm is an indispensable and fundamental network control mechanism for achieving some tasks, such notifying all nodes of some information, transferring data with high reliability, getting some information from all nodes, or to reserve a route by flooding the messages in the network. In particular, the flooding algorithm is greatly effective in the heterogeneous and dynamic network environment such as so-called ubiquitous networks, whose topology is indefinite or changes dynamically and whose nodal function may be simple and less intelligent. Actually, it is applied to grasp the network topology in a sensor network or an ad-hoc network, or to retrieve content information by mobile agent systems. A flooding algorithm has the advantages of robustness and optimality by parallel processing of messages. However, the flooding mechanism has a fundamental disadvantages: it causes the message congestion in the network, and eventually increases the processing time until the flooding control is finished. In this paper, we propose and evaluate methods for producing a more efficient flooding algorithm by adopting the growth processes of primitive creatures, such as molds or microbes.
Jian-Fa QIAN Li-Na ZHANG Shi-Xin ZHU
The ring Fp + uFp + + uk-1Fp may be of interest in coding theory, which have already been used in the construction of optimal frequency-hopping sequence. In this work, cyclic codes over Fp + uFp + + uk-1Fp which is an open problem posed in [1] are considered. Namely, the structure of cyclic code over Fp + uFp + + uk-1Fp and that of their duals are derived.
Tomohiro OHNO Shigeki MATSUBARA Nobuo KAWAGUCHI Yasuyoshi INAGAKI
Spontaneously spoken Japanese includes a lot of grammatically ill-formed linguistic phenomena such as fillers, hesitations, inversions, and so on, which do not appear in written language. This paper proposes a novel method of robust dependency parsing using a large-scale spoken language corpus, and evaluates the availability and robustness of the method using spontaneously spoken dialogue sentences. By utilizing stochastic information about the appearance of ill-formed phenomena, the method can robustly parse spoken Japanese including fillers, inversions, or dependencies over utterance units. Experimental results reveal that the parsing accuracy reached 87.0%, and we confirmed that it is effective to utilize the location information of a bunsetsu, and the distance information between bunsetsus as stochastic information.
Kohsuke NISHIMURA Ryo INOHARA Masashi USAMI Shigeyuki AKIBA
Optical regeneration technique using an electro-absorption modulator (EAM) is reviewed. Simple 3R optical regeneration using an EAM was proposed and verified at 20 Gbit/s. The optical nonlinearities including cross-absorption modulation (XAM) and cross-phase modulation (XPM) induced in an EAM were quantitatively characterized by experiment. High bit-rate 2R type all-optical regeneration (wavelength conversion) at 100 Gbit/s was demonstrated by an EAM in conjunction with a delayed interferometer (DI) with required optical pulse energy of 1.5 pJ. It was verified that the operable bandwidth of the EAM-DI wavelength converter at 40 Gbit/s covered almost full range of C-band without tuning operation conditions.