Hiroshi SHIMAMORI Teruhiko KOHAMA Tamotsu NINOMIYA
Paralleled converter system with synchronous rectifiers (SRs) causes several problems such as surge voltage, inhalation current and circulating current. Generally, the system stops operation of the SRs in light load to avoid these problems. However, simultaneously, large voltage fluctuations in the output of the modules are occurred due to forward voltage drop of diode. The fluctuations cause serious faults to the semiconductor devices working in very low voltage such as CPU and VLSI. Moreover, the voltage fluctuations generate unstable current fluctuations in the paralleled converter system with current-sharing control. This paper proposes new switching control methods for rectifiers to reduce the voltage and current fluctuations. The effectiveness of the proposed methods is confirmed by computer simulation and experimental results.
Satoshi NAKAMURA Kazuya TAKEDA Kazumasa YAMAMOTO Takeshi YAMADA Shingo KUROIWA Norihide KITAOKA Takanobu NISHIURA Akira SASOU Mitsunori MIZUMACHI Chiyomi MIYAJIMA Masakiyo FUJIMOTO Toshiki ENDO
This paper introduces an evaluation framework for Japanese noisy speech recognition named AURORA-2J. Speech recognition systems must still be improved to be robust to noisy environments, but this improvement requires development of the standard evaluation corpus and assessment technologies. Recently, the Aurora 2, 3 and 4 corpora and their evaluation scenarios have had significant impact on noisy speech recognition research. The AURORA-2J is a Japanese connected digits corpus and its evaluation scripts are designed in the same way as Aurora 2 with the help of European Telecommunications Standards Institute (ETSI) AURORA group. This paper describes the data collection, baseline scripts, and its baseline performance. We also propose a new performance analysis method that considers differences in recognition performance among speakers. This method is based on the word accuracy per speaker, revealing the degree of the individual difference of the recognition performance. We also propose categorization of modifications, applied to the original HTK baseline system, which helps in comparing the systems and in recognizing technologies that improve the performance best within the same category.
Wook SHIN Jong-Youl PARK Dong-Ik LEE
The current scheme of access control judges the legality of each access based on immediate information without considering associate information hidden in a series of accesses. Due to the deficiency, access control systems do not efficiently limit attacks consist of ordinary operations. For trusted operating system developments, we extended RBAC and added negative procedural constraints to refuse those attacks. With the procedural constraints, the access control of trusted operating systems can discriminate attack trials from normal behaviors. This paper shows the specification of the extended concept and model, and presents simple analysis results.
This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual daily-life situations. The paper proposes that the expresion of affect (often translated as 'kansei' in Japanese) is the main factor differentiating laboratory speech from real-world conversational speech, and presents a framework for the specification of affect through differences in speaking style and voice quality. Having an enormous corpus of speech samples available for concatenation allows the selection of complete phrase-sized utterance segments, and changes the focus of unit selection from segmental or phonetic continuity to one of prosodic and discoursal appropriateness instead. Samples of the resulting large-corpus-based synthesis can be heard at http://feast.his.atr.jp/AESOP.
Weifeng LI Tetsuya SHINDE Hiroshi FUJIMURA Chiyomi MIYAJIMA Takanori NISHINO Katunobu ITOU Kazuya TAKEDA Fumitada ITAKURA
This paper describes a new multi-channel method of noisy speech recognition, which estimates the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by distributed microphones. The advantages of the proposed method are as follows: 1) The method does not require a sensitive geometric layout, calibration of the sensors nor additional pre-processing for tracking the speech source; 2) System works in very small computation amounts; and 3) Regression weights can be statistically optimized over the given training data. Once the optimal regression weights are obtained by regression learning, they can be utilized to generate the estimated log spectrum in the recognition phase, where the speech of close-talking is no longer required. The performance of the proposed method is illustrated by speech recognition of real in-car dialogue data. In comparison to the nearest distant microphone and multi-microphone adaptive beamformer, the proposed approach obtains relative word error rate (WER) reductions of 9.8% and 3.6%, respectively.
Amaro LIMA Heiga ZEN Yoshihiko NANKAKU Keiichi TOKUDA Tadashi KITAMURA Fernando G. RESENDE
This paper presents an analysis of the applicability of Sparse Kernel Principal Component Analysis (SKPCA) for feature extraction in speech recognition, as well as, a proposed approach to make the SKPCA technique realizable for a large amount of training data, which is an usual context in speech recognition systems. Although the KPCA (Kernel Principal Component Analysis) has proved to be an efficient technique for being applied to speech recognition, it has the disadvantage of requiring training data reduction, when its amount is excessively large. This data reduction is important to avoid computational unfeasibility and/or an extremely high computational burden related to the feature representation step of the training and the test data evaluations. The standard approach to perform this data reduction is to randomly choose frames from the original data set, which does not necessarily provide a good statistical representation of the original data set. In order to solve this problem a likelihood related re-estimation procedure was applied to the KPCA framework, thus creating the SKPCA, which nevertheless is not realizable for large training databases. The proposed approach consists in clustering the training data and applying to these clusters a SKPCA like data reduction technique generating the reduced data clusters. These reduced data clusters are merged and reduced in a recursive procedure until just one cluster is obtained, making the SKPCA approach realizable for a large amount of training data. The experimental results show the efficiency of SKPCA technique with the proposed approach over the KPCA with the standard sparse solution using randomly chosen frames and the standard feature extraction techniques.
Minkyu PARK Sangchul HAN Heeheon KIM Seongje CHO Yookun CHO
Multiprocessor architecture becomes common on real-time systems as the workload of real-time systems increases. Recently new deadline-based (EDF-based) multiprocessor scheduling algorithms are devised, and comparative studies on the performance of these algorithms are necessary. In this paper, we compare EDZL, a hybrid of EDF and LLF, with other deadline-based scheduling algorithms such as EDF, EDF-US[m/(2m-1)], and fpEDF. We show EDZL schedules all task sets schedulable by EDF. The experimental results show that the number of preemptions of EDZL is comparable to that of EDF and the schedulable utilization bound of EDZL is higher than those of other algorithms we consider.
Hiroyoshi YAMAMOTO Yoshihiko NANKAKU Chiyomi MIYAJIMA Keiichi TOKUDA Tadashi KITAMURA
This paper investigates the parameter tying structures of a mixture of factor analyzers (MFA) and discriminative training of MFA for speaker identification. The parameters of factor loading matrices or diagonal matrices are shared in different mixtures of MFA. Then, minimum classification error (MCE) training is applied to the MFA parameters to enhance the discrimination ability. The result of a text-independent speaker identification experiment shows that MFA outperforms the conventional Gaussian mixture model (GMM) with diagonal or full covariance matrices and achieves the best performance when sharing the diagonal matrices, resulting in a relative gain of 26% over the GMM with diagonal covariance matrices. The improvement is more significant especially in sparse training data condition. The recognition performance is further improved by MCE training with an additional gain of 3% error reduction.
Ian R. LANE Tatsuya KAWAHARA Tomoko MATSUI Satoshi NAKAMURA
An efficient, scalable speech recognition architecture combining topic detection and topic-dependent language modeling is proposed for multi-domain spoken language systems. In the proposed approach, the inferred topic is automatically detected from the user's utterance, and speech recognition is then performed by applying an appropriate topic-dependent language model. This approach enables users to freely switch between domains while maintaining high recognition accuracy. As topic detection is performed on a single utterance, detection errors may occur and propagate through the system. To improve robustness, a hierarchical back-off mechanism is introduced where detailed topic models are applied when topic detection is confident and wider models that cover multiple topics are applied in cases of uncertainty. The performance of the proposed architecture is evaluated when combined with two topic detection methods: unigram likelihood and SVMs (Support Vector Machines). On the ATR Basic Travel Expression Corpus, both methods provide a significant reduction in WER (9.7% and 10.3%, respectively) compared to a single language model system. Furthermore, recognition accuracy is comparable to performing decoding with all topic-dependent models in parallel, while the required computational cost is much reduced.
Takashi SAITO Masaharu SAKAMOTO
This paper presents a new framework for effectively creating VoiceFonts for speech synthesis. A VoiceFont in this paper represents a voice inventory aimed at generating personalized voices. Creating well-formed voice inventories is a time-consuming and laborious task. This has become a critical issue for speech synthesis systems that make an attempt to synthesize many high quality voice personalities. The framework we propose here aims to drastically reduce the burden with a twofold approach. First, in order to substantially enhance the accuracy and robustness of automatic speech segmentation, we introduce a multi-layered speech segmentation algorithm with a new measure of segmental reliability. Secondly, to minimize the amount of human intervention in the process of VoiceFont creation, we provide easy-to-use functions in a data viewer and compiler to facilitate checking and validation of the automatically extracted data. We conducted experiments to investigate the accuracy of the automatic speech segmentation, and its robustness to speaker and style variations. The results of the experiments on six speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases. In addition, to subjectively evaluate VoiceFonts created by using the framework, we conducted a listening test for speaker recognizability. The results show that the voice personalities of synthesized speech generated by the VoiceFont-based speech synthesizer are fairly close to those of the donor speakers.
Hiromitsu BAN Chiyomi MIYAJIMA Katsunobu ITOU Kazuya TAKEDA Fumitada ITAKURA
Behavioral synchronization between speech and finger tapping provides a novel approach to improving speech recognition accuracy. We combine a sequence of finger tapping timings recorded alongside an utterance using two distinct methods: in the first method, HMM state transition probabilities at the word boundaries are controlled by the timing of the finger tapping; in the second, the probability (relative frequency) of the finger tapping is used as a 'feature' and combined with MFCC in a HMM recognition system. We evaluate these methods through connected digit recognition under different noise conditions (AURORA-2J). Leveraging the synchrony between speech and finger tapping provides a 46% relative improvement in connected digit recognition experiments.
Hirokazu TAKENOUCHI Tatsushi NAKAHARA Kiyoto TAKAHATA Ryo TAKAHASHI Hiroyuki SUZUKI
Asynchronous optical packet switching (OPS) is a promising solution to support the continuous growth of transmission capacity demand. It has been, however, quite difficult to implement key functions needed at the node of such networks with all-optical approaches. We have proposed a new optoelectronic system composed of a packet-by-packet optical clock-pulse generator (OCG), an all-optical serial-to-parallel converter (SPC), a photonic parallel-to-serial converter (PSC), and CMOS circuitry. The system makes it possible to carry out various required functions such as buffering (random access memory), optical packet compression/decompression, and optical label swapping for high-speed asynchronous optical packets.
Chee Seong GOH Sze Yun SET Kazuro KIKUCHI
We report tunable optical devices based on fiber Bragg gratings (FBGs), whose filtering characteristics are controlled by strain distributions. These devices include a widely wavelength tunable filter, a tunable group-velocity dispersion (GVD) compensator, a tunable dispersion slope (DS) compensator, and a variable-bandwidth optical add/drop multiplexer (OADM), which will play important roles for next-generation reconfigurable optical networks.
This paper analytically formulates both the optimal quantization noise allocation ratio and the coding gain of the two-dimensional morphological Haar wavelet transform. The two-dimensional morphological Haar wavelet transform has been proposed as a nonlinear wavelet transform. It has been anticipated for application to nonlinear transform coding. To utilize a transformation to transform coding, both the optimal quantization noise allocation ratio and the coding gain of the transformation should be derived beforehand regardless of whether the transformation is linear or nonlinear. The derivation is crucial for progress of nonlinear transform image coding with nonlinear wavelet because the two-dimensional morphological Haar wavelet is the most basic nonlinear wavelet. We derive both the optimal quantization noise allocation ratio and the coding gain of the two-dimensional morphological Haar wavelet transform by introducing appropriate approximations to handle the cumbersome nonlinear operator included in the transformation. Numerical experiments confirmed the validity of formulations.
Yohei ITAYA Heiga ZEN Yoshihiko NANKAKU Chiyomi MIYAJIMA Keiichi TOKUDA Tadashi KITAMURA
This paper investigates the effectiveness of the DAEM (Deterministic Annealing EM) algorithm in acoustic modeling for speaker and speech recognition. Although the EM algorithm has been widely used to approximate the ML estimates, it has the problem of initialization dependence. To relax this problem, the DAEM algorithm has been proposed and confirmed the effectiveness in artificial small tasks. In this paper, we applied the DAEM algorithm to practical speech recognition tasks: speaker recognition based on GMMs and continuous speech recognition based on HMMs. Experimental results show that the DAEM algorithm can improve the recognition performance as compared to the standard EM algorithm with conventional initialization algorithms, especially in the flat start training for continuous speech recognition.
Sang Gu KANG Doo Hyung WOO Hee Chul LEE
Transferring the image information in analog form between the focal plane array (FPA) and the external electronics causes the disturbance of the outside noise. On-chip analog-to-digital (A/D) converter into the readout integrated circuit (ROIC) can eliminate the possibilities of the cross-talk of noise. Also, the information can be transported more efficiently in power in the digital domain compared to the analog domain. In designing on-chip A/D converter for cooled type high density infrared detector array, the most stringent requirements are power dissipation, number of bits, die area and throughput. In this study, pipelined type A/D converter was adopted because it has high operation speed characteristics with medium power consumption. Capacitor averaging technique and digital error correction for high resolution was used to eliminate the error which is brought out from the device mismatch. The readout circuit was fabricated using 0.6 µm CMOS process for 128 128 mid-wavelength infrared (MWIR) HgCdTe detector array. Fabricated circuit used direct injection type for input stage, and then S/N ratio could be maximized with increasing the integration capacitor. The measured performance of the 14 b A/D converter exhibited 0.2 LSB differential non-linearity (DNL) and 4 LSB integral non-linearity (INL). A/D converter had a 1 MHz operation speed with 75 mW power dissipation at 5 V. It took the die area of 5.6 mm2. It showed the good performance that can apply for cooled type high density infrared detector array.
Dae-Won KIM Young-il KIM Doheon LEE Kwang Hyung LEE
In this paper, conventional validity indexes are reviewed and the shortcomings of the fuzzy cluster validation index based on inter-cluster proximity are examined. Based on these considerations, a new cluster validity index is proposed for fuzzy partitions obtained from the fuzzy c-means algorithm. The proposed validity index is defined as the average value of the relative intersections of all possible pairs of fuzzy clusters in the system. It computes the overlap between two fuzzy clusters by considering the intersection of each data point in the overlap. The optimal number of clusters is obtained by minimizing the validity index with respect to c. Experiments in which the proposed validity index and several conventional validity indexes were applied to well known data sets highlight the superior qualities of the proposed index.
Hideki TODE Makoto WADA Kazuhiko KINOSHITA Toshihiro MASAKI Koso MURAKAMI
A flooding algorithm is an indispensable and fundamental network control mechanism for achieving some tasks, such notifying all nodes of some information, transferring data with high reliability, getting some information from all nodes, or to reserve a route by flooding the messages in the network. In particular, the flooding algorithm is greatly effective in the heterogeneous and dynamic network environment such as so-called ubiquitous networks, whose topology is indefinite or changes dynamically and whose nodal function may be simple and less intelligent. Actually, it is applied to grasp the network topology in a sensor network or an ad-hoc network, or to retrieve content information by mobile agent systems. A flooding algorithm has the advantages of robustness and optimality by parallel processing of messages. However, the flooding mechanism has a fundamental disadvantages: it causes the message congestion in the network, and eventually increases the processing time until the flooding control is finished. In this paper, we propose and evaluate methods for producing a more efficient flooding algorithm by adopting the growth processes of primitive creatures, such as molds or microbes.
This paper proposes a novel multi-layer approach to fundamental frequency modeling for concatenative speech synthesis based on a statistical learning technique called additive models. We define an additive F0 contour model consisting of long-term, intonational phrase-level, component and short-term, accentual phrase-level, component, along with a least-squares error criterion that includes a regularization term. A backfitting algorithm, that is derived from this error criterion, estimates both components simultaneously by iteratively applying cubic spline smoothers. When this method is applied to a 7,000 utterance Japanese speech corpus, it achieves F0 RMS errors of 28.9 and 29.8 Hz on the training and test data, respectively, with corresponding correlation coefficients of 0.806 and 0.777. The automatically determined intonational and accentual phrase components turn out to behave smoothly, systematically, and intuitively under a variety of prosodic conditions.
Chiu-Ching TUAN Chen-Chau YANG
Model-based movement patterns play a crucial role in evaluating the performance of mobility-dependent Personal Communication Service (PCS) strategies. This study proposes a new normal walk model to represent more closely the daily movement patterns of a mobile station (MS) in PCS networks than a conventional random walk model. A drift angle θ in this model is applied to determine the relative direction in which an MS handoffs in the next one step, based on the concepts that most real trips follow the shortest path and the directions of daily motion are mostly symmetric. Hence, θ is assumed to approach the normal distribution with the parameters: µ is set to 0and σ is in the range of 5to 90. Varying σ thus redistributes the probabilities associated with θ to make the normal mobility patterns more realistic than the random ones. Experimental results verify that the proposed normal walk is correct and valid for modeling an n-layer mesh cluster of PCS networks. Moreover, when σ = 79.5, a normal walk can almost represent, and even replace, a random walk.