The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] Al(20498hit)

12541-12560hit(20498hit)

  • New Switching Control for Synchronous Rectifications in Low-Voltage Paralleled Converter System without Voltage and Current Fluctuations

    Hiroshi SHIMAMORI  Teruhiko KOHAMA  Tamotsu NINOMIYA  

     
    PAPER-Electronic Circuits

      Vol:
    E88-C No:3
      Page(s):
    395-402

    Paralleled converter system with synchronous rectifiers (SRs) causes several problems such as surge voltage, inhalation current and circulating current. Generally, the system stops operation of the SRs in light load to avoid these problems. However, simultaneously, large voltage fluctuations in the output of the modules are occurred due to forward voltage drop of diode. The fluctuations cause serious faults to the semiconductor devices working in very low voltage such as CPU and VLSI. Moreover, the voltage fluctuations generate unstable current fluctuations in the paralleled converter system with current-sharing control. This paper proposes new switching control methods for rectifiers to reduce the voltage and current fluctuations. The effectiveness of the proposed methods is confirmed by computer simulation and experimental results.

  • AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition

    Satoshi NAKAMURA  Kazuya TAKEDA  Kazumasa YAMAMOTO  Takeshi YAMADA  Shingo KUROIWA  Norihide KITAOKA  Takanobu NISHIURA  Akira SASOU  Mitsunori MIZUMACHI  Chiyomi MIYAJIMA  Masakiyo FUJIMOTO  Toshiki ENDO  

     
    PAPER-Speech Corpora and Related Topics

      Vol:
    E88-D No:3
      Page(s):
    535-544

    This paper introduces an evaluation framework for Japanese noisy speech recognition named AURORA-2J. Speech recognition systems must still be improved to be robust to noisy environments, but this improvement requires development of the standard evaluation corpus and assessment technologies. Recently, the Aurora 2, 3 and 4 corpora and their evaluation scenarios have had significant impact on noisy speech recognition research. The AURORA-2J is a Japanese connected digits corpus and its evaluation scripts are designed in the same way as Aurora 2 with the help of European Telecommunications Standards Institute (ETSI) AURORA group. This paper describes the data collection, baseline scripts, and its baseline performance. We also propose a new performance analysis method that considers differences in recognition performance among speakers. This method is based on the word accuracy per speaker, revealing the degree of the individual difference of the recognition performance. We also propose categorization of modifications, applied to the original HTK baseline system, which helps in comparing the systems and in recognizing technologies that improve the performance best within the same category.

  • Extended Role Based Access Control with Procedural Constraints for Trusted Operating Systems

    Wook SHIN  Jong-Youl PARK  Dong-Ik LEE  

     
    PAPER-Application Information Security

      Vol:
    E88-D No:3
      Page(s):
    619-627

    The current scheme of access control judges the legality of each access based on immediate information without considering associate information hidden in a series of accesses. Due to the deficiency, access control systems do not efficiently limit attacks consist of ordinary operations. For trusted operating system developments, we extended RBAC and added negative procedural constraints to refuse those attacks. With the procedural constraints, the access control of trusted operating systems can discriminate attack trials from normal behaviors. This paper shows the specification of the extended concept and model, and presents simple analysis results.

  • Developments in Corpus-Based Speech Synthesis: Approaching Natural Conversational Speech

    Nick CAMPBELL  

     
    INVITED PAPER

      Vol:
    E88-D No:3
      Page(s):
    376-383

    This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual daily-life situations. The paper proposes that the expresion of affect (often translated as 'kansei' in Japanese) is the main factor differentiating laboratory speech from real-world conversational speech, and presents a framework for the specification of affect through differences in speaking style and voice quality. Having an enormous corpus of speech samples available for concatenation allows the selection of complete phrase-sized utterance segments, and changes the focus of unit selection from segmental or phonetic continuity to one of prosodic and discoursal appropriateness instead. Samples of the resulting large-corpus-based synthesis can be heard at http://feast.his.atr.jp/AESOP.

  • Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones

    Weifeng LI  Tetsuya SHINDE  Hiroshi FUJIMURA  Chiyomi MIYAJIMA  Takanori NISHINO  Katunobu ITOU  Kazuya TAKEDA  Fumitada ITAKURA  

     
    PAPER-Feature Extraction and Acoustic Medelings

      Vol:
    E88-D No:3
      Page(s):
    384-390

    This paper describes a new multi-channel method of noisy speech recognition, which estimates the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by distributed microphones. The advantages of the proposed method are as follows: 1) The method does not require a sensitive geometric layout, calibration of the sensors nor additional pre-processing for tracking the speech source; 2) System works in very small computation amounts; and 3) Regression weights can be statistically optimized over the given training data. Once the optimal regression weights are obtained by regression learning, they can be utilized to generate the estimated log spectrum in the recognition phase, where the speech of close-talking is no longer required. The performance of the proposed method is illustrated by speech recognition of real in-car dialogue data. In comparison to the nearest distant microphone and multi-microphone adaptive beamformer, the proposed approach obtains relative word error rate (WER) reductions of 9.8% and 3.6%, respectively.

  • Applying Sparse KPCA for Feature Extraction in Speech Recognition

    Amaro LIMA  Heiga ZEN  Yoshihiko NANKAKU  Keiichi TOKUDA  Tadashi KITAMURA  Fernando G. RESENDE  

     
    PAPER-Feature Extraction and Acoustic Medelings

      Vol:
    E88-D No:3
      Page(s):
    401-409

    This paper presents an analysis of the applicability of Sparse Kernel Principal Component Analysis (SKPCA) for feature extraction in speech recognition, as well as, a proposed approach to make the SKPCA technique realizable for a large amount of training data, which is an usual context in speech recognition systems. Although the KPCA (Kernel Principal Component Analysis) has proved to be an efficient technique for being applied to speech recognition, it has the disadvantage of requiring training data reduction, when its amount is excessively large. This data reduction is important to avoid computational unfeasibility and/or an extremely high computational burden related to the feature representation step of the training and the test data evaluations. The standard approach to perform this data reduction is to randomly choose frames from the original data set, which does not necessarily provide a good statistical representation of the original data set. In order to solve this problem a likelihood related re-estimation procedure was applied to the KPCA framework, thus creating the SKPCA, which nevertheless is not realizable for large training databases. The proposed approach consists in clustering the training data and applying to these clusters a SKPCA like data reduction technique generating the reduced data clusters. These reduced data clusters are merged and reduced in a recursive procedure until just one cluster is obtained, making the SKPCA approach realizable for a large amount of training data. The experimental results show the efficiency of SKPCA technique with the proposed approach over the KPCA with the standard sparse solution using randomly chosen frames and the standard feature extraction techniques.

  • Comparison of Deadline-Based Scheduling Algorithms for Periodic Real-Time Tasks on Multiprocessor

    Minkyu PARK  Sangchul HAN  Heeheon KIM  Seongje CHO  Yookun CHO  

     
    LETTER-System Programs

      Vol:
    E88-D No:3
      Page(s):
    658-661

    Multiprocessor architecture becomes common on real-time systems as the workload of real-time systems increases. Recently new deadline-based (EDF-based) multiprocessor scheduling algorithms are devised, and comparative studies on the performance of these algorithms are necessary. In this paper, we compare EDZL, a hybrid of EDF and LLF, with other deadline-based scheduling algorithms such as EDF, EDF-US[m/(2m-1)], and fpEDF. We show EDZL schedules all task sets schedulable by EDF. The experimental results show that the number of preemptions of EDZL is comparable to that of EDF and the schedulable utilization bound of EDZL is higher than those of other algorithms we consider.

  • Parameter Sharing in Mixture of Factor Analyzers for Speaker Identification

    Hiroyoshi YAMAMOTO  Yoshihiko NANKAKU  Chiyomi MIYAJIMA  Keiichi TOKUDA  Tadashi KITAMURA  

     
    PAPER-Feature Extraction and Acoustic Medelings

      Vol:
    E88-D No:3
      Page(s):
    418-424

    This paper investigates the parameter tying structures of a mixture of factor analyzers (MFA) and discriminative training of MFA for speaker identification. The parameters of factor loading matrices or diagonal matrices are shared in different mixtures of MFA. Then, minimum classification error (MCE) training is applied to the MFA parameters to enhance the discrimination ability. The result of a text-independent speaker identification experiment shows that MFA outperforms the conventional Gaussian mixture model (GMM) with diagonal or full covariance matrices and achieves the best performance when sharing the diagonal matrices, resulting in a relative gain of 26% over the GMM with diagonal covariance matrices. The improvement is more significant especially in sparse training data condition. The recognition performance is further improved by MCE training with an additional gain of 3% error reduction.

  • Dialogue Speech Recognition by Combining Hierarchical Topic Classification and Language Model Switching

    Ian R. LANE  Tatsuya KAWAHARA  Tomoko MATSUI  Satoshi NAKAMURA  

     
    PAPER-Spoken Language Systems

      Vol:
    E88-D No:3
      Page(s):
    446-454

    An efficient, scalable speech recognition architecture combining topic detection and topic-dependent language modeling is proposed for multi-domain spoken language systems. In the proposed approach, the inferred topic is automatically detected from the user's utterance, and speech recognition is then performed by applying an appropriate topic-dependent language model. This approach enables users to freely switch between domains while maintaining high recognition accuracy. As topic detection is performed on a single utterance, detection errors may occur and propagate through the system. To improve robustness, a hierarchical back-off mechanism is introduced where detailed topic models are applied when topic detection is confident and wider models that cover multiple topics are applied in cases of uncertainty. The performance of the proposed architecture is evaluated when combined with two topic detection methods: unigram likelihood and SVMs (Support Vector Machines). On the ATR Basic Travel Expression Corpus, both methods provide a significant reduction in WER (9.7% and 10.3%, respectively) compared to a single language model system. Furthermore, recognition accuracy is comparable to performing decoding with all topic-dependent models in parallel, while the required computational cost is much reduced.

  • A VoiceFont Creation Framework for Generating Personalized Voices

    Takashi SAITO  Masaharu SAKAMOTO  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E88-D No:3
      Page(s):
    525-534

    This paper presents a new framework for effectively creating VoiceFonts for speech synthesis. A VoiceFont in this paper represents a voice inventory aimed at generating personalized voices. Creating well-formed voice inventories is a time-consuming and laborious task. This has become a critical issue for speech synthesis systems that make an attempt to synthesize many high quality voice personalities. The framework we propose here aims to drastically reduce the burden with a twofold approach. First, in order to substantially enhance the accuracy and robustness of automatic speech segmentation, we introduce a multi-layered speech segmentation algorithm with a new measure of segmental reliability. Secondly, to minimize the amount of human intervention in the process of VoiceFont creation, we provide easy-to-use functions in a data viewer and compiler to facilitate checking and validation of the automatically extracted data. We conducted experiments to investigate the accuracy of the automatic speech segmentation, and its robustness to speaker and style variations. The results of the experiments on six speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases. In addition, to subjectively evaluate VoiceFonts created by using the framework, we conducted a listening test for speaker recognizability. The results show that the voice personalities of synthesized speech generated by the VoiceFont-based speech synthesizer are fairly close to those of the donor speakers.

  • Speech Recognition Using Finger Tapping Timings

    Hiromitsu BAN  Chiyomi MIYAJIMA  Katsunobu ITOU  Kazuya TAKEDA  Fumitada ITAKURA  

     
    LETTER-Speech and Hearing

      Vol:
    E88-D No:3
      Page(s):
    667-670

    Behavioral synchronization between speech and finger tapping provides a novel approach to improving speech recognition accuracy. We combine a sequence of finger tapping timings recorded alongside an utterance using two distinct methods: in the first method, HMM state transition probabilities at the word boundaries are controlled by the timing of the finger tapping; in the second, the probability (relative frequency) of the finger tapping is used as a 'feature' and combined with MFCC in a HMM recognition system. We evaluate these methods through connected digit recognition under different noise conditions (AURORA-2J). Leveraging the synchrony between speech and finger tapping provides a 46% relative improvement in connected digit recognition experiments.

  • High-Speed Optical Packet Processing Technologies for Optical Packet-Switched Networks

    Hirokazu TAKENOUCHI  Tatsushi NAKAHARA  Kiyoto TAKAHATA  Ryo TAKAHASHI  Hiroyuki SUZUKI  

     
    INVITED PAPER

      Vol:
    E88-C No:3
      Page(s):
    286-294

    Asynchronous optical packet switching (OPS) is a promising solution to support the continuous growth of transmission capacity demand. It has been, however, quite difficult to implement key functions needed at the node of such networks with all-optical approaches. We have proposed a new optoelectronic system composed of a packet-by-packet optical clock-pulse generator (OCG), an all-optical serial-to-parallel converter (SPC), a photonic parallel-to-serial converter (PSC), and CMOS circuitry. The system makes it possible to carry out various required functions such as buffering (random access memory), optical packet compression/decompression, and optical label swapping for high-speed asynchronous optical packets.

  • Spectrum Tuning of Fiber Bragg Gratings by Strain Distributions and Its Applications

    Chee Seong GOH  Sze Yun SET  Kazuro KIKUCHI  

     
    PAPER

      Vol:
    E88-C No:3
      Page(s):
    363-371

    We report tunable optical devices based on fiber Bragg gratings (FBGs), whose filtering characteristics are controlled by strain distributions. These devices include a widely wavelength tunable filter, a tunable group-velocity dispersion (GVD) compensator, a tunable dispersion slope (DS) compensator, and a variable-bandwidth optical add/drop multiplexer (OADM), which will play important roles for next-generation reconfigurable optical networks.

  • Optimal Quantization Noise Allocation and Coding Gain in Transform Coding with Two-Dimensional Morphological Haar Wavelet

    Yasunari YOKOTA  Xiaoyong TAN  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E88-D No:3
      Page(s):
    636-645

    This paper analytically formulates both the optimal quantization noise allocation ratio and the coding gain of the two-dimensional morphological Haar wavelet transform. The two-dimensional morphological Haar wavelet transform has been proposed as a nonlinear wavelet transform. It has been anticipated for application to nonlinear transform coding. To utilize a transformation to transform coding, both the optimal quantization noise allocation ratio and the coding gain of the transformation should be derived beforehand regardless of whether the transformation is linear or nonlinear. The derivation is crucial for progress of nonlinear transform image coding with nonlinear wavelet because the two-dimensional morphological Haar wavelet is the most basic nonlinear wavelet. We derive both the optimal quantization noise allocation ratio and the coding gain of the two-dimensional morphological Haar wavelet transform by introducing appropriate approximations to handle the cumbersome nonlinear operator included in the transformation. Numerical experiments confirmed the validity of formulations.

  • Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition

    Yohei ITAYA  Heiga ZEN  Yoshihiko NANKAKU  Chiyomi MIYAJIMA  Keiichi TOKUDA  Tadashi KITAMURA  

     
    PAPER-Feature Extraction and Acoustic Medelings

      Vol:
    E88-D No:3
      Page(s):
    425-431

    This paper investigates the effectiveness of the DAEM (Deterministic Annealing EM) algorithm in acoustic modeling for speaker and speech recognition. Although the EM algorithm has been widely used to approximate the ML estimates, it has the problem of initialization dependence. To relax this problem, the DAEM algorithm has been proposed and confirmed the effectiveness in artificial small tasks. In this paper, we applied the DAEM algorithm to practical speech recognition tasks: speaker recognition based on GMMs and continuous speech recognition based on HMMs. Experimental results show that the DAEM algorithm can improve the recognition performance as compared to the standard EM algorithm with conventional initialization algorithms, especially in the flat start training for continuous speech recognition.

  • Optimum Solution of On-Chip A/D Converter for Cooled Type Infrared Focal Plane Array

    Sang Gu KANG  Doo Hyung WOO  Hee Chul LEE  

     
    PAPER-Electronic Circuits

      Vol:
    E88-C No:3
      Page(s):
    413-419

    Transferring the image information in analog form between the focal plane array (FPA) and the external electronics causes the disturbance of the outside noise. On-chip analog-to-digital (A/D) converter into the readout integrated circuit (ROIC) can eliminate the possibilities of the cross-talk of noise. Also, the information can be transported more efficiently in power in the digital domain compared to the analog domain. In designing on-chip A/D converter for cooled type high density infrared detector array, the most stringent requirements are power dissipation, number of bits, die area and throughput. In this study, pipelined type A/D converter was adopted because it has high operation speed characteristics with medium power consumption. Capacitor averaging technique and digital error correction for high resolution was used to eliminate the error which is brought out from the device mismatch. The readout circuit was fabricated using 0.6 µm CMOS process for 128 128 mid-wavelength infrared (MWIR) HgCdTe detector array. Fabricated circuit used direct injection type for input stage, and then S/N ratio could be maximized with increasing the integration capacitor. The measured performance of the 14 b A/D converter exhibited 0.2 LSB differential non-linearity (DNL) and 4 LSB integral non-linearity (INL). A/D converter had a 1 MHz operation speed with 75 mW power dissipation at 5 V. It took the die area of 5.6 mm2. It showed the good performance that can apply for cooled type high density infrared detector array.

  • Assessing the Quality of Fuzzy Partitions Using Relative Intersection

    Dae-Won KIM  Young-il KIM  Doheon LEE  Kwang Hyung LEE  

     
    PAPER-Computation and Computational Models

      Vol:
    E88-D No:3
      Page(s):
    594-602

    In this paper, conventional validity indexes are reviewed and the shortcomings of the fuzzy cluster validation index based on inter-cluster proximity are examined. Based on these considerations, a new cluster validity index is proposed for fuzzy partitions obtained from the fuzzy c-means algorithm. The proposed validity index is defined as the average value of the relative intersections of all possible pairs of fuzzy clusters in the system. It computes the overlap between two fuzzy clusters by considering the intersection of each data point in the overlap. The optimal number of clusters is obtained by minimizing the validity index with respect to c. Experiments in which the proposed validity index and several conventional validity indexes were applied to well known data sets highlight the superior qualities of the proposed index.

  • Enhanced Flooding Algorithms Introducing the Concept of Biotic Growth

    Hideki TODE  Makoto WADA  Kazuhiko KINOSHITA  Toshihiro MASAKI  Koso MURAKAMI  

     
    PAPER-Software Platform Technologies

      Vol:
    E88-B No:3
      Page(s):
    903-910

    A flooding algorithm is an indispensable and fundamental network control mechanism for achieving some tasks, such notifying all nodes of some information, transferring data with high reliability, getting some information from all nodes, or to reserve a route by flooding the messages in the network. In particular, the flooding algorithm is greatly effective in the heterogeneous and dynamic network environment such as so-called ubiquitous networks, whose topology is indefinite or changes dynamically and whose nodal function may be simple and less intelligent. Actually, it is applied to grasp the network topology in a sensor network or an ad-hoc network, or to retrieve content information by mobile agent systems. A flooding algorithm has the advantages of robustness and optimality by parallel processing of messages. However, the flooding mechanism has a fundamental disadvantages: it causes the message congestion in the network, and eventually increases the processing time until the flooding control is finished. In this paper, we propose and evaluate methods for producing a more efficient flooding algorithm by adopting the growth processes of primitive creatures, such as molds or microbes.

  • Fundamental Frequency Modeling for Speech Synthesis Based on a Statistical Learning Technique

    Shinsuke SAKAI  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E88-D No:3
      Page(s):
    489-495

    This paper proposes a novel multi-layer approach to fundamental frequency modeling for concatenative speech synthesis based on a statistical learning technique called additive models. We define an additive F0 contour model consisting of long-term, intonational phrase-level, component and short-term, accentual phrase-level, component, along with a least-squares error criterion that includes a regularization term. A backfitting algorithm, that is derived from this error criterion, estimates both components simultaneously by iteratively applying cubic spline smoothers. When this method is applied to a 7,000 utterance Japanese speech corpus, it achieves F0 RMS errors of 28.9 and 29.8 Hz on the training and test data, respectively, with corresponding correlation coefficients of 0.806 and 0.777. The automatically determined intonational and accentual phrase components turn out to behave smoothly, systematically, and intuitively under a variety of prosodic conditions.

  • A Compact Normal Walk Modeling in PCS Networks with Mesh Cells

    Chiu-Ching TUAN  Chen-Chau YANG  

     
    PAPER-Mobile Information Network and Personal Communications

      Vol:
    E88-A No:3
      Page(s):
    761-769

    Model-based movement patterns play a crucial role in evaluating the performance of mobility-dependent Personal Communication Service (PCS) strategies. This study proposes a new normal walk model to represent more closely the daily movement patterns of a mobile station (MS) in PCS networks than a conventional random walk model. A drift angle θ in this model is applied to determine the relative direction in which an MS handoffs in the next one step, based on the concepts that most real trips follow the shortest path and the directions of daily motion are mostly symmetric. Hence, θ is assumed to approach the normal distribution with the parameters: µ is set to 0and σ is in the range of 5to 90. Varying σ thus redistributes the probabilities associated with θ to make the normal mobility patterns more realistic than the random ones. Experimental results verify that the proposed normal walk is correct and valid for modeling an n-layer mesh cluster of PCS networks. Moreover, when σ = 79.5, a normal walk can almost represent, and even replace, a random walk.

12541-12560hit(20498hit)