The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)

1121-1140hit(2504hit)

  • The Optimal Architecture Design of Two-Dimension Matrix Multiplication Jumping Systolic Array

    Yun YANG  Shinji KIMURA  

     
    PAPER

      Vol:
    E91-A No:4
      Page(s):
    1101-1111

    This paper proposes an efficient systolic array construction method for optimal planar systolic design of the matrix multiplication. By connection network adjustment among systolic array processing element (PE), the input/output data are jumping in the systolic array for multiplication operation requirements. Various 2-D systolic array topologies, such as square topology and hexagonal topology, have been studied to construct appropriate systolic array configuration and realize high performance matrix multiplication. Based on traditional Kung-Leiserson systolic architecture, the proposed "Jumping Systolic Array (JSA)" algorithm can increase the matrix multiplication speed with less processing elements and few data registers attachment. New systolic arrays, such as square jumping array, redundant dummy latency jumping hexagonal array, and compact parallel flow jumping hexagonal array, are also proposed to improve the concurrent system operation efficiency. Experimental results prove that the JSA algorithm can realize fully concurrent operation and dominate other systolic architectures in the specific systolic array system characteristics, such as band width, matrix complexity, or expansion capability.

  • Hardware Neural Network for a Visual Inspection System

    Seungwoo CHUN  Yoshihiro HAYAKAWA  Koji NAKAJIMA  

     
    PAPER

      Vol:
    E91-A No:4
      Page(s):
    935-942

    The visual inspection of defects in products is heavily dependent on human experience and instinct. In this situation, it is difficult to reduce the production costs and to shorten the inspection time and hence the total process time. Consequently people involved in this area desire an automatic inspection system. In this paper, we propose a hardware neural network, which is expected to provide high-speed operation for automatic inspection of products. Since neural networks can learn, this is a suitable method for self-adjustment of criteria for classification. To achieve high-speed operation, we use parallel and pipelining techniques. Furthermore, we use a piecewise linear function instead of a conventional activation function in order to save hardware resources. Consequently, our proposed hardware neural network achieved 6GCPS and 2GCUPS, which in our test sample proved to be sufficiently fast.

  • Fundamental Consideration on Distance Estimation Using Acoustical Standing Wave

    Noboru NAKASAKO  Tetsuji UEBO  Atsushi MORI  Norimitsu OHMATA  

     
    LETTER-Engineering Acoustics

      Vol:
    E91-A No:4
      Page(s):
    1218-1221

    In the research field of microwave radar, a range finding method based on standing wave is known to be effective for measuring short distances. In this paper, we focus our attention on audible sound and fundamentally examine the distance estimation method in which acoustical standing wave is used.

  • A High-Speed Design of Montgomery Multiplier

    Yibo FAN  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER

      Vol:
    E91-A No:4
      Page(s):
    971-977

    With the increase of key length used in public cryptographic algorithms such as RSA and ECC, the speed of Montgomery multiplication becomes a bottleneck. This paper proposes a high speed design of Montgomery multiplier. Firstly, a modified scalable high-radix Montgomery algorithm is proposed to reduce critical path. Secondly, a high-radix clock-saving dataflow is proposed to support high-radix operation and one clock cycle delay in dataflow. Finally, a hardware-reused architecture is proposed to reduce the hardware cost and a parallel radix-16 design of data path is proposed to accelerate the speed. By using HHNEC 0.25 µm standard cell library, the implementation results show that the total cost of Montgomery multiplier is 130 KGates, the clock frequency is 180 MHz and the throughput of 1024-bit RSA encryption is 352 kbps. This design is suitable to be used in high speed RSA or ECC encryption/decryption. As a scalable design, it supports any key-length encryption/decryption up to the size of on-chip memory.

  • Optimum Pulse Shape Design for UWB Systems with Timing Jitter

    Wilaiporn LEE  Suwich KUNARUTTANAPRUK  Somchai JITAPUNKUL  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E91-B No:3
      Page(s):
    772-783

    This paper proposes a novel technique in designing the optimum pulse shape for ultra wideband (UWB) systems under the presence of timing jitter. In the UWB systems, pulse transmission power and timing jitter tolerance are crucial keys to communications success. While there is a strong desire to maximize both of them, one must be traded off against the other. In the literature, much effort has been devoted to separately optimize each of them without considering the drawback to the other. In this paper, both factors are jointly considered. The proposed pulse attains the adequate power to survive the noise floor and at the same time provides good resistance to the timing jitter. The proposed pulse also meets the power spectral mask restriction as prescribed by the Federal Communications Commission (FCC) for indoor UWB systems. Simulation results confirm the advantages of the proposed pulse over other previously known UWB pulses. Parameters of the proposed optimization algorithm are also investigated in this paper.

  • Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method

    Goshu NAGINO  Makoto SHOZAKAI  Tomoki TODA  Hiroshi SARUWATARI  Kiyohiro SHIKANO  

     
    PAPER-Corpus

      Vol:
    E91-D No:3
      Page(s):
    607-614

    This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.

  • Feature Compensation Employing Multiple Environmental Models for Robust In-Vehicle Speech Recognition

    Wooil KIM  John H.L. HANSEN  

     
    PAPER-Noisy Speech Recognition

      Vol:
    E91-D No:3
      Page(s):
    430-438

    An effective feature compensation method is developed for reliable speech recognition in real-life in-vehicle environments. The CU-Move corpus, used for evaluation, contains a range of speech and noise signals collected for a number of speakers under actual driving conditions. PCGMM-based feature compensation, considered in this paper, utilizes parallel model combination to generate noise-corrupted speech model by combining clean speech and the noise model. In order to address unknown time-varying background noise, an interpolation method of multiple environmental models is employed. To alleviate computational expenses due to multiple models, an Environment Transition Model is employed, which is motivated from Noise Language Model used in Environmental Sniffing. An environment dependent scheme of mixture sharing technique is proposed and shown to be more effective in reducing the computational complexity. A smaller environmental model set is determined by the environment transition model for mixture sharing. The proposed scheme is evaluated on the connected single digits portion of the CU-Move database using the Aurora2 evaluation toolkit. Experimental results indicate that our feature compensation method is effective for improving speech recognition in real-life in-vehicle conditions. A reduction of 73.10% of the computational requirements was obtained by employing the environment dependent mixture sharing scheme with only a slight change in recognition performance. This demonstrates that the proposed method is effective in maintaining the distinctive characteristics among the different environmental models, even when selecting a large number of Gaussian components for mixture sharing.

  • Noisy Speech Recognition Based on Integration/Selection of Multiple Noise Suppression Methods Using Noise GMMs

    Norihide KITAOKA  Souta HAMAGUCHI  Seiichi NAKAGAWA  

     
    PAPER-Noisy Speech Recognition

      Vol:
    E91-D No:3
      Page(s):
    411-421

    To achieve high recognition performance for a wide variety of noise and for a wide range of signal-to-noise ratio, this paper presents methods for integration of four noise reduction algorithms: spectral subtraction with smoothing of time direction, temporal domain SVD-based speech enhancement, GMM-based speech estimation and KLT-based comb-filtering. In this paper, we proposed two types of combination methods of noise suppression algorithms: selection of front-end processor and combination of results from multiple recognition processes. Recognition results on the CENSREC-1 task showed the effectiveness of our proposed methods.

  • Local Peak Enhancement for In-Car Speech Recognition in Noisy Environment

    Osamu ICHIKAWA  Takashi FUKUDA  Masafumi NISHIMURA  

     
    LETTER

      Vol:
    E91-D No:3
      Page(s):
    635-639

    The accuracy of automatic speech recognition in a car is significantly degraded in a very low SNR (Signal to Noise Ratio) situation such as "Fan high" or "Window open". In such cases, speech signals are often buried in broadband noise. Although several existing noise reduction algorithms are known to improve the accuracy, other approaches that can work with them are still required for further improvement. One of the candidates is enhancement of the harmonic structures in human voices. However, most conventional approaches are based on comb filtering, and it is difficult to use them in practical situations, because their assumptions for F0 detection and for voiced/unvoiced detection are not accurate enough in realistic noisy environments. In this paper, we propose a new approach that does not rely on such detection. An observed power spectrum is directly converted into a filter for speech enhancement, by retaining only the local peaks considered to be harmonic structures in the human voice. In our experiments, this approach reduced the word error rate by 17% in realistic automobile environments. Also, it showed further improvement when used with existing noise reduction methods.

  • Canonicalization of Feature Parameters for Robust Speech Recognition Based on Distinctive Phonetic Feature (DPF) Vectors

    Mohammad NURUL HUDA  Muhammad GHULAM  Takashi FUKUDA  Kouichi KATSURADA  Tsuneo NITTA  

     
    PAPER-Feature Extraction

      Vol:
    E91-D No:3
      Page(s):
    488-498

    This paper describes a robust automatic speech recognition (ASR) system with less computation. Acoustic models of a hidden Markov model (HMM)-based classifier include various types of hidden factors such as speaker-specific characteristics, coarticulation, and an acoustic environment, etc. If there exists a canonicalization process that can recover the degraded margin of acoustic likelihoods between correct phonemes and other ones caused by hidden factors, the robustness of ASR systems can be improved. In this paper, we introduce a canonicalization method that is composed of multiple distinctive phonetic feature (DPF) extractors corresponding to each hidden factor canonicalization, and a DPF selector which selects an optimum DPF vector as an input of the HMM-based classifier. The proposed method resolves gender factors and speaker variability, and eliminates noise factors by applying the canonicalzation based on the DPF extractors and two-stage Wiener filtering. In the experiment on AURORA-2J, the proposed method provides higher word accuracy under clean training and significant improvement of word accuracy in low signal-to-noise ratio (SNR) under multi-condition training compared to a standard ASR system with mel frequency ceptral coeffient (MFCC) parameters. Moreover, the proposed method requires a reduced, two-fifth, Gaussian mixture components and less memory to achieve accurate ASR.

  • Improved Noise Reduction with Packet Loss Recovery Based on Post-Filtering over IP Networks

    Jinsul KIM  Hyunwoo LEE  Won RYU  Seungho HAN  Minsoo HAHN  

     
    LETTER-Multimedia Systems for Communications

      Vol:
    E91-B No:3
      Page(s):
    975-979

    This letter mainly focuses on improving current noise reduction methods to solve the critical speech distortion problems with robust noise reduction in noisy speech signals for speech enhancement over IP networks. For robust noise reduction with packet loss recovery, we propose a novel optimized Wiener filtering technique that uses the estimated SNR (Signal-to-Noise Ratio) with packet loss recovery method which is applied as post-filtering over IP-networks. Simulation results demonstrate that the proposed scheme provides better reduction and recovery rates with considering packet loss and SNR environment than other methods.

  • Experimental Evaluation of the Super Sweep Spectrum Analyzer

    Masao NAGANO  Toshio ONODERA  Mototaka SONE  

     
    PAPER-Digital Signal Processing

      Vol:
    E91-A No:3
      Page(s):
    782-790

    A sweep spectrum analyzer has been improved over the years, but the fundamental method has not been changed before the 'Super Sweep' method appeared. The 'Super Sweep' method has been expected to break the limitation of the conventional sweep spectrum analyzer, a limit of the maximum sweep rate which is in inverse proportion to the square of the frequency resolution. The superior performance of the 'Super Sweep' method, however, has not been experimentally proved yet. This paper gives the experimental evaluation on the 'Super Sweep' spectrum analyzer, of which theoretical concepts have already been presented by the authors of this paper. Before giving the experimental results, we give complete analysis for a sweep spectrum analyzer and express the principle of the super-sweep operation with a complete set of equations. We developed an experimental system whose components operated in an optimum condition as the spectrum analyzer. Then we investigated its properties, a peak level reduction and broadening of the frequency resolution of the measured spectrum, by changing the sweep rate. We also confirmed that the experimental system satisfactorily detected the spectrum at least 30 times faster than the conventional method and the sweep rate was in proportion to the bandwidth of the base band signal to be analyzed. We proved that the 'Super Sweep' method broke the restriction of the sweep rate put on a conventional sweep spectrum analyzer.

  • Bilingual Cluster Based Models for Statistical Machine Translation

    Hirofumi YAMAMOTO  Eiichiro SUMITA  

     
    PAPER-Applications

      Vol:
    E91-D No:3
      Page(s):
    588-597

    We propose a domain specific model for statistical machine translation. It is well-known that domain specific language models perform well in automatic speech recognition. We show that domain specific language and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparseness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In order to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynamically. For these cases, not only the translation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statistical machine translation. In the proposed method, a bilingual training corpus, is automatically clustered into sub-corpora. Each sub-corpus is deemed to be a domain. The domain of a source sentence is predicted by using its similarity to the sub-corpora. The predicted domain (sub-corpus) specific language and translation models are then used for the translation decoding. This approach gave an improvement of 2.7 in BLEU score on the IWSLT05 Japanese to English evaluation corpus (improving the score from 52.4 to 55.1). This is a substantial gain and indicates the validity of the proposed bilingual cluster based models.

  • Language Modeling Using PLSA-Based Topic HMM

    Atsushi SAKO  Tetsuya TAKIGUCHI  Yasuo ARIKI  

     
    PAPER-Language Modeling

      Vol:
    E91-D No:3
      Page(s):
    522-528

    In this paper, we propose a PLSA-based language model for sports-related live speech. This model is implemented using a unigram rescaling technique that combines a topic model and an n-gram. In the conventional method, unigram rescaling is performed with a topic distribution estimated from a recognized transcription history. This method can improve the performance, but it cannot express topic transition. By incorporating the concept of topic transition, it is expected that the recognition performance will be improved. Thus, the proposed method employs a "Topic HMM" instead of a history to estimate the topic distribution. The Topic HMM is an Ergodic HMM that expresses typical topic distributions as well as topic transition probabilities. Word accuracy results from our experiments confirmed the superiority of the proposed method over a trigram and a PLSA-based conventional method that uses a recognized history.

  • Design for Testability Method to Avoid Error Masking of Software-Based Self-Test for Processors

    Masato NAKAZATO  Michiko INOUE  Satoshi OHTAKE  Hideo FUJIWARA  

     
    PAPER-High-Level Testing

      Vol:
    E91-D No:3
      Page(s):
    763-770

    In this paper, we propose a design for testability method for test programs of software-based self-test using test program templates. Software-based self-test using templates has a problem of error masking where some faults detected in a test generation for a module are not detected by the test program synthesized from the test. The proposed method achieves 100% template level fault efficiency, that is, it completely avoids the error masking. Moreover, the proposed method has no performance degradation (adds only observation points) and enables at-speed testing.

  • Race-Free Mixed Serial-Parallel Comparison for Low Power Content Addressable Memory

    Seong-Ook JUNG  Sei-Seung YOON  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E91-A No:3
      Page(s):
    895-898

    This letter presents a race-free mixed serial-parallel comparison (RFMSPC) scheme which uses both serial and parallel CAMs in a match line. A self-reset search line scheme for the serial CAM is proposed to avoid the timing race problem and additional timing penalties. Various 32 entry CAMs are designed using 90 nm 1.2 V CMOS process to verify the proposed RFMSPC scheme. It shows that the RFMSPC saves power consumption by 40%, 53% and 63% at the cost of a 4%, 6% and 16% increase in search time according to 1, 2, and 4 serial CAM bits in a match line.

  • Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

    Longbiao WANG  Seiichi NAKAGAWA  Norihide KITAOKA  

     
    PAPER-ASR under Reverberant Conditions

      Vol:
    E91-D No:3
      Page(s):
    457-466

    In a distant-talking environment, the length of channel impulse response is longer than the short-term spectral analysis window. Conventional short-term spectrum based Cepstral Mean Normalization (CMN) is therefore, not effective under these conditions. In this paper, we propose a robust speech recognition method by combining a short-term spectrum based CMN with a long-term one. We assume that a static speech segment (such as a vowel, for example) affected by reverberation, can be modeled by a long-term cepstral analysis. Thus, the effect of long reverberation on a static speech segment may be compensated by the long-term spectrum based CMN. The cepstral distance of neighboring frames is used to discriminate the static speech segment (long-term spectrum) and the non-static speech segment (short-term spectrum). The cepstra of the static and non-static speech segments are normalized by the corresponding cepstral means. In a previous study, we proposed an environmentally robust speech recognition method based on Position-Dependent CMN (PDCMN) to compensate for channel distortion depending on speaker position, and which is more efficient than conventional CMN. In this paper, the concept of combining short-term and long-term spectrum based CMN is extended to PDCMN. We call this Variable Term spectrum based PDCMN (VT-PDCMN). Since PDCMN/VT-PDCMN cannot normalize speaker variations because a position-dependent cepstral mean contains the average speaker characteristics over all speakers, we also combine PDCMN/VT-PDCMN with conventional CMN in this study. We conducted the experiments based on our proposed method using limited vocabulary (100 words) distant-talking isolated word recognition in a real environment. The proposed method achieved a relative error reduction rate of 60.9% over the conventional short-term spectrum based CMN and 30.6% over the short-term spectrum based PDCMN.

  • Recognizing Reverberant Speech Based on Amplitude and Frequency Modulation

    Yotaro KUBO  Shigeki OKAWA  Akira KUREMATSU  Katsuhiko SHIRAI  

     
    PAPER-ASR under Reverberant Conditions

      Vol:
    E91-D No:3
      Page(s):
    448-456

    We have attempted to recognize reverberant speech using a novel speech recognition system that depends on not only the spectral envelope and amplitude modulation but also frequency modulation. Most of the features used by modern speech recognition systems, such as MFCC, PLP, and TRAPS, are derived from the energy envelopes of narrowband signals by discarding the information in the carrier signals. However, some experiments show that apart from the spectral/time envelope and its modulation, the information on the zero-crossing points of the carrier signals also plays a significant role in human speech recognition. In realistic environments, a feature that depends on the limited properties of the signal may easily be corrupted. In order to utilize an automatic speech recognizer in an unknown environment, using the information obtained from other signal properties and combining them is important to minimize the effects of the environment. In this paper, we propose a method to analyze carrier signals that are discarded in most of the speech recognition systems. Our system consists of two nonlinear discriminant analyzers that use multilayer perceptrons. One of the nonlinear discriminant analyzers is HATS, which can capture the amplitude modulation of narrowband signals efficiently. The other nonlinear discriminant analyzer is a pseudo-instantaneous frequency analyzer proposed in this paper. This analyzer can capture the frequency modulation of narrowband signals efficiently. The combination of these two analyzers is performed by the method based on the entropy of the feature introduced by Okawa et al. In this paper, in Sect. 2, we first introduce pseudo-instantaneous frequencies to capture a property of the carrier signal. The previous AM analysis method are described in Sect. 3. The proposed system is described in Sect. 4. The experimental setup is presented in Sect. 5, and the results are discussed in Sect. 6. We evaluate the performance of the proposed method by continuous digit recognition of reverberant speech. The proposed system exhibits considerable improvement with regard to the MFCC feature extraction system.

  • Linear Discriminant Analysis Using a Generalized Mean of Class Covariances and Its Application to Speech Recognition

    Makoto SAKAI  Norihide KITAOKA  Seiichi NAKAGAWA  

     
    PAPER-Feature Extraction

      Vol:
    E91-D No:3
      Page(s):
    478-487

    To precisely model the time dependency of features is one of the important issues for speech recognition. Segmental unit input HMM with a dimensionality reduction method has been widely used to address this issue. Linear discriminant analysis (LDA) and heteroscedastic extensions, e.g., heteroscedastic linear discriminant analysis (HLDA) or heteroscedastic discriminant analysis (HDA), are popular approaches to reduce dimensionality. However, it is difficult to find one particular criterion suitable for any kind of data set in carrying out dimensionality reduction while preserving discriminative information. In this paper, we propose a new framework which we call power linear discriminant analysis (PLDA). PLDA can be used to describe various criteria including LDA, HLDA, and HDA with one control parameter. In addition, we provide an efficient selection method using a control parameter without training HMMs nor testing recognition performance on a development data set. Experimental results show that the PLDA is more effective than conventional methods for various data sets.

  • Noise Suppression Based on Multi-Model Compositions Using Multi-Pass Search with Multi-Label N-gram Models

    Takatoshi JITSUHIRO  Tomoji TORIYAMA  Kiyoshi KOGURE  

     
    PAPER-Noisy Speech Recognition

      Vol:
    E91-D No:3
      Page(s):
    402-410

    We propose a noise suppression method based on multi-model compositions and multi-pass search. In real environments, input speech for speech recognition includes many kinds of noise signals. To obtain good recognized candidates, suppressing many kinds of noise signals at once and finding target speech is important. Before noise suppression, to find speech and noise label sequences, we introduce multi-pass search with acoustic models including many kinds of noise models and their compositions, their n-gram models, and their lexicon. Noise suppression is frame-synchronously performed using the multiple models selected by recognized label sequences with time alignments. We evaluated this method using the E-Nightingale task, which contains voice memoranda spoken by nurses during actual work at hospitals. The proposed method obtained higher performance than the conventional method.

1121-1140hit(2504hit)