IEICE global.ieice.org Site

Keyword Search Result

[Keyword] SPE(2504hit)

1121-1140hit(2504hit)

The Optimal Architecture Design of Two-Dimension Matrix Multiplication Jumping Systolic Array
Yun YANG Shinji KIMURA

PAPER

Vol:
E91-A No:4
Page(s):
1101-1111
This paper proposes an efficient systolic array construction method for optimal planar systolic design of the matrix multiplication. By connection network adjustment among systolic array processing element (PE), the input/output data are jumping in the systolic array for multiplication operation requirements. Various 2-D systolic array topologies, such as square topology and hexagonal topology, have been studied to construct appropriate systolic array configuration and realize high performance matrix multiplication. Based on traditional Kung-Leiserson systolic architecture, the proposed "Jumping Systolic Array (JSA)" algorithm can increase the matrix multiplication speed with less processing elements and few data registers attachment. New systolic arrays, such as square jumping array, redundant dummy latency jumping hexagonal array, and compact parallel flow jumping hexagonal array, are also proposed to improve the concurrent system operation efficiency. Experimental results prove that the JSA algorithm can realize fully concurrent operation and dominate other systolic architectures in the specific systolic array system characteristics, such as band width, matrix complexity, or expansion capability.
Hardware Neural Network for a Visual Inspection System
Seungwoo CHUN Yoshihiro HAYAKAWA Koji NAKAJIMA

PAPER

Vol:
E91-A No:4
Page(s):
935-942
The visual inspection of defects in products is heavily dependent on human experience and instinct. In this situation, it is difficult to reduce the production costs and to shorten the inspection time and hence the total process time. Consequently people involved in this area desire an automatic inspection system. In this paper, we propose a hardware neural network, which is expected to provide high-speed operation for automatic inspection of products. Since neural networks can learn, this is a suitable method for self-adjustment of criteria for classification. To achieve high-speed operation, we use parallel and pipelining techniques. Furthermore, we use a piecewise linear function instead of a conventional activation function in order to save hardware resources. Consequently, our proposed hardware neural network achieved 6GCPS and 2GCUPS, which in our test sample proved to be sufficiently fast.
Fundamental Consideration on Distance Estimation Using Acoustical Standing Wave
Noboru NAKASAKO Tetsuji UEBO Atsushi MORI Norimitsu OHMATA

LETTER-Engineering Acoustics

Vol:
E91-A No:4
Page(s):
1218-1221
In the research field of microwave radar, a range finding method based on standing wave is known to be effective for measuring short distances. In this paper, we focus our attention on audible sound and fundamentally examine the distance estimation method in which acoustical standing wave is used.
A High-Speed Design of Montgomery Multiplier
Yibo FAN Takeshi IKENAGA Satoshi GOTO

PAPER

Vol:
E91-A No:4
Page(s):
971-977
With the increase of key length used in public cryptographic algorithms such as RSA and ECC, the speed of Montgomery multiplication becomes a bottleneck. This paper proposes a high speed design of Montgomery multiplier. Firstly, a modified scalable high-radix Montgomery algorithm is proposed to reduce critical path. Secondly, a high-radix clock-saving dataflow is proposed to support high-radix operation and one clock cycle delay in dataflow. Finally, a hardware-reused architecture is proposed to reduce the hardware cost and a parallel radix-16 design of data path is proposed to accelerate the speed. By using HHNEC 0.25 µm standard cell library, the implementation results show that the total cost of Montgomery multiplier is 130 KGates, the clock frequency is 180 MHz and the throughput of 1024-bit RSA encryption is 352 kbps. This design is suitable to be used in high speed RSA or ECC encryption/decryption. As a scalable design, it supports any key-length encryption/decryption up to the size of on-chip memory.
Optimum Pulse Shape Design for UWB Systems with Timing Jitter
Wilaiporn LEE Suwich KUNARUTTANAPRUK Somchai JITAPUNKUL

PAPER-Wireless Communication Technologies

Vol:
E91-B No:3
Page(s):
772-783
This paper proposes a novel technique in designing the optimum pulse shape for ultra wideband (UWB) systems under the presence of timing jitter. In the UWB systems, pulse transmission power and timing jitter tolerance are crucial keys to communications success. While there is a strong desire to maximize both of them, one must be traded off against the other. In the literature, much effort has been devoted to separately optimize each of them without considering the drawback to the other. In this paper, both factors are jointly considered. The proposed pulse attains the adequate power to survive the noise floor and at the same time provides good resistance to the timing jitter. The proposed pulse also meets the power spectral mask restriction as prescribed by the Federal Communications Commission (FCC) for indoor UWB systems. Simulation results confirm the advantages of the proposed pulse over other previously known UWB pulses. Parameters of the proposed optimization algorithm are also investigated in this paper.
Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method
Goshu NAGINO Makoto SHOZAKAI Tomoki TODA Hiroshi SARUWATARI Kiyohiro SHIKANO

PAPER-Corpus

Vol:
E91-D No:3
Page(s):
607-614
This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.
Feature Compensation Employing Multiple Environmental Models for Robust In-Vehicle Speech Recognition
Wooil KIM John H.L. HANSEN

PAPER-Noisy Speech Recognition

Vol:
E91-D No:3
Page(s):
430-438
An effective feature compensation method is developed for reliable speech recognition in real-life in-vehicle environments. The CU-Move corpus, used for evaluation, contains a range of speech and noise signals collected for a number of speakers under actual driving conditions. PCGMM-based feature compensation, considered in this paper, utilizes parallel model combination to generate noise-corrupted speech model by combining clean speech and the noise model. In order to address unknown time-varying background noise, an interpolation method of multiple environmental models is employed. To alleviate computational expenses due to multiple models, an Environment Transition Model is employed, which is motivated from Noise Language Model used in Environmental Sniffing. An environment dependent scheme of mixture sharing technique is proposed and shown to be more effective in reducing the computational complexity. A smaller environmental model set is determined by the environment transition model for mixture sharing. The proposed scheme is evaluated on the connected single digits portion of the CU-Move database using the Aurora2 evaluation toolkit. Experimental results indicate that our feature compensation method is effective for improving speech recognition in real-life in-vehicle conditions. A reduction of 73.10% of the computational requirements was obtained by employing the environment dependent mixture sharing scheme with only a slight change in recognition performance. This demonstrates that the proposed method is effective in maintaining the distinctive characteristics among the different environmental models, even when selecting a large number of Gaussian components for mixture sharing.
Noisy Speech Recognition Based on Integration/Selection of Multiple Noise Suppression Methods Using Noise GMMs
Norihide KITAOKA Souta HAMAGUCHI Seiichi NAKAGAWA

PAPER-Noisy Speech Recognition

Vol:
E91-D No:3
Page(s):
411-421
To achieve high recognition performance for a wide variety of noise and for a wide range of signal-to-noise ratio, this paper presents methods for integration of four noise reduction algorithms: spectral subtraction with smoothing of time direction, temporal domain SVD-based speech enhancement, GMM-based speech estimation and KLT-based comb-filtering. In this paper, we proposed two types of combination methods of noise suppression algorithms: selection of front-end processor and combination of results from multiple recognition processes. Recognition results on the CENSREC-1 task showed the effectiveness of our proposed methods.
Local Peak Enhancement for In-Car Speech Recognition in Noisy Environment
Osamu ICHIKAWA Takashi FUKUDA Masafumi NISHIMURA

LETTER

Vol:
E91-D No:3
Page(s):
635-639
The accuracy of automatic speech recognition in a car is significantly degraded in a very low SNR (Signal to Noise Ratio) situation such as "Fan high" or "Window open". In such cases, speech signals are often buried in broadband noise. Although several existing noise reduction algorithms are known to improve the accuracy, other approaches that can work with them are still required for further improvement. One of the candidates is enhancement of the harmonic structures in human voices. However, most conventional approaches are based on comb filtering, and it is difficult to use them in practical situations, because their assumptions for F0 detection and for voiced/unvoiced detection are not accurate enough in realistic noisy environments. In this paper, we propose a new approach that does not rely on such detection. An observed power spectrum is directly converted into a filter for speech enhancement, by retaining only the local peaks considered to be harmonic structures in the human voice. In our experiments, this approach reduced the word error rate by 17% in realistic automobile environments. Also, it showed further improvement when used with existing noise reduction methods.
Canonicalization of Feature Parameters for Robust Speech Recognition Based on Distinctive Phonetic Feature (DPF) Vectors
Mohammad NURUL HUDA Muhammad GHULAM Takashi FUKUDA Kouichi KATSURADA Tsuneo NITTA

PAPER-Feature Extraction

Vol:
E91-D No:3
Page(s):
488-498
This paper describes a robust automatic speech recognition (ASR) system with less computation. Acoustic models of a hidden Markov model (HMM)-based classifier include various types of hidden factors such as speaker-specific characteristics, coarticulation, and an acoustic environment, etc. If there exists a canonicalization process that can recover the degraded margin of acoustic likelihoods between correct phonemes and other ones caused by hidden factors, the robustness of ASR systems can be improved. In this paper, we introduce a canonicalization method that is composed of multiple distinctive phonetic feature (DPF) extractors corresponding to each hidden factor canonicalization, and a DPF selector which selects an optimum DPF vector as an input of the HMM-based classifier. The proposed method resolves gender factors and speaker variability, and eliminates noise factors by applying the canonicalzation based on the DPF extractors and two-stage Wiener filtering. In the experiment on AURORA-2J, the proposed method provides higher word accuracy under clean training and significant improvement of word accuracy in low signal-to-noise ratio (SNR) under multi-condition training compared to a standard ASR system with mel frequency ceptral coeffient (MFCC) parameters. Moreover, the proposed method requires a reduced, two-fifth, Gaussian mixture components and less memory to achieve accurate ASR.
Improved Noise Reduction with Packet Loss Recovery Based on Post-Filtering over IP Networks
Jinsul KIM Hyunwoo LEE Won RYU Seungho HAN Minsoo HAHN

LETTER-Multimedia Systems for Communications

Vol:
E91-B No:3
Page(s):
975-979
This letter mainly focuses on improving current noise reduction methods to solve the critical speech distortion problems with robust noise reduction in noisy speech signals for speech enhancement over IP networks. For robust noise reduction with packet loss recovery, we propose a novel optimized Wiener filtering technique that uses the estimated SNR (Signal-to-Noise Ratio) with packet loss recovery method which is applied as post-filtering over IP-networks. Simulation results demonstrate that the proposed scheme provides better reduction and recovery rates with considering packet loss and SNR environment than other methods.
Experimental Evaluation of the Super Sweep Spectrum Analyzer
Masao NAGANO Toshio ONODERA Mototaka SONE

PAPER-Digital Signal Processing

Vol:
E91-A No:3
Page(s):
782-790
A sweep spectrum analyzer has been improved over the years, but the fundamental method has not been changed before the 'Super Sweep' method appeared. The 'Super Sweep' method has been expected to break the limitation of the conventional sweep spectrum analyzer, a limit of the maximum sweep rate which is in inverse proportion to the square of the frequency resolution. The superior performance of the 'Super Sweep' method, however, has not been experimentally proved yet. This paper gives the experimental evaluation on the 'Super Sweep' spectrum analyzer, of which theoretical concepts have already been presented by the authors of this paper. Before giving the experimental results, we give complete analysis for a sweep spectrum analyzer and express the principle of the super-sweep operation with a complete set of equations. We developed an experimental system whose components operated in an optimum condition as the spectrum analyzer. Then we investigated its properties, a peak level reduction and broadening of the frequency resolution of the measured spectrum, by changing the sweep rate. We also confirmed that the experimental system satisfactorily detected the spectrum at least 30 times faster than the conventional method and the sweep rate was in proportion to the bandwidth of the base band signal to be analyzed. We proved that the 'Super Sweep' method broke the restriction of the sweep rate put on a conventional sweep spectrum analyzer.
Bilingual Cluster Based Models for Statistical Machine Translation
Hirofumi YAMAMOTO Eiichiro SUMITA

PAPER-Applications

Vol:
E91-D No:3
Page(s):
588-597
We propose a domain specific model for statistical machine translation. It is well-known that domain specific language models perform well in automatic speech recognition. We show that domain specific language and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparseness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In order to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynamically. For these cases, not only the translation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statistical machine translation. In the proposed method, a bilingual training corpus, is automatically clustered into sub-corpora. Each sub-corpus is deemed to be a domain. The domain of a source sentence is predicted by using its similarity to the sub-corpora. The predicted domain (sub-corpus) specific language and translation models are then used for the translation decoding. This approach gave an improvement of 2.7 in BLEU score on the IWSLT05 Japanese to English evaluation corpus (improving the score from 52.4 to 55.1). This is a substantial gain and indicates the validity of the proposed bilingual cluster based models.
Language Modeling Using PLSA-Based Topic HMM
Atsushi SAKO Tetsuya TAKIGUCHI Yasuo ARIKI

PAPER-Language Modeling

Vol:
E91-D No:3
Page(s):
522-528
In this paper, we propose a PLSA-based language model for sports-related live speech. This model is implemented using a unigram rescaling technique that combines a topic model and an n-gram. In the conventional method, unigram rescaling is performed with a topic distribution estimated from a recognized transcription history. This method can improve the performance, but it cannot express topic transition. By incorporating the concept of topic transition, it is expected that the recognition performance will be improved. Thus, the proposed method employs a "Topic HMM" instead of a history to estimate the topic distribution. The Topic HMM is an Ergodic HMM that expresses typical topic distributions as well as topic transition probabilities. Word accuracy results from our experiments confirmed the superiority of the proposed method over a trigram and a PLSA-based conventional method that uses a recognized history.
Design for Testability Method to Avoid Error Masking of Software-Based Self-Test for Processors
Masato NAKAZATO Michiko INOUE Satoshi OHTAKE Hideo FUJIWARA

PAPER-High-Level Testing

Vol:
E91-D No:3
Page(s):
763-770
In this paper, we propose a design for testability method for test programs of software-based self-test using test program templates. Software-based self-test using templates has a problem of error masking where some faults detected in a test generation for a module are not detected by the test program synthesized from the test. The proposed method achieves 100% template level fault efficiency, that is, it completely avoids the error masking. Moreover, the proposed method has no performance degradation (adds only observation points) and enables at-speed testing.
Race-Free Mixed Serial-Parallel Comparison for Low Power Content Addressable Memory
Seong-Ook JUNG Sei-Seung YOON

LETTER-VLSI Design Technology and CAD

Vol:
E91-A No:3
Page(s):
895-898
This letter presents a race-free mixed serial-parallel comparison (RFMSPC) scheme which uses both serial and parallel CAMs in a match line. A self-reset search line scheme for the serial CAM is proposed to avoid the timing race problem and additional timing penalties. Various 32 entry CAMs are designed using 90 nm 1.2 V CMOS process to verify the proposed RFMSPC scheme. It shows that the RFMSPC saves power consumption by 40%, 53% and 63% at the cost of a 4%, 6% and 16% increase in search time according to 1, 2, and 4 serial CAM bits in a match line.
Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN
Longbiao WANG Seiichi NAKAGAWA Norihide KITAOKA

PAPER-ASR under Reverberant Conditions

Vol:
E91-D No:3
Page(s):
457-466
In a distant-talking environment, the length of channel impulse response is longer than the short-term spectral analysis window. Conventional short-term spectrum based Cepstral Mean Normalization (CMN) is therefore, not effective under these conditions. In this paper, we propose a robust speech recognition method by combining a short-term spectrum based CMN with a long-term one. We assume that a static speech segment (such as a vowel, for example) affected by reverberation, can be modeled by a long-term cepstral analysis. Thus, the effect of long reverberation on a static speech segment may be compensated by the long-term spectrum based CMN. The cepstral distance of neighboring frames is used to discriminate the static speech segment (long-term spectrum) and the non-static speech segment (short-term spectrum). The cepstra of the static and non-static speech segments are normalized by the corresponding cepstral means. In a previous study, we proposed an environmentally robust speech recognition method based on Position-Dependent CMN (PDCMN) to compensate for channel distortion depending on speaker position, and which is more efficient than conventional CMN. In this paper, the concept of combining short-term and long-term spectrum based CMN is extended to PDCMN. We call this Variable Term spectrum based PDCMN (VT-PDCMN). Since PDCMN/VT-PDCMN cannot normalize speaker variations because a position-dependent cepstral mean contains the average speaker characteristics over all speakers, we also combine PDCMN/VT-PDCMN with conventional CMN in this study. We conducted the experiments based on our proposed method using limited vocabulary (100 words) distant-talking isolated word recognition in a real environment. The proposed method achieved a relative error reduction rate of 60.9% over the conventional short-term spectrum based CMN and 30.6% over the short-term spectrum based PDCMN.
Recognizing Reverberant Speech Based on Amplitude and Frequency Modulation
Yotaro KUBO Shigeki OKAWA Akira KUREMATSU Katsuhiko SHIRAI

PAPER-ASR under Reverberant Conditions

Vol:
E91-D No:3
Page(s):
448-456
We have attempted to recognize reverberant speech using a novel speech recognition system that depends on not only the spectral envelope and amplitude modulation but also frequency modulation. Most of the features used by modern speech recognition systems, such as MFCC, PLP, and TRAPS, are derived from the energy envelopes of narrowband signals by discarding the information in the carrier signals. However, some experiments show that apart from the spectral/time envelope and its modulation, the information on the zero-crossing points of the carrier signals also plays a significant role in human speech recognition. In realistic environments, a feature that depends on the limited properties of the signal may easily be corrupted. In order to utilize an automatic speech recognizer in an unknown environment, using the information obtained from other signal properties and combining them is important to minimize the effects of the environment. In this paper, we propose a method to analyze carrier signals that are discarded in most of the speech recognition systems. Our system consists of two nonlinear discriminant analyzers that use multilayer perceptrons. One of the nonlinear discriminant analyzers is HATS, which can capture the amplitude modulation of narrowband signals efficiently. The other nonlinear discriminant analyzer is a pseudo-instantaneous frequency analyzer proposed in this paper. This analyzer can capture the frequency modulation of narrowband signals efficiently. The combination of these two analyzers is performed by the method based on the entropy of the feature introduced by Okawa et al. In this paper, in Sect. 2, we first introduce pseudo-instantaneous frequencies to capture a property of the carrier signal. The previous AM analysis method are described in Sect. 3. The proposed system is described in Sect. 4. The experimental setup is presented in Sect. 5, and the results are discussed in Sect. 6. We evaluate the performance of the proposed method by continuous digit recognition of reverberant speech. The proposed system exhibits considerable improvement with regard to the MFCC feature extraction system.
Linear Discriminant Analysis Using a Generalized Mean of Class Covariances and Its Application to Speech Recognition
Makoto SAKAI Norihide KITAOKA Seiichi NAKAGAWA

PAPER-Feature Extraction

Vol:
E91-D No:3
Page(s):
478-487
To precisely model the time dependency of features is one of the important issues for speech recognition. Segmental unit input HMM with a dimensionality reduction method has been widely used to address this issue. Linear discriminant analysis (LDA) and heteroscedastic extensions, e.g., heteroscedastic linear discriminant analysis (HLDA) or heteroscedastic discriminant analysis (HDA), are popular approaches to reduce dimensionality. However, it is difficult to find one particular criterion suitable for any kind of data set in carrying out dimensionality reduction while preserving discriminative information. In this paper, we propose a new framework which we call power linear discriminant analysis (PLDA). PLDA can be used to describe various criteria including LDA, HLDA, and HDA with one control parameter. In addition, we provide an efficient selection method using a control parameter without training HMMs nor testing recognition performance on a development data set. Experimental results show that the PLDA is more effective than conventional methods for various data sets.
Noise Suppression Based on Multi-Model Compositions Using Multi-Pass Search with Multi-Label N-gram Models
Takatoshi JITSUHIRO Tomoji TORIYAMA Kiyoshi KOGURE

PAPER-Noisy Speech Recognition

Vol:
E91-D No:3
Page(s):
402-410
We propose a noise suppression method based on multi-model compositions and multi-pass search. In real environments, input speech for speech recognition includes many kinds of noise signals. To obtain good recognized candidates, suppressing many kinds of noise signals at once and finding target speech is important. Before noise suppression, to find speech and noise label sequences, we introduce multi-pass search with acoustic models including many kinds of noise models and their compositions, their n-gram models, and their lexicon. Noise suppression is frame-synchronously performed using the multiple models selected by recognized label sequences with time alignments. We evaluated this method using the E-Nightingale task, which contains voice memoranda spoken by nurses during actual work at hospitals. The proposed method obtained higher performance than the conventional method.

1121-1140hit(2504hit)

Keyword Search Result

[Keyword] SPE(2504hit)

The Optimal Architecture Design of Two-Dimension Matrix Multiplication Jumping Systolic Array

Hardware Neural Network for a Visual Inspection System

Fundamental Consideration on Distance Estimation Using Acoustical Standing Wave

A High-Speed Design of Montgomery Multiplier

Optimum Pulse Shape Design for UWB Systems with Timing Jitter

Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method

Feature Compensation Employing Multiple Environmental Models for Robust In-Vehicle Speech Recognition

Noisy Speech Recognition Based on Integration/Selection of Multiple Noise Suppression Methods Using Noise GMMs

Local Peak Enhancement for In-Car Speech Recognition in Noisy Environment

Canonicalization of Feature Parameters for Robust Speech Recognition Based on Distinctive Phonetic Feature (DPF) Vectors

Improved Noise Reduction with Packet Loss Recovery Based on Post-Filtering over IP Networks

Experimental Evaluation of the Super Sweep Spectrum Analyzer

Bilingual Cluster Based Models for Statistical Machine Translation

Language Modeling Using PLSA-Based Topic HMM

Design for Testability Method to Avoid Error Masking of Software-Based Self-Test for Processors

Race-Free Mixed Serial-Parallel Comparison for Low Power Content Addressable Memory

Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

Recognizing Reverberant Speech Based on Amplitude and Frequency Modulation

Linear Discriminant Analysis Using a Generalized Mean of Class Covariances and Its Application to Speech Recognition

Noise Suppression Based on Multi-Model Compositions Using Multi-Pass Search with Multi-Label N-gram Models

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles