The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] EE(4073hit)

3021-3040hit(4073hit)

  • Signal Integrity Design and Analysis for a 400 MHz RISC Microcontroller

    Akira YAMADA  Yasuhiro NUNOMURA  Hiroaki SUZUKI  Hisakazu SATO  Niichi ITOH  Tetsuya KAGEMOTO  Hironobu ITO  Takashi KURAFUJI  Nobuharu YOSHIOKA  Jingo NAKANISHI  Hiromi NOTANI  Rei AKIYAMA  Atsushi IWABU  Tadao YAMANAKA  Hidehiro TAKATA  Takeshi SHIBAGAKI  Takahiko ARAKAWA  Hiroshi MAKINO  Osamu TOMISAWA  Shuhei IWADE  

     
    PAPER-Design Methods and Implementation

      Vol:
    E86-C No:4
      Page(s):
    635-642

    A high-speed 32-bit RISC microcontroller has been developed. In order to realize high-speed operation with minimum hardware resource, we have developed new design and analysis methods such as a clock distribution, a bus-line layout, and an IR drop analysis. As a result, high-speed operation of 400 MHz has been achieved with power dissipation of 0.96 W at 1.8 V.

  • Speaker Tracking for Hands-Free Continuous Speech Recognition in Noise Based on a Spectrum-Entropy Beamforming Method

    George NOKAS  Evangelos DERMATAS  

     
    LETTER-Speech and Hearing

      Vol:
    E86-D No:4
      Page(s):
    755-758

    In this paper, we present a novel beam-former capable of tracking a rapidly moving speaker in a very noisy environment. The localization algorithm extracts a set of candidate direction-of-arrival (DOA) for the signal sources using array signal processing methods in the frequency domain. A minimum variance (MV) beam-former identifies the speech signal DOA in the direction where the signal's spectrum entropy is minimized. A fine tuning process detects the MV direction which is closest to the initial estimation using a smaller analysis window. Extended experiments, carried out in the range of 20-0 dB SNR, show significant improvement in the recognition rate of a moving speaker especially in very low SNRs (from 11.11% to 43.79% at 0 dB SNR in anechoic environment and from 9.9% to 30.51% in reverberant environment).

  • TCP Performance over IEEE 802.11 Based Multichannel MAC Protocol for Mobile Ad Hoc Networks

    Tsugunao KOBAYASHI  

     
    PAPER-Terrestrial Radio Communications

      Vol:
    E86-B No:4
      Page(s):
    1307-1316

    In this paper, we propose a new IEEE 802.11 based multichannel MAC protocol, which satisfies a single transceiver constraint and which is still compatible with the IEEE 802.11 standard. Our proposed protocol does not need any additional transceivers, and we can make use of the current IEEE 802.11 hardware for the implementation of our multichannel protocol. We propose and investigate two kinds of channel selection algorithms, these are, the sender based channel selection scheme and the receiver based channel selection scheme. We evaluate the TCP performance over our proposed multichannel MAC protocol by the simulations taking account of the overhead, the time taken by changing the frequency channel and the time taken by the carrier sense in each channel. It is shown that our proposed scheme improves the performance significantly with a single transceiver. It is shown that the overhead costs are quite large in the multichannel MAC protocol, and the sender based channel selection scheme achieves better performance than that of the receiver based channel selection scheme in most cases. It is also shown that there exists a break-even value of the PLL synthesizer lockup time if we compare with the single channel IEEE 802.11 MAC protocol.

  • Fast Motion Estimation Algorithm and Low-Power CMOS Motion Estimator for MPEG Encoding

    Tadayoshi ENOMOTO  Akira KOTABE  

     
    PAPER-Architecture and Algorithms

      Vol:
    E86-C No:4
      Page(s):
    535-545

    A fast-motion-estimation (ME) algorithm called a "breaking-off-search (BOS)" was developed. It can improve processing speed of the full-search (FS) method by a factor of 3.4. The BOS algorithm can not only sometimes achieve better visual quality than FS, but can also solve visual degradation problems associated with conventional fast-ME algorithms whenever picture patterns change (i. e. , presence of scene changes). The power dissipation of a 0.6-µ m CMOS parallel Wallace-tree motion estimator using BOS was reduced to about 281 mW which was 1/28.7 that of the 0.6-µ m CMOS binary-tree motion estimator using FS.

  • High-Speed and Low-Power Techniques of Hardware and Software for Digital Signal Processors

    Hiroshi TAKAHASHI  Rimon IKENO  Yutaka TOYONOH  Akihiro TAKEGAMA  Yasumasa IKEZAKI  Tohru URASAKI  Hitoshi SATOH  Masayasu ITOIGAWA  Yoshinari MATSUMOTO  

     
    PAPER-Circuit Design

      Vol:
    E86-C No:4
      Page(s):
    589-596

    High-speed and low-power DSPs have been developed for versatile hand set applications. The DSP contains a 16-bit fixed point DSP core with multiple buses, highly tuned instruction sets and a low-power architecture, featuring CPU power with 404.5 µ W/MHz, chip power with 2.08 mW/MHz at peak and 200 µA stand-by current and 160 MHz/160 MIPS performance by a single DSP core, and also operates at 0.68 V within the temperature range from -40C to 125C in the worst case (Weak corner) even using much higher I-off current process compared to a conventional process to obtain a faster operating frequency. In this paper, we discuss circuit design techniques to continue scaling down valuable IP cores keeping the same functionality, better speed performance, and lower power dissipation with much lower voltage operation capability. For further power reduction by DSP software, Run-time Power Control (RPC) has been demonstrated in an MP3 player using 100 MHz/100 MIPS DSP at 1.8 V, which is a real-time application running on an Internet audio evaluation module experimentally and we obtained 32-60% power reduction on various music source data.

  • Diversity Transform of N-DPSK with Decision-Feedback Differential Detection over Correlated Rayleigh Fading

    Fuh-Hsin HWANG  

     
    LETTER-Wireless Communication Technology

      Vol:
    E86-B No:4
      Page(s):
    1457-1461

    In this letter, we investigate a diversity scheme which employs a simple transform, symbol interleaving and decision-feedback differential detection (DF-DD) for differential phase-shift-keying signal transmission over correlated Rayleigh fading. The proposed scheme merits instinct time diversity within each transmitted block and thus presents patent resistance to fading. It is shown that the considered technique provides significant diversity gains in a correlated Rayleigh fading channel.

  • A Dynamical N-Queen Problem Solver Using Hysteresis Neural Networks

    Takao YAMAMOTO  Kenya JIN'NO  Haruo HIROSE  

     
    PAPER

      Vol:
    E86-A No:4
      Page(s):
    740-745

    In a previous study about a combinatorial optimization problem solver using neural networks, since the Hopfield method, convergence to the optimum solution sooner and with more certainty is regarded as important. Namely, only static states are considered as the information. However, from a biological point of view, dynamical systems have attracted attention recently. Therefore, we propose a "dynamical" combinatorial optimization problem solver using hysteresis neural networks. In this paper, the proposed system is evaluated by the N-Queen problem.

  • On Automatic Speech Recognition at the Dawn of the 21st Century

    Chin-Hui LEE  

     
    INVITED SURVEY PAPER

      Vol:
    E86-D No:3
      Page(s):
    377-396

    In the last three decades of the 20th Century, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, to medium size vocabulary voice interactive command and control systems for business automation, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. Although we have witnessed many new technological promises, we have also encountered a number of practical limitations that hinder a widespread deployment of applications and services. On one hand, fast progress was observed in statistical speech and language modeling. On the other hand only spotty successes have been reported in applying knowledge sources in acoustics, speech and language science to improving speech recognition performance and robustness to adverse conditions. In this paper we review some key advances in several areas of speech recognition. A bottom-up detection framework is also proposed to facilitate worldwide research collaboration for incorporating technology advances in both statistical modeling and knowledge integration into going beyond the current speech recognition limitations and benefiting the society in the 21st century.

  • Language Modeling Using Patterns Extracted from Parse Trees for Speech Recognition

    Takatoshi JITSUHIRO  Hirofumi YAMAMOTO  Setsuo YAMADA  Genichiro KIKUI  Yoshinori SAGISAKA  

     
    PAPER-Speech and Speaker Recognition

      Vol:
    E86-D No:3
      Page(s):
    446-453

    We propose new language models to represent phrasal structures by patterns extracted from parse trees. First, modified word trigram models are proposed. They are extracted from sentences analyzed by the preprocessing of the parser with knowledge. Since sentences are analyzed to create sub-trees of a few words, these trigram models can represent relations among a few neighbor words more strongly than conventional word trigram models. Second, word pattern models are used on these modified word trigram models. The word patterns are extracted from parse trees and can represent phrasal structures and much longer word-dependency than trigram models. Experimental results show that modified trigram models are more effective than traditional trigram models and that pattern models attain slight improvements over modified trigram models. Furthermore, additional experiments show that pattern models are more effective for long sentences.

  • Adaptive Antennas Open Access

    Nobuyoshi KIKUMA  Mitoshi FUJIMOTO  

     
    INVITED PAPER

      Vol:
    E86-B No:3
      Page(s):
    968-979

    This paper reviews the historical development of adaptive antennas in Japan. First of all, we watch basic adaptive algorithms. In 1980s, particularly, the following issues were a matter of considerable concern to us; (a) behavior to the coherent interference like multipath waves or radar clutters, (b) signal degradation in case that the direction of arrival (DOA) of desired signal is different from the DOA specified beforehand in the adaptive antennas with the DOA of the desired signal as a prior knowledge, and (c) performance of adaptive antennas when the desired signal and interference are broadband. Although there are a lot of development and modification of adaptive algorithms in Japan, we refer in this paper only to the above-mentioned topics. Secondly, our attention is paid to implementation of adaptive antennas and advanced technologies. A large number of researches on the subjects have been carried out in Japan. Particularly, we focus on the initiative studies in Japan toward mobile communication application. They include researches of mobile radio propagation for adaptive antennas, calibration methods, and adaptive antenna for mobile terminals. As a matter of course, we also refer to adaptive antenna technologies for advanced communication schemes such as CDMA, SDMA, OFDM and so on. Finally, we take notice of some pilot products which were developed to verify the effect of the adaptive antenna in the practical environments. As the initiative ones, a couple of equipments are introduced in this paper.

  • Effect of Conductive Sheet Placed over PCB on Electromagnetic Noise Shielding

    Motoshi TANAKA  Hisashi TAKITA  Hiroshi INOUE  

     
    PAPER-Electromagnetic Compatibility(EMC)

      Vol:
    E86-B No:3
      Page(s):
    1125-1131

    The effect of a conductive sheet placed over a PCB with a microstrip line on electromagnetic noise shielding is investigated. As a typical conductive sheet, a copper sheet is used, and is not grounded. First, the input impedance of the microstrip line and the magnetic field when varying the distance between the PCB and the conductive sheet are measured, and the distance that does not affect the signal transmission is set at 8 mm. Second, the effect of the conductive sheet size on the magnetic field radiation is discussed by measurements and FDTD modeling, and the magnetic near-field distribution around the PCB is visualized by using the FDTD calculation. A conductive sheet whose width is larger than the PCB width should be effective for suppression of the magnetic near-field noise radiation just above a PCB.

  • A Stochastic F0 Contour Model Based on Clustering and a Probabilistic Measure

    Yoichi YAMASHITA  Tomoyoshi ISHIDA  Kazuki SHIMADERA  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E86-D No:3
      Page(s):
    543-549

    One of fundamental issues on the F0 contour is modeling relationship between F0 parameters and linguistic information of a sentence. This paper proposes a stochastic F0 model which probabilistically models the relationship between the F0 contour and the linguistic information. For the application of speech synthesis, an F0 generator selects the most probable F0 contour from candidates given by a probabilistic F0 model. An F0 contour of a Japanese sentence is represented by concatenation of F0 patterns of a Japanese syntactic unit, bunsetsu. A bunsetsu F0 pattern is composed of an F0 average and an F0 shape. The F0 average is independently predicted for each bunsetsu by a quantification theory from linguistic features of the bunsetsu. The most probable sequence of bunsetsu F0 shapes for a sentence is found in the F0 shape database based on a probabilistic measure. The probability that an F0 contour is observed for a sentence is defined by two kinds of probabilities, the F0 shape production and the F0 shape bigram. The latter is a probability of adjacent occurrence of two F0 shapes, which is similar to a word bigram in speech recognition. Several typical bunsetsu F0 shapes are extracted by clustering of training data and stored in the F0 shape database. The probability of the F0 shape production is computed for each bunsetsu based on distribution of values for the linguistic feature in a cluster. The RMS prediction errors of the F0 contour are 0.26[octave].

  • Modified Restricted Temporal Decomposition and Its Application to Low Rate Speech Coding

    Phu Chien NGUYEN  Takao OCHI  Masato AKAGI  

     
    PAPER-Speech and Audio Coding

      Vol:
    E86-D No:3
      Page(s):
    397-405

    This paper presents a method of temporal decomposition (TD) for line spectral frequency (LSF) parameters, called "Modified Restricted Temporal Decomposition" (MRTD), and its application to low rate speech coding. The LSF parameters have not been used for TD due to the stability problems in the linear predictive coding (LPC) model. To overcome this deficiency, a refinement process is applied to the event vectors in the proposed TD method to preserve their LSF ordering property. Meanwhile, the restricted second order TD model, where only two adjacent event functions can overlap and all event functions at any time sum up to one, is utilized to reduce the computational cost of TD. In addition, based on the geometric interpretation of TD the MRTD method enforces a new property on the event functions, named the "well-shapedness" property, to model the temporal structure of speech more effectively. This paper also proposes a method for speech coding at rates around 1.2 kbps based on STRAIGHT, a high quality speech analysis-synthesis method, using MRTD. In this speech coding method, MRTD based vector quantization is used for encoding spectral information of speech. Subjective test results indicate that the speech quality of the proposed speech coding method is close to that of the 4.8 kbps FS-1016 CELP coder.

  • A Silence Compression Algorithm for the Multi-Rate Dual-Bandwidth MPEG-4 CELP Standard

    Masahiro SERIZAWA  Hironori ITO  Toshiyuki NOMURA  

     
    PAPER-Speech and Audio Coding

      Vol:
    E86-D No:3
      Page(s):
    412-417

    This paper proposes a silence compression algorithm operating at multi-rates (MR) and with dual-bandwidths (DB), a narrowband and a wideband, for the MPEG (Moving Picture Experts Group)-4 CELP (Code Excited Linear Prediction) standard. The MR/DB operations are implemented by a Variable-Frame-size/Dual-Bandwidth Voice Activity Detection (VF/DB-VAD) module with bandwidth conversions of the input signal, and a Variable-Frame-size Comfort Noise Generator (VF-CNG) module. The CNG module adaptively smoothes the Root Mean Square (RMS) value of the input signal to improve the coding quality during transition periods. The algorithm also employs a Dual-Rate Discontinuous Transmission (DR-DTX) module to reduce an average transmission bitrate during silence periods. Subjective test results show that the proposed silence compression algorithm gives no degradation in coding quality for clean and noisy speech signals. These signals include about 20 to 30% non-speech frames and the average transmission bitrates are reduced by 20 to 40%. The proposed algorithm has been adopted as a part of the ISO/IEC MPEG-4 CELP version 2 standard.

  • Signal Processing Representations of Speech

    W. Bastiaan KLEIJN  

     
    INVITED SURVEY PAPER

      Vol:
    E86-D No:3
      Page(s):
    359-376

    Synergies in processing requirements and knowledge of human speech production and perception have led to a similarity of the speech signal representations used for the tasks of recognition, coding, and modification. The representations are generally composed of a description of the vocal-tract transfer function and, in the case of coding and modification, a description of the excitation signal. This paper provides an overview of commonly used representations. For coding and modification, autoregressive models represented by line spectral frequencies perform well for the vocal tract, and pitch-synchronous filter banks and modulation-domain filters perform well for the excitation. For recognition, good representations are based on a smoothed magnitude response of the vocal tract.

  • Continuous Speech Recognition Using an On-Line Speaker Adaptation Method Based on Automatic Speaker Clustering

    Wei ZHANG  Seiichi NAKAGAWA  

     
    PAPER-Speech and Speaker Recognition

      Vol:
    E86-D No:3
      Page(s):
    464-473

    This paper evaluates an on-line incremental speaker adaptation method for co-channel conversation including multiple speakers with the assumption that the speaker is unknown and changes frequently. After performing the speaker clustering treatment based on the Vector Quantization (VQ) distortion for every utterance, acoustic models for each cluster are adapted by Maximum Likelihood Linear Regression (MLLR) or Maximum A Posteriori probability (MAP). The performance of continuous speech recognition could be improved. In this paper, to prove the efficiency of the speaker clustering method for improving the performance of continuous speech recognition, the continuous speech recognition experiments with supervised and unsupervised cluster adaptation were conducted, respectively. Finally, evaluation experiments based on other prepared test data were performed on continuous syllable recognition and large vocabulary continuous speech recognition (LVCSR). The efficiency of the speaker adaptation and clustering methods presented in this paper was supported strongly by the experimental results.

  • Filter Bank Subtraction for Robust Speech Recognition

    Kazuo ONOE  Hiroyuki SEGI  Takeshi KOBAYAKAWA  Shoei SATO  Shinichi HOMMA  Toru IMAI  Akio ANDO  

     
    PAPER-Robust Speech Recognition and Enhancement

      Vol:
    E86-D No:3
      Page(s):
    483-488

    In this paper, we propose a new technique of filter bank subtraction for robust speech recognition under various acoustic conditions. Spectral subtraction is a simple and useful technique for reducing the influence of additive noise. Conventional spectral subtraction assumes accurate estimation of the noise spectrum and no correlation between speech and noise. Those assumptions, however, are rarely satisfied in reality, leading to the degradation of speech recognition accuracy. Moreover, the recognition improvement attained by conventional methods is slight when the input SNR changes sharply. We propose a new method in which the output values of filter banks are used for noise estimation and subtraction. By estimating noise at each filter bank, instead of at each frequency point, the method alleviates the necessity for precise estimation of noise. We also take into consideration expected phase differences between the spectra of speech and noise in the subtraction and control a subtraction coefficient theoretically. Recognition experiments on test sets at several SNRs showed that the filter bank subtraction technique improved the word accuracy significantly and got better results than conventional spectral subtraction on all the test sets. In other experiments, on recognizing speech from TV news field reports with environmental noise, the proposed subtraction method yielded better results than the conventional method.

  • Face-to-Talk: Audio-Visual Speech Detection for Robust Speech Recognition in Noisy Environment

    Kazumasa MURAI  Satoshi NAKAMURA  

     
    PAPER-Robust Speech Recognition and Enhancement

      Vol:
    E86-D No:3
      Page(s):
    505-513

    This paper discusses "face-to-talk" audio-visual speech detection for robust speech recognition in noisy environment, which consists of facial orientation based switch and audio-visual speech section detection. Most of today's speech recognition systems must actually turned on and off by a switch e.g. "push-to-talk" to indicate which utterance should be recognized, and a specific speech section must be detected prior to any further analysis. To improve usability and performance, we have researched how to extract the useful information from visual modality. We implemented a facial orientation based switch, which activates the speech recognition during a speaker is facing to the camera. Then, the speech section is detected by analyzing the image of the face. Visual speech detection is robust to audio noise, but because the articulation starts prior to the speech and lasts longer than the speech, the detected section tends to be longer and ends up with insertion errors. Therefore, we have fused the audio-visual modality detected sections. Our experiment confirms that the proposed audio-visual speech detection method improves recognition performance in noisy environment.

  • A Hybrid HMM/BN Acoustic Model for Automatic Speech Recognition

    Konstantin MARKOV  Satoshi NAKAMURA  

     
    PAPER-Speech and Speaker Recognition

      Vol:
    E86-D No:3
      Page(s):
    438-445

    In current HMM based speech recognition systems, it is difficult to supplement acoustic spectrum features with additional information such as pitch, gender, articulator positions, etc. On the other hand, Bayesian Networks (BN) allow for easy combination of different continuous as well as discrete features by exploring conditional dependencies between them. However, the lack of efficient algorithms has limited their application in continuous speech recognition. In this paper we propose new acoustic model, where HMM are used for modeling of temporal speech characteristics and state probability model is represented by BN. In our experimental system based on HMM/BN model, in addition to speech observation variable, state BN has two more (hidden) variables representing noise type and SNR value. Evaluation results on AURORA2 database showed 36.4% word error rate reduction for closed noise test which is comparable with other much more complex systems utilizing effective adaptation and noise robust methods.

  • Ensemble Monte Carlo/Molecular Dynamics Simulation of Inversion Layer Mobility in Si MOSFETs--Effects of Substrate Impurity

    Yoshinari KAMAKURA  Hironori RYOUKE  Kenji TANIGUCHI  

     
    PAPER

      Vol:
    E86-C No:3
      Page(s):
    357-362

    Electron transport in bulk Si and MOSFET inversion layers is studied using an ensemble Monte Carlo (EMC) technique coupled with the molecular dynamics (MD) method. The Coulomb interactions among point charges (electrons and negative ions) are directly taken into account in the simulation. It is demonstrated that the static screening of Coulomb interactions is correctly simulated by the EMC/MD method. Furthermore, we calculate the inversion layer mobility in Si MOSFETs, and mobility roll-off near the threshold voltage is observed by the present approach.

3021-3040hit(4073hit)