The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] EE(4073hit)

3041-3060hit(4073hit)

  • Speech Enhancement by Profile Fitting Method

    Osamu ICHIKAWA  Tetsuya TAKIGUCHI  Masafumi NISHIMURA  

     
    PAPER-Robust Speech Recognition and Enhancement

      Vol:
    E86-D No:3
      Page(s):
    514-521

    It is believed that distant-talking speech recognition in a noisy environment requires a large-scale microphone array. However, this cannot fit into small consumer devices. Our objective is to improve the performance with a limited number of microphones (preferably only left and right). In this paper, we focused on a profile that is the shape of the power distribution according to the beamforming direction. An observed profile can be decomposed into known profiles for directional sound sources and a non-directional background sound source. Evaluations confirmed this method reduced the CER (Character Error Ratio) for the dictation task by more than 20% compared to a conventional 2-channel Adaptive Spectral Subtraction beamformer in a non-reverberant environment.

  • Grey Filtering and Its Application to Speech Enhancement

    Cheng-Hsiung HSIEH  

     
    PAPER-Robust Speech Recognition and Enhancement

      Vol:
    E86-D No:3
      Page(s):
    522-533

    In this paper, a grey filtering approach based on GM(1,1) model is proposed. Then the grey filtering is applied to speech enhancement. The fundamental idea in the proposed grey filtering is to relate estimation error of GM(1,1) model to additive noise. The simulation results indicate that the additive noise can be estimated accurately by the proposed grey filtering approach with an appropriate scaling factor. Note that the spectral subtraction approach to speech enhancement is heavily dependent on the accuracy of statistics of additive noise and that the grey filtering is able to estimate additive noise appropriately. A magnitude spectral subtraction (MSS) approach for speech enhancement is proposed where the mechanism to determine the non-speech and speech portions is not required. Two examples are provided to justify the proposed MSS approach based on grey filtering. The simulation results show that the objective of speech enhancement has been achieved by the proposed MSS approach. Besides, the proposed MSS approach is compared with HFR-based approach in [4] and ZP approach in [5]. Simulation results indicate that in most of cases HFR-based and ZP approaches outperform the proposed MSS approach in SNRimp. However, the proposed MSS approach has better subjective listening quality than HFR-based and ZP approaches.

  • Crosstalk Equalization for High-Speed Digital Transmission Systems

    Hui-Chul WON  Gi-Hong IM  

     
    PAPER-Wireless Communication Technology

      Vol:
    E86-B No:3
      Page(s):
    1063-1072

    In this paper, we discuss crosstalk equalization technique for high-speed digital transmission systems. This equalization technique makes use of the cyclostationarity of the crosstalk interferer. We first analyze the eigenstructure of the equalizer in the presence of cyclostationary crosstalk interference. It is shown that the eigenvalues of the equalizer depend upon the folded signal and interferer power spectra, and the cross power spectrum between the signal and the interferer. The expressions of the minimum mean square error (MMSE) and the excess MSE are then obtained by using the equalizer's eigenstructure. Analysis and simulation results indicate that such peculiar equalizer's eigenstructure in the presence of cyclostationary interference results in significantly different initial convergence and steady-state behaviors as compared with the stationary noise case. We also show that the performance of the equalizer varies depending on the relative clock phase of the symbol clocks used by the signal and the crosstalk interferer.

  • Improved Downlink Performance of Transmit Adaptive Array with Limited Feedback Channel Rate by Applying Transmit Antenna Selection

    Cheol Yong AHN  Dong Ku KIM  

     
    LETTER-Antenna and Propagation

      Vol:
    E86-B No:3
      Page(s):
    1186-1190

    Transmit adaptive array requires the forward link channel state for evaluating the optimum transmit weight in which a feedback channel transports the forward link channel state to the base station. Since the feedback information limits the transmission rate of the reverse link traffic, it is necessary to keep the number of feedback bits to a minimum. This paper presents a system in which the N transmit antennas are extended to the 2N transmit antennas while the feedback channel is limited as that of N-transmit antenna system. The increased antennas can give additional diversity gain but requires higher rate of feedback bits. The limited feedback channel increases the quantization error of feedback information since the number of feedback bits assigned to each antenna is reduced. In order to overcome the limited rate of feedback channel problem, this paper proposes the transmit antenna selection schemes which can effectively use the limited feedback bits, reduce the computational complexity at the mobile station, and eventually achieve diversity gain. System performances are investigated for the case of N=4 for the various antenna selection schemes on both flat fading and multi-path fading channels.

  • A New Dynamic D-Flip-Flop Aiming at Glitch and Charge Sharing Free

    Sung-Hyun YANG  Younggap YOU  Kyoung-Rok CHO  

     
    PAPER-Electronic Circuits

      Vol:
    E86-C No:3
      Page(s):
    496-505

    A dual-modulus (divide-by-128/129) prescaler has been designed based on 0.25-µm CMOS technology employing new D-flip-flops. The new D-flip-flops are free from glitch problems due to internal charge sharing. Transistor merging technique has been employed to reduce the number of transistors and to secure reliable high-speed operation. At the 2.5-V supply voltage, the prescaler using the proposed dynamic D-flip-flops can operate up to the frequency of 2.95-GHz, and consumes about 10% and about 27% less power than Yuan/Svensson's and Huang's circuits, respectively.

  • Estimating Syntactic Structure from Prosody in Japanese Speech

    Tomoko OHSUGA  Yasuo HORIUCHI  Akira ICHIKAWA  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E86-D No:3
      Page(s):
    558-564

    In this study, we introduce a method for estimating the syntactic structure of Japanese speech from F0 contour and pause duration. We defined a prosodic unit (PU) which is divided by the local minimal point of an F0 contour or pause. Combining PUs repeatedly (a pair of PUs is combined into one PU), a tree structure is gradually generated. Which pair of PUs in a sequence of three PUs should be combined is decided by a discriminant function based on the discriminant analysis of a corpus of speech data. We applied the method to the ATR Phonetically Balanced Sentences read by four Japanese speakers. We found that with this method, the correct rate of judgement for each sequence of three PUs is 79% and the estimation accuracy of the entire syntactic structure for each sentence is 26%. We consider this result to demonstrate a good degree of accuracy for the difficult task of estimating syntactic structure only from prosody.

  • Speaker Recognition Using Adaptively Boosted Classifiers

    Say-Wei FOO  Eng-Guan LIM  

     
    PAPER-Speech and Speaker Recognition

      Vol:
    E86-D No:3
      Page(s):
    474-482

    In this paper, a novel approach to speaker recognition is proposed. The approach makes use of adaptive boosting (AdaBoost) and classifiers such as Multilayer Perceptrons (MLP) and C4.5 Decision Trees for closed set, text-dependent speaker recognition. The performance of the systems is assessed using a subset of utterances drawn from the YOHO speaker verification corpus. Experiments show that significant improvement in accuracy can be achieved with the application of adaptive boosting techniques. Results also reveal that an accuracy of 98.8% for speaker identification may be achieved using the adaptively boosted C4.5 system.

  • Establishment of Protection Paths Using Maximum Degree of Sharing in WDM Networks

    Jian-Qing LI  Hong-Shik PARK  Hyeong-Ho LEE  

     
    PAPER-Network Management/Operation

      Vol:
    E86-B No:3
      Page(s):
    1109-1116

    In wavelength division multiplexed networks, shared path protection provides the same level of protection against a single fiber-link failure as dedicated path protection with potentially higher network utilization. The shared path protection is more complex to provision and maintain. In this paper, we introduce a parameter, the degree of sharing, which refers to the number of protection paths that a wavelength can be assigned to on a link. We propose methods for calculating the maximum degree of sharing. We consider on-line routing and wavelength assignment (RWA) of protection paths that are established for incremental traffic using the maximum degree of sharing. Establishment of protection paths using the maximum degree of sharing can simplify the algorithm. We compare the results on the decreased calculation time with accepted connection requests for a given number of wavelengths, assuming that wavelengths are assigned according to the First-Fit policy for working paths and Last-Fit policy for protection paths. The more wavelengths are used, the more calculation time can be reduced. When the load increases, the decreasing rate of calculation time also increases.

  • A New Multistage Search of Algebraic CELP Codebooks Based on Trellis Coding

    Mohammed HALIMI  Abdellah KADDAI  Messaoud BENGHERABI  

     
    PAPER-Speech and Audio Coding

      Vol:
    E86-D No:3
      Page(s):
    406-411

    This paper proposes a new multistage technique of algebraic codebook in CELP coders called Trellis Search inspired from the Trellis Coded Quantization (TCQ). This search technique is implemented into the fixed codebook of the standard G.729 for objective evaluation on a large corpus of a testing speech database. Simulations results show that in terms of computer execution time the proposed search scheme reduces the codebook search by approximately 23% compared to the time of focused search used in the standard G.729. This yields to a reduction of about 8% in the computer execution time of the coder at the cost of a slight degradation of speech quality but perceptually not noticeable. Moreover, this new technique shows better speech quality than the G.729A at the expense of a higher complexity.

  • A New Approach to Blind System Identification in MEG Data

    Kuniharu KISHIDA  Hidekazu FUKAI  Takashi HARA  Kazuhiro SHINOSAKI  

     
    PAPER-Applications

      Vol:
    E86-A No:3
      Page(s):
    611-619

    A new blind identification method of transfer functions between variables in feedback systems is introduced for single sweep type of MEG data. The method is based on the viewpoint of stochastic/statistical inverse problems. The required conditions of the model are stationary and linear Gaussian processes. Raw MEG data of the brain activities are heavily contaminated with several noises and artifacts. The elimination of them is a crucial problem especially for the method. Usually, these noises and artifacts are removed by notch and high-pass filters which are preset automatically. In the present paper, we will try two types of more careful preprocessing procedures for the identification method to obtain impulse functions. One is a careful notch filtering and the other is a blind source separation method based on temporal structure. As results, identifiably of transfer functions and their impulse responses are improved in both cases. Transfer functions and impulse responses identified between MEG sensors are obtained by using the method in Appendix A, when eyes are closed with rest state. Some advantages of the blind source separation method are discussed.

  • Statistical Threshold Voltage Fluctuation Analysis by Monte Carlo Ion Implantation Method

    Yoshinori ODA  Yasuyuki OHKURA  Kaina SUZUKI  Sanae ITO  Hirotaka AMAKAWA  Kenji NISHI  

     
    PAPER

      Vol:
    E86-C No:3
      Page(s):
    416-420

    A new analysis method for random dopant induced threshold voltage fluctuations by using Monte Carlo ion implantation were presented. The method was applied to investigate Vt fluctuations due to statistical variation of pocket dopant profile in 0.1µm MOSFET's by 3D process-device simulation system. This method is very useful to analyze a statistical fluctuation in sub-100 nm MOSFET's efficiently.

  • Audio-Visual Speech Recognition Based on Optimized Product HMMs and GMM Based-MCE-GPD Stream Weight Estimation

    Kenichi KUMATANI  Satoshi NAKAMURA  

     
    PAPER-Speech and Speaker Recognition

      Vol:
    E86-D No:3
      Page(s):
    454-463

    In this paper, we describe an adaptive integration method for an audio-visual speech recognition system that uses not only the speaker's audio speech signal but visual speech signals like lip images. Human beings communicate with each other by integrating multiple types of sensory information such as hearing and vision. Such integration can be applied to automatic speech recognition, too. In the integration of audio and visual speech features for speech recognition, there are two important issues, i.e., (1) a model that represents the synchronous and asynchronous characteristics between audio and visual features, and makes the best use of a whole database that includes uni-modal, audio only, or visual only data as well as audio-visual data, and (2) the adaptive estimation of reliability weights for the audio and visual information. This paper mainly investigates two issues and proposes a novel method to effectively integrate audio and visual information in an audio-visual Automatic Speech Recognition (ASR) system. First, as the model that integrates audio-visual speech information, we apply a product of hidden Markov models (product HMM), the product of an audio HMM and a visual HMM. We newly propose a method that re-estimates the product HMM using audio-visual synchronous speech data so as to train the synchronicity of the audio-visual information, while the original product HMM assumes independence from audio-visual features. Second, for the optimal audio-visual information reliability weight estimation, we propose a Gaussian mixture model (GMM) based-MCE-GPD (minimum classification error and generalized probabilistic descent) algorithm, which enables reductions in the amount of adaptation data and amount of computations required for the GMM estimation. Evaluation experiments show that the proposed audio-visual speech recognition system improves the recognition accuracy over conventional ones even if the audio signals are clean.

  • A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech

    Jin-Song ZHANG  Konstantin MARKOV  Tomoko MATSUI  Satoshi NAKAMURA  

     
    PAPER-Robust Speech Recognition and Enhancement

      Vol:
    E86-D No:3
      Page(s):
    489-496

    This paper presents a study on modeling inter-word pauses to improve the robustness of acoustic models for recognizing noisy conversational speech. When precise contextual modeling is used for pauses, the frequent appearances and varying acoustics of pauses in noisy conversational speech make it a problem to automatically generate an accurate phonetic transcription of the training data for developing robust acoustic models. This paper presents a proposal to exploit the reliable phonetic heuristics of pauses in speech to aid the detection of varying pauses. Based on it, a stepwise approach to optimize pause HMMs was applied to the data of the DARPA SPINE2 project, and more correct phonetic transcription was achieved. The cross-word triphone HMMs developed using this method got an absolute 9.2% word error reduction when compared to the conventional method with only context free modeling of pauses. For the same pause modeling method, the use of the optimized phonetic segmentation brought about an absolute 5.2% improvements.

  • A Context Clustering Technique for Average Voice Models

    Junichi YAMAGISHI  Masatsune TAMURA  Takashi MASUKO  Keiichi TOKUDA  Takao KOBAYASHI  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E86-D No:3
      Page(s):
    534-542

    This paper describes a new context clustering technique for average voice model, which is a set of speaker independent speech synthesis units. In the technique, we first train speaker dependent models using multi-speaker speech database, and then construct a decision tree common to these speaker dependent models for context clustering. When a node of the decision tree is split, only the context related questions which are applicable to all speaker dependent models are adopted. As a result, every node of the decision tree always has training data of all speakers. After construction of the decision tree, all speaker dependent models are clustered using the common decision tree and a speaker independent model, i.e., an average voice model is obtained by combining speaker dependent models. From the results of subjective tests, we show that the average voice models trained using the proposed technique can generate more natural sounding speech than the conventional average voice models.

  • Automatic Estimation of Accentual Attribute Values of Words for Accent Sandhi Rules of Japanese Text-to-Speech Conversion

    Nobuaki MINEMATSU  Ryuji KITA  Keikichi HIROSE  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E86-D No:3
      Page(s):
    550-557

    Accurate estimation of accentual attribute values of words, which is required to apply rules of Japanese word accent sandhi to prosody generation, is an important factor to realize high-quality text-to-speech (TTS) conversion. The rules were already formulated by Sagisaka et al. and are widely used in Japanese TTS conversion systems. Application of these rules, however, requires values of a few accentual attributes of each constituent word of input text. The attribute values cannot be found in any public database or any accent dictionaries of Japanese. Further, these values are difficult even for native speakers of Japanese to estimate only with their introspective consideration of properties of their mother tongue. In this paper, an algorithm was proposed, where these values were automatically estimated from a large amount of data of accent types of accentual phrases, which were collected through a long series of listening experiments. In the proposed algorithm, inter-speaker differences of knowledge of accent sandhi were well considered. To improve the coverage of the estimated values over the obtained data, the rules were tentatively modified. Evaluation experiments using two-mora accentual phrases showed the high validity of the estimated values and the modified rules and also some defects caused by varieties of linguistic expressions of Japanese.

  • Quantum Electron Transport Modeling in Nano-Scale Devices

    Matsuto OGAWA  Hideaki TSUCHIYA  Tanroku MIYOSHI  

     
    INVITED PAPER

      Vol:
    E86-C No:3
      Page(s):
    363-371

    We describe progress we have achieved in the development of our quantum transport modeling for nano-scale devices. Our simulation is based upon either the non-equilibrium Green's function method (NEGF) or the quantum correction (QC) associated with density gradient method (DG) and/or effective potential method (EP). We show the results of our modeling methods applied to several devices and discuss issues faced with regards to computational time, open boundary conditions, and their relationship to self-consistent solution of the Poisson-NEGF equations. We also discuss those for efficiently tailored QC Monte Carlo techniques.

  • A Nonlinear Model on the AQM Algorithm GREEN

    Hongwei KONG  Ning GE  Fang RUAN  Chongxi FENG  Pingyi FAN  

     
    PAPER-Packet Transmission

      Vol:
    E86-B No:2
      Page(s):
    622-629

    In this paper, we propose a nonlinear control model to characterize the AQM algorithm-GREEN. Based on this model, we analyze its performance and prove that there exists a stable oscillation when in equilibrium. Furthermore, we also investigate the effects of the factors such as bandwidth, round trip time, and load level on the amplitude and frequency of the oscillation. Theoretical analysis and simulation results indicate that GREEN algorithm is insensitive to the network conditions when the link rate and the round trip time are relatively small and becomes more sensitive to the change of network conditions when the bandwidth delay product is relatively high. For GREEN the adaptability to a wide range of network conditions is based on the compromising of the efficiency.

  • Stability Evaluation of a Dynamic Traffic Engineering Method in a Large-Scale Network

    Takao OGURA  Junji SUZUKI  Akira CHUGO  Masafumi KATOH  Tomonori AOYAMA  

     
    PAPER-MPLS and Routing

      Vol:
    E86-B No:2
      Page(s):
    518-525

    As use of the Internet continues to spread rapidly, Traffic Engineering (TE) is needed to optimize IP network resource utilization. In particular, load balancing with TE can prevent traffic concentration on a single path between ingress and egress routers. To apply TE, we have constructed an MPLS (Multi-Protocol Label Switching) network with TE capability in the JGN (Japan Gigabit Network), and evaluated dynamic load balancing behavior in it from the viewpoint of control stability. We confirmed that with this method, setting appropriate control parameter values enables traffic to be equally distributed over two or more routes in an actual large-scale network. In addition, we verified the method's effectiveness by using a digital cinema application as input traffic.

  • Hardware-Efficient Architecture Design for Zerotree Coding in MPEG-4 Still Texture Coder

    Chung-Jr LIAN  Zhong-Lan YANG  Hao-Chieh CHANG  Liang-Gee CHEN  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E86-A No:2
      Page(s):
    472-479

    This paper presents a hardware-efficient architecture of tree-depth scan (TDS) and multiple quantization (MQ) scheme for zerotree coding in MPEG-4 still texture coder. The proposed TDS architecture can achieve its maximal throughput to area ratio and minimize the external memory access with only one wavelet-tree size on-chip buffer. The MQ scheme adopts the power-of-two (POT) quantization to realize a cost-effective hardware implementation. The prototyping chip has been implemented in TSMC 0.35 µm CMOS 1P4M technology. This architecture can handle 30 4-CIF (704576) frames per second with five spatial scalability and five SNR scalability layers at 100 MHz working frequency.

  • A Pipeline Structure for High-Speed Step-by-Step RS Decoding

    Tung-Chou CHEN  Che-Ho WEI  Shyue-Win WEI  

     
    LETTER-Fundamental Theories

      Vol:
    E86-B No:2
      Page(s):
    847-849

    Based on a modified step-by-step decoding procedure, a high-speed pipelined Reed-Solomon decoder is presented. The decoder requires only the delay time of three 2-input XOR gates for decoding each coded symbol. The decoder can be operated in a bit rate of Gbits/sec order and thus suitable for the very high speed data transmission systems.

3041-3060hit(4073hit)