The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SI(16314hit)

11301-11320hit(16314hit)

  • The Extraction of Vehicle License Plate Region Using Edge Directional Properties of Wavelet Subband

    Sung Wook PARK  Su Cheol HWANG  Jong Wook PARK  

     
    LETTER-Image Processing, Image Pattern Recognition

      Vol:
    E86-D No:3
      Page(s):
    664-669

    Changing vehicle structures and backgrounds makes it very difficult to correctly extract a license plate region from a vehicle image. In this paper, we propose a simple method to extract the license plate region using edge properties of wavelet subband. The High Frequency Subband (HFS) of an image has edge information for each direction. Edge information is concentrated in each direction of the Headlight-Radiator-Headlight (H-R-H) and the license plate region compared to other regions in the vehicle image. This paper shows a license plate region extraction method using these edge properties and our experimental results with various vehicle images.

  • Automatic Estimation of Accentual Attribute Values of Words for Accent Sandhi Rules of Japanese Text-to-Speech Conversion

    Nobuaki MINEMATSU  Ryuji KITA  Keikichi HIROSE  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E86-D No:3
      Page(s):
    550-557

    Accurate estimation of accentual attribute values of words, which is required to apply rules of Japanese word accent sandhi to prosody generation, is an important factor to realize high-quality text-to-speech (TTS) conversion. The rules were already formulated by Sagisaka et al. and are widely used in Japanese TTS conversion systems. Application of these rules, however, requires values of a few accentual attributes of each constituent word of input text. The attribute values cannot be found in any public database or any accent dictionaries of Japanese. Further, these values are difficult even for native speakers of Japanese to estimate only with their introspective consideration of properties of their mother tongue. In this paper, an algorithm was proposed, where these values were automatically estimated from a large amount of data of accent types of accentual phrases, which were collected through a long series of listening experiments. In the proposed algorithm, inter-speaker differences of knowledge of accent sandhi were well considered. To improve the coverage of the estimated values over the obtained data, the rules were tentatively modified. Evaluation experiments using two-mora accentual phrases showed the high validity of the estimated values and the modified rules and also some defects caused by varieties of linguistic expressions of Japanese.

  • Audio-Visual Speech Recognition Based on Optimized Product HMMs and GMM Based-MCE-GPD Stream Weight Estimation

    Kenichi KUMATANI  Satoshi NAKAMURA  

     
    PAPER-Speech and Speaker Recognition

      Vol:
    E86-D No:3
      Page(s):
    454-463

    In this paper, we describe an adaptive integration method for an audio-visual speech recognition system that uses not only the speaker's audio speech signal but visual speech signals like lip images. Human beings communicate with each other by integrating multiple types of sensory information such as hearing and vision. Such integration can be applied to automatic speech recognition, too. In the integration of audio and visual speech features for speech recognition, there are two important issues, i.e., (1) a model that represents the synchronous and asynchronous characteristics between audio and visual features, and makes the best use of a whole database that includes uni-modal, audio only, or visual only data as well as audio-visual data, and (2) the adaptive estimation of reliability weights for the audio and visual information. This paper mainly investigates two issues and proposes a novel method to effectively integrate audio and visual information in an audio-visual Automatic Speech Recognition (ASR) system. First, as the model that integrates audio-visual speech information, we apply a product of hidden Markov models (product HMM), the product of an audio HMM and a visual HMM. We newly propose a method that re-estimates the product HMM using audio-visual synchronous speech data so as to train the synchronicity of the audio-visual information, while the original product HMM assumes independence from audio-visual features. Second, for the optimal audio-visual information reliability weight estimation, we propose a Gaussian mixture model (GMM) based-MCE-GPD (minimum classification error and generalized probabilistic descent) algorithm, which enables reductions in the amount of adaptation data and amount of computations required for the GMM estimation. Evaluation experiments show that the proposed audio-visual speech recognition system improves the recognition accuracy over conventional ones even if the audio signals are clean.

  • A Context Clustering Technique for Average Voice Models

    Junichi YAMAGISHI  Masatsune TAMURA  Takashi MASUKO  Keiichi TOKUDA  Takao KOBAYASHI  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E86-D No:3
      Page(s):
    534-542

    This paper describes a new context clustering technique for average voice model, which is a set of speaker independent speech synthesis units. In the technique, we first train speaker dependent models using multi-speaker speech database, and then construct a decision tree common to these speaker dependent models for context clustering. When a node of the decision tree is split, only the context related questions which are applicable to all speaker dependent models are adopted. As a result, every node of the decision tree always has training data of all speakers. After construction of the decision tree, all speaker dependent models are clustered using the common decision tree and a speaker independent model, i.e., an average voice model is obtained by combining speaker dependent models. From the results of subjective tests, we show that the average voice models trained using the proposed technique can generate more natural sounding speech than the conventional average voice models.

  • High-Quality and Processor-Efficient Implementation of an MPEG-2 AAC Encoder

    Yuichiro TAKAMIZAWA  Toshiyuki NOMURA  Masao IKEKAWA  

     
    PAPER-Speech and Audio Coding

      Vol:
    E86-D No:3
      Page(s):
    418-424

    This paper describes high-quality and processor-efficient software implementation of an MPEG-2 AAC LC Profile encoder. MDCT and quantization processing are accelerated by 21.3% and 19.0%, respectively, through the use of SIMD instructions. In addition, psycho-acoustic analysis in the MDCT domain makes the use of FFTs unnecessary and reduces the computational cost of the analysis by 56.0%. The results of subjective quality tests show that better sound quality is provided by greater efficiency in quantization processing and Huffman coding. All of this results in high-quality and processor-efficient software implementation of an MPEG-2 AAC encoder. Subjective test results show that the sound quality achieved at 96 kb/s/stereo is equivalent to that of MP3 at 128 kb/s/stereo. The encoder works 13 times faster than realtime for stereo encoding on an 800 MHz Pentium III processor.

  • Noise and Channel Distortion Robust ASR System for DARPA SPINE2 Task

    Konstantin MARKOV  Tomoko MATSUI  Rainer GRUHN  Jinsong ZHANG  Satoshi NAKAMURA  

     
    PAPER-Robust Speech Recognition and Enhancement

      Vol:
    E86-D No:3
      Page(s):
    497-504

    This paper presents the ATR speech recognition system designed for the DARPA SPINE2 evaluation task. The system is capable of dealing with speech from highly variable, real-world noisy conditions and communication channels. A number of robust techniques are implemented, such as differential spectrum mel-scale cepstrum features, on-line MLLR adaptation, and word-level hypothesis combination, which led to a significant reduction in the word error rate.

  • Image Feature Extraction Algorithm for Support Vector Machines Using Multi-Layer Block Model

    Wonjun HWANG  Hanseok KO  

     
    PAPER-Pattern Recognition

      Vol:
    E86-D No:3
      Page(s):
    623-632

    This paper concerns recognizing 3-dimensional object using proposed multi-layer block model. In particular, we aim to achieve desirable recognition performance while restricting the computational load to a low level using 3-step feature extraction procedure. An input image is first precisely partitioned into hierarchical layers of blocks in the form of base blocks and overlapping blocks. The hierarchical blocks are merged into a matrix, with which abundant local feature information can be obtained. The local features extracted are then employed by the kernel based support vector machines in tournament for enhanced system recognition performance while keeping it to low dimensional feature space. The simulation results show that the proposed feature extraction method reduces the computational load by over 80% and preserves the stable recognition rate from varying illumination and noise conditions.

  • A Class of Codes for Correcting Single Spotty Byte Errors

    Ganesan UMANESAN  Eiji FUJIWARA  

     
    PAPER-Coding Theory

      Vol:
    E86-A No:3
      Page(s):
    704-714

    In certain computer and communication systems, the significant number of byte errors are not hard errors, but a few transient bit errors confined to byte regions. This kind of byte errors are called spotty byte errors, meaning, not all, but only 2 or 3 random bits, are corrupted in a byte. Especially, the codewords of memory systems which use recent high density wide I/O data semiconductor DRAM chips are prone to this kind of spotty byte errors. This is because, the presence of strong electromagnetic waves in the environment or the bombardment of an energetic particle on a DRAM chip is highly likely to upset more than just one bit stored in that chip. Under this situation, codes capable of correcting single spotty byte errors are suitable for application in semiconductor memory systems. This paper defines a spotty byte error as a random t-bit error confined to a b-bit byte and proposes a class of codes called Single t/b-error Correcting (St/bEC) codes which are capable of correcting single spotty byte errors occurring in computer and communication systems. For the case where the chip data output is 16 bits, i.e., b=16, the S3/16EC code proposed in this paper requires only 16 check bits, that is, only one chip is required for check bits at practical information lengths such as 64, 128 and 256 bits. Furthermore, this S3/16EC code is capable of detecting more than 95% of all single 16-bit byte errors at information length 64 bits.

  • Pre-Route Power Analysis Techniques for SoC

    Takashi YAMADA  Takeshi SAKAMOTO  Shinji FURUICHI  Mamoru MUKUNO  Yoshifumi MATSUSHITA  Hiroto YASUURA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E86-A No:3
      Page(s):
    686-692

    This paper proposes two techniques for improving the accuracy of gate-level power analysis for system-on-a-chip (SoC). (1) Creation of custom wire load models for clock nets. (2) Use of layout information (actual net capacitance and input signal transition time). The analysis time is reduced to less than one three-hundredth of the transistor-level power analysis time. Error is within 5% against a real chip, (the same level as that of the transistor-level power analysis), if technique (2) is used, and within 15% if technique (1) is used.

  • Polar Coordinate Based Nonlinear Function for Frequency-Domain Blind Source Separation

    Hiroshi SAWADA  Ryo MUKAI  Shoko ARAKI  Shoji MAKINO  

     
    PAPER-Convolutive Systems

      Vol:
    E86-A No:3
      Page(s):
    590-596

    This paper discusses a nonlinear function for independent component analysis to process complex-valued signals in frequency-domain blind source separation. Conventionally, nonlinear functions based on the Cartesian coordinates are widely used. However, such functions have a convergence problem. In this paper, we propose a more appropriate nonlinear function that is based on the polar coordinates of a complex number. In addition, we show that the difference between the two types of functions arises from the assumed densities of independent components. Our discussion is supported by several experimental results for separating speech signals, which show that the polar type nonlinear functions behave better than the Cartesian type.

  • Three-Dimensional Triangle-Based Simulation of Etching Processes and Applications

    Oliver LENHART  Eberhard BAR  

     
    PAPER

      Vol:
    E86-C No:3
      Page(s):
    427-432

    A software module for the three-dimensional simulation of etching processes has been developed. It works on multilayer structures given as triangulated surface meshes. The mesh is moved nodewise according to rates which, in this work, have been determined from isotropic and anisotropic components. An important feature of the algorithm is the automatic detection of triple lines along mask edges and the refinement of triangles at these triple lines. This allows for the simulation of underetching. The capabilities of the algorithm are demonstrated by several examples such as the simulation of glass etching for the fabrication of a phase shift mask for optical lithography and the etching of an STI trench structure. Moreover, etch profiles of a silicon substrate covered by an oxide mask are shown for different parameters of the etch components. Spacer etching has also been performed. Furthermore, a specific algorithm for the simulation of purely isotropic etching is described and demonstrated.

  • Performance of Iterative Receiver for Joint Detection and Channel Estimation in SDM/OFDM Systems

    SeungYoung PARK  BoSeok SEO  ChungGu KANG  

     
    LETTER-Wireless Communication Technology

      Vol:
    E86-B No:3
      Page(s):
    1157-1162

    In this letter, we study the performance of the iterative receiver as applied to the space division multiplexing/orthogonal frequency division multiplexing (SDM/OFDM) systems. The iterative receiver under consideration employs the soft in/soft out (SISO) decoding process, which operates iteratively in conjunction with channel estimation for performing data detection and channel estimation at the same time. As opposed to the previous studies in which the perfect channel state information is assumed, the effects of channel estimation are taken into account for evaluating the performance of the iterative receiver and it is shown that the channel estimation applied in every iteration step of the iterative receiver plays a crucial role to warrant the performance, especially at a low signal-to-noise power ratio (SNR).

  • Stress Classification Using Subband Based Features

    Tin Lay NWE  Say Wei FOO  Liyanage C. DE SILVA  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E86-D No:3
      Page(s):
    565-573

    On research to determine reliable acoustic indicators for the type of stress present in speech, the majority of systems have concentrated on the statistics extracted from pitch contour, energy contour, wavelet based subband features and Teager-Energy-Operator (TEO) based feature parameters. These systems work mostly on pair-wise distinction between stress and neutral speech. Their performance decreases substantially when tested in multi-style detection among many stress categories. In this paper, a novel system is proposed using linear short time Log Frequency Power Coefficients (LFPC) and TEO based nonlinear LFPC features in both time and frequency domain. Five-state Hidden Markov Model (HMM) with continuous Gaussian mixture distribution is used. The stress classification ability of the system is tested using data from the SUSAS (Speech Under Simulated and Actual Stress) database to categorize five stress conditions individually. It is found that the performance of linear acoustic features LFPC is better than that of nonlinear TEO based LFPC feature parameters. Results show that with linear acoustic feature LFPC, average accuracy of 84% and the best accuracy of 95% can be achieved in the classification of the five categories. Results of test of the system under different signal-to-noise conditions show that the performance of the system does not degrade drastically with increase in noise. It is also observed that classification using nonlinear frequency domain LFPC features gives relatively higher accuracy than that using nonlinear time domain LFPC features.

  • Realistic Scaling Scenario for Sub-100 nm Embedded SRAM Based on 3-Dimensional Interconnect Simulation

    Yasumasa TSUKAMOTO  Tatsuya KUNIKIYO  Koji NII  Hiroshi MAKINO  Shuhei IWADE  Kiyoshi ISHIKAWA  Yasuo INOUE  Norihiko KOTANI  

     
    PAPER

      Vol:
    E86-C No:3
      Page(s):
    439-446

    It is still an open problem to elucidate the scaling merits of an embedded SRAM with Low Operating Power (LOP) MOSFETs fabricated in 50, 70 and 100 nm CMOS technology nodes. Taking into account a realistic SRAM cell layout, we evaluated the parasitic capacitance of the bit line (BL) as well as the word line (WL) in each generation. By means of a 3-Dimensional (3D) interconnect simulator (Raphael), we focused on the scaling merit through a comparison of the simulated SRAM BL delay for each CMOS technology node. In this paper, we propose two kinds of original interconnect structure which modify ITRS (International Technology Roadmap for Semiconductors), and make it clear that the original interconnect structures with reduced gate overlap capacitance guarantee the scaling merits of SRAM cells fabricated with LOP MOSFETs in 50 and 70 nm CMOS technology nodes.

  • In-Advance CPU Time Analysis for Stationary Monte Carlo Device Simulations

    Christoph JUNGEMANN  Bernd MEINERZHAGEN  

     
    PAPER

      Vol:
    E86-C No:3
      Page(s):
    314-319

    In this work it is shown for the first time how to calculate in advance by momentum-based noise simulation for stationary Monte Carlo (MC) device simulations the CPU time, which is necessary to achieve a predefined error level. In addition, analytical expressions for the simulation-time factor of terminal current estimation are given. Without further improvements of the MC algorithm MC simulations of small terminal currents are found to be often prohibitively CPU intensive.

  • Blind Source Separation Algorithms with Matrix Constraints

    Andrzej CICHOCKI  Pando GEORGIEV  

     
    INVITED PAPER-Constant Systems

      Vol:
    E86-A No:3
      Page(s):
    522-531

    In many applications of Independent Component Analysis (ICA) and Blind Source Separation (BSS) estimated sources signals and the mixing or separating matrices have some special structure or some constraints are imposed for the matrices such as symmetries, orthogonality, non-negativity, sparseness and specified invariant norm of the separating matrix. In this paper we present several algorithms and overview some known transformations which allows us to preserve several important constraints.

  • Improved Design Criteria and New Trellis Codes for Space-Time Trellis Coded Modulation in Fast Fading Channels

    Yukihiro SASAZAKI  Tomoaki OHTSUKI  

     
    PAPER-Wireless Communication Technology

      Vol:
    E86-B No:3
      Page(s):
    1057-1062

    The design criteria for space-time trellis codes (STTC's) in fast fading channels have been proposed: the Distance Criterion and the Product Criterion. The design criteria in [1] are based on optimizing the pairwise error probability (PWEP). However, the frame error rate (FER) of STTC's depends on the distance spectrum. In this paper, we propose a new design criterion for STTC's based on the distance spectrum in fast fading channels. The proposed design criterion is based on the product distance distribution for the large signal-to-noise ratio (SNR) and the trace distribution for the small SNR, respectively. Moreover, we propose new STTC's by the computer search based on the proposed design criterion in fast fading channels. By computer simulation, we show that the proposed design criterion is more useful than the Product Criterion in [1] in fast fading channels. We also show that the proposed STTC's achieve better FER than the conventional STTC's in fast fading channels.

  • Speaker Recognition Using Adaptively Boosted Classifiers

    Say-Wei FOO  Eng-Guan LIM  

     
    PAPER-Speech and Speaker Recognition

      Vol:
    E86-D No:3
      Page(s):
    474-482

    In this paper, a novel approach to speaker recognition is proposed. The approach makes use of adaptive boosting (AdaBoost) and classifiers such as Multilayer Perceptrons (MLP) and C4.5 Decision Trees for closed set, text-dependent speaker recognition. The performance of the systems is assessed using a subset of utterances drawn from the YOHO speaker verification corpus. Experiments show that significant improvement in accuracy can be achieved with the application of adaptive boosting techniques. Results also reveal that an accuracy of 98.8% for speaker identification may be achieved using the adaptively boosted C4.5 system.

  • Antennas for Terrestrial Microwave Relay Links Open Access

    Toshikazu HORI  

     
    INVITED PAPER

      Vol:
    E86-B No:3
      Page(s):
    900-908

    Antennas for Japanese terrestrial microwave relay links have been developed since the1950's and put into commercial use up to now in Japan. In particular, the path-length lens antennas developed in 1953 represents a monumental achievement for terrestrial microwave relay links, and the offset antenna for 256 QAM radio relay links developed in 1989 has the best electrical performance in the world. This paper reviews the antennas for Japanese terrestrial microwave relay links that have historical significance and describes the antenna design technologies developed in Japan.

  • Robust Independent Component Analysis via Time-Delayed Cumulant Functions

    Pando GEORGIEV  Andrzej CICHOCKI  

     
    PAPER-Constant Systems

      Vol:
    E86-A No:3
      Page(s):
    573-579

    In this paper we consider blind source separation (BSS) problem of signals which are spatially uncorrelated of order four, but temporally correlated of order four (for instance speech or biomedical signals). For such type of signals we propose a new sufficient condition for separation using fourth order statistics, stating that the separation is possible, if the source signals have distinct normalized cumulant functions (depending on time delay). Using this condition we show that the BSS problem can be converted to a symmetric eigenvalue problem of a generalized cumulant matrix Z(4)(b) depending on L-dimensional parameter b, if this matrix has distinct eigenvalues. We prove that the set of parameters b which produce Z(4)(b) with distinct eigenvalues form an open subset of RL, whose complement has a measure zero. We propose a new separating algorithm which uses Jacobi's method for joint diagonalization of cumulant matrices depending on time delay. We empasize the following two features of this algorithm: 1) The optimal number of matrices for joint diago- nalization is 100-150 (established experimentally), which for large dimensional problems is much smaller than those of JADE; 2) It works well even if the signals from the above class are, additionally, white (of order two) with zero kurtosis (as shown by an example).

11301-11320hit(16314hit)