The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)

2361-2380hit(2504hit)

  • Mechanical Stress Analysis of Trench Isolation Using a Two-Dimensional Simulation

    Satoshi MATSUDA  Nobuyuki ITOH  Chihiro YOSHINO  Yoshiroh TSUBOI  Yasuhiro KATSUMATA  Hiroshi IWAI  

     
    PAPER-Process Simulation

      Vol:
    E77-C No:2
      Page(s):
    124-128

    Junction leakage current of trench isolation devices is strongly influenced by trench configuration. The origin of the leakage current is the mechanical stress that is generated by the differential thermal expansion between the Si substrate and the SiO2 filled isolation trench during the isolation forming process. A two-dimensional mechanical stress simulation was used to analyze trench-isolated devices. The simulated distribution and magnitude of stress were found to agree with Raman spectroscopic measurements of actual devices. The stress in the deeper regions between deep trenches is likely to increase greatly as the size of devices diminishes, so it is important to reduce this stress and thus suppress junction leakage current.

  • RookNet: A Switching Network for High Speed Communication

    Yuji OIE  Yasuhito SASAKI  Hideo MIYAHARA  

     
    PAPER

      Vol:
    E77-B No:2
      Page(s):
    139-146

    Central switches are expected to operate at the rate of Terabit per second in high speed networks, like the B-ISDN. Photonic switches using lightwave technology based on wavelength division multiplexing (WDM) and frequency division multiplexing (FDM) are promising ones for high speed switching. Such lightwave networks are mainly divided into two groups, according to the number of hops required for packets to arrive at their destinations: single-hop networks such as networks using star coupler and multihop networks such as Manhattan Street Network and ShuffleNet. In this paper we focus our attention on multihop networks and propose a mesh network, referred to as RookNet, for high speed communication. The average transmission delay time and maximum throughput of RookNet is approximately analyzed. It is shown that, as the number of nodes goes to infinity, the maximum throughput aproaches 0.433 and 0.485 when each node is equipped with no internal buffer and internal buffers of infinite capacity for relayed packets, respectively.

  • Spoken Sentence Recognition Based on HMM-LR with Hybrid Language Modeling

    Kenji KITA  Tsuyoshi MORIMOTO  Kazumi OHKURA  Shigeki SAGAYAMA  Yaneo YANO  

     
    PAPER

      Vol:
    E77-D No:2
      Page(s):
    258-265

    This paper describes Japanese spoken sentence recognition using hybrid language modeling, which combines the advantages of both syntactic and stochastic language models. As the baseline system, we adopted the HMM-LR speech recognition system, with which we have already achieved good performance for Japanese phrase recognition tasks. Several improvements have been made to this system aimed at handling continuously spoken sentences. The first improvement is HMM training with continuous utterances as well as word utterances. In previous implementations, HMMs were trained with only word utterances. Continuous utterances are included in the HMM training data because coarticulation effects are much stronger in continuous utterances. The second improvement is the development of a sentential grammar for Japanese. The sentential grammar was created by combining inter- and intra-phrase CFG grammars, which were developed separately. The third improvement is the incorporation of stochastic linguistic knowledge, which includes stochastic CFG and a bigram model of production rules. The system was evaluated using continuously spoken sentences from a conference registration task that included approximately 750 words. We attained a sentence accuracy of 83.9% in the speaker-dependent condition.

  • Tantalum Dry-Etching Characteristics for X-Ray Mask Fabrication

    Akira OZAWA  Shigehisa OHKI  Masatoshi ODA  Hideo YOSHIHARA  

     
    PAPER-Integrated Electronics

      Vol:
    E77-C No:2
      Page(s):
    255-262

    Directional dry etching of Tantalum is described X-ray lithography absorber patterns. Experiments are carried out using both reactive ion etching in CBrF3-based plasma and electron-cyclotron-resonance ion-stream etching in Cl2-based plasma. Ta absorber patterns with perpendicular sidewalls cannot be obtained by RIE when only CBrF3 gas is used as the etchant. While adding CH4 to CBrF3 effectively improves the undercutting of Ta patterns, it deteriorates etching stability because of the intensive deposition effect of CH4 fractions. By adding an Ar/CH4 mixture gas to CBrF3, it is possible to use RIE to fabricate 0.2-µm Ta absorber patterns with perpendicular sidewalls. ECR ion-stream etching is investigated to obtain high etching selectivity between Ta and SiO2 (etching mask)/SiN (membrane). Adding O2 to the Cl2 etchant improves undercutting without remarkably decreasing etching selectivity. Furthermore, an ECR ion-stream etching method is developed to stably etch Ta absorber patterns finer than 0.2µm. This is successfully applied to X-ray lithography mask fabrication for LSI test devices.

  • A Combined Fast Adaptive Filter Algorithm with an Automatic Switching Method

    Youhua WANG  Kenji NAKAYAMA  

     
    PAPER-Adaptive Signal Processing

      Vol:
    E77-A No:1
      Page(s):
    247-256

    This paper proposes a new combined fast algorithm for transversal adaptive filters. The fast transversal filter (FTF) algorithm and the normalized LMS (NLMS) are combined in the following way. In the initialization period, the FTF is used to obtain fast convergence. After converging, the algorithm is switched to the NLMS algorithm because the FTF cannot be used for a long time due to its numerical instability. Nonstationary environment, that is, time varying unknown system for instance, is classified into three categories: slow time varying, fast time varying and sudden time varying systems. The NLMS algorithm is applied to the first situation. In the latter two cases, however, the NLMS algorithm cannot provide a good performance. So, the FTF algorithm is selected. Switching between the two algorithms is automatically controlled by using the difference of the MSE sequence. If the difference exceeds a threshold, then the FTF is selected. Other wise, the NLMS is selected. Compared with the RLS algorithm, the proposed combined algorithm needs less computation, while maintaining the same performance. Furthermore, compared with the FTF algorithm, it provides numerically stable operation.

  • Speech Recognition of lsolated Digits Using Simultaneous Generative Histogram

    Yasuhisa HAYASHI  Akio OGIHARA  Kunio FUKUNAGA  

     
    LETTER

      Vol:
    E76-A No:12
      Page(s):
    2052-2054

    We propose a recognition method for HMM using a simultaneous generative histogram. Proposed method uses the correlation between two features, which is expressed by a simultaneous generative histogram. Then output probabilities of integrated HMM are conditioned by the codeword of another feature. The proposed method is applied to isolated digit word recognition to confirm its validity.

  • A Translation Method from Natural Language Specifications of Communication Protocols into Algebraic Specifications Using Contextual Dependencies

    Yasunori ISHIHARA  Hiroyuki SEKI  Tadao KASAMI  Jun SHIMABUKURO  Kazuhiko OKAWA  

     
    PAPER-Automaton, Language and Theory of Computing

      Vol:
    E76-D No:12
      Page(s):
    1479-1489

    This paper presents a method of translating natural language specifications of communication protocols into algebraic specifications. Such a natural language specification specifies action sequences performed by the protocol machine (program). Usually, a sentence implicitly specifies the state of the protocol machine at which the described actions must be performed. The authors propose a method of analyzing the implicitly specified states of the protocol machine taking the OSI session protocol specification (265 sentences) as an example. The method uses the following properties: (a) syntactic properties of a natural language (English in this paper); (b) syntactic properties introduced by the target algebraic specifications, e.g., type constraints; (c) properties specific to the target domain, e.g., properties of data types. This paper also shows the result of applying this method to the main part of the OSI session protocol specification (29 paragraphs, 98 sentences). For 95 sentences, the translation system uniquely determines the states specified implicitly by these sentences, using only (a) and (b) described above. By using (c) in addition, each implicitly specified state in the remaining three sentences is uniquely determined.

  • A Collision Detection Processor for Intelligent Vehicles

    Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E76-C No:12
      Page(s):
    1804-1811

    Since carelessness in driving causes a terrible traffic accident, it is an important subject for a vehicle to avoid collision autonomously. Real-time collision detection between a vehicle and obstacles will be a key target for the next-generation car electronics system. In collision detection, a large storage capacity is usually required to store the 3-D information on the obstacles lacated in a workspace. Moreover, high-computational power is essential not only in coordinate transformation but also in matching operation. In the proposed collision detection VLSI processor, the matching operation is drastically accelerated by using a Content-Addressable Memory (CAM) which evaluates the magnitude relationships between an input word and all the stored words in parallel. A new obstacle representation based on a union of rectangular solids is also used to reduce the obstacle memory capacity, so that the collision detection can be parformed only by parallel magnitude comparison. Parallel architecture using several identical processor elements (PEs) is employed to perform the coordinate transformation at high speed based on the COordinate Rotation DIgital Computation (CORDIC) algorithms. The collision detection time becomes 5.2 ms using 20 PEs and five CAMs with a 42-kbit capacity.

  • Significance of Suitability Assessment in Speech Synthesis Applications

    Hideki KASUYA  

     
    INVITED PAPER

      Vol:
    E76-A No:11
      Page(s):
    1893-1897

    The paper indicates the importance of suitability assesment in speech synthesis applications. Human factors involved in the use of a synthetic speech are first discussed on the basis of an example of a newspaper company where synthetic speech is extensively used as an aid for proofreading a manuscript. Some findings obtained from perceptual experiments on the subjects' preference for paralinguistic properties of synthetic speech are then described, focusing primarily on the suitability of pitch characteristics, speaker's gender, and speaking rates in the task where subjects are asked to proofread a printed text while listening to the speech. The paper finally claims the need for a flexibile speech synthesis system which helps the users create their own synthetic speech.

  • An Effective Defect-Repair Scheme for a High Speed SRAM

    Sadayuki OOKUMA  Katsuyuki SATO  Akira IDE  Hideyuki AOKI  Takashi AKIOKA  Hideaki UCHIDA  

     
    PAPER-SRAM

      Vol:
    E76-C No:11
      Page(s):
    1620-1625

    To make a fast Bi-CMOS SRAM yield high without speed degradation, three defect-repair methods, the address comparison method, the fuse decoder method and the distributed fuse method, were considered in detail and their advantages and disadvantages were made clear. The distributed fuse method is demonstrated to be further improved by a built-in fuse word driver and a built-in fuse column selector, and fuse analog switches. This enhanced distributed fuse scheme was examined in a fast Bi-CMOS SRAM. A maximun access time of 14 ns and a chip size of 8.8 mm17.4 mm are expected for a 4 Mb Bi-CMOS SRAM in the future.

  • Tree-Based Approaches to Automatic Generation of Speech Synthesis Rules for Prosodic Parameters

    Yoichi YAMASHITA  Manabu TANAKA  Yoshitake AMAKO  Yasuo NOMURA  Yoshikazu OHTA  Atsunori KITOH  Osamu KAKUSHO  Riichiro MIZOGUCHI  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1934-1941

    This paper describes automatic generation of speech synthesis rules which predict a stress level for each bunsetsu in long noun phrases. The rules are inductively inferred from a lot of speech data by using two kinds of tree-based methods, the conventional decision tree and the SBR-tree methods. The rule sets automatically generated by two methods have almost the same performance and decrease the prediction error to about 14 Hz from 23 Hz of the accent component value. The rate of the correct reproduction of the change for adjacent bunsetsu pairs is also used as a measure for evaluating the generated rule sets and they correctly reproduce the change of about 80%. The effectiveness of the rule sets is verified through the listening test. And, with regard to the comprehensiveness of the generated rules, the rules by the SBR-tree methods are very compact and easy to human experts to interpret and matches the former studies.

  • High Quality Speech Synthesis System Based on Waveform Concatenation of Phoneme Segment

    Tomohisa HIROKAWA  Kenzo ITOH  Hirokazu SATO  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1964-1970

    A new system for speech synthesis by concatenating waveforms selected from a dictionary is described. The dictionary is constructed from a two-hour speech that includes isolated words and sentences uttered by one male speaker, and contains over 45,000 entries which are identified by their average pitch, dynamic pitch parameter which represents micro pitch structure in a segment, duration and average amplitude. Phoneme duration is set according to phoneme environment, and phoneme power is controlled, by both pitch frequency and phoneme environment. Tests show the average errors in vowel duration and consonant duration are 28.8 ms and 16.8 ms respectively, and the vowel power average error is 2.9 dB. The pitch frequency patterns are calculated according to a conventional model in which the accent component is abbed to a gross phrase component. Set a phoneme string and prosody information, the optimum waveforms are selected from the dictionary by matching their attributes with the given phonetic and prosodic information. A waveform selection function, which has two terms corresponding to prosody and phonological coincidence between rule-set values and waveform values from the dictionary, is proposed. The weight coefficients used in the selection function are determined through subjective hearing tests. The selected waveform segments are then modified in waveform domain to further adjust for the desired prosody. A pitch frequency modification method based on pitch synchronous overlap-add technique is introduced into the system. Lastly, the waveforms are interpolated between voiced waveforms to avoid abrupt changes in voice spectrum and waveform shape. An absolute evaluation test of five grades is performed to the synthesized voice and the mean of the score is 3.1, which is over "good," and while the original speaker quality is retained.

  • Physiologically-Based Speech Synthesis Using Neural Networks

    Makoto HIRAYAMA  Eric Vatikiotis-BATESON  Mitsuo KAWATO  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1898-1910

    This paper focuses on two areas in our effort to synthesize speech from neuromotor input using neural network models that effect transforms between cognitive intentions to speak, their physiological effects on vocal tract structures, and subsequent realization as acoustic signals. The first area concerns the biomechanical transform between motor commands to muscles and the ensuing articulator behavior. Using physiological data of muscle EMG (electromyography) and articulator movements during natural English speech utterances, three articulator-specific neural networks learn the forward dynamics that relate motor commands to the muscles and motion of the tongue, jaw, ant lips. Compared to a fully-connected network, mapping muscle EMG and motion for all three sets of articulators at once, this modular approach has improved performance by reducing network complexity and has eliminated some of the confounding influence of functional coupling among articulators. Network independence has also allowed us to identify and assess the effects of technical and empirical limitations on an articulator-by-articulator basis. This is particularly important for modeling the tongue whose complex structure is very difficult to examine empirically. The second area of progress concerns the transform between articulator motion and the speech acoustics. From the articulatory movement trajectories, a second neural network generates PARCOR (partial correlation) coefficients which are then used to synthesize the speech acoustics. In the current implementation, articulator velocities have been added as the inputs to the network. As a result, the model now follows the fast changes of the coefficients for consonants generated by relatively slow articulatory movements during natural English utterances. Although much work still needs to be done, progress in these areas brings us closer to our goal of emulating speech production processes computationally.

  • Development of TTS Card for PCs and TTS Software for WSs

    Yoshiyuki HARA  Tsuneo NITTA  Hiroyoshi SAITO  Ken'ichiro KOBAYASHI  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1999-2007

    Text-to-speech synthesis (TTS) is currently one of the most important media conversion techniques. In this paper, we describe a Japanese TTS card developed for constructing a personal-computer-based multimedia platform, and a TTS software package developed for a workstation-based multimedia platform. Some applications of this hardware and software are also discussed. The TTS consists of a linguistic processing stage for converting text into phonetic and prosodic information, and a speech processing stage for producing speech from the phonetic and prosodic symbols. The linguistic processing stage uses morphological analysis, rewriting rules for accent movement and pause insertion, and other techniques to impart correct accentuation and a natural-sounding intonation to the synthesized speech. The speech processing stage employs the cepstrum method with consonant-vowel (CV) syllables as the synthesis unit to achieve clear and smooth synthesized speech. All of the processing for converting Japanese text (consisting of mixed Japanese Kanji and Kana characters) to synthesized speech is done internally on the TTS card. This allows the card to be used widely in various applications, including electronic mail and telephone service systems without placing any processing burden on the personal computer. The TTS software was used for an E-mail reading tool on a workstation.

  • Development of a Rule-Based Speech Synthesizer Module for Embedded Use

    Mikio YAMAGUCHI  John-Paul HOSOM  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1990-1998

    A module for rule-based Japanese speech synthesis has been developed. The synthesizer was constructed using the Multiple-Cascade Terminal Analog (MCTA) structure, and this sturcture has been improved in three respects: the voicing-source model has an increased number of variable parameters which allows for voicing-source waveforms that better approximate natural speech; the spectral characteristics of the fricative source have been improved; and the path used for nasal consonants has an increased number of resonators to better conform to theory. The current synthesis system uses a modified stored-pattern data structure which allows better transitions between syllables; however, time-invariant values are used in certain cases in order to decrease the amount of required memory. This system also has a new consolidated method for generating geminate obstruents and syllabic nasals. This synthesizer and synthesis system have been implemented in a re-developed rule-based speech-synthesis module. This module has been constructed using ASIC technology and has both small size (56368 mm) and light weight (19g); it is therefore possible to embed it in various types of portable or moving machinery. The module can be connected directly to a mocroprocessor bus and accepts as input sentences which are generated by the host computer. The input sentences are written with the Japanese katakana or romaji syllabaries and other symbols which describe the sentence structure. The syllable articulation rate for one hundred Japanese syllables (including palatalized sounds) is 65% and for sixty-seven syllables (not including palatalized sounds) is 74%. The word intelligibility, measured using phonetically-balanced words, it 88%.

  • Power Control of a Terminal Analog Synthesizer Using a Glottal Model

    Mikio YAMAGUCHI  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1957-1963

    A terminal-analog synthesizer which uses a glottal model has already been proposed for rule-based speech synthesis, but the control strategy for glottal source intensity levels has not yet been defined. On the other hand, power-control rules which determine the target segmental power of synthetic speech have been proposed, based on statistical analysis of the power in natural speech. It is pointed out that there is a close correlation between observed fundamental frequency and power levels in natural speech; however, the theoretical reasons for this correlation have not been explained. This paper shows the relationship between fundamental frequency and resultant power in a terminal-analog synthesizer which uses a glottal model. From the equations it can be deduced that the tendency in natural speech for power to increase with fundamental frequency can be closely simulated by the sum of the effect of the radiation characteristic and the effect of the synthesis system's vocal tract transfer function. In addition, this paper proposes a method for adjusting the power of synthetic speech to any desired value. This control method can be executed in real-time.

  • Phoneme Power Control for Speech Synthesis

    Kenzo ITOH  Tomohisa HIROKAWA  Hirokazu SATO  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1911-1918

    This paper proposes a new method of phoneme power control for speech synthesis by rule. The innovation of this method lies in its use of the phoneme environment and the relationship between speech power and pitch frequency. First, the permissible threshold (PT) for power modification is measured by subjective experiments using power manipulated speech material. As a result, it is concluded that the PT of power modification is 4.1 dB. This experimental result is significant when discussing power control and gives a criterion for power control accuracy. Next, the relationship between speech power and pitch frequency is analyzed using a very large speech data base. The results show that the relationship between phoneme power and pitch frequency is affected by the kind of phoneme, the adjoining phonemes, rising or falling pitch, and initial or final position in the sentence. Finally, we propose that the phoneme power should be controlled by pitch frequency and phoneme environment. This proposal is implemented in a waveform concatenation type text-to-speech synthesizer. This new method yields an averaged root mean square error between real and estimated speech power of 2.17 dB. This value indicates that 94% of the estimated power values are within the permissible threshold of human perception.

  • Speech Segment Selection for Concatenative Synthesis Based on Spectral Distortion Minimization

    Naoto IWAHASHI  Nobuyoshi KAIKI  Yoshinori SAGISAKA  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1942-1948

    This paper proposes a new scheme for concatenative speech synthesis to improve the speech segment selection procedure. The proposed scheme selects a segment sequence for concatenation by minimizing acoustic distortions between the selected segment and the desired spectrum for the target without the use of heuristics. Four types of distortion, a) the spectral prototypicality of a segment, b) the spectral difference between the source and target contexts, c) the degradation resulting from concatenation of phonemes, and d) the acoustic discontinuity between the concatenated segments, are formulated as acoustic quantities, and used as measures for minimization. A search method for selecting segments from a large speech database is also descrided. In this method, a three-step optimization using dynamic programming is used to minimize the four types of distortion. A perceptual test shows that this proposed segment selection method with minimum distortion criteria produces high quality synthesized speech, and that contextual spectral difference and acoustic discontinuity at the segment boundary are important measures for improving the quality.

  • Synthesis of Protocol Specifications for Design of Responsive Protocols

    Hirotaka IGARASHI  Yoshiaki KAKUDA  Tohru KIKUNO  

     
    PAPER

      Vol:
    E76-D No:11
      Page(s):
    1375-1385

    Responsive protocols are communication protocols which ensure timely and reliable recovery when error events occur. Protocol synthesis for design of responsive protocols is to derive a protocol specification based on a service specification. In the previous methods, if the service specification includes simultaneous transmission of primitives from a high layer to a low layer through different service access points, then the derived protocol specification includes protocol errors of unspecified reception caused by message collisions. Also, they only includes a recovery function such as retransmission of messages. This is not enough for recovery from abnormal states due to coordination loss. This paper extends a class of derived protocol specifications to include message collisions which usually occur in real communication protocols. Furthermore, this paper proposes a new method for synthesis of a responsive protocal specification derived from a service specification such that the derived protocol specification is free from protocol erros of unspecified receptions caused by message collisions and includes two recovery functions: message retransmission and checkpoint restart functions.

  • A Portable Text-to-Speech System Using a Pocket-Sized Formant Speech Synthesizer

    Norio HIGUCHI  Tohru SHIMIZU  Hisashi KAWAI  Seiichi YAMAMOTO  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1981-1989

    The authors developed a portable Japanese text-to-speech system using a pocket-sized formant speech synthesizer. It consists of a linguistic processor and an acoustic processor. The linguistic processor runs on an MS-DOS personal computer and has functions to determine readings and prosodic information for input sentences written in kana-kanji-mixed style. New techniques, such as minimization of a cost function for phrases, rare-compound flag, semantic information, information of reading selection and restriction by associated particles, are used to increase the accuracy of readings and accent positions. The accuracy of determining readings and accent positions is 98.6% for sentences in newspaper articles. It is possible to use the linguistic processor through an interface library which has also been developed by the authors. Consequently, it has become possible not only to convert whole texts stored in text files but also to convert parts of sentences sent by the interface library sequentially, and the readings and prosodic information are optimized for the whole sentence at one time. The acoustic processor is custom-made hardware, and it has adopted new techniques, for the improvement of rules for vowel devoicing, control of phoneme durations, control of the phrase components of voice fundamental frequency and the construction of the acoustic parameter database. Due to the above-mentioned modifications, the naturalness of synthetic speech generated by a Klatt-type formant speech synthesizer was improved. On a naturalness test it was rated 3.61 on a scale of 5 points from 0 to 4.

2361-2380hit(2504hit)