IEICE global.ieice.org Site

Keyword Search Result

[Keyword] SPE(2504hit)

2361-2380hit(2504hit)

Mechanical Stress Analysis of Trench Isolation Using a Two-Dimensional Simulation
Satoshi MATSUDA Nobuyuki ITOH Chihiro YOSHINO Yoshiroh TSUBOI Yasuhiro KATSUMATA Hiroshi IWAI

PAPER-Process Simulation

Vol:
E77-C No:2
Page(s):
124-128
Junction leakage current of trench isolation devices is strongly influenced by trench configuration. The origin of the leakage current is the mechanical stress that is generated by the differential thermal expansion between the Si substrate and the SiO2 filled isolation trench during the isolation forming process. A two-dimensional mechanical stress simulation was used to analyze trench-isolated devices. The simulated distribution and magnitude of stress were found to agree with Raman spectroscopic measurements of actual devices. The stress in the deeper regions between deep trenches is likely to increase greatly as the size of devices diminishes, so it is important to reduce this stress and thus suppress junction leakage current.
RookNet: A Switching Network for High Speed Communication
Yuji OIE Yasuhito SASAKI Hideo MIYAHARA

PAPER

Vol:
E77-B No:2
Page(s):
139-146
Central switches are expected to operate at the rate of Terabit per second in high speed networks, like the B-ISDN. Photonic switches using lightwave technology based on wavelength division multiplexing (WDM) and frequency division multiplexing (FDM) are promising ones for high speed switching. Such lightwave networks are mainly divided into two groups, according to the number of hops required for packets to arrive at their destinations: single-hop networks such as networks using star coupler and multihop networks such as Manhattan Street Network and ShuffleNet. In this paper we focus our attention on multihop networks and propose a mesh network, referred to as RookNet, for high speed communication. The average transmission delay time and maximum throughput of RookNet is approximately analyzed. It is shown that, as the number of nodes goes to infinity, the maximum throughput aproaches 0.433 and 0.485 when each node is equipped with no internal buffer and internal buffers of infinite capacity for relayed packets, respectively.
Spoken Sentence Recognition Based on HMM-LR with Hybrid Language Modeling
Kenji KITA Tsuyoshi MORIMOTO Kazumi OHKURA Shigeki SAGAYAMA Yaneo YANO

PAPER

Vol:
E77-D No:2
Page(s):
258-265
This paper describes Japanese spoken sentence recognition using hybrid language modeling, which combines the advantages of both syntactic and stochastic language models. As the baseline system, we adopted the HMM-LR speech recognition system, with which we have already achieved good performance for Japanese phrase recognition tasks. Several improvements have been made to this system aimed at handling continuously spoken sentences. The first improvement is HMM training with continuous utterances as well as word utterances. In previous implementations, HMMs were trained with only word utterances. Continuous utterances are included in the HMM training data because coarticulation effects are much stronger in continuous utterances. The second improvement is the development of a sentential grammar for Japanese. The sentential grammar was created by combining inter- and intra-phrase CFG grammars, which were developed separately. The third improvement is the incorporation of stochastic linguistic knowledge, which includes stochastic CFG and a bigram model of production rules. The system was evaluated using continuously spoken sentences from a conference registration task that included approximately 750 words. We attained a sentence accuracy of 83.9% in the speaker-dependent condition.
Tantalum Dry-Etching Characteristics for X-Ray Mask Fabrication
Akira OZAWA Shigehisa OHKI Masatoshi ODA Hideo YOSHIHARA

PAPER-Integrated Electronics

Vol:
E77-C No:2
Page(s):
255-262
Directional dry etching of Tantalum is described X-ray lithography absorber patterns. Experiments are carried out using both reactive ion etching in CBrF3-based plasma and electron-cyclotron-resonance ion-stream etching in Cl2-based plasma. Ta absorber patterns with perpendicular sidewalls cannot be obtained by RIE when only CBrF3 gas is used as the etchant. While adding CH4 to CBrF3 effectively improves the undercutting of Ta patterns, it deteriorates etching stability because of the intensive deposition effect of CH4 fractions. By adding an Ar/CH4 mixture gas to CBrF3, it is possible to use RIE to fabricate 0.2-µm Ta absorber patterns with perpendicular sidewalls. ECR ion-stream etching is investigated to obtain high etching selectivity between Ta and SiO2 (etching mask)/SiN (membrane). Adding O2 to the Cl2 etchant improves undercutting without remarkably decreasing etching selectivity. Furthermore, an ECR ion-stream etching method is developed to stably etch Ta absorber patterns finer than 0.2µm. This is successfully applied to X-ray lithography mask fabrication for LSI test devices.
A Combined Fast Adaptive Filter Algorithm with an Automatic Switching Method
Youhua WANG Kenji NAKAYAMA

PAPER-Adaptive Signal Processing

Vol:
E77-A No:1
Page(s):
247-256
This paper proposes a new combined fast algorithm for transversal adaptive filters. The fast transversal filter (FTF) algorithm and the normalized LMS (NLMS) are combined in the following way. In the initialization period, the FTF is used to obtain fast convergence. After converging, the algorithm is switched to the NLMS algorithm because the FTF cannot be used for a long time due to its numerical instability. Nonstationary environment, that is, time varying unknown system for instance, is classified into three categories: slow time varying, fast time varying and sudden time varying systems. The NLMS algorithm is applied to the first situation. In the latter two cases, however, the NLMS algorithm cannot provide a good performance. So, the FTF algorithm is selected. Switching between the two algorithms is automatically controlled by using the difference of the MSE sequence. If the difference exceeds a threshold, then the FTF is selected. Other wise, the NLMS is selected. Compared with the RLS algorithm, the proposed combined algorithm needs less computation, while maintaining the same performance. Furthermore, compared with the FTF algorithm, it provides numerically stable operation.
Speech Recognition of lsolated Digits Using Simultaneous Generative Histogram
Yasuhisa HAYASHI Akio OGIHARA Kunio FUKUNAGA

LETTER

Vol:
E76-A No:12
Page(s):
2052-2054
We propose a recognition method for HMM using a simultaneous generative histogram. Proposed method uses the correlation between two features, which is expressed by a simultaneous generative histogram. Then output probabilities of integrated HMM are conditioned by the codeword of another feature. The proposed method is applied to isolated digit word recognition to confirm its validity.
A Translation Method from Natural Language Specifications of Communication Protocols into Algebraic Specifications Using Contextual Dependencies
Yasunori ISHIHARA Hiroyuki SEKI Tadao KASAMI Jun SHIMABUKURO Kazuhiko OKAWA

PAPER-Automaton, Language and Theory of Computing

Vol:
E76-D No:12
Page(s):
1479-1489
This paper presents a method of translating natural language specifications of communication protocols into algebraic specifications. Such a natural language specification specifies action sequences performed by the protocol machine (program). Usually, a sentence implicitly specifies the state of the protocol machine at which the described actions must be performed. The authors propose a method of analyzing the implicitly specified states of the protocol machine taking the OSI session protocol specification (265 sentences) as an example. The method uses the following properties: (a) syntactic properties of a natural language (English in this paper); (b) syntactic properties introduced by the target algebraic specifications, e.g., type constraints; (c) properties specific to the target domain, e.g., properties of data types. This paper also shows the result of applying this method to the main part of the OSI session protocol specification (29 paragraphs, 98 sentences). For 95 sentences, the translation system uniquely determines the states specified implicitly by these sentences, using only (a) and (b) described above. By using (c) in addition, each implicitly specified state in the remaining three sentences is uniquely determined.
A Collision Detection Processor for Intelligent Vehicles
Masanori HARIYAMA Michitaka KAMEYAMA

PAPER

Vol:
E76-C No:12
Page(s):
1804-1811
Since carelessness in driving causes a terrible traffic accident, it is an important subject for a vehicle to avoid collision autonomously. Real-time collision detection between a vehicle and obstacles will be a key target for the next-generation car electronics system. In collision detection, a large storage capacity is usually required to store the 3-D information on the obstacles lacated in a workspace. Moreover, high-computational power is essential not only in coordinate transformation but also in matching operation. In the proposed collision detection VLSI processor, the matching operation is drastically accelerated by using a Content-Addressable Memory (CAM) which evaluates the magnitude relationships between an input word and all the stored words in parallel. A new obstacle representation based on a union of rectangular solids is also used to reduce the obstacle memory capacity, so that the collision detection can be parformed only by parallel magnitude comparison. Parallel architecture using several identical processor elements (PEs) is employed to perform the coordinate transformation at high speed based on the COordinate Rotation DIgital Computation (CORDIC) algorithms. The collision detection time becomes 5.2 ms using 20 PEs and five CAMs with a 42-kbit capacity.
Significance of Suitability Assessment in Speech Synthesis Applications
Hideki KASUYA

INVITED PAPER

Vol:
E76-A No:11
Page(s):
1893-1897
The paper indicates the importance of suitability assesment in speech synthesis applications. Human factors involved in the use of a synthetic speech are first discussed on the basis of an example of a newspaper company where synthetic speech is extensively used as an aid for proofreading a manuscript. Some findings obtained from perceptual experiments on the subjects' preference for paralinguistic properties of synthetic speech are then described, focusing primarily on the suitability of pitch characteristics, speaker's gender, and speaking rates in the task where subjects are asked to proofread a printed text while listening to the speech. The paper finally claims the need for a flexibile speech synthesis system which helps the users create their own synthetic speech.
An Effective Defect-Repair Scheme for a High Speed SRAM
Sadayuki OOKUMA Katsuyuki SATO Akira IDE Hideyuki AOKI Takashi AKIOKA Hideaki UCHIDA

PAPER-SRAM

Vol:
E76-C No:11
Page(s):
1620-1625
To make a fast Bi-CMOS SRAM yield high without speed degradation, three defect-repair methods, the address comparison method, the fuse decoder method and the distributed fuse method, were considered in detail and their advantages and disadvantages were made clear. The distributed fuse method is demonstrated to be further improved by a built-in fuse word driver and a built-in fuse column selector, and fuse analog switches. This enhanced distributed fuse scheme was examined in a fast Bi-CMOS SRAM. A maximun access time of 14 ns and a chip size of 8.8 mm17.4 mm are expected for a 4 Mb Bi-CMOS SRAM in the future.
Tree-Based Approaches to Automatic Generation of Speech Synthesis Rules for Prosodic Parameters
Yoichi YAMASHITA Manabu TANAKA Yoshitake AMAKO Yasuo NOMURA Yoshikazu OHTA Atsunori KITOH Osamu KAKUSHO Riichiro MIZOGUCHI

PAPER

Vol:
E76-A No:11
Page(s):
1934-1941
This paper describes automatic generation of speech synthesis rules which predict a stress level for each bunsetsu in long noun phrases. The rules are inductively inferred from a lot of speech data by using two kinds of tree-based methods, the conventional decision tree and the SBR-tree methods. The rule sets automatically generated by two methods have almost the same performance and decrease the prediction error to about 14 Hz from 23 Hz of the accent component value. The rate of the correct reproduction of the change for adjacent bunsetsu pairs is also used as a measure for evaluating the generated rule sets and they correctly reproduce the change of about 80%. The effectiveness of the rule sets is verified through the listening test. And, with regard to the comprehensiveness of the generated rules, the rules by the SBR-tree methods are very compact and easy to human experts to interpret and matches the former studies.
High Quality Speech Synthesis System Based on Waveform Concatenation of Phoneme Segment
Tomohisa HIROKAWA Kenzo ITOH Hirokazu SATO

PAPER

Vol:
E76-A No:11
Page(s):
1964-1970
A new system for speech synthesis by concatenating waveforms selected from a dictionary is described. The dictionary is constructed from a two-hour speech that includes isolated words and sentences uttered by one male speaker, and contains over 45,000 entries which are identified by their average pitch, dynamic pitch parameter which represents micro pitch structure in a segment, duration and average amplitude. Phoneme duration is set according to phoneme environment, and phoneme power is controlled, by both pitch frequency and phoneme environment. Tests show the average errors in vowel duration and consonant duration are 28.8 ms and 16.8 ms respectively, and the vowel power average error is 2.9 dB. The pitch frequency patterns are calculated according to a conventional model in which the accent component is abbed to a gross phrase component. Set a phoneme string and prosody information, the optimum waveforms are selected from the dictionary by matching their attributes with the given phonetic and prosodic information. A waveform selection function, which has two terms corresponding to prosody and phonological coincidence between rule-set values and waveform values from the dictionary, is proposed. The weight coefficients used in the selection function are determined through subjective hearing tests. The selected waveform segments are then modified in waveform domain to further adjust for the desired prosody. A pitch frequency modification method based on pitch synchronous overlap-add technique is introduced into the system. Lastly, the waveforms are interpolated between voiced waveforms to avoid abrupt changes in voice spectrum and waveform shape. An absolute evaluation test of five grades is performed to the synthesized voice and the mean of the score is 3.1, which is over "good," and while the original speaker quality is retained.
Physiologically-Based Speech Synthesis Using Neural Networks
Makoto HIRAYAMA Eric Vatikiotis-BATESON Mitsuo KAWATO

PAPER

Vol:
E76-A No:11
Page(s):
1898-1910
This paper focuses on two areas in our effort to synthesize speech from neuromotor input using neural network models that effect transforms between cognitive intentions to speak, their physiological effects on vocal tract structures, and subsequent realization as acoustic signals. The first area concerns the biomechanical transform between motor commands to muscles and the ensuing articulator behavior. Using physiological data of muscle EMG (electromyography) and articulator movements during natural English speech utterances, three articulator-specific neural networks learn the forward dynamics that relate motor commands to the muscles and motion of the tongue, jaw, ant lips. Compared to a fully-connected network, mapping muscle EMG and motion for all three sets of articulators at once, this modular approach has improved performance by reducing network complexity and has eliminated some of the confounding influence of functional coupling among articulators. Network independence has also allowed us to identify and assess the effects of technical and empirical limitations on an articulator-by-articulator basis. This is particularly important for modeling the tongue whose complex structure is very difficult to examine empirically. The second area of progress concerns the transform between articulator motion and the speech acoustics. From the articulatory movement trajectories, a second neural network generates PARCOR (partial correlation) coefficients which are then used to synthesize the speech acoustics. In the current implementation, articulator velocities have been added as the inputs to the network. As a result, the model now follows the fast changes of the coefficients for consonants generated by relatively slow articulatory movements during natural English utterances. Although much work still needs to be done, progress in these areas brings us closer to our goal of emulating speech production processes computationally.
Development of TTS Card for PCs and TTS Software for WSs
Yoshiyuki HARA Tsuneo NITTA Hiroyoshi SAITO Ken'ichiro KOBAYASHI

PAPER

Vol:
E76-A No:11
Page(s):
1999-2007
Text-to-speech synthesis (TTS) is currently one of the most important media conversion techniques. In this paper, we describe a Japanese TTS card developed for constructing a personal-computer-based multimedia platform, and a TTS software package developed for a workstation-based multimedia platform. Some applications of this hardware and software are also discussed. The TTS consists of a linguistic processing stage for converting text into phonetic and prosodic information, and a speech processing stage for producing speech from the phonetic and prosodic symbols. The linguistic processing stage uses morphological analysis, rewriting rules for accent movement and pause insertion, and other techniques to impart correct accentuation and a natural-sounding intonation to the synthesized speech. The speech processing stage employs the cepstrum method with consonant-vowel (CV) syllables as the synthesis unit to achieve clear and smooth synthesized speech. All of the processing for converting Japanese text (consisting of mixed Japanese Kanji and Kana characters) to synthesized speech is done internally on the TTS card. This allows the card to be used widely in various applications, including electronic mail and telephone service systems without placing any processing burden on the personal computer. The TTS software was used for an E-mail reading tool on a workstation.
Development of a Rule-Based Speech Synthesizer Module for Embedded Use
Mikio YAMAGUCHI John-Paul HOSOM

PAPER

Vol:
E76-A No:11
Page(s):
1990-1998
A module for rule-based Japanese speech synthesis has been developed. The synthesizer was constructed using the Multiple-Cascade Terminal Analog (MCTA) structure, and this sturcture has been improved in three respects: the voicing-source model has an increased number of variable parameters which allows for voicing-source waveforms that better approximate natural speech; the spectral characteristics of the fricative source have been improved; and the path used for nasal consonants has an increased number of resonators to better conform to theory. The current synthesis system uses a modified stored-pattern data structure which allows better transitions between syllables; however, time-invariant values are used in certain cases in order to decrease the amount of required memory. This system also has a new consolidated method for generating geminate obstruents and syllabic nasals. This synthesizer and synthesis system have been implemented in a re-developed rule-based speech-synthesis module. This module has been constructed using ASIC technology and has both small size (56368 mm) and light weight (19g); it is therefore possible to embed it in various types of portable or moving machinery. The module can be connected directly to a mocroprocessor bus and accepts as input sentences which are generated by the host computer. The input sentences are written with the Japanese katakana or romaji syllabaries and other symbols which describe the sentence structure. The syllable articulation rate for one hundred Japanese syllables (including palatalized sounds) is 65% and for sixty-seven syllables (not including palatalized sounds) is 74%. The word intelligibility, measured using phonetically-balanced words, it 88%.
Power Control of a Terminal Analog Synthesizer Using a Glottal Model
Mikio YAMAGUCHI

PAPER

Vol:
E76-A No:11
Page(s):
1957-1963
A terminal-analog synthesizer which uses a glottal model has already been proposed for rule-based speech synthesis, but the control strategy for glottal source intensity levels has not yet been defined. On the other hand, power-control rules which determine the target segmental power of synthetic speech have been proposed, based on statistical analysis of the power in natural speech. It is pointed out that there is a close correlation between observed fundamental frequency and power levels in natural speech; however, the theoretical reasons for this correlation have not been explained. This paper shows the relationship between fundamental frequency and resultant power in a terminal-analog synthesizer which uses a glottal model. From the equations it can be deduced that the tendency in natural speech for power to increase with fundamental frequency can be closely simulated by the sum of the effect of the radiation characteristic and the effect of the synthesis system's vocal tract transfer function. In addition, this paper proposes a method for adjusting the power of synthetic speech to any desired value. This control method can be executed in real-time.
Phoneme Power Control for Speech Synthesis
Kenzo ITOH Tomohisa HIROKAWA Hirokazu SATO

PAPER

Vol:
E76-A No:11
Page(s):
1911-1918
This paper proposes a new method of phoneme power control for speech synthesis by rule. The innovation of this method lies in its use of the phoneme environment and the relationship between speech power and pitch frequency. First, the permissible threshold (PT) for power modification is measured by subjective experiments using power manipulated speech material. As a result, it is concluded that the PT of power modification is 4.1 dB. This experimental result is significant when discussing power control and gives a criterion for power control accuracy. Next, the relationship between speech power and pitch frequency is analyzed using a very large speech data base. The results show that the relationship between phoneme power and pitch frequency is affected by the kind of phoneme, the adjoining phonemes, rising or falling pitch, and initial or final position in the sentence. Finally, we propose that the phoneme power should be controlled by pitch frequency and phoneme environment. This proposal is implemented in a waveform concatenation type text-to-speech synthesizer. This new method yields an averaged root mean square error between real and estimated speech power of 2.17 dB. This value indicates that 94% of the estimated power values are within the permissible threshold of human perception.
Speech Segment Selection for Concatenative Synthesis Based on Spectral Distortion Minimization
Naoto IWAHASHI Nobuyoshi KAIKI Yoshinori SAGISAKA

PAPER

Vol:
E76-A No:11
Page(s):
1942-1948
This paper proposes a new scheme for concatenative speech synthesis to improve the speech segment selection procedure. The proposed scheme selects a segment sequence for concatenation by minimizing acoustic distortions between the selected segment and the desired spectrum for the target without the use of heuristics. Four types of distortion, a) the spectral prototypicality of a segment, b) the spectral difference between the source and target contexts, c) the degradation resulting from concatenation of phonemes, and d) the acoustic discontinuity between the concatenated segments, are formulated as acoustic quantities, and used as measures for minimization. A search method for selecting segments from a large speech database is also descrided. In this method, a three-step optimization using dynamic programming is used to minimize the four types of distortion. A perceptual test shows that this proposed segment selection method with minimum distortion criteria produces high quality synthesized speech, and that contextual spectral difference and acoustic discontinuity at the segment boundary are important measures for improving the quality.
Synthesis of Protocol Specifications for Design of Responsive Protocols
Hirotaka IGARASHI Yoshiaki KAKUDA Tohru KIKUNO

PAPER

Vol:
E76-D No:11
Page(s):
1375-1385
Responsive protocols are communication protocols which ensure timely and reliable recovery when error events occur. Protocol synthesis for design of responsive protocols is to derive a protocol specification based on a service specification. In the previous methods, if the service specification includes simultaneous transmission of primitives from a high layer to a low layer through different service access points, then the derived protocol specification includes protocol errors of unspecified reception caused by message collisions. Also, they only includes a recovery function such as retransmission of messages. This is not enough for recovery from abnormal states due to coordination loss. This paper extends a class of derived protocol specifications to include message collisions which usually occur in real communication protocols. Furthermore, this paper proposes a new method for synthesis of a responsive protocal specification derived from a service specification such that the derived protocol specification is free from protocol erros of unspecified receptions caused by message collisions and includes two recovery functions: message retransmission and checkpoint restart functions.
A Portable Text-to-Speech System Using a Pocket-Sized Formant Speech Synthesizer
Norio HIGUCHI Tohru SHIMIZU Hisashi KAWAI Seiichi YAMAMOTO

PAPER

Vol:
E76-A No:11
Page(s):
1981-1989
The authors developed a portable Japanese text-to-speech system using a pocket-sized formant speech synthesizer. It consists of a linguistic processor and an acoustic processor. The linguistic processor runs on an MS-DOS personal computer and has functions to determine readings and prosodic information for input sentences written in kana-kanji-mixed style. New techniques, such as minimization of a cost function for phrases, rare-compound flag, semantic information, information of reading selection and restriction by associated particles, are used to increase the accuracy of readings and accent positions. The accuracy of determining readings and accent positions is 98.6% for sentences in newspaper articles. It is possible to use the linguistic processor through an interface library which has also been developed by the authors. Consequently, it has become possible not only to convert whole texts stored in text files but also to convert parts of sentences sent by the interface library sequentially, and the readings and prosodic information are optimized for the whole sentence at one time. The acoustic processor is custom-made hardware, and it has adopted new techniques, for the improvement of rules for vowel devoicing, control of phoneme durations, control of the phrase components of voice fundamental frequency and the construction of the acoustic parameter database. Due to the above-mentioned modifications, the naturalness of synthetic speech generated by a Klatt-type formant speech synthesizer was improved. On a naturalness test it was rated 3.61 on a scale of 5 points from 0 to 4.

2361-2380hit(2504hit)

Keyword Search Result

[Keyword] SPE(2504hit)

Mechanical Stress Analysis of Trench Isolation Using a Two-Dimensional Simulation

RookNet: A Switching Network for High Speed Communication

Spoken Sentence Recognition Based on HMM-LR with Hybrid Language Modeling

Tantalum Dry-Etching Characteristics for X-Ray Mask Fabrication

A Combined Fast Adaptive Filter Algorithm with an Automatic Switching Method

Speech Recognition of lsolated Digits Using Simultaneous Generative Histogram

A Translation Method from Natural Language Specifications of Communication Protocols into Algebraic Specifications Using Contextual Dependencies

A Collision Detection Processor for Intelligent Vehicles

Significance of Suitability Assessment in Speech Synthesis Applications

An Effective Defect-Repair Scheme for a High Speed SRAM

Tree-Based Approaches to Automatic Generation of Speech Synthesis Rules for Prosodic Parameters

High Quality Speech Synthesis System Based on Waveform Concatenation of Phoneme Segment

Physiologically-Based Speech Synthesis Using Neural Networks

Development of TTS Card for PCs and TTS Software for WSs

Development of a Rule-Based Speech Synthesizer Module for Embedded Use

Power Control of a Terminal Analog Synthesizer Using a Glottal Model

Phoneme Power Control for Speech Synthesis

Speech Segment Selection for Concatenative Synthesis Based on Spectral Distortion Minimization

Synthesis of Protocol Specifications for Design of Responsive Protocols

A Portable Text-to-Speech System Using a Pocket-Sized Formant Speech Synthesizer

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles