IEICE global.ieice.org Site

Keyword Search Result

[Keyword] Al(20498hit)

19681-19700hit(20498hit)

A System for the Synthesis of High-Quality Speech from Texts on General Weather Conditions
Keikichi HIROSE Hiroya FUJISAKI

PAPER

Vol:
E76-A No:11
Page(s):
1971-1980
A text-to-speech conversion system for Japanese has been developed for the purpose of producing high-quality speech output. This system consists of four processing stages: 1) linguistic processing, 2) phonological processing, 3) control parameter generation, and 4) speech waveform generation. Although the processing at the first stage is restricted to the texts on general weather conditions, the other three stages can also cope with texts of news and narrations on other topics. Since the prosodic features of speech are largely related to the linguistic information, such as word accent, syntactic structure and discourse structure, linguistic processing of a wider range than ever, at least a sentence, is indispensable to obtain good quality speech with respect to the prosody. From this point of view, input text was restricted to the weather forecast sentences and a method for linguistic processing was developed to conduct morpheme, syntactic and semantic analyses simultaneously. A quantitative model for generating fundamental frequency contours was adopted to make a good reflection of the linguistic information on the prosody of synthetic speech. A set of prosodic rules was constructed to generate prosodic symbols representing prosodic structures of the text from the linguistic information obtained at the first stage. A new speech synthesizer based on the terminal analog method was also developed to improve the segmental quality of synthetic speech. It consists of four paths of cascade connection of pole/zero filters and three waveform generators. The four paths are respectively used for the synthesis of vowels and vowel-like sounds, nasal murmur and buzz bar, friction, and plosion, while the three generators produce voicing source waveform approximated by polynomials, white Gaussian noise source for fricatives and impulse source for plosives. The validity of the approach above has been confirmed by the listening tests using speech synthesized by the developed system. Improvements both in the quality of prosodic features and in the quality of segmental features were realized for the synthetic speech.
A Feasibility Study on a Simple Stored Channel Simulator for Urban Mobile Radio Environments
Tsutomu TAKEUCHI

PAPER-Radio Communication

Vol:
E76-B No:11
Page(s):
1424-1428
A stored channel simulator for digital mobile radio enviroments is proposed, which enables the field tests in the laboratory under identical conditions, since it can reproduce the actual multipath radio channels by using the channel impulse responses (CIR's) measured in the field. Linear interpolation of CIR is introduced to simplify the structure of the proposed simulator. The performance of the proposed simulator is confirmed by the laboratory tests.
A Reconfigurable Parallel Processor Based on a TDLCA Model
Masahiro TSUNOYAMA Masataka KAWANAKA Sachio NAITO

PAPER

Vol:
E76-D No:11
Page(s):
1358-1364
This paper proposes a reconfigurable parallel processor based on a two-dimensional linear celular automaton model. The processor based on the model can be reconfigured quickly by utilizing the characteristics of the automaton used for its model. Moreover, the processor has short data path length between processing elements compared with the length of the processor based on one-dimensional linear cellular automaton model which has been already discussed. The processing elements of the processor based on the two-dimensional linear cellular automaton model are regarded as cells and the operational states of the processor are treated as the states of the automaton. When faults are detected, the processor can be reconfigured by changing its state under the state transition function of the processor determined by the weighting function of the automaton model. The processor can be reconfigured within a clock period required for making a state transition. This processor is extremely effective for real-time data processing systems required high reliability.
Speech Segment Selection for Concatenative Synthesis Based on Spectral Distortion Minimization
Naoto IWAHASHI Nobuyoshi KAIKI Yoshinori SAGISAKA

PAPER

Vol:
E76-A No:11
Page(s):
1942-1948
This paper proposes a new scheme for concatenative speech synthesis to improve the speech segment selection procedure. The proposed scheme selects a segment sequence for concatenation by minimizing acoustic distortions between the selected segment and the desired spectrum for the target without the use of heuristics. Four types of distortion, a) the spectral prototypicality of a segment, b) the spectral difference between the source and target contexts, c) the degradation resulting from concatenation of phonemes, and d) the acoustic discontinuity between the concatenated segments, are formulated as acoustic quantities, and used as measures for minimization. A search method for selecting segments from a large speech database is also descrided. In this method, a three-step optimization using dynamic programming is used to minimize the four types of distortion. A perceptual test shows that this proposed segment selection method with minimum distortion criteria produces high quality synthesized speech, and that contextual spectral difference and acoustic discontinuity at the segment boundary are important measures for improving the quality.
Noise Reduction Techniques for a 64-kb ECL-CMOS SRAM with a 2-ns Cycle Time
Kenichi OHHATA Yoshiaki SAKURAI Hiroaki NAMBU Kazuo KANETANI Youji IDEI Toshirou HIRAMOTO Nobuo TAMBA Kunihiko YAMAGUCHI Masanori ODAKA Kunihiko WATANABE Takahide IKEDA Noriyuki HOMMA

PAPER-SRAM

Vol:
E76-C No:11
Page(s):
1611-1619
An ECL-CMOS SRAM technology is proposed which features a combination of ECL word drivers, ECL write circuits and low-voltage CMOS cells. This technology assures both ultra-high-speed and high-density. In the ECL-CMOS SRAM,various kinds of noise generated during the write cycle seriously affect the memory performance, because it has much faster access than conventional SRAMs. To overcome this problem, we propose three noise reduction techniques; a noise reduction clamp circuit, an emitter follower with damping capacitor and a twisted bit line structure with "normally on" equalizer. These techniques allow fast accese and cycle times. To evaluate these techniques, a 64-kb SRAM chip was fabricated using 0.5-µm BiCMOS technology. This SRAM has a short cycle time of 2 ns and a very fast access time of 1.5 ns. Evaluation proves the usefulness of these techniques.
A High-Density Multiple-Valued Content-Addressable Memory Based on One Transistor Cell
Satoshi ARAGAKI Takahiro HANYU Tatsuo HIGUCHI

PAPER-Application Specific Memory

Vol:
E76-C No:11
Page(s):
1649-1656
This paper presents a high-density multiple-valued content-addressable memory (MVCAM) based on a floating-gate MOS device. In the proposed CAM, a basic operation performed in each cell is a threshold function that is a kind of inverter whose threshold value is programmable. Various multiple-valued operations for data retrieval can be easily performed using threshold functions. Moreover, each cell circuit in the MVCAM can be implemented using only a single floating-gate MOS transistor. As a result, the cell area of the four-valued CAM are reduced to 37% in comparison with that of the conventional dynamic CAM cell.
Observation of Nonlinear Waves in a Graded-Index Planar Waveguide with a Kerr-Line Nonlinear Cover
Kazuhiko OGUSU Masashi YOSHIMURA Hiroo KOMURA

LETTER-Opto-Electronics

Vol:
E76-C No:11
Page(s):
1691-1694
The intensity-dependent transmission characteristics of an Ag+Na+ ion-exchanged glass waveguide with a nematic liquid crystal MBBA cover have been investigated experimentally using an Ar+ laser. It is found that the transmission characteristics of the TE1 mode are strongly influenced by temperature. Optical bistability has been observed at a particular temperature. Such the strong temperature dependence is believed to be brought by an increase in ordinary refractive index of the MBBA cover due to temperature rise.
Prosodic Characteristics of Japanese Conversational Speech
Nobuyoshi KAIKI Yoshinori SAGISAKA

PAPER

Vol:
E76-A No:11
Page(s):
1927-1933
In this paper, we quantitively analyzed speech data in seven different styles to make natural Japanese conversational speech synthesis. Three reading styles were produced at different speeds (slow, normal and fast), and four speaking styles were produced by enacting conversation in different situations (free, hurried, angry and polite). To clarify the differences in prosodic characteristics between conversational speech and read speech, means and standard deviations of vowel duration, vowel amplitude and fundamental frequency (F0) were analyzed. We found large variation in these prosodic parameters. To look more precisely at the segmental duration and segmental amplitude differences between conversational speech and read speech, control rules of prosodic parameters in reading styles were applied to conversational speech. F0 contours of different speaking styles are superposed by normalizing the segmental duration. The differences between estimated values and actual values were analyzed. Large differences were found at sentence final and key (focused) phrases. Sentence final positions showed lengthening of segmental vowel duration and increased segmental vowel amplitude. Key phrase positions featured raising F0.
Manifestation of Linguistic Information in the Voice Fundamental Frequency Contours of Spoken Japanese
Hiroya FUJISAKI Keikichi HIROSE Noboru TAKAHASHI

PAPER

Vol:
E76-A No:11
Page(s):
1919-1926
Prosodic features of the spoken Japanese play an important role in the transmission of linguistic information concerning the lexical word accent, the sentence structure and the discourse structure. In order to construct prosodic rules for synthesizing high-quality speech, therefore, prosodic features of speech should be quantitatively analyzed with respect to the linguistic information. With a special focus on the fundamental frequency contour, we first define four prosodic units for the spoken Japanese, viz., prosodic word, prosodic phrase, prosodic clause and prosodic sentence, based on a decomposition of the fundamental frequency contour using a functional model for the generation process. Syntactic units are also introduced which have rough correspondence to these prosodic units. The relationships between the linguistic information and the characteristics of the components of the fundamental frequency contour are then described on the basis of results obtained by the analysis of two sets of speech material. Analysis of weathercast and newscast sentences showed that prosodic boundaries given by the manner of continuation/termination of phrase components fall into three categories, and are primarily related to the syntactic boundaries. On the other hand, analysis of noun phrases with various combinations of word accent types, syntactic structures, and focal conditions, indicated that the magnitude and the shape of the accent components, which of course reflect the information concerning the lexical accent types of constituent words, are largely influenced by the focal structure. The results also indicated that there are cases where prosody fails to meet all the requirements presented by word accent, syntax and discourse.
A Verification Method via Invariant for Communication Protocols Modeled as Extended Communicating Finite-State Machines
Masahiro HIGUCHI Osamu SHIRAKAWA Hiroyuki SEKI Mamoru FUJII Tadao KASAMI

PAPER-Signaling System and Communication Protocol

Vol:
E76-B No:11
Page(s):
1363-1372
This paper presents a method for verifying safety property of a communication protocol modeled as two extended communicating finite-state machines with two unbounded FIFO channels connecting them. In this method, four types of atomic formulae specifying a condition on a machine and a condition on a sequence of messages in a channel are introduced. A human verifier describes a logical formula which expresses conditions expected to be satisfied by all reachable global states, and a verification system proves that the formula is indeed satisfied by such states (i.e. the formula is an invariant) by induction. If the invariant is never satisfied in any unsafe state, it can be concluded that the protocol it safe. To show the effectiveness of this method, a sample protocol extracted from the data transfer phase of the OSI session protocol was verified by using the verification system.
High Quality Synthetic Speech Generation Using Synchronized Oscillators
Kenji HASHIMOTO Takemi MOCHIDA Yasuaki SATO Tetsunori KOBAYASHI Katsuhiko SHIRAI

PAPER

Vol:
E76-A No:11
Page(s):
1949-1956
For the production of high quality synthetic sounds in a text-to-speech system, an excellent synthesizing method of speech signals is indispensable. In this paper, a new speech analysis-synthesis method for the text-to-speech system is proposed. The signals of voiced speech, which have a line spectrum structure at intervals of pitch in the linear frequency domain, can be represented approximately by the superposition of sinusoidal waves. In our system, analysis and synthesis are performed using such a harmonic structure of the signals of voiced speech. In the analysis phase, assuming an exact harmonic structure model at intervals of pitch against the fine structure of the short-time power spectrum, the fundamental frequency f0 is decided so as to minimize the error of the log-power spectrum at each peak position. At the same time, according to the value of the above minimized error, the rate of periodicity of the speech signal is detemined. Then the log-power spectrum envelope is represented by the cosine-series interpolating the data which are sampled at every pitch period. In the synthesis phase, numerical solutions of non-linear differential equations which generate sinusoidal waves are used. For voiced sounds, those equations behave as a group of mutually synchronized oscillators. These sinusoidal waves are superposed so as to reconstruct the line spectrum structure. For voiceless sounds, those non-linear differential equations work as passive filters with input noise sources. Our system has some characteristics as follows. (1) Voiced and voiceless sounds can be treated in a same framowork. (2) Since the phase and the power information of each sinusoidal wave can be easily controlled, if necessary, periodic waveforms in the voiced sounds can be precisely reproduced in the time domain. (3) The fundamental frequency f0 and phoneme duration can be easily changed without much degradation of original sound quality.
A Study on ATM Network Planning Based on Evaluation of Design Items
Makiko YOSHIDA Hiroyuki OKAZAKI

PAPER-Communication Networks and Service

Vol:
E76-B No:11
Page(s):
1333-1340
This paper describes a planning method for ATM networks. The method is based on evaluation of two design items, VC routing and VP routing, as well as on consideration of VPI constraints. In the evaluation, VC routing is compared with VP routing in separate case studies undertaken from the point of view of various parameters such as traffic volume, cost function and network scale. The results suggest the vertical relationship between VC and VP levels in optimally designed ATM networks. VC and VP network levels are then studied separately, and design methods are proposed for individual levels. In addition a perturbation method is proposed for the VC and VP routing use, whose optimum is varied as a function of the parameters described above. Evaluation results show the proposed perturbation method provides cost-effective networks.
Development of a Rule-Based Speech Synthesizer Module for Embedded Use
Mikio YAMAGUCHI John-Paul HOSOM

PAPER

Vol:
E76-A No:11
Page(s):
1990-1998
A module for rule-based Japanese speech synthesis has been developed. The synthesizer was constructed using the Multiple-Cascade Terminal Analog (MCTA) structure, and this sturcture has been improved in three respects: the voicing-source model has an increased number of variable parameters which allows for voicing-source waveforms that better approximate natural speech; the spectral characteristics of the fricative source have been improved; and the path used for nasal consonants has an increased number of resonators to better conform to theory. The current synthesis system uses a modified stored-pattern data structure which allows better transitions between syllables; however, time-invariant values are used in certain cases in order to decrease the amount of required memory. This system also has a new consolidated method for generating geminate obstruents and syllabic nasals. This synthesizer and synthesis system have been implemented in a re-developed rule-based speech-synthesis module. This module has been constructed using ASIC technology and has both small size (56368 mm) and light weight (19g); it is therefore possible to embed it in various types of portable or moving machinery. The module can be connected directly to a mocroprocessor bus and accepts as input sentences which are generated by the host computer. The input sentences are written with the Japanese katakana or romaji syllabaries and other symbols which describe the sentence structure. The syllable articulation rate for one hundred Japanese syllables (including palatalized sounds) is 65% and for sixty-seven syllables (not including palatalized sounds) is 74%. The word intelligibility, measured using phonetically-balanced words, it 88%.
The Trend of Functional Memory Development
Keikichi TAMARU

INVITED PAPER

Vol:
E76-C No:11
Page(s):
1545-1554
The concept of functional memory was proposed over nearly four decades ago. However, the actually usable products have not appeared until the 1980s instead of the long history of development. Functional memory is classified into three categories; there are a general functional memory, a processing element array with small size memory and a special purpose memory. Today a majority of functional memory is an associative memory or a content addressable memory (CAM) and a special purpose memory based on CAM. Due to advances in fablication capability,the capacity of CAM LSI has increased over 100 K bits. A general purpose CAM was developed based on SRAM cell and DRAM cell, respectively. The typical CAM LSI of both types, 20 K bits SRAM based CAM and 288 K bits DRAM based CAM, are introduced. DRAM based CAM is attractive for the large capacity. A parallel processor architecture based on CAM cell is proposed which is called a Functional Memory Type Parallel Processor (FMPP). The basic feature is a dual character of a higher performance CAM and a tiny processor array. It can perform a highly parallel operation to the stored data.
Separated Equivalent Edge Current Method for Calculating Scattering Cross Sections of Polyhedron Structures
Yonehiko SUNAHARA Hiroyuki OHMINE Hiroshi AOKI Takashi KATAGI Tsutomu HASHIMOTO

PAPER-Antennas and Propagation

Vol:
E76-B No:11
Page(s):
1439-1444
This paper describes a novel method to calculate the fields scattered by a polyhedron structure for an incident plane wave. In this method, the fields diffracted by an edge are calculated using the equivalent edge currents which are separated into components dependent on each of the two surfaces which form the edge. The separated equivalent edge currents are based on the Geometrical Theory of Diffraction (GTD). Using this Separated Equivalent Edge Current Method (SEECM) , fields scattered by a polyhedron structure can be calculated without special treatment of the singularity in the diffraction coefficient. This method can be also applied successfully to structures with convex surfaces by modeling them as polyhedron structures.
An Integrated Voice and Data Transmission System with Idle Signal Multiple Access--Dynamic Analysis--
Gang WU Kaiji MUKUMOTO Akira FUKUDA

PAPER-Communication Systems and Transmission Equipment

Vol:
E76-B No:11
Page(s):
1398-1407
In our preceding paper, I-ISMA (Idle Signal Multiple Access for Integrated services), a combination of ISMA and time reservation technique, was proposed to transmit an integrated voice and data traffic in third generation wireless communication networks. There, the channel capacity of I-ISMA was evaluated by the static analysis. To fully estimate performance of contention-based channel access protocols, however, we also need dynamic analysis to evaluate stability, delay, etc. Particularly, in systems concerning real-time voice transmission, delay is one of the most important performance measures. A six-mode model to describe an I-ISMA system is set up. With some assumptions for simplification, the dynamic behavior of the system is approximated by a Markov process so that the EPA (Equilibrium Point Analysis), a fluid approximation method, can be applied to the analysis. Then, numerical and simulation results are obtained for some examples. By means of the same analysis method and under the same conditions, the performance of PRMA is evaluated and compared briefly with that of I-ISMA.
An Analysis of Optimal Frame Rate in Low Bit Rate Video Coding
Yasuhiro TAKISHIMA Masahiro WADA Hitomi MURAKAMI

PAPER-Communication Systems and Transmission Equipment

Vol:
E76-B No:11
Page(s):
1389-1397
We analyze frame rates in low bit rate video coding and show that an optimal frame rate can be theoretically obtained. In low bit rate video coding the frame rate is usually forced to be decreased for reducing the total amount of coded information. The choice of frame rate, however, has a great effect on the picture quality in a trade-off relation between coded picture quality and motion smoothness. It is known from experience that in order to achieve an optimum balance between these two factors, a frame rate has to be selected which is appropriate for the coding scheme, property of the video sequences and coding bit rate. A theoretical analysis, however, on the existence of an optimal frame rate and how the optimal frame rate would be expressed has not been performed. In this paper, coding distortion measured by mean square error is analyzed by using video signal models such as a rate-distortion function for coded frames and inter-frame correlation coefficients for non-coded frames. Overall picture quality taking account of coded picture quality and motion smoothness simultaneously is expressed as a function of frame rate. This analysis shows that the optimum frame rate can be uniquely specified. The maximum frame rate is optimal when the coding bit rate is higher than a certain value for a given video scene, while a frame rate less than the maximum is optimal otherwise. The result of the theoretical analysis is compared with the results of computer simulation. In addition, the relation between this analysis and a subjective evaluation is described. From both comparisons this theoretical analysis can be justified as an effective scheme to indicate the optimal frame rate, and it shows the possibility of improving picture quality by selecting frame rate adaptively.
Loss and Waiting Time Probability Approximation for General Queueing
Kenji NAKAGAWA

PAPER-Communication Theory

Vol:
E76-B No:11
Page(s):
1381-1388
Queueing problems are investigated for very wide classes of input traffic and service time models to obtain good loss probability and waiting time probability approximation. The proposed approximation is based on the fundamental recursion formula and the Chernoff bound technique, both of which requires no particular assumption for the stochastic nature of input traffic and service time, such as renewal or markovian properties. The only essential assumption is stationarity. We see that the accuracy of the obtained approximation is confirmed by comparison with computer simulation. There are a number of advantages of the proposed method of approximation when we apply it to network capacity design or path accommodation design problems. First, the proposed method has the advantage of applying to multi-media traffic. In the ATM network, a variety of bursty or non-bursty cell traffic exist and are superposed, so some unified analysis methodology is required without depending each traffic's characteristics. Since our method assumes only the stationarity of input and service process, it is applicable to arbitrary types of cell streams. Further, this approach can be used for the unexpected future traffic models. The second advantage in application is that the proposed probability approximation requires only small amount of computational complexity. Because of the use of the Chernoff bound technique, the convolution of every traffic's probability density fnuction is replaced by the product of probability generating functions. Hence, the proposed method provides a fast algorithm for, say, the call admission control problem. Third, it has the advantage of accuracy. In this paper, we applied the approxmation to the cases of homogeneous CBR traffic, non-homogeneous CBR traffic, M/D/1, AR(1)/D/1, M/M/1 and D/M/1. In all cases, the approximating values have enough accuracy for the exact values or computer simulation results from low traffic load to high load. Moreover, in all cases of the numerical comparison, our approximations are upper bounds of the real values. This is very important for the sake of conservative network design.
Physiologically-Based Speech Synthesis Using Neural Networks
Makoto HIRAYAMA Eric Vatikiotis-BATESON Mitsuo KAWATO

PAPER

Vol:
E76-A No:11
Page(s):
1898-1910
This paper focuses on two areas in our effort to synthesize speech from neuromotor input using neural network models that effect transforms between cognitive intentions to speak, their physiological effects on vocal tract structures, and subsequent realization as acoustic signals. The first area concerns the biomechanical transform between motor commands to muscles and the ensuing articulator behavior. Using physiological data of muscle EMG (electromyography) and articulator movements during natural English speech utterances, three articulator-specific neural networks learn the forward dynamics that relate motor commands to the muscles and motion of the tongue, jaw, ant lips. Compared to a fully-connected network, mapping muscle EMG and motion for all three sets of articulators at once, this modular approach has improved performance by reducing network complexity and has eliminated some of the confounding influence of functional coupling among articulators. Network independence has also allowed us to identify and assess the effects of technical and empirical limitations on an articulator-by-articulator basis. This is particularly important for modeling the tongue whose complex structure is very difficult to examine empirically. The second area of progress concerns the transform between articulator motion and the speech acoustics. From the articulatory movement trajectories, a second neural network generates PARCOR (partial correlation) coefficients which are then used to synthesize the speech acoustics. In the current implementation, articulator velocities have been added as the inputs to the network. As a result, the model now follows the fast changes of the coefficients for consonants generated by relatively slow articulatory movements during natural English utterances. Although much work still needs to be done, progress in these areas brings us closer to our goal of emulating speech production processes computationally.
Should Responsive Systems be Event-Triggered or Time-Triggered ?
Hermann KOPETZ

INVITED PAPER

Vol:
E76-D No:11
Page(s):
1325-1332
In this paper the two different paradigms for the design of responsive, i.e., distributed fault-tolerant real-time systems, the event-triggered (ET) approach and the time-triggered (TT) approach, are analyzed and compared. The comparison focuses on the temporal properties and considers the issues of predictability, testability, resource utilization, extensibility, and assumption coverage.

19681-19700hit(20498hit)

Keyword Search Result

[Keyword] Al(20498hit)

A System for the Synthesis of High-Quality Speech from Texts on General Weather Conditions

A Feasibility Study on a Simple Stored Channel Simulator for Urban Mobile Radio Environments

A Reconfigurable Parallel Processor Based on a TDLCA Model

Speech Segment Selection for Concatenative Synthesis Based on Spectral Distortion Minimization

Noise Reduction Techniques for a 64-kb ECL-CMOS SRAM with a 2-ns Cycle Time

A High-Density Multiple-Valued Content-Addressable Memory Based on One Transistor Cell

Observation of Nonlinear Waves in a Graded-Index Planar Waveguide with a Kerr-Line Nonlinear Cover

Prosodic Characteristics of Japanese Conversational Speech

Manifestation of Linguistic Information in the Voice Fundamental Frequency Contours of Spoken Japanese

A Verification Method via Invariant for Communication Protocols Modeled as Extended Communicating Finite-State Machines

High Quality Synthetic Speech Generation Using Synchronized Oscillators

A Study on ATM Network Planning Based on Evaluation of Design Items

Development of a Rule-Based Speech Synthesizer Module for Embedded Use

The Trend of Functional Memory Development

Separated Equivalent Edge Current Method for Calculating Scattering Cross Sections of Polyhedron Structures

An Integrated Voice and Data Transmission System with Idle Signal Multiple Access--Dynamic Analysis--

An Analysis of Optimal Frame Rate in Low Bit Rate Video Coding

Loss and Waiting Time Probability Approximation for General Queueing

Physiologically-Based Speech Synthesis Using Neural Networks

Should Responsive Systems be Event-Triggered or Time-Triggered ?

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles