The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] Ti(30728hit)

29601-29620hit(30728hit)

  • High Quality Synthetic Speech Generation Using Synchronized Oscillators

    Kenji HASHIMOTO  Takemi MOCHIDA  Yasuaki SATO  Tetsunori KOBAYASHI  Katsuhiko SHIRAI  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1949-1956

    For the production of high quality synthetic sounds in a text-to-speech system, an excellent synthesizing method of speech signals is indispensable. In this paper, a new speech analysis-synthesis method for the text-to-speech system is proposed. The signals of voiced speech, which have a line spectrum structure at intervals of pitch in the linear frequency domain, can be represented approximately by the superposition of sinusoidal waves. In our system, analysis and synthesis are performed using such a harmonic structure of the signals of voiced speech. In the analysis phase, assuming an exact harmonic structure model at intervals of pitch against the fine structure of the short-time power spectrum, the fundamental frequency f0 is decided so as to minimize the error of the log-power spectrum at each peak position. At the same time, according to the value of the above minimized error, the rate of periodicity of the speech signal is detemined. Then the log-power spectrum envelope is represented by the cosine-series interpolating the data which are sampled at every pitch period. In the synthesis phase, numerical solutions of non-linear differential equations which generate sinusoidal waves are used. For voiced sounds, those equations behave as a group of mutually synchronized oscillators. These sinusoidal waves are superposed so as to reconstruct the line spectrum structure. For voiceless sounds, those non-linear differential equations work as passive filters with input noise sources. Our system has some characteristics as follows. (1) Voiced and voiceless sounds can be treated in a same framowork. (2) Since the phase and the power information of each sinusoidal wave can be easily controlled, if necessary, periodic waveforms in the voiced sounds can be precisely reproduced in the time domain. (3) The fundamental frequency f0 and phoneme duration can be easily changed without much degradation of original sound quality.

  • A Portable Text-to-Speech System Using a Pocket-Sized Formant Speech Synthesizer

    Norio HIGUCHI  Tohru SHIMIZU  Hisashi KAWAI  Seiichi YAMAMOTO  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1981-1989

    The authors developed a portable Japanese text-to-speech system using a pocket-sized formant speech synthesizer. It consists of a linguistic processor and an acoustic processor. The linguistic processor runs on an MS-DOS personal computer and has functions to determine readings and prosodic information for input sentences written in kana-kanji-mixed style. New techniques, such as minimization of a cost function for phrases, rare-compound flag, semantic information, information of reading selection and restriction by associated particles, are used to increase the accuracy of readings and accent positions. The accuracy of determining readings and accent positions is 98.6% for sentences in newspaper articles. It is possible to use the linguistic processor through an interface library which has also been developed by the authors. Consequently, it has become possible not only to convert whole texts stored in text files but also to convert parts of sentences sent by the interface library sequentially, and the readings and prosodic information are optimized for the whole sentence at one time. The acoustic processor is custom-made hardware, and it has adopted new techniques, for the improvement of rules for vowel devoicing, control of phoneme durations, control of the phrase components of voice fundamental frequency and the construction of the acoustic parameter database. Due to the above-mentioned modifications, the naturalness of synthetic speech generated by a Klatt-type formant speech synthesizer was improved. On a naturalness test it was rated 3.61 on a scale of 5 points from 0 to 4.

  • Prosodic Characteristics of Japanese Conversational Speech

    Nobuyoshi KAIKI  Yoshinori SAGISAKA  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1927-1933

    In this paper, we quantitively analyzed speech data in seven different styles to make natural Japanese conversational speech synthesis. Three reading styles were produced at different speeds (slow, normal and fast), and four speaking styles were produced by enacting conversation in different situations (free, hurried, angry and polite). To clarify the differences in prosodic characteristics between conversational speech and read speech, means and standard deviations of vowel duration, vowel amplitude and fundamental frequency (F0) were analyzed. We found large variation in these prosodic parameters. To look more precisely at the segmental duration and segmental amplitude differences between conversational speech and read speech, control rules of prosodic parameters in reading styles were applied to conversational speech. F0 contours of different speaking styles are superposed by normalizing the segmental duration. The differences between estimated values and actual values were analyzed. Large differences were found at sentence final and key (focused) phrases. Sentence final positions showed lengthening of segmental vowel duration and increased segmental vowel amplitude. Key phrase positions featured raising F0.

  • Physiologically-Based Speech Synthesis Using Neural Networks

    Makoto HIRAYAMA  Eric Vatikiotis-BATESON  Mitsuo KAWATO  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1898-1910

    This paper focuses on two areas in our effort to synthesize speech from neuromotor input using neural network models that effect transforms between cognitive intentions to speak, their physiological effects on vocal tract structures, and subsequent realization as acoustic signals. The first area concerns the biomechanical transform between motor commands to muscles and the ensuing articulator behavior. Using physiological data of muscle EMG (electromyography) and articulator movements during natural English speech utterances, three articulator-specific neural networks learn the forward dynamics that relate motor commands to the muscles and motion of the tongue, jaw, ant lips. Compared to a fully-connected network, mapping muscle EMG and motion for all three sets of articulators at once, this modular approach has improved performance by reducing network complexity and has eliminated some of the confounding influence of functional coupling among articulators. Network independence has also allowed us to identify and assess the effects of technical and empirical limitations on an articulator-by-articulator basis. This is particularly important for modeling the tongue whose complex structure is very difficult to examine empirically. The second area of progress concerns the transform between articulator motion and the speech acoustics. From the articulatory movement trajectories, a second neural network generates PARCOR (partial correlation) coefficients which are then used to synthesize the speech acoustics. In the current implementation, articulator velocities have been added as the inputs to the network. As a result, the model now follows the fast changes of the coefficients for consonants generated by relatively slow articulatory movements during natural English utterances. Although much work still needs to be done, progress in these areas brings us closer to our goal of emulating speech production processes computationally.

  • A Feasibility Study on a Simple Stored Channel Simulator for Urban Mobile Radio Environments

    Tsutomu TAKEUCHI  

     
    PAPER-Radio Communication

      Vol:
    E76-B No:11
      Page(s):
    1424-1428

    A stored channel simulator for digital mobile radio enviroments is proposed, which enables the field tests in the laboratory under identical conditions, since it can reproduce the actual multipath radio channels by using the channel impulse responses (CIR's) measured in the field. Linear interpolation of CIR is introduced to simplify the structure of the proposed simulator. The performance of the proposed simulator is confirmed by the laboratory tests.

  • Tree-Based Approaches to Automatic Generation of Speech Synthesis Rules for Prosodic Parameters

    Yoichi YAMASHITA  Manabu TANAKA  Yoshitake AMAKO  Yasuo NOMURA  Yoshikazu OHTA  Atsunori KITOH  Osamu KAKUSHO  Riichiro MIZOGUCHI  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1934-1941

    This paper describes automatic generation of speech synthesis rules which predict a stress level for each bunsetsu in long noun phrases. The rules are inductively inferred from a lot of speech data by using two kinds of tree-based methods, the conventional decision tree and the SBR-tree methods. The rule sets automatically generated by two methods have almost the same performance and decrease the prediction error to about 14 Hz from 23 Hz of the accent component value. The rate of the correct reproduction of the change for adjacent bunsetsu pairs is also used as a measure for evaluating the generated rule sets and they correctly reproduce the change of about 80%. The effectiveness of the rule sets is verified through the listening test. And, with regard to the comprehensiveness of the generated rules, the rules by the SBR-tree methods are very compact and easy to human experts to interpret and matches the former studies.

  • High Quality Speech Synthesis System Based on Waveform Concatenation of Phoneme Segment

    Tomohisa HIROKAWA  Kenzo ITOH  Hirokazu SATO  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1964-1970

    A new system for speech synthesis by concatenating waveforms selected from a dictionary is described. The dictionary is constructed from a two-hour speech that includes isolated words and sentences uttered by one male speaker, and contains over 45,000 entries which are identified by their average pitch, dynamic pitch parameter which represents micro pitch structure in a segment, duration and average amplitude. Phoneme duration is set according to phoneme environment, and phoneme power is controlled, by both pitch frequency and phoneme environment. Tests show the average errors in vowel duration and consonant duration are 28.8 ms and 16.8 ms respectively, and the vowel power average error is 2.9 dB. The pitch frequency patterns are calculated according to a conventional model in which the accent component is abbed to a gross phrase component. Set a phoneme string and prosody information, the optimum waveforms are selected from the dictionary by matching their attributes with the given phonetic and prosodic information. A waveform selection function, which has two terms corresponding to prosody and phonological coincidence between rule-set values and waveform values from the dictionary, is proposed. The weight coefficients used in the selection function are determined through subjective hearing tests. The selected waveform segments are then modified in waveform domain to further adjust for the desired prosody. A pitch frequency modification method based on pitch synchronous overlap-add technique is introduced into the system. Lastly, the waveforms are interpolated between voiced waveforms to avoid abrupt changes in voice spectrum and waveform shape. An absolute evaluation test of five grades is performed to the synthesized voice and the mean of the score is 3.1, which is over "good," and while the original speaker quality is retained.

  • Soft-Decision Decoding Algorithm for Binary Linear Block Codes

    Yong Geol SHIM  Choong Woong LEE  

     
    PAPER-Information Theory and Coding Theory

      Vol:
    E76-A No:11
      Page(s):
    2016-2021

    A soft-decision decoding algorithm for binary linear block codes is proposed. This algorithm seeks to minimize the block error probability. With careful examinations of the first hard-decision decoded results, the candidate codewords are efficiently searched for. Thus, we can reduce the decoding complexity (the number of hard-decision decodings) and lower the block error probability. Computer simulation results are presented for the (23, 12) Golay code. They show that the decoding complexity is considerably reduced and the block error probability is close to that of the maximum likelihood decoder.

  • Development of TTS Card for PCs and TTS Software for WSs

    Yoshiyuki HARA  Tsuneo NITTA  Hiroyoshi SAITO  Ken'ichiro KOBAYASHI  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1999-2007

    Text-to-speech synthesis (TTS) is currently one of the most important media conversion techniques. In this paper, we describe a Japanese TTS card developed for constructing a personal-computer-based multimedia platform, and a TTS software package developed for a workstation-based multimedia platform. Some applications of this hardware and software are also discussed. The TTS consists of a linguistic processing stage for converting text into phonetic and prosodic information, and a speech processing stage for producing speech from the phonetic and prosodic symbols. The linguistic processing stage uses morphological analysis, rewriting rules for accent movement and pause insertion, and other techniques to impart correct accentuation and a natural-sounding intonation to the synthesized speech. The speech processing stage employs the cepstrum method with consonant-vowel (CV) syllables as the synthesis unit to achieve clear and smooth synthesized speech. All of the processing for converting Japanese text (consisting of mixed Japanese Kanji and Kana characters) to synthesized speech is done internally on the TTS card. This allows the card to be used widely in various applications, including electronic mail and telephone service systems without placing any processing burden on the personal computer. The TTS software was used for an E-mail reading tool on a workstation.

  • Performance of Some Multidestination GBN ARQ Protocols under Unequal Round-Trip Delays

    Tsern-Huei LEE  Jo-Ku HU  

     
    PAPER-Signaling System and Communication Protocol

      Vol:
    E76-B No:11
      Page(s):
    1352-1362

    The performance of various ARQ protocols has recently been analyzed for multidestination environments. In all previous work, the round-trip delays between the transmitter and each of the receivers are assumed (or forced) to be equal to the maximum one, to simplify the analysis and/or the operation. This assumption obviously will sacrifice the system performance. In this paper, we evaluate the throughput efficiencies of three multidestination GBN ARQ protocols under unequal round-trip delays. In the investigated protocols, multiple copies of each data block are (re)transmitted contiguously to the receivers. Tight lower bounds are obtained for the throughput efficiencies of the schemes in which each data block is transmitted with the optimum number of copies. Results show that assuming all the round-trip delays to be equal to the maximum one may sacrifice the performance significantly. We also compare the performances of the three investigated protocols. In general, the performance becomes better as the transmitter utilizes more of the outcomes of previous transmission attempts.

  • Reachability Analysis for Specified Processes in a Behavior Description

    Kenji SHIBATA  Yutaka HIRAKAWA  Akira TAKURA  Tadashi OHTA  

     
    PAPER-Communication Theory

      Vol:
    E76-B No:11
      Page(s):
    1373-1380

    Until now, in a communication system which deals with multiple processes, system behavior has been described by a fixed number of processes. The state reachability problem for specified processes was generally deliberated within a pre-defined number of processes, and was analyzed by essentially searching for all possible behaviors. However, in a system whose number of processes is arbitrary, a given state which is not reachable in some situations which consists of a small number of processes might be reachable in another situation which consists of a larger number of processes. This article discusses the above problem, assuming that the behavior of a system is described by an arbitrary number of processes. After discussing the relationship between our model and the Petri net model, we clarify the properties between the set of reachable states and the number of processes involved in the system, and show an algorithm to obtain a sufficient number of processes for resolving the reachability problem.

  • The Trend of Functional Memory Development

    Keikichi TAMARU  

     
    INVITED PAPER

      Vol:
    E76-C No:11
      Page(s):
    1545-1554

    The concept of functional memory was proposed over nearly four decades ago. However, the actually usable products have not appeared until the 1980s instead of the long history of development. Functional memory is classified into three categories; there are a general functional memory, a processing element array with small size memory and a special purpose memory. Today a majority of functional memory is an associative memory or a content addressable memory (CAM) and a special purpose memory based on CAM. Due to advances in fablication capability,the capacity of CAM LSI has increased over 100 K bits. A general purpose CAM was developed based on SRAM cell and DRAM cell, respectively. The typical CAM LSI of both types, 20 K bits SRAM based CAM and 288 K bits DRAM based CAM, are introduced. DRAM based CAM is attractive for the large capacity. A parallel processor architecture based on CAM cell is proposed which is called a Functional Memory Type Parallel Processor (FMPP). The basic feature is a dual character of a higher performance CAM and a tiny processor array. It can perform a highly parallel operation to the stored data.

  • Loss and Waiting Time Probability Approximation for General Queueing

    Kenji NAKAGAWA  

     
    PAPER-Communication Theory

      Vol:
    E76-B No:11
      Page(s):
    1381-1388

    Queueing problems are investigated for very wide classes of input traffic and service time models to obtain good loss probability and waiting time probability approximation. The proposed approximation is based on the fundamental recursion formula and the Chernoff bound technique, both of which requires no particular assumption for the stochastic nature of input traffic and service time, such as renewal or markovian properties. The only essential assumption is stationarity. We see that the accuracy of the obtained approximation is confirmed by comparison with computer simulation. There are a number of advantages of the proposed method of approximation when we apply it to network capacity design or path accommodation design problems. First, the proposed method has the advantage of applying to multi-media traffic. In the ATM network, a variety of bursty or non-bursty cell traffic exist and are superposed, so some unified analysis methodology is required without depending each traffic's characteristics. Since our method assumes only the stationarity of input and service process, it is applicable to arbitrary types of cell streams. Further, this approach can be used for the unexpected future traffic models. The second advantage in application is that the proposed probability approximation requires only small amount of computational complexity. Because of the use of the Chernoff bound technique, the convolution of every traffic's probability density fnuction is replaced by the product of probability generating functions. Hence, the proposed method provides a fast algorithm for, say, the call admission control problem. Third, it has the advantage of accuracy. In this paper, we applied the approxmation to the cases of homogeneous CBR traffic, non-homogeneous CBR traffic, M/D/1, AR(1)/D/1, M/M/1 and D/M/1. In all cases, the approximating values have enough accuracy for the exact values or computer simulation results from low traffic load to high load. Moreover, in all cases of the numerical comparison, our approximations are upper bounds of the real values. This is very important for the sake of conservative network design.

  • An Analysis of Optimal Frame Rate in Low Bit Rate Video Coding

    Yasuhiro TAKISHIMA  Masahiro WADA  Hitomi MURAKAMI  

     
    PAPER-Communication Systems and Transmission Equipment

      Vol:
    E76-B No:11
      Page(s):
    1389-1397

    We analyze frame rates in low bit rate video coding and show that an optimal frame rate can be theoretically obtained. In low bit rate video coding the frame rate is usually forced to be decreased for reducing the total amount of coded information. The choice of frame rate, however, has a great effect on the picture quality in a trade-off relation between coded picture quality and motion smoothness. It is known from experience that in order to achieve an optimum balance between these two factors, a frame rate has to be selected which is appropriate for the coding scheme, property of the video sequences and coding bit rate. A theoretical analysis, however, on the existence of an optimal frame rate and how the optimal frame rate would be expressed has not been performed. In this paper, coding distortion measured by mean square error is analyzed by using video signal models such as a rate-distortion function for coded frames and inter-frame correlation coefficients for non-coded frames. Overall picture quality taking account of coded picture quality and motion smoothness simultaneously is expressed as a function of frame rate. This analysis shows that the optimum frame rate can be uniquely specified. The maximum frame rate is optimal when the coding bit rate is higher than a certain value for a given video scene, while a frame rate less than the maximum is optimal otherwise. The result of the theoretical analysis is compared with the results of computer simulation. In addition, the relation between this analysis and a subjective evaluation is described. From both comparisons this theoretical analysis can be justified as an effective scheme to indicate the optimal frame rate, and it shows the possibility of improving picture quality by selecting frame rate adaptively.

  • Trends in Capacitor Dielectrics for DRAMs

    Akihiko ISHITANI  Pierre-Yves LESAICHERRE  Satoshi KAMIYAMA  Koichi ANDO  Hirohito WATANABE  

     
    INVITED PAPER

      Vol:
    E76-C No:11
      Page(s):
    1564-1581

    Material research on capacitor dielectrics for DRAM applications is reviewed. The state of the art technologies to prepare Si3N4,Ta2O5, and SrTiO3 thin films for capacitors are described. The down-scaling limits for Si3N4 and Ta2O5 capacitors seem to be 3.5 and 1.5 nm SiO2 equivalent thickness, respectively. Combined with a rugged polysilicon electrode surface,Si3N4 and Ta2O5 based-capacitors are available for 256 Mbit and 1 Gbit DRAMs. At the present time, the minimum SiO2 equivalent thickness for high permittivity materials is around 1 nm with the leakage current density of 10-7 A/cm2. Among the great variety of ferroelectrics, two families of materials,i.e., Pb (Zr, Ti) O3 and (Ba, Sr) TiO3 have emerged as the most promising candidates for 1 Gbit DRAMs and beyond. If the chemical vapor deposition technology can be established for these materials, capacitor dielectrics should not be a limiting issue for Gbit DRAMs.

  • Design of a Multiplier-Accumulator for High Speed lmage Filtering

    Farhad Fuad ISLAM  Keikichi TAMARU  

     
    PAPER-VLSI Design Technology

      Vol:
    E76-A No:11
      Page(s):
    2022-2032

    Multiplication-accumulation is the basic computation required for image filtering operations. For real-time image filtering, very high throughput computation is essential. This work proposes a hardware algorithm for an application-specific VLSI architecture which realizes an area-efficient high throughput multiplier-accumulator. The proposed algorithm utilizes a priori knowledge of filter mask coefficients and optimizes number of basic hardware components (e.g., full adders, pipeline latches, etc.). This results in the minimum area VLSI architecture under certain input/output constraints.

  • Manifestation of Linguistic Information in the Voice Fundamental Frequency Contours of Spoken Japanese

    Hiroya FUJISAKI  Keikichi HIROSE  Noboru TAKAHASHI  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1919-1926

    Prosodic features of the spoken Japanese play an important role in the transmission of linguistic information concerning the lexical word accent, the sentence structure and the discourse structure. In order to construct prosodic rules for synthesizing high-quality speech, therefore, prosodic features of speech should be quantitatively analyzed with respect to the linguistic information. With a special focus on the fundamental frequency contour, we first define four prosodic units for the spoken Japanese, viz., prosodic word, prosodic phrase, prosodic clause and prosodic sentence, based on a decomposition of the fundamental frequency contour using a functional model for the generation process. Syntactic units are also introduced which have rough correspondence to these prosodic units. The relationships between the linguistic information and the characteristics of the components of the fundamental frequency contour are then described on the basis of results obtained by the analysis of two sets of speech material. Analysis of weathercast and newscast sentences showed that prosodic boundaries given by the manner of continuation/termination of phrase components fall into three categories, and are primarily related to the syntactic boundaries. On the other hand, analysis of noun phrases with various combinations of word accent types, syntactic structures, and focal conditions, indicated that the magnitude and the shape of the accent components, which of course reflect the information concerning the lexical accent types of constituent words, are largely influenced by the focal structure. The results also indicated that there are cases where prosody fails to meet all the requirements presented by word accent, syntax and discourse.

  • High-Performance Memory Macrocells with Row and Column Sliceable Architecture

    Nobutaro SHIBATA  Yoshinori GOTOH  Shigeru DATE  

     
    PAPER-Application Specific Memory

      Vol:
    E76-C No:11
      Page(s):
    1641-1648

    New memory-macrocell architecture has been developed to obtain high-performance macrocells with a short design Turn-Around-Time (TAT) in ASIC design. The authors propose row- and column-sliceable macrocell architecture in which only nine kinds of rectangular-functional cells, called leaf-cells, are abutted to form macrocells of any sizes. The row-sliceable structure of peripheral circuits is possible due to a newly-developed channel-embedded address decoder combined with via-hole programming. Macrocell performance, especially access time, is kept at a high level by the distributed driver configuration. Zero address-setup time during write operation is actualized by delaying internal write timing with a new delay circuit. A short design TAT of 30 minutes is accomplished due to the simplicity of both macrocell generation and the checking procedure. The macrocells are designed with gate-array and full-custom style, and fabricated with 0.5 µm CMOS technology.

  • Noise Reduction Techniques for a 64-kb ECL-CMOS SRAM with a 2-ns Cycle Time

    Kenichi OHHATA  Yoshiaki SAKURAI  Hiroaki NAMBU  Kazuo KANETANI  Youji IDEI  Toshirou HIRAMOTO  Nobuo TAMBA  Kunihiko YAMAGUCHI  Masanori ODAKA  Kunihiko WATANABE  Takahide IKEDA  Noriyuki HOMMA  

     
    PAPER-SRAM

      Vol:
    E76-C No:11
      Page(s):
    1611-1619

    An ECL-CMOS SRAM technology is proposed which features a combination of ECL word drivers, ECL write circuits and low-voltage CMOS cells. This technology assures both ultra-high-speed and high-density. In the ECL-CMOS SRAM,various kinds of noise generated during the write cycle seriously affect the memory performance, because it has much faster access than conventional SRAMs. To overcome this problem, we propose three noise reduction techniques; a noise reduction clamp circuit, an emitter follower with damping capacitor and a twisted bit line structure with "normally on" equalizer. These techniques allow fast accese and cycle times. To evaluate these techniques, a 64-kb SRAM chip was fabricated using 0.5-µm BiCMOS technology. This SRAM has a short cycle time of 2 ns and a very fast access time of 1.5 ns. Evaluation proves the usefulness of these techniques.

  • A High-Density Multiple-Valued Content-Addressable Memory Based on One Transistor Cell

    Satoshi ARAGAKI  Takahiro HANYU  Tatsuo HIGUCHI  

     
    PAPER-Application Specific Memory

      Vol:
    E76-C No:11
      Page(s):
    1649-1656

    This paper presents a high-density multiple-valued content-addressable memory (MVCAM) based on a floating-gate MOS device. In the proposed CAM, a basic operation performed in each cell is a threshold function that is a kind of inverter whose threshold value is programmable. Various multiple-valued operations for data retrieval can be easily performed using threshold functions. Moreover, each cell circuit in the MVCAM can be implemented using only a single floating-gate MOS transistor. As a result, the cell area of the four-valued CAM are reduced to 37% in comparison with that of the conventional dynamic CAM cell.

29601-29620hit(30728hit)