The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)

1401-1420hit(2504hit)

  • Speech Synthesis with Various Emotional Expressions and Speaking Styles by Style Interpolation and Morphing

    Makoto TACHIBANA  Junichi YAMAGISHI  Takashi MASUKO  Takao KOBAYASHI  

     
    PAPER

      Vol:
    E88-D No:11
      Page(s):
    2484-2491

    This paper describes an approach to generating speech with emotional expressivity and speaking style variability. The approach is based on a speaking style and emotional expression modeling technique for HMM-based speech synthesis. We first model several representative styles, each of which is a speaking style and/or an emotional expression, in an HMM-based speech synthesis framework. Then, to generate synthetic speech with an intermediate style from representative ones, we synthesize speech from a model obtained by interpolating representative style models using a model interpolation technique. We assess the style interpolation technique with subjective evaluation tests using four representative styles, i.e., neutral, joyful, sad, and rough in read speech and synthesized speech from models obtained by interpolating models for all combinations of two styles. The results show that speech synthesized from the interpolated model has a style in between the two representative ones. Moreover, we can control the degree of expressivity for speaking styles or emotions in synthesized speech by changing the interpolation ratio in interpolation between neutral and other representative styles. We also show that we can achieve style morphing in speech synthesis, namely, changing style smoothly from one representative style to another by gradually changing the interpolation ratio.

  • Concatenative Speech Synthesis Based on the Plural Unit Selection and Fusion Method

    Tatsuya MIZUTANI  Takehiko KAGOSHIMA  

     
    PAPER-Speech and Hearing

      Vol:
    E88-D No:11
      Page(s):
    2565-2572

    This paper proposes a novel speech synthesis method to generate human-like natural speech. The conventional unit-selection-based synthesis method selects speech units from a large database, and concatenates them with or without modifying the prosody to generate synthetic speech. This method features highly human-like voice quality. The method, however, has a problem that a suitable speech unit is not necessarily selected. Since the unsuitable speech unit selection causes discontinuity between the consecutive speech units, the synthesized speech quality deteriorates. It might be considered that the conventional method can attain higher speech quality if the database size increases. However, preparation of a larger database requires a longer recording time. The narrator's voice quality does not remain constant throughout the recording period. This fact deteriorates the database quality, and still leaves the problem of unsuitable selection. We propose the plural unit selection and fusion method which avoids this problem. This method integrates the unit fusion used in the unit-training-based method with the conventional unit-selection-based method. The proposed method selects plural speech units for each segment, fuses the selected speech units for each segment, modifies the prosody of the fused speech units, and concatenates them to generate synthetic speech. This unit fusion creates speech units which are connected to one another with much less voice discontinuity, and realizes high quality speech. A subjective evaluation test showed that the proposed method greatly improves the speech quality compared with the conventional method. Also, it showed that the speech quality of the proposed method is kept high regardless of the database size, from small (10 minutes) to large (40 minutes). The proposed method is a new framework in the sense that it is a hybrid method between the unit-selection-based method and the unit-training-based method. In the framework, the algorithms of the unit selection and the unit fusion are exchangeable for more efficient techniques. Thus, the framework is expected to lead to new synthesis methods.

  • Information-Spectrum Characterization of Multiple-Access Channels with Correlated Sources

    Ken-ichi IWATA  Yasutada OOHAMA  

     
    PAPER-Information Theory

      Vol:
    E88-A No:11
      Page(s):
    3196-3202

    In this paper, Information-Spectrum characterization is derived for the reliable transmission of general correlated sources over the general multiple-access channels. We consider the necessary and sufficient conditions for the transmission of general correlated sources over the general multiple-access channels by using Information-Spectrum methods which are introduced by Han and Verdu.

  • On Bit Error Probabilities of SSMA Communication Systems Using Spreading Sequences of Markov Chains

    Hiroshi FUJISAKI  Yosuke YAMADA  

     
    PAPER

      Vol:
    E88-A No:10
      Page(s):
    2669-2677

    We study asynchronous SSMA communication systems using binary spreading sequences of Markov chains and prove the CLT (central limit theorem) for the empirical distribution of the normalized MAI (multiple-access interference). We also prove that the distribution of the normalized MAI for asynchronous systems can never be Gaussian if chains are irreducible and aperiodic. Based on these results, we propose novel theoretical evaluations of bit error probabilities in such systems based on the CLT and compare these and conventional theoretical estimations based on the SGA (standard Gaussian approximation) with experimental results. Consequently we confirm that the proposed theoretical evaluations based on the CLT agree with the experimental results better than the theoretical evaluations based on the SGA. Accordingly, using the theoretical evaluations based on the CLT, we give the optimum spreading sequences of Markov chains in terms of bit error probabilities.

  • Demonstration of 10 Gbit/s-Based Time-Spreading and Wavelength-Hopping Optical-Code-Division-Multiplexing Using Fiber-Bragg-Grating En/Decoder

    Naoki MINATO  Hideaki TAMAI  Hideyuki IWAMURA  Satoko KUTSUZAWA  Shuko KOBAYASHI  Kensuke SASAKI  Akihiko NISHIKI  

     
    PAPER

      Vol:
    E88-B No:10
      Page(s):
    3848-3854

    We studied 10 Gbit/s-based time-spreading and wave-length-hopping (TS-WH) optical code division multiplexing (OCDM) using fiber Bragg gratings (FBGs). To apply it to such the high bit rate system more than ten gigabit, two techniques are adopted. One is encoding with the maximum spreading time of 400 ps, which is four times as data bit duration, to encode without shortening chip duration. Another is encoder design. The apodized refractive index profile to the unit-gratings composing the encoder is designed to encode the pulses with 10-20 ps width at 10 Gbit/s rate. Using these techniques, 210 Gbit/s OCDM is demonstrated successfully. In this scheme, transmission distance is limited due to dispersion effect because the signal has wide bandwidth to assign a wavelength-hopping pattern. We use no additional devices to compensate the dispersion, in order to construct simple and cost-effective system. Novel FBG encoder is designed to incorporate both encoding and compensating of group delay among chip pulses within one device. We confirm the extension of transmission distance in the TS-WH OCDM from the demonstration over 40 km-long single mode fiber.

  • Information-Spectrum Characterization of Broadcast Channel with General Source

    Ken-ichi IWATA  Yasutada OOHAMA  

     
    PAPER-Information Theory

      Vol:
    E88-A No:10
      Page(s):
    2808-2818

    This paper clarifies a necessary condition and a sufficient condition for transmissibility for a given set of general sources and a given general broadcast channel. The approach is based on the information-spectrum methods introduced by Han and Verdu. Moreover, we consider the capacity region of the general broadcast channel with arbitrarily fixed error probabilities if we send independent private and common messages over the channel. Furthermore, we treat the capacity region for mixed broadcast channel.

  • Rules and Algorithms for Phonetic Transcription of Standard Malay

    Yousif A. EL-IMAM  Zuraidah Mohd DON  

     
    PAPER-Speech and Hearing

      Vol:
    E88-D No:10
      Page(s):
    2354-2372

    Phonetic transcription of text is an indispensable component of text-to-speech (TTS) systems and is used in acoustic modeling for speech recognition and other natural language processing applications. One approach to the transcription of written text into phonetic entities or sounds is to use a set of well-defined context and language-dependent rules. The process of transcribing text into sounds starts by preprocessing the text and representing it by lexical items to which the rules are applicable. The rules can be segregated into phonemic and phonetic rules. Phonemic rules operate on graphemes to convert them into phonemes. Phonetic rules operate on phonemes and convert them into context-dependent phonetic entities with actual sounds. Converting from written text into actual sounds, developing a comprehensive set of rules, and transforming the rules into implementable algorithms for any language cause several problems that have their origins in the relative lack of correspondence between the spelling of the lexical items and their sound contents. For Standard Malay (SM) these problems are not as severe as those for languages of complex spelling systems, such as English and French, but they do exist. In this paper, developing a comprehensive computerized system for processing SM text and transcribing it into phonetic entities and evaluating the performance of this system, irrespective of the application, is discussed. In particular, the following issues are dealt with in this paper: (1) the spelling and other problems of SM writing and their impact on converting graphemes into phonemes, (2) the development of a comprehensive set of grapheme-to-phoneme rules for SM, (3) a description of the phonetic variations of SM or how the phonemes of SM vary in context and the development of a set of phoneme-to-phonetic transcription rules, (4) the formulation of the phonemic and phonetic rules into algorithms that are applicable to the computer-based processing of input SM text, and (5) the evaluation of the performance of the process of converting SM text into actual sounds by the above mentioned methods.

  • Millimeter-Wave Broadband Mixers in New Testing and Measurement Instruments for High Data Rate Signal Analyses

    Masayuki KIMISHIMA  

     
    PAPER

      Vol:
    E88-C No:10
      Page(s):
    1973-1980

    The millimeter-wave (MMW) broadband mixers that are useful for measurement instruments to analyze MMW high data rate signals have been investigated. At first, we propose the specialized RF front-end for analyses of MMW high data rate signals. Next, the required specifications for the 1st mixers of the front-end are estimated, and the design, fabrication, and testing results of Q, V, and W-band monolithic broadband resistive mixers are described. The testing results are compared with performances of the diode mixer designed for V-band. It was found that the resistive mixers have very attractive performances of low conversion loss, good frequency flatness and high third order intercept point (IP3) with low Local (LO) oscillators power. The developed resistive mixers are suitable for the proposed MMW band measurement instruments.

  • Multi-Gigabit Pre-Emphasis Design and Analysis for Serial Link

    Chih-Hsien LIN  Chang-Hsiao TSAI  Chih-Ning CHEN  Shyh-Jye JOU  

     
    PAPER-Electronic Circuits

      Vol:
    E88-C No:10
      Page(s):
    2009-2019

    In this paper, a multi-Gbps pre-emphasis design methodology and circuits for a 4/2 Pulse Amplitude Modulation (PAM) transmitter of high-speed data serial link over cable are proposed. Theoretically analysis of the total frequency response including pre-emphasis, package, cable loss and termination are first carried out. In order to gain higher data rates without increasing of symbol rate, we use 4 PAM in our system. Then, we propose a pre-emphasis architecture and algorithm that can enlarge the high frequency response so the overall frequency response in the receiver side is uniform within the desired frequency range. The overall circuit is implemented in TSMC 0.18 µm 1P6M 1.8 V CMOS process. A test chip of this transmitter with pre-emphasis, PLL circuit and on-chip termination resistors is implemented by full custom flow to verify the design methodology. The measurement results of 10/5 Gbps (4/2 PAM) are carried out over 5 meter (m) long cable and is in agreement with our analysis and simulation results.

  • High-Speed Digital Circuit Design Using Differential Logic with Asymmetric Signal Transition

    Masao MORIMOTO  Makoto NAGATA  Kazuo TAKI  

     
    PAPER-Electronic Circuits

      Vol:
    E88-C No:10
      Page(s):
    2001-2008

    Asymmetric slope differential CMOS (ASD-CMOS) and asymmetric slope differential dynamic logic (ASDDL) surpass the highest speed that conventional CMOS logic circuits can achieve, resulting from deeply shortened rise time along with relatively prolonged fall time. ASD-CMOS is a static logic and ASDDL is a dynamic logic without per-gate synchronous clock signal, each of which needs two-phase operation as well as differential signaling, however, interleaved precharging hides the prolonged fall time and BDD-based compound logic design mitigates area increase. ASD-CMOS 16-bit multiplier in a 0.18-µm CMOS technology demonstrates 1.78 nsec per an operation, which reaches 34% reduction of the best delay time achieved by a multiplier using a CMOS standard cell library that is conventional yet tuned to the optimum in energy-delay products. ASDDL can be superior to DCVS-DOMINO circuits not only in delay time but also in area and even in power. ASDDL 16-bit multiplier achieves delay and power reduction of 4% and 20%, respectively, compared with DCVS-DOMINO realization. A prototype ASD-CMOS 16-bit multiplier with built-in test circuitry fabricated in a 0.13-µm CMOS technology operates with the delay time of 1.57 nsec at 1.2 V.

  • Speculative Computation and Abduction for an Autonomous Agent

    Ken SATOH  

     
    PAPER

      Vol:
    E88-D No:9
      Page(s):
    2031-2038

    In this paper, we propose an agent architecture for a combination of speculative computation and abduction. Speculative computation is a tentative computation when complete information for performing computation is not obtained. We use a default value to complement such incomplete information. Unlike usual default reasoning, the real value for the information can be obtained during the computation and the computation can be revised on the fly. In the previous work, we applied this technique to handling distributed problem solving under incomplete communication environments in the context of multi-agent systems and proposed correct procedures in abductive logic programming in terms of perfect model semantics. In the previous work, however, we regarded assumptions as defaults and used these assumptions for speculative computation. Thus, we could not perform hypothetical reasoning, that is, the original usage of abduction. In this paper, we extend our framework so that speculative computation and abduction can be both performed. As a result, our procedure becomes an extension of the abductive procedure developed by Kakas and Mancarella augmented by dynamic belief revision mechanism about outside world.

  • Anti-Parallel Dipole Coupling of Quantum Dots via an Optical Near-Field Interaction

    Tadashi KAWAZOE  Kiyoshi KOBAYASHI  Motoichi OHTSU  

     
    PAPER

      Vol:
    E88-C No:9
      Page(s):
    1845-1849

    We observed the optically forbidden energy transfer between cubic CuCl quantum dots coupled via an optical near-field interaction using time-resolved near-field photoluminescence (PL) spectroscopy. The energy transfer time and exciton lifetime were estimated from the rise and decay times of the PL pump-probe signal, respectively. We found that the exciton lifetime increased as the energy transfer time fell. This result strongly supports the notion that near-field interaction between QD makes the anti-parallel dipole coupling. Namely, a quantum-dots pair coupled by an optical near field has a long exciton lifetime which indicates the anti-parallel coupling of QDs forming a weakly radiative quadrupole state.

  • Frequency Domain Microphone Array Calibration and Beamforming for Automatic Speech Recognition

    Jwu-Sheng HU  Chieh-Cheng CHENG  

     
    PAPER-Noise and Vibration

      Vol:
    E88-A No:9
      Page(s):
    2401-2411

    This investigation proposed two array beamformers SPFDBB (Soft Penalty Frequency Domain Block Beamformer) and FDABB (Frequency Domain Adjustable Block Beamformer). Compared with the conventional beamformers, these frequency-domain methods can significantly reduce the computation power requirement in ASR (Automatic Speech Recognition) based applications. Like other reference signal based techniques, SPFDBB and FDABB minimize microphone's mismatch, desired signal cancellation caused by reflection effects and resolution due to the array's position. Additionally, these proposed methods are suitable for both near-field and far-field environments. Generally, the convolution relation between channel and speech source in time domain cannot be modeled accurately as a multiplication in the frequency domain with a finite window size, especially in ASR applications. SPFDBB and FDABB can approximate this multiplication by treating several frames as a block to achieve a better beamforming result. Moreover, FDABB adjusts the number of frames on-line to cope with the variation of characteristics in both speech and interference signals. A better performance was found to be achievable by combining these methods with an ASR mechanism.

  • A Fast Encoding Technique for Vector Quantization of LSF Parameters

    Sangwon KANG  Yongwon SHIN  Changyong SON  Thomas R. FISCHER  

     
    PAPER-Multimedia Systems for Communications" Multimedia Systems for Communications

      Vol:
    E88-B No:9
      Page(s):
    3750-3755

    A fast encoding technique is described for vector quantization (VQ) of line spectral frequency parameters. A reduction in VQ encoding complexity is achieved by using a preliminary test that reduces the necessary codebook search range. The test is performed based on two criteria. One criterion uses the distance between a specific single element of the input vector and the corresponding element of the codevectors in the codebook. The other criterion makes use of the ordering property of LSF parameters. The fast encoding technique is implemented in the enhanced variable rate codec (EVRC) encoding algorithm. Simulation results show that the average searching range of the codebook can be reduced by 44.50% for the EVRC without degradation of spectral distortion (SD).

  • Optimum Wavelength Filter Spectrum Response in DWDM Systems for Ultimate Spectral Efficiency

    Shuichi SUZUKI  Yasuo KOKUBUN  

     
    PAPER-Fiber-Optic Transmission for Communications

      Vol:
    E88-B No:9
      Page(s):
    3649-3659

    A method of evaluating the wavelength filter spectrum response is introduced. The increase of the crosstalk level due to the filtering and the relation between the total crosstalk and the spectral efficiency are derived in detail using the Gaussian filter. Since this method can be applied to various kinds of filter spectrum responses, the ultimate spectral efficiencies of filters are compared. In this comparison, the problem of the box-like filter, which has been considered to be desirable, is revealed, and this is improved by cascading the filter spectrum. The requirement on the rejection floor that inheres in the filter is also made clear.

  • Design of UWB Pulses in Terms of B-Splines

    Mitsuhiro MATSUO  Masaru KAMADA  Hiromasa HABUCHI  

     
    PAPER-Pulse Shape

      Vol:
    E88-A No:9
      Page(s):
    2287-2298

    The present paper discusses a new construction of UWB pulses within the framework of soft-spectrum adaptation. The employed basis functions are B-splines having the following properties: (i) The B-splines are time-limited piecewise polynomials. (ii) The first-order B-splines are rectangular pulses and they converge band-limited functions at the limit that their order tends to infinity. (iii) There are an analog circuit and a fast digital filter for the generation of B-splines. Simple application of Gram-Schmidt orthonormalization process to the shifted B-splines results in a few basic pulses, which are well time-limited and have a broad band width, but do not comply with the FCC spectral mask. A constrained approximation technique is proposed for adaptively designing pulses so that they approximate target frequency characteristics. At the cost of using eleven shifted B-splines, an example set of four pulses comforting the FCC spectral mask is obtained.

  • Traffic Sign Classification Using Ring Partitioned Method

    Aryuanto SOETEDJO  Koichi YAMADA  

     
    PAPER-Intelligent Transport System

      Vol:
    E88-A No:9
      Page(s):
    2419-2426

    Traffic sign recognition usually consists of two stages: detection and classification. In this paper, we describe the classification stage using the ring-partitioned method. The proposed method uses a specified grayscale image in the pre-processing step and ring-partitioned matching in the matching step. The method does not need carefully prepared many samples of traffic sign images for the training process, alternatively only the standard traffic signs are used as the reference images. The experimental results show the effectiveness of the method in the matching of occluded, rotated, and illumination problems of the traffic sign images with the fast computation time.

  • Tradeoff between Area Spectral Efficiency and End-to-End Throughput in Rate-Adaptive Multihop Radio Networks

    Koji YAMAMOTO  Susumu YOSHIDA  

     
    PAPER

      Vol:
    E88-B No:9
      Page(s):
    3532-3540

    We investigate the impact of symbol rate control, modulation level control, and the number of hops on the area spectral efficiency of interference-limited multihop radio networks. By controlling symbol rate and modulation level, data rate can be adapted according to received power. In addition, varying the number of hops can control received power. First, we evaluate the achievable end-to-end throughput of multihop transmission assuming symbol rate and modulation level control. Numerical results reveal that by controlling symbol rate or using multihop transmission, the end-to-end communication range can be extended at the cost of end-to-end throughput, and this may result in lower area spectral efficiency. Next, an expression for the area spectral efficiency of multihop radio networks is derived as a function of the number of hops and the end-to-end throughput. Numerical results also reveal that the resulting area spectral efficiency depends on the specific circumstances, which, however, can be increased only by using multihop transmission.

  • Ultra Wideband Time Hopping Impulse Radio Signal Impact on Performance of TD-SCDMA

    Guangrong YUE  Hongyu CHEN  Shaoqian LI  

     
    PAPER-Co-existance

      Vol:
    E88-A No:9
      Page(s):
    2373-2380

    This paper studies power spectrum density (PSD) of multi-user aggregate time hopping (TH) ultra wideband (UWB) signal with asynchronous transmission and synchronous transmission. TH codes under consideration are deterministic periodic code and random integer code. Based on the PSD, the in-band interference power for TD SCDMA is investigated as function of UWB system parameters. Two UWB modulations, TH pulse position modulation (PPM) and TH BPSK, are considered for calculating the in-band interference power. The numerical results indicate that asynchronous transmission is an effective way to decrease the peak in-band interference caused by multi-user aggregate TH-PPM UWB signal. Although increasing the maximum of time hopping code elements can smooth the PSD of TH UWB signal, it is not a good idea for reducing the peak in-band interference for TD SCDMA. For the random integer TH code, while only TH UWB continuous spectral exists in TD SCDMA band or multi-user signals of TH UWB are transmitted asynchronously, the in-band interference for TD SCDMA is in proportion to the number of the UWB users. For TD SCDMA in which band discrete spectral line exists the in-band interference caused by TH UWB with synchronous transmission is in proportion to the square of the number of the UWB users.

  • Tree-Structured Clustering Methods for Piecewise Linear-Transformation-Based Noise Adaptation

    Zhipeng ZHANG  Toshiaki SUGIMURA  Sadaoki FURUI  

     
    PAPER-Speech and Hearing

      Vol:
    E88-D No:9
      Page(s):
    2168-2176

    This paper proposes the application of tree-structured clustering to the processing of noisy speech collected under various SNR conditions in the framework of piecewise-linear transformation (PLT)-based HMM adaptation for noisy speech. Three kinds of clustering methods are described: a one-step clustering method that integrates noise and SNR conditions and two two-step clustering methods that construct trees for each SNR condition. According to the clustering results, a noisy speech HMM is made for each node of the tree structure. Based on the likelihood maximization criterion, the HMM that best matches the input speech is selected by tracing the tree from top to bottom, and the selected HMM is further adapted by linear transformation. The proposed methods are evaluated by applying them to a Japanese dialogue recognition system. The results confirm that the proposed methods are effective in recognizing digitally noise-added speech and actual noisy speech issued by a wide range of speakers under various noise conditions. The results also indicate that the one-step clustering method gives better performance than the two-step clustering methods.

1401-1420hit(2504hit)