IEICE global.ieice.org Site

Keyword Search Result

[Keyword] SPE(2504hit)

1461-1480hit(2504hit)

Ultra Low Loss and Long Length Photonic Crystal Fiber
Katsusuke TAJIMA Jian ZHOU

INVITED PAPER

Vol:
E88-C No:5
Page(s):
870-875
Photonic crystal fiber (PCF) is a promising candidate for future transmission media due to its unobtainable features in a conventional single-mode fiber. We discuss some important problems to realize a PCF for transmission purpose. We also present recent progress on the PCF as a transmission media.
An Optimal Certificate Dispersal Algorithm for Mobile Ad Hoc Networks
Hua ZHENG Shingo OMURA Jiro UCHIDA Koichi WADA

PAPER

Vol:
E88-A No:5
Page(s):
1258-1266
In this paper, we focus on the problem that in an ad hoc network, how to send a message securely between two users using the certificate dispersal system. In this system, special data called certificate is issued between two users and these issued certificates are stored among the network. Our final purpose on this certificate dispersal problem is to construct certificate graphs with lower dispersability cost which indicates the average number of certificates stored in each node in an ad hoc network. As our first step, when a certificate graph is given, we construct two efficient certificate dispersal algorithms for strongly connected graphs and directed graphs in this paper. We can show that for a strongly connected graph G =(V, E) and a directed graph H =(V ′, E ′), new upper bounds on dispersability cost on the average number of certificates stored in one node are O(DG +|E|/|V|) and O(pG dmax +|E ′|/|V ′|) respectively, where DG is the diameter of G, dmax is the maximum diameter of strongly connected components of H and pG is the number of strongly connected components of H. Furthermore, we give some new lower bounds for the problem and we also show that our algorithms are optimal for several graph classes.
Novel 4RTD Logic Circuits
Hideaki YAMADA Takao WAHO

PAPER-Nanomaterials and Quantum-Effect Devices

Vol:
E88-C No:4
Page(s):
699-704
Based on the similarity in current-voltage characteristics of resonant-tunneling diodes (RTDs) and tunneling-type superconductive Josephson junctions, novel current-mode logic circuits consisting of four RTDs have been proposed. NAND and NOR functions, as well as AND and OR, can be obtained in a simple circuit configuration. SPICE simulation showed that the present circuits can operate at a clock frequency as high as 200 GHz.
Dual Level Access Scheme for Digital Video Sequences
Thumrongrat AMORNRAKSA Peter SWEENEY

PAPER-Broadcast Systems

Vol:
E88-B No:4
Page(s):
1632-1640
In this paper, a dual level access scheme is proposed to provide two levels of access to the broadcast data; one to video signals protected for authorized users, another to extra information e.g. advertisements provided for the remaining users in the network. In the scheme, video signals in MPEG format are considered. The video contents are protected from unauthorized viewing by encrypting the DC coefficients of the luminance component in I-frames, which are extracted from the MPEG bit-stream. An improved direct sequence spread spectrum technique is used to add extra information to non-zero AC coefficients, extracted from the same MPEG bit-stream. The resultant MPEG bit-stream still occupies the same existing bandwidth allocated for a broadcast channel. At the receiver, the extra information is recovered and subtracted from the altered AC coefficients. The result is then combined with the decrypted DC coefficients to restore the original MPEG bit-stream. The experimental results show that less than 2.9% of the size of MPEG bit-stream was required to be encrypted in order to efficiently reduce its commercial value. Also, on average, with a 1.125 Mbps MPEG bit-stream, an amount of extra information up to 1.4 kbps could be successfully transmitted, while the video quality (PSNR) was unnoticeably degraded by 2.81 dB.
Adaptive Microphone Array System with Two-Stage Adaptation Mode Controller
Yang-Won JUNG Hong-Goo KANG Chungyong LEE Dae-Hee YOUN Changkyu CHOI Jaywoo KIM

PAPER-Digital Signal Processing

Vol:
E88-A No:4
Page(s):
972-977
In this paper, an adaptive microphone array system with a two-stage adaptation mode controller (AMC) is proposed for high-quality speech acquisition in real environments. The proposed system includes an adaptive array algorithm, a time-delay estimator and a newly proposed AMC. To ensure proper adaptation of the adaptive array algorithm, the proposed AMC uses not only temporal information, but also spatial information. The proposed AMC is constructed with two processing stages: an initialization stage and a running stage. In the initialization stage, a sound source localization technique is adopted, and a signal correlation characteristic is used in the running stage. For the adaptive array algorithm, a generalized sidelobe canceller with an adaptive blocking matrix is used. The proposed algorithm is implemented as a real-time man-machine interface module of a home-agent robot. Simulation results show 13 dB SINR improvement with the speaker sitting 2 m distance from the home-agent robot. The speech recognition rate is also enhanced by 32% when compared to the single channel acquisition system.
Bayesian Confidence Scoring and Adaptation Techniques for Speech Recognition
Tae-Yoon KIM Hanseok KO

LETTER-Multimedia Systems for Communications" Multimedia Systems for Communications

Vol:
E88-B No:4
Page(s):
1756-1759
Bayesian combining of confidence measures is proposed for speech recognition. Bayesian combining is achieved by the estimation of joint pdf of confidence feature vector in correct and incorrect hypothesis classes. In addition, the adaptation of a confidence score using the pdf is presented. The proposed methods reduced the classification error rate by 18% from the conventional single feature based confidence scoring method in isolated word Out-of-Vocabulary rejection test.
Diffusion-Type Autonomous Decentralized Flow Control for End-to-End Flow in High-Speed Networks
Chisa TAKANO Masaki AIDA

PAPER-Network

Vol:
E88-B No:4
Page(s):
1559-1567
We have proposed diffusion-type flow control as a solution for the extremely time-sensitive flow control required for high-speed networks. In our method of flow control, we design in advance simple and appropriate rules for action at the nodes, and these automatically result in stable and efficient network-wide performance through local interactions between nodes. Specifically, we design the rules for the flow control action of each node that simulates the local interaction of a diffusion phenomenon, in order that the packet density is diffused throughout the network as soon as possible. However, in order to make a comparison with other flow control methods under the same conditions, the evaluations in our previous studies used a closed network model, in which the number of packets was unchanged. This paper investigates the performance of our flow control method for an end-to-end flow, in order to show that it is still effective in more realistic networks. We identify the key issues associated with our flow control method when applied to an open network model, and demonstrate a two-step solution. First, we consider the rule for flow control action at the boundary node, which is the ingress node in the network, and propose a rule to achieve smooth diffusion of the packet density. Secondly, we introduce a shaping mechanism, which keeps the number of packets in the network at an appropriate level.
Generalized Variance-Based Markovian Fitting for Self-Similar Traffic Modelling
Shou-Kuo SHAO Malla REDDY PERATI Meng-Guang TSAI Hen-Wai TSAO Jingshown WU

PAPER

Vol:
E88-B No:4
Page(s):
1493-1502
Most of the proposed self-similar traffic models are asymptotic in nature. Hence, they are less effective in queueing-based performance evaluation when the buffer sizes are small. In this paper, we propose a short range dependent (SRD) process modelling by a generalized variance-based Markovian fitting to provide effective queueing-based performance measures when buffer sizes are small. The proposed method is to match the variance of the exact second-order self-similar processes. The fitting procedure determines the related parameters in an exact and straightforward way. The resultant traffic model essentially consists of a superposition of several two-state Markov-modulated Poisson processes (MMPPs) with distinct modulating parameters. We present how well the resultant MMPP could emulate the variance of original self-similar traffic in the range of the specified time scale, and could provide more accurate bounds for the queueing-based performance measures, namely tail probability, mean waiting time and loss probability. Numerical results show that both the second-order statistics and queueing-based performance measures when buffer capacity is small are more accurate than that of the variance-based fitting where the modulating parameters of each superposed two-state MMPP are equal. We then investigate the relationship between time scale and the number of superposed two-state MMPPs. We found that when the performance measures pertaining to larger time scales are not better than that of smaller ones, we need to increase the number of superposed two-state MMPPs to maintain the accurate and reliable queueing-based performance measures. We then conclude from the extensive numerical examples that an exact second-order self-similar traffic can be well represented by the proposed model.
A Sub-0.5 V Differential ED-CMOS/SOI Circuit with Over-1-GHz Operation
Takakuni DOUSEKI Toshishige SHIMAMURA Nobutaro SHIBATA

PAPER-Digital

Vol:
E88-C No:4
Page(s):
582-588
This paper describes a speed-oriented ultralow-voltage and low-power SOI circuit technique based on a differential enhancement- and depletion-mode (ED)-MOS circuit. Combining an ED-MOS circuit block for critical paths and a multi-Vth CMOS circuit block for noncritical paths, that is, the so-called differential ED-CMOS/SOI circuit, makes it possible to achieve low-power and ultrahigh-speed operation of over 1 GHz at a supply voltage of less than 0.5 V. As two applications of the differential ED-CMOS/SOI circuit, a multi-stage frequency divider that uses the ED-MOS circuit in a first-stage frequency divider and a pipelined adder with a CMOS pipeline register are described in detail. To verify the effectiveness of the ED-CMOS/SOI circuit scheme, we fabricated a 1/8 frequency divider and a 32-bit binary look-ahead carry (BLC) adder using the 0.25-µm MTCMOS/SOI process. The frequency divider operates down to 0.3 V with a maximum operating frequency of 3.6 GHz while suppressing power dissipation to 0.3 mW. The 32-bit adder operates at a frequency of 1 GHz at 0.5 V.
Performance Limitation of On-Chip Global Interconnects for High-Speed Signaling
Akira TSUCHIYA Masanori HASHIMOTO Hidetoshi ONODERA

PAPER

Vol:
E88-A No:4
Page(s):
885-891
This paper discusses performance limitation of on-chip interconnects. On-chip global interconnects are considered to be a bottleneck of high-performance LSIs. To overcome this issue, high-speed signaling and large throughput interconnection using electrical wires have been studied. However the limitation of on-chip interconnects has not been examined sufficiently. This paper reveals the maximum performance of on-chip global interconnects based on derived analytic expressions and detailed circuit simulation. We derive trade-off curves among bit rate, interconnect length, and eye opening both for single-end and for differential signaling. The results show that differential signaling improves signaling performance several times compared with conventional single-end signaling, and demonstrate that 80 Gbps differential signaling on 10 mm interconnects is promising.
Objective Quality Assessment of Wideband Speech Coding
Nobuhiko KITAWAKI Kou NAGAI Takeshi YAMADA

PAPER-Network

Vol:
E88-B No:3
Page(s):
1111-1118
Recently, wideband speech communication using 7 kHz-wideband speech coding, as described in ITU-T Recommendations G.722, G.722.1, and G.722.2, has become increasingly necessary for use in advanced IP telephony using PCs, since, for this application, hands-free communication using separate microphones and loudspeakers is indispensable, and in this situation wideband speech is particularly helpful in enhancing the naturalness of communication. An objective quality measurement methodology for wideband-speech coding has been studied, its essential components being an objective quality measure and an input test signal. This paper describes Wideband-PESQ conforming to the draft Annex to ITU-T Recommendation P.862, "Perceptual Evaluation of Speech Quality (PESQ)," as the objective quality measure, by evaluating the consistency between the subjectively evaluated MOS (Mean Opinion Score) and objectively estimated MOS. This paper also describes the verification of artificial voice conforming to Recommendation P.50 "Artificial Voices," as the input test signal for such measurements, by evaluating the consistency between the objectively estimated MOS using a real voice and that obtained using an artificial voice.
Parameter Sharing in Mixture of Factor Analyzers for Speaker Identification
Hiroyoshi YAMAMOTO Yoshihiko NANKAKU Chiyomi MIYAJIMA Keiichi TOKUDA Tadashi KITAMURA

PAPER-Feature Extraction and Acoustic Medelings

Vol:
E88-D No:3
Page(s):
418-424
This paper investigates the parameter tying structures of a mixture of factor analyzers (MFA) and discriminative training of MFA for speaker identification. The parameters of factor loading matrices or diagonal matrices are shared in different mixtures of MFA. Then, minimum classification error (MCE) training is applied to the MFA parameters to enhance the discrimination ability. The result of a text-independent speaker identification experiment shows that MFA outperforms the conventional Gaussian mixture model (GMM) with diagonal or full covariance matrices and achieves the best performance when sharing the diagonal matrices, resulting in a relative gain of 26% over the GMM with diagonal covariance matrices. The improvement is more significant especially in sparse training data condition. The recognition performance is further improved by MCE training with an additional gain of 3% error reduction.
Applying Sparse KPCA for Feature Extraction in Speech Recognition
Amaro LIMA Heiga ZEN Yoshihiko NANKAKU Keiichi TOKUDA Tadashi KITAMURA Fernando G. RESENDE

PAPER-Feature Extraction and Acoustic Medelings

Vol:
E88-D No:3
Page(s):
401-409
This paper presents an analysis of the applicability of Sparse Kernel Principal Component Analysis (SKPCA) for feature extraction in speech recognition, as well as, a proposed approach to make the SKPCA technique realizable for a large amount of training data, which is an usual context in speech recognition systems. Although the KPCA (Kernel Principal Component Analysis) has proved to be an efficient technique for being applied to speech recognition, it has the disadvantage of requiring training data reduction, when its amount is excessively large. This data reduction is important to avoid computational unfeasibility and/or an extremely high computational burden related to the feature representation step of the training and the test data evaluations. The standard approach to perform this data reduction is to randomly choose frames from the original data set, which does not necessarily provide a good statistical representation of the original data set. In order to solve this problem a likelihood related re-estimation procedure was applied to the KPCA framework, thus creating the SKPCA, which nevertheless is not realizable for large training databases. The proposed approach consists in clustering the training data and applying to these clusters a SKPCA like data reduction technique generating the reduced data clusters. These reduced data clusters are merged and reduced in a recursive procedure until just one cluster is obtained, making the SKPCA approach realizable for a large amount of training data. The experimental results show the efficiency of SKPCA technique with the proposed approach over the KPCA with the standard sparse solution using randomly chosen frames and the standard feature extraction techniques.
Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones
Weifeng LI Tetsuya SHINDE Hiroshi FUJIMURA Chiyomi MIYAJIMA Takanori NISHINO Katunobu ITOU Kazuya TAKEDA Fumitada ITAKURA

PAPER-Feature Extraction and Acoustic Medelings

Vol:
E88-D No:3
Page(s):
384-390
This paper describes a new multi-channel method of noisy speech recognition, which estimates the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by distributed microphones. The advantages of the proposed method are as follows: 1) The method does not require a sensitive geometric layout, calibration of the sensors nor additional pre-processing for tracking the speech source; 2) System works in very small computation amounts; and 3) Regression weights can be statistically optimized over the given training data. Once the optimal regression weights are obtained by regression learning, they can be utilized to generate the estimated log spectrum in the recognition phase, where the speech of close-talking is no longer required. The performance of the proposed method is illustrated by speech recognition of real in-car dialogue data. In comparison to the nearest distant microphone and multi-microphone adaptive beamformer, the proposed approach obtains relative word error rate (WER) reductions of 9.8% and 3.6%, respectively.
Developments in Corpus-Based Speech Synthesis: Approaching Natural Conversational Speech
Nick CAMPBELL

INVITED PAPER

Vol:
E88-D No:3
Page(s):
376-383
This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual daily-life situations. The paper proposes that the expresion of affect (often translated as 'kansei' in Japanese) is the main factor differentiating laboratory speech from real-world conversational speech, and presents a framework for the specification of affect through differences in speaking style and voice quality. Having an enormous corpus of speech samples available for concatenation allows the selection of complete phrase-sized utterance segments, and changes the focus of unit selection from segmental or phonetic continuity to one of prosodic and discoursal appropriateness instead. Samples of the resulting large-corpus-based synthesis can be heard at http://feast.his.atr.jp/AESOP.
Modeling Improved Prosody Generation from High-Level Linguistically Annotated Corpora
Gerasimos XYDAS Dimitris SPILIOTOPOULOS Georgios KOUROUPETROGLOU

PAPER-Speech Synthesis and Prosody

Vol:
E88-D No:3
Page(s):
510-518
Synthetic speech usually suffers from bad F0 contour surface. The prediction of the underlying pitch targets robustly relies on the quality of the predicted prosodic structures, i.e. the corresponding sequences of tones and breaks. In the present work, we have utilized a linguistically enriched annotated corpus to build data-driven models for predicting prosodic structures with increased accuracy. We have then used a linear regression approach for the F0 modeling. An appropriate XML annotation scheme has been introduced to encode syntax, grammar, new or already given information, phrase subject/object information, as well as rhetorical elements in the corpus, by exploiting a Natural Language Generator (NLG) system. To prove the benefits from the introduction of the enriched input meta-information, we first show that while tone and break CART predictors have high accuracy when standing alone (92.35% for breaks, 87.76% for accents and 99.03% for endtones), their application in the TtS chain degrades the Linear Regression pitch target model. On the other hand, the enriched linguistic meta-information minimizes errors of models leading to a more natural F0 surface. Both objective and subjective evaluation were adopted for the intonation contours by taking into account the propagated errors introduced by each model in the synthesis chain.
Designing Target Cost Function Based on Prosody of Speech Database
Kazuki ADACHI Tomoki TODA Hiromichi KAWANAMI Hiroshi SARUWATARI Kiyohiro SHIKANO

PAPER-Speech Synthesis and Prosody

Vol:
E88-D No:3
Page(s):
519-524
This research aims to construct a high-quality Japanese TTS (Text-to-Speech) system that has high flexibility in treating prosody. Many TTS systems have implemented a prosody control system but such systems have been fundamentally designed to output speech with a standard pitch and speech rate. In this study, we employ a unit selection-concatenation method and also introduce an analysis-synthesis process to provide precisely controlled prosody in output speech. Speech quality degrades in proportion to the amount of prosody modification, therefore a target cost for prosody is set to evaluate prosodic difference between target prosody and speech candidates in such a unit selection system. However, the conventional cost ignores the original prosody of speech segments, although it is assumed that the quality deterioration tendency varies in relation to the pitch or speech rate of original speech. In this paper, we propose a novel cost function design based on the prosody of speech segments. First, we recorded nine databases of Japanese speech with different prosodic characteristics. Then with respect to the speech databases, we investigated the relationships between the amount of prosody modification and the perceptual degradation. The results indicate that the tendency of perceptual degradation differs according to the prosodic features of the original speech. On the basis of these results, we propose a new cost function design, which changes a cost function according to the prosody of a speech database. Results of preference testing of synthetic speech show that the proposed cost functions generate speech of higher quality than the conventional method.
Robust Dependency Parsing of Spontaneous Japanese Spoken Language
Tomohiro OHNO Shigeki MATSUBARA Nobuo KAWAGUCHI Yasuyoshi INAGAKI

PAPER-Speech Corpora and Related Topics

Vol:
E88-D No:3
Page(s):
545-552
Spontaneously spoken Japanese includes a lot of grammatically ill-formed linguistic phenomena such as fillers, hesitations, inversions, and so on, which do not appear in written language. This paper proposes a novel method of robust dependency parsing using a large-scale spoken language corpus, and evaluates the availability and robustness of the method using spontaneously spoken dialogue sentences. By utilizing stochastic information about the appearance of ill-formed phenomena, the method can robustly parse spoken Japanese including fillers, inversions, or dependencies over utterance units. Experimental results reveal that the parsing accuracy reached 87.0%, and we confirmed that it is effective to utilize the location information of a bunsetsu, and the distance information between bunsetsus as stochastic information.
Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis
Junichi YAMAGISHI Koji ONISHI Takashi MASUKO Takao KOBAYASHI

PAPER-Speech Synthesis and Prosody

Vol:
E88-D No:3
Page(s):
502-509
This paper describes the modeling of various emotional expressions and speaking styles in synthetic speech using HMM-based speech synthesis. We show two methods for modeling speaking styles and emotional expressions. In the first method called style-dependent modeling, each speaking style and emotional expression is modeled individually. In the second one called style-mixed modeling, each speaking style and emotional expression is treated as one of contexts as well as phonetic, prosodic, and linguistic features, and all speaking styles and emotional expressions are modeled simultaneously by using a single acoustic model. We chose four styles of read speech -- neutral, rough, joyful, and sad -- and compared the above two modeling methods using these styles. The results of subjective evaluation tests show that both modeling methods have almost the same accuracy, and that it is possible to synthesize speech with the speaking style and emotional expression similar to those of the target speech. In a test of classification of styles in synthesized speech, more than 80% of speech samples generated using both the models were judged to be similar to the target styles. We also show that the style-mixed modeling method gives fewer output and duration distributions than the style-dependent modeling method.
An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems
Seiichi NAKAGAWA Tomohiro WATANABE Hiromitsu NISHIZAKI Takehito UTSURO

PAPER-Spoken Language Systems

Vol:
E88-D No:3
Page(s):
463-471
This paper describes an accurate unsupervised speaker adaptation method for lecture style spontaneous speech recognition using multiple LVCSR systems. In an unsupervised speaker adaptation framework, the improvement of recognition performance by adapting acoustic models remarkably depends on the accuracy of labels such as phonemes and syllables. Therefore, extraction of the adaptation data guided by confidence measure is effective for unsupervised adaptation. In this paper, we looked for the high confidence portions based on the agreement between two LVCSR systems, adapted acoustic models using the portions attached with high accurate labels, and then improved the recognition accuracy. We applied our method to the Corpus of Spontaneous Japanese (CSJ) and the method improved the recognition rate by about 2.1% in comparison with a traditional method.

1461-1480hit(2504hit)

Keyword Search Result

[Keyword] SPE(2504hit)

Ultra Low Loss and Long Length Photonic Crystal Fiber

An Optimal Certificate Dispersal Algorithm for Mobile Ad Hoc Networks

Novel 4RTD Logic Circuits

Dual Level Access Scheme for Digital Video Sequences

Adaptive Microphone Array System with Two-Stage Adaptation Mode Controller

Bayesian Confidence Scoring and Adaptation Techniques for Speech Recognition

Diffusion-Type Autonomous Decentralized Flow Control for End-to-End Flow in High-Speed Networks

Generalized Variance-Based Markovian Fitting for Self-Similar Traffic Modelling

A Sub-0.5 V Differential ED-CMOS/SOI Circuit with Over-1-GHz Operation

Performance Limitation of On-Chip Global Interconnects for High-Speed Signaling

Objective Quality Assessment of Wideband Speech Coding

Parameter Sharing in Mixture of Factor Analyzers for Speaker Identification

Applying Sparse KPCA for Feature Extraction in Speech Recognition

Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones

Developments in Corpus-Based Speech Synthesis: Approaching Natural Conversational Speech

Modeling Improved Prosody Generation from High-Level Linguistically Annotated Corpora

Designing Target Cost Function Based on Prosody of Speech Database

Robust Dependency Parsing of Spontaneous Japanese Spoken Language

Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis

An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles