IEICE global.ieice.org Site

Keyword Search Result

[Keyword] Ti(30728hit)

20281-20300hit(30728hit)

A Study on Acoustic Modeling for Speech Recognition of Predominantly Monosyllabic Languages
Ekkarit MANEENOI Visarut AHKUPUTRA Sudaporn LUKSANEEYANAWIN Somchai JITAPUNKUL

PAPER

Vol:
E87-D No:5
Page(s):
1146-1163
This paper presents a study on acoustic modeling for speech recognition of predominantly monosyllabic languages. Various speech units used in speech recognition systems have been investigated. To evaluate the effectiveness of these acoustic models, the Thai language is selected, since it is a predominantly monosyllabic language and has a complex vowel system. Several experiments have been carried out to find the proper speech unit that can accurately create acoustic model and give a higher recognition rate. Results of recognition rates under different acoustic models are given and compared. In addition, this paper proposes a new speech unit for speech recognition, namely onset-rhyme unit. Two models are proposed-the Phonotactic Onset-Rhyme Model (PORM) and the Contextual Onset-Rhyme Model (CORM). The models comprise a pair of onset and rhyme units, which makes up a syllable. An onset comprises an initial consonant and its transition towards the following vowel. Together with the onset, the rhyme consists of a steady vowel segment and a final consonant. Experimental results show that the onset-rhyme model improves on the efficiency of other speech units. The onset-rhyme model improves on the accuracy of the inter-syllable triphone model by nearly 9.3% and of the context-dependent Initial-Final model by nearly 4.7% for the speaker-dependent systems using only an acoustic model, and 5.6% and 4.5% for the speaker-dependent systems using both acoustic and language model respectively. The results show that the onset-rhyme models attain a high recognition rate. Moreover, they also give more efficiency in terms of system complexity.
Orthogonalized Distinctive Phonetic Feature Extraction for Noise-Robust Automatic Speech Recognition
Takashi FUKUDA Tsuneo NITTA

PAPER

Vol:
E87-D No:5
Page(s):
1110-1118
In this paper, we propose a noise-robust automatic speech recognition system that uses orthogonalized distinctive phonetic features (DPFs) as input of HMM with diagonal covariance. In an orthogonalized DPF extraction stage, first, a speech signal is converted to acoustic features composed of local features (LFs) and ΔP, then a multilayer neural network (MLN) with 153 output units composed of context-dependent DPFs of a preceding context DPF vector, a current DPF vector, and a following context DPF vector maps the LFs to DPFs. Karhunen-Loeve transform (KLT) is then applied to orthogonalize each DPF vector in the context-dependent DPFs, using orthogonal bases calculated from a DPF vector that represents 38 Japanese phonemes. Each orthogonalized DPF vector is finally decorrelated one another by using Gram-Schmidt orthogonalization procedure. In experiments, after evaluating the parameters of the MLN input and output units in the DPF extractor, the orthogonalized DPFs are compared with original DPFs. The orthogonalized DPFs are then evaluated in comparison with a standard parameter set of MFCCs and dynamic features. Next, noise robustness is tested using four types of additive noise. The experimental results show that the use of the proposed orthogonalized DPFs can significantly reduce the error rate in an isolated spoken-word recognition task both with clean speech and with speech contaminated by additive noise. Furthermore, we achieved significant improvements when combining the orthogonalized DPFs with conventional static MFCCs and ΔP.
F₀ Dynamics in Singing: Evidence from the Data of a Baritone Singer
Hiroki MORI Wakana ODAGIRI Hideki KASUYA

PAPER

Vol:
E87-D No:5
Page(s):
1086-1092
Transitional fundamental frequency (F0) characteristics comprise a crucial part of F0 dynamics in singing. This paper examines the F0 characteristics during the note transition period. An analysis of the singing voice of a professional baritone strongly suggests that asymmetries exist in the mechanisms used for controlling rising and falling. Specifically, the F0 contour in rising transitions can be modeled as a step response from a critically-damped second-order linear system with fixed average/maximum speed of change, whereas that in falling transitions can be modeled as a step response from an underdamped second-order linear system with fixed transition time. The validity of the model is examined through auditory experiments using synthesized singing voice.
Speaker Adaptation Method for Acoustic-to-Articulatory Inversion using an HMM-Based Speech Production Model
Sadao HIROYA Masaaki HONDA

PAPER

Vol:
E87-D No:5
Page(s):
1071-1078
We present a speaker adaptation method that makes it possible to determine articulatory parameters from an unknown speaker's speech spectrum using an HMM (Hidden Markov Model)-based speech production model. The model consists of HMMs of articulatory parameters for each phoneme and an articulatory-to-acoustic mapping that transforms the articulatory parameters into a speech spectrum for each HMM state. The model is statistically constructed by using actual articulatory-acoustic data. In the adaptation method, geometrical differences in the vocal tract as well as the articulatory behavior in the reference model are statistically adjusted to an unknown speaker. First, the articulatory parameters are estimated from an unknown speaker's speech spectrum using the reference model. Secondly, the articulatory-to-acoustic mapping is adjusted by maximizing the output probability of the acoustic parameters for the estimated articulatory parameters of the unknown speaker. With the adaptation method, the RMS error between the estimated articulatory parameters and the observed ones is 1.65 mm. The improvement rate over the speaker independent model is 56.1 %.
Negation as Failure through a Network
Kazunori IRIYA Susumu YAMASAKI

PAPER-Computation and Computational Models

Vol:
E87-D No:5
Page(s):
1200-1207
This paper deals with distributed procedures, caused by negation as failure through a network, where general logic programs are distributed so that they communicate with each other in terms of negation as failure inquiries and responses, but not in terms of derivations of SLD resolutions. The common variables as channels in share for distributed programs are not treated, but negation as failure validated in the whole network is the object for communications of distributed programs. We can define the semantics for the distributed programs in a network. At the same time, we have distributed proof procedures for distributed programs, by means of negation as failure to be implemented through the network, where the soundness of the procedure is guaranteed by the defined semantics.
Wavelet Coding of Structured Geometry Data on Triangular Lattice Plane Considering Rate-Distortion Properties
Hiroyuki KANEKO Koichi FUKUDA Akira KAWANAKA

PAPER-Image Processing and Video Processing

Vol:
E87-D No:5
Page(s):
1238-1246
Efficient representations of a 3-D object shape and its texture data have attracted wide attention for the transmission of computer graphics data and for the development of multi-view real image rendering systems on computer networks. Polygonal mesh data, which consist of connectivity information, geometry data, and texture data, are often used for representing 3-D objects in many applications. This paper presents a wavelet coding technique for coding the geometry data structured on a triangular lattice plane obtained by structuring the connectivity of the polygonal mesh data. Since the structured geometry data have an arbitrarily-shaped support on the triangular lattice plane, a shape-adaptive wavelet transform was used to obtain the wavelet coefficients, whose number is identical to the number of original data, while preserving the self-similarity of the wavelet coefficients across subbands. In addition, the wavelet coding technique includes extensions of the zerotree entropy (ZTE) coding for taking into account the rate-distortion properties of the structured geometry data. The parent-children dependencies are defined as the set of wavelet coefficients from different bands that represent the same spatial region in the triangular lattice plane, and the wavelet coefficients in the spatial tree are optimally pruned based on the rate-distortion properties of the geometry data. Experiments in which proposed wavelet coding was applied to some sets of polygonal mesh data showed that the proposed wavelet coding achieved better coding efficiency than the Topologically Assisted Geometry Compression scheme adopted in the MPEG-4 standard.
Robust Speaker Identification System Based on Multilayer Eigen-Codebook Vector Quantization
Ching-Tang HSIEH Eugene LAI Wan-Chen CHEN

PAPER

Vol:
E87-D No:5
Page(s):
1185-1193
This paper presents some effective methods for improving the performance of a speaker identification system. Based on the multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency subbands in order not to spread noise distortions over the entire feature space. For capturing the characteristics of the vocal tract, the linear predictive cepstral coefficients (LPCC) of the lower frequency subband for each decomposition process are calculated. In addition, a hard threshold technique for the lower frequency subband in each decomposition process is also applied to eliminate the effect of noise interference. Furthermore, cepstral domain feature vector normalization is applied to all computed features in order to provide similar parameter statistics in all acoustic environments. In order to effectively utilize all these multiband speech features, we propose a modified vector quantization as the identifier. This model uses the multilayer concept to eliminate the interference among the multiband speech features and then uses the principal component analysis (PCA) method to evaluate the codebooks for capturing a more detailed distribution of the speaker's phoneme characteristics. The proposed method is evaluated using the KING speech database for text-independent speaker identification. Experimental results show that the recognition performance of the proposed method is better than those of the vector quantization (VQ) and the Gaussian mixture model (GMM) using full-band LPCC and mel-frequency cepstral coefficients (MFCC) features in both clean and noisy environments. Also, a satisfactory performance can be achieved in low SNR environments.
Phoneme-Balanced and Digit-Sequence-Preserving Connected Digit Patterns for Text-Prompted Speaker Verification
Tsuneo KATO Tohru SHIMIZU

PAPER

Vol:
E87-D No:5
Page(s):
1194-1199
This paper presents a novel design of connected digit patterns to achieve high accuracy text-prompted speaker verification over a cellular phone network. To reduce the error rate, a phoneme-balanced connected digit pattern for enrollment, and digit-sequence-preserving connected digit patterns for verification (i.e. patterns preserving partial digit sequences of the enrollment pattern) are proposed. In addition to these, a decision procedure using multiple patterns has been designed to overcome the low quality of cellular phone speech. Experimental results on cellular phone speech showed the phoneme-balanced patterns for enrollment and digit-sequence-preserving patterns for verification reduced more than 50% of equal error rate compared to the conventional method using randomly-selected and randomly-reordered digit patterns. The decision procedure reduced 60% of the error rate. In addition, this paper shows that verification patterns depending on the pattern of a preceding utterance reduced 10% of the error rate. Overall, the error rate obtained by the proposed method was 1% for 99% of clients and 95% of impostors.
A Priority-Based QoS Routing for Multimedia Traffic in Ad Hoc Wireless Networks with Directional Antenna Using a Zone-Reservation Protocol
Tetsuro UEDA Shinsuke TANAKA Siuli ROY Dola SAHA Somprakash BANDYOPADHYAY

PAPER-Ad-hoc Network

Vol:
E87-B No:5
Page(s):
1085-1094
Quality of Service (QoS) provisioning is a new but challenging research area in the field of Mobile Ad hoc Network (MANET) to support multimedia data communication. However, the existing QoS routing protocols in ad hoc network did not consider a major aspect of wireless environment, i.e., mutual interference. Interference between nodes belonging to two or more routes within the proximity of one another causes Route Coupling. This can be avoided by using zone-disjoint routes. Two routes are said to be zone disjoint if data communication over one path does not interfere with the data communication along the other path. In this paper, we have proposed a scheme for supporting priority-based QoS in MANET by classifying the traffic flows in the network into different priority classes and giving different treatment to the flows belonging to different classes during routing so that the high priority flows will achieve best possible throughput. Our objective is to reduce the effect of coupling between routes used by high and low priority traffic by reserving zone of communication. The part of the network, used for high priority data communication, i.e, high priority zone, will be avoided by low priority data through the selection of a different route that is maximally zone-disjoint with respect to high priority zones and which consequently allows contention-free transmission of high priority traffic. The suggested protocol in our paper selects shortest path for high priority traffic and diverse routes for low priority traffic that will minimally interfere with high priority flows, thus reducing the effect of coupling between high and low priority routes. This adaptive, priority-based routing protocol is implemented on Qualnet Simulator using directional antenna to prove the effectiveness of our proposal. The use of directional antenna in our protocol largely reduces the probability of radio interference between communicating hosts compared to omni-directional antenna and improves the overall utilization of the wireless medium in the context of ad hoc wireless network through Space Division Multiple Access (SDMA).
Energy Consumption Tradeoffs for Compressed Wireless Data at a Mobile Terminal
Jari VEIJALAINEN Eetu OJANEN Mohammad Aminul HAQ Ville-Pekka VAHTEALA Mitsuji MATSUMOTO

PAPER-Mobile Radio

Vol:
E87-B No:5
Page(s):
1123-1130
The high-end telecom terminal and PDAs, sometimes called Personal Trusted Devices (PTDs) are programmable, have tens of megabytes memory, and rather fast processors. In this paper we analyze, when it is energy-efficient to transfer application data compressed over the downlink and then decompress it at the terminal, or compress it first at the terminal and then send it compressed over up-link. These questions are meaningful in the context of usual application code or data and streams that are stored before presentation and require lossless compression methods to be used. We deduce an analytical model and assess the model parameters based on experiments in 2G (GSM) and 3G (FOMA) network. The results indicate that if the reduction through compression in size of the file to be downloaded is higher than ten per cent, energy is saved as compared to receiving the file uncompressed. For the upload case even two percent reduction in size is enough for energy savings at the terminal with the current transmission speeds and observed energy parameters. If time is saved using compressed files during transmission, then energy is certainly saved. From energy savings at the terminal we cannot deduce time savings, however. Energy and time consumed at the server for compression/decompression is considered negligible in this context and ignored. The same holds for the base stations and other fixed telecom infrastructure components.
Improved HMM Separation for Distant-Talking Speech Recognition
Tetsuya TAKIGUCHI Masafumi NISHIMURA

PAPER

Vol:
E87-D No:5
Page(s):
1127-1137
In distant-talking speech recognition, the recognition accuracy is seriously degraded by reverberation and environmental noise. A robust speech recognition technique in such environments, HMM separation and composition, has been described in. HMM separation estimates the model parameters of the acoustic transfer function using adaptation data uttered from an unknown position in noisy and reverberant environments, and HMM composition builds an HMM of noisy and reverberant speech, using the acoustic transfer function estimated by HMM separation. Previously, HMM separation has been applied to the acoustic transfer function based on a single Gaussian distribution. However the improvement was smaller than expected for the impulse response with long reverberations. This is because the variance of the acoustic transfer function in each frame increases, since the length of the impulse response of the room reverberation is longer than that of the spectral analysis window. In this paper, HMM separation is extended to estimate the acoustic transfer function based on the Gaussian mixture components in order to compensate for the greater variability of the acoustic transfer function, and the re-estimation formulae are derived. In addition, this paper introduces a technique to adapt the noise weight for each mel-spaced frequency in order to improve the performance of the HMM separation in the linear-spectral domain, since the use of the HMM separation in the linear-spectral domain sometimes causes a negative mean output due to the subtraction operation. The extended HMM separation is evaluated on distant-talking speech recognition tasks. The results of the experiments clarify the effectiveness of the proposed method.
Fundamental Properties of M-Convex and L-Convex Functions in Continuous Variables
Kazuo MUROTA Akiyoshi SHIOURA

PAPER

Vol:
E87-A No:5
Page(s):
1042-1052
The concepts of M-convexity and L-convexity, introduced by Murota (1996, 1998) for functions on the integer lattice, extract combinatorial structures in well-solved nonlinear combinatorial optimization problems. These concepts are extended to polyhedral convex functions and quadratic functions on the real space by Murota-Shioura (2000, 2001). In this paper, we consider a further extension to general convex functions. The main aim of this paper is to provide rigorous proofs for fundamental properties of general M-convex and L-convex functions.
Automatic Extraction of Tone Command Parameters for the Model of F₀ Contour Generation for Standard Chinese
Wentao GU Keikichi HIROSE Hiroya FUJISAKI

PAPER

Vol:
E87-D No:5
Page(s):
1079-1085
The model for the process of F0 contour generation, first proposed by Fujisaki and his coworkers, has been successfully applied to Standard Chinese, which is a typical tone language with a distinct feature that both positive and negative tone commands are required. However, the inverse problem, viz., automatic derivation of the model parameters from an observed F0 contour of speech, cannot be solved analytically. Moreover, the extraction of model parameters for Standard Chinese is more difficult than for Japanese and English, because the polarity of tone commands cannot be inferred directly from the F0 contour itself. In this paper, an efficient method is proposed to solve the problem by using information on syllable timing and tone labels. With the same framework as for the successive approximation method proposed for Japanese and English, the method presented here for Standard Chinese is focused on the first-order estimation of tone command parameters. A set of intra-syllable and inter-syllable rules are constructed to recognize the tone command patterns within each syllable. The experiment shows that the method works effectively and gives results comparable to those obtained by manual analysis.
Single Electron Random Number Generator
Hisanao AKIMA Shigeo SATO Koji NAKAJIMA

LETTER-Electronic Circuits

Vol:
E87-C No:5
Page(s):
832-834
A random number generator composed of single electron devices is presented. Due to stochastic behavior of electron tunneling process, single electron devices have intrinsic randomness. Using its randomness, a true random number generator can be implemented. Although fluctuation of device parameters degrades the performance of the proposed circuit, we show that the adjustment of the bias voltages can compensate the fluctuation.
Adaptive Wireless Transmission Scheme Considering Stay Time in Spot Mobile Access
Yuki MINODA Katsutoshi TSUKAMOTO Shozo KOMAKI

PAPER-Wireless LAN

Vol:
E87-B No:5
Page(s):
1235-1241
In this paper, an adaptive transmission scheme considering the stay time in a spot mobile access system is proposed. The proposed adaptive transmission scheme selects the modulation format according to the user's stay time in the spot communication zone and the types of data requested by each user. In the proposed system, when the stay time of a user is short, high-speed modulation is selected for this user. When the stay time of a user is long, a more reliable modulation format is selected. The computer simulation results show that the proposed transmission scheme without any channel estimation can achieve the same or better performance than when using the modulation format fixedly when the carrier-to-noise ratio changes rapidly.
The Role of Arbiters for Unconditionally Secure Authentication
Goichiro HANAOKA Junji SHIKATA Yumiko HANAOKA Hideki IMAI

LETTER

Vol:
E87-A No:5
Page(s):
1132-1140
Authentication codes (A-codes, for short) are considered as important building blocks for constructing unconditionally secure authentication schemes. Since in the conventional A-codes, two communicating parties, transmitter and receiver, utilized a common secret key, and such A-codes do not provide non-repudiation. With the aim of enhancing with non-repudiation property, Simmons introduced A2-codes. Later, Johansson formally defined an improved version of A2-codes called, the A3-codes. Unlike A2-codes, A3-codes do not require an arbiter to be fully trusted. In this paper, we clarify the security definition of A3-codes which may be misdefined. We show a concrete attack against an A3-code and conclude that concrete constructions of A3-codes implicitly assumes a trusted arbiter. We also show that there is no significant difference between A2-codes and A3-codes in a practical sense and further argue that it is impossible to construct an "ideal" A3-codes, that is, without any trusted arbiter. Finally, we introduce a novel model of asymmetric A-codes with an arbiter but do not have to be fully trusted, and also show a concrete construction of the asymmetric A-codes for the model. Since our proposed A-code does not require fully trusted arbiters, it is more secure than A2-codes or A3-codes.
Performance of QPSK/OFDM on Frequency-Selective Rayleigh Fading Channels
Jeong-Woo JWA

LETTER-Wireless Communication Technology

Vol:
E87-B No:5
Page(s):
1407-1411
In this paper, we derive expressions for the bit error probability of QPSK/OFDM on frequency-selective Rayleigh fading channels. In the OFDM system, ICI (interchannel interference) caused by Doppler spread of the channel degrades the error performance of the system and introduces the error floor even for coherent detection. Analysis results show that the error performance of QPSK/OFDM can be degraded as the normalized maximum Doppler frequency fD /Bsub is increased where fD is the maximum Doppler frequency and Bsub is the subchannel bandwidth. Computer simulations confirm the theoretical analysis results for BPSK and QPSK signals.
Traceability Schemes against Illegal Distribution of Signed Documents
Shoko YONEZAWA Goichiro HANAOKA Junji SHIKATA Hideki IMAI

LETTER

Vol:
E87-A No:5
Page(s):
1172-1182
Illegal distribution of signed documents can be considered as one of serious problems of digital signatures. In this paper, to solve the problem, we propose three protocols concerning signature schemes. These schemes achieve not only traceability of an illegal user but also universal verifiability. The first scheme is a basic scheme which can trace an illegal receiver, and the generation and tracing of a signed document are simple and efficient. However, in this scheme, it is assumed that a signer is honest. The second scheme gives another tracing method which does not always assume that a signer is honest. Furthermore, in the method, an illegal user can be traced by an authority itself, hence, it is efficient in terms of communication costs. However, in this scheme it is assumed that there exists only a legal verification algorithm. Thus, in general, this scheme cannot trace a modified signed document which is accepted by a modified verification algorithm. The third one is a scheme which requires no trusted signer and allows a modified verification algorithm. It can trace an illegal receiver or even a signer in such a situation. All of our schemes are constructed by simple combinations of standard signature schemes, consequently, one can flexibly choose suitable building blocks for satisfying requirements for a system.
On the Performance of Multiuser Diversity under Explicit Quality of Service Constraints over Fading Channels
Shiping DUAN Youyun XU Wentao SONG

PAPER-Wireless Communication Technology

Vol:
E87-B No:5
Page(s):
1290-1296
Multiuser diversity, identified by recent information theoretic results, is a form of diversity inherent in a wireless network. The diversity gain is obtained from independent time-varying fading channels across different users. The main practical issue in multiuser diversity is lack of Quality of Service (QoS) guarantees. This study proposes a wireless scheduling algorithm named MUDSEQ for downlink channels exploiting multiuser diversity under explicit QoS constraints. The numerical results demonstrate that the novel algorithm can yield non-negligible diversity gain even under tight QoS constraints and little scattering or slow fading environments. Additionally, a system framework for dynamic resource allocation based on the proposed algorithm is developed.
What are the Essential Cues for Understanding Spoken Language?
Steven GREENBERG Takayuki ARAI

INVITED PAPER

Vol:
E87-D No:5
Page(s):
1059-1070
Classical models of speech recognition assume that a detailed, short-term analysis of the acoustic signal is essential for accurately decoding the speech signal and that this decoding process is rooted in the phonetic segment. This paper presents an alternative view, one in which the time scales required to accurately describe and model spoken language are both shorter and longer than the phonetic segment, and are inherently wedded to the syllable. The syllable reflects a singular property of the acoustic signal -- the modulation spectrum -- which provides a principled, quantitative framework to describe the process by which the listener proceeds from sound to meaning. The ability to understand spoken language (i.e., intelligibility) vitally depends on the integrity of the modulation spectrum within the core range of the syllable (3-10 Hz) and reflects the variation in syllable emphasis associated with the concept of prosodic prominence ("accent"). A model of spoken language is described in which the prosodic properties of the speech signal are embedded in the temporal dynamics associated with the syllable, a unit serving as the organizational interface among the various tiers of linguistic representation.

20281-20300hit(30728hit)

Keyword Search Result

[Keyword] Ti(30728hit)

A Study on Acoustic Modeling for Speech Recognition of Predominantly Monosyllabic Languages

Orthogonalized Distinctive Phonetic Feature Extraction for Noise-Robust Automatic Speech Recognition

F₀ Dynamics in Singing: Evidence from the Data of a Baritone Singer

Speaker Adaptation Method for Acoustic-to-Articulatory Inversion using an HMM-Based Speech Production Model

Negation as Failure through a Network

Wavelet Coding of Structured Geometry Data on Triangular Lattice Plane Considering Rate-Distortion Properties

Robust Speaker Identification System Based on Multilayer Eigen-Codebook Vector Quantization

Phoneme-Balanced and Digit-Sequence-Preserving Connected Digit Patterns for Text-Prompted Speaker Verification

A Priority-Based QoS Routing for Multimedia Traffic in Ad Hoc Wireless Networks with Directional Antenna Using a Zone-Reservation Protocol

Energy Consumption Tradeoffs for Compressed Wireless Data at a Mobile Terminal

Improved HMM Separation for Distant-Talking Speech Recognition

Fundamental Properties of M-Convex and L-Convex Functions in Continuous Variables

Automatic Extraction of Tone Command Parameters for the Model of F₀ Contour Generation for Standard Chinese

Single Electron Random Number Generator

Adaptive Wireless Transmission Scheme Considering Stay Time in Spot Mobile Access

The Role of Arbiters for Unconditionally Secure Authentication

Performance of QPSK/OFDM on Frequency-Selective Rayleigh Fading Channels

Traceability Schemes against Illegal Distribution of Signed Documents

On the Performance of Multiuser Diversity under Explicit Quality of Service Constraints over Fading Channels

What are the Essential Cues for Understanding Spoken Language?

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles