1-6hit |
We propose Optimal Temporal Decomposition (OTD) of speech for voice morphing preserving Δ cepstrum. OTD is an optimal modification of the original Temporal Decomposition (TD) by B. Atal. It is theoretically shown that OTD can achieve minimal spectral distortion for the TD-based approximation of time-varying LPC parameters. Moreover, by applying OTD to preserving Δ cepstrum, it is also theoretically shown that Δ cepstrum of a target speaker can be reflected to that of a source speaker. In frequency domain interpolation, the Laplacian Spectral Distortion (LSD) measure is introduced to improve the Inverse Function of Integrated Spectrum (IFIS) based non-uniform frequency warping. Experimental results indicate that Δ cepstrum of the OTD-based morphing spectra of a source speaker is mostly equal to that of a target speaker except for a piecewise constant factor and subjective listening tests show that the speech intelligibility of the proposed morphing method is superior to the conventional method.
Peng ZHAO Atsusi HIGASHI Yukio SATO
This paper deals with on-line signature verification. A signature is obtained as a sequence of x, y-coordinates of pen-tip movement and writing pressure. The features of a signature are derived from the coordinates and the writing pressure and are decomposed into two principal features, shape and motion, using the DP-matching technique. We found that each point of a signature varies each time to some degree. However, the degrees of local variations subject to points, as some points are relatively stable and do not vary much while some of them are not. In this paper, we propose to incorporate weighted local variations based on the stability of each point so as to evaluate the difference of two signatures locally as well as globally. The dissimilarity measures are presented with respect to the corresponding features and are combined into one for efficient verification. In addition to the x, y-coordinates, the writing pressure is also considered to be part of shape. Experiments were carried out with a database which consists of 300 genuine signatures and 300 forgeries collected from 10 subjects. The effectiveness of incorporating the weighted local variation is shown by the experimental results. It contributes to an average increase in the correct verification rate as the correct verification rate increased 1.0% and was found to be 98.7%.
Eun Joo RHEE Tae Kyun KIM Masayuki NAKAJIMA
This paper presents a system for recognition of on-line cursive Hangul (Korean characters) by means of DP matching of structural information. The penalty function has the following special features. In order to prevent short spurious strokes from causing large penalties, an input stroke is weighted by its length relative to other input strokes. In order to make use of pen-up and pen-down information, a penalty is incurred when 2 strokes of differing type (i.e. pen-up with pen-down) are matched. Finally, to reduce the chance of obtaining a suboptimal solution which can result from using the greedy algorithm in DP matching, we look-ahead an extra match. In a computer simulation we obtained a recognition rate of 92% for partially cursive characters and 89% for fully cursive characters. Furthermore, for both cases combined the correct character appears 98% of the time in the top 10 candidates. Thus we confirmed that the proposed algorithm is effective in recognizing cursive Hangul.
Yoshiaki SAITOH Yasushi HASEGAWA Tohru KIRYU Jun'ichi HORI
We use the B spline function and apply the Oslo algorithm to minimize the number of control points in electrocardiogram (ECG) waveform compression under the limitation of evaluation indexes. This method is based on dynamic programming matching to transfer the control points of a reference ECG waveform to the succeeding ECG waveforms. This reduces the execution time for beat-to-beat processing. We also reduced the processing time at several compression stages. When the difference percent normalized root mean square difference is around 10, our method gives the highest compression ratio at a sampling frequency of 250 Hz.
Hiroyoshi MORITA Kingo KOBAYASHI
A method for the compression of ECG data is presented. The method is based on the edit distance algorithm developed in the file comparison problems. The edit distance between two sequences of symbols is defined as the number of edit operations required to transform a sequence of symbols into the other. We adopt the edit distance algorithm to obtain a list of edit operations, called edit script, which transforms a reference pulse into a pulse selected from ECG data. If the decoder knows the same reference, it can reproduce the original pulse, only from the edit script. The amount of the edit script is expected to be smaller than that of the original pulse when the two pulses look alike and thereby we can reduce the amount of space to store the data. Applying the proposed scheme to the raw data of ECG, we have achieved a high compression about 14: 1 without losing the significant features of signals.
Shigeki OKAWA Takashi ENDO Tetsunori KOBAYASHI Katsuhiko SHIRAI
In this paper, a new scheme for ohrase recognition in conversational speech is proposed, in which prosodic and phonemic information processing are usefully combined. This approach is employed both to produce candidates of phrase boundaries and to discriminate phonemes. The fundamental frequency patterns of continuous utterances are statistically analyzed and the likelihood of the occurrence of a phrase boundary is calculated for every frame. At the same time, the likelihood of phonemic characteristics of each frame can be obtained using a hierarchical clustering method. These two scores, along with lexical and grammatical constraints, can be effectively utilized to develop a possible word sequences or a word lattices which correspond to the continuous speech utterances. Our preliminary experjment shows the feasibility of applying prosody for continuous speech recognition especially for conversational style utterances.