Takashi NOSE Junichi YAMAGISHI Takashi MASUKO Takao KOBAYASHI
This paper describes a technique for controlling the degree of expressivity of a desired emotional expression and/or speaking style of synthesized speech in an HMM-based speech synthesis framework. With this technique, multiple emotional expressions and speaking styles of speech are modeled in a single model by using a multiple-regression hidden semi-Markov model (MRHSMM). A set of control parameters, called the style vector, is defined, and each speech synthesis unit is modeled by using the MRHSMM, in which mean parameters of the state output and duration distributions are expressed by multiple-regression of the style vector. In the synthesis stage, the mean parameters of the synthesis units are modified by transforming an arbitrarily given style vector that corresponds to a point in a low-dimensional space, called style space, each of whose coordinates represents a certain specific speaking style or emotion of speech. The results of subjective evaluation tests show that style and its intensity can be controlled by changing the style vector.
Hiroshi HOSOBE Ken SATOH Philippe CODOGNET
In this paper, we extend our framework of speculative computation in multi-agent systems by introducing default constraints. In research on multi-agent systems, handling incomplete information due to communication failure or due to other agents' delay in communication is a very important issue. For a solution to this problem, we previously proposed speculative computation based on abduction in the context of master-slave multi-agent systems and gave a procedure in abductive logic programming. In our previous proposal, a master agent prepares a default value for a yes/no question in advance, and it performs speculative computation using the default without waiting for a reply to the question. This computation is effective unless the contradictory reply to the default is returned. In this paper, we formalize speculative constraint processing, and propose a correct operational model for such computation so that we can handle not only yes/no questions, but also more general types of questions.
Iakovos OURANOS Petros STEFANEAS Panayiotis FRANGOS
We present MobileOBJ, a formal framework for specifying and verifying mobile systems. Based on hidden algebra, the components of a mobile system are specified as behavioral objects or Observational Transition Systems, a kind of transition system, enriched with special action and observation operators related to the distinct characteristics of mobile computing systems. The whole system comes up as the concurrent composition of these components. The implementation of the abstract model is achieved using CafeOBJ, an executable, industrial strength algebraic specification language. The visualization of the specification can be done using CafeOBJ graphical notation. In addition, invariant and behavioral properties of mobile systems can be proved through theorem proving techniques, such as structural induction and coinduction that are fully supported by the CafeOBJ system. The application of the proposed framework is presented through the modeling of a mobile computing environment and the services that need to be supported by the former.
Takayuki YAMADA Ryoichi SHINKUMA Tatsuro TAKAHASHI
In conventional road-vehicle communication systems, user terminals in the vehicles have to directly connect to wireless access points (APs). However, vehicle speeds are so fast that the channel condition between the terminals and the APs constantly changes because of changing path loss and time-varying fading. In this paper, to compensate for such deterioration, we propose to reduce the relative speed between the terminals and the APs by an inter-vehicle packet relay technique. If a terminal can send data via other vehicles running at lower speeds, the relative speed will decrease, which suppresses the dynamic range of path loss and deterioration by fading. We, first, validate our method by a numerical analysis using a statistical path-loss model. The numerical analysis verifies that our method is able to suppress deterioration caused by path loss and time-varying fading. However, in the numerical analysis, geometric propagation of paths is not considered; instantaneous and rapid loss changes are not considered. Therefore, we evaluate our method by computer simulations using a geometric propagation model. In the simulations, phase difference between multiple paths and loss fluctuation within one frame duration affect the performance. From the results of the simulations, we validate our method. Furthermore, we investigate the combination of our method and the selection diversity technique, which can suppress channel fluctuation and may enhance the performance of our method. Moreover, we measure interference in the overlapped zone between two AP areas. From the measurement, we show that our packet relays do not cause a problem in interference between areas.
Effect of dispersions of medium parameters and structure on the recording performance was systematically investigated. Moderately increased M-H loop slope is effective for obtaining higher thermal stability, smaller saturation fields, and higher resolution. It was found that the most influential factor is the dispersion in anisotropy field, Hk. Small Hk dispersion reduced the noise when exchange coupled media were used. Reduced grain size and a stacked structure of the media were expected to give a restricted gain in the signal to noise ratio.
Yuta TSUKAMOTO Arata KAWAMURA Youji IIGUNI
In this paper, a novel speech enhancement algorithm based on the MAP estimation is proposed. The proposed speech enhancer adaptively changes the speech spectral density used in the MAP estimation according to the sum of the observed power spectra. In a speech segment, the speech spectral density approaches to Rayleigh distribution to keep the quality of the enhanced speech. While in a non-speech segment, it approaches to an exponential distribution to reduce noise effectively. Furthermore, when the noise is super-Gaussian, we modify the width of Gaussian so that the Gaussian model with the modified width approximates the distribution of the super-Gaussian noise. This technique is effective in suppressing residual noise well. From computer experiments, we confirm the effectiveness of the proposed method.
Takumi SANO Fuminori NAITO Shuhei YOSHIDA Manabu YAMAMOTO
In this paper, we presented a computer simulation analysis of high-density hologram recording, which is a promising mass optical memory technique. A simulation method for off-axis speckle-shift multiplexed recording by three-dimensional computer simulation analysis was presented, as well the signal evaluation of recording and reproduction. By this simulation method, the characteristic features of recording and reproduction are studied from the viewpoints of signal-to-noise-ratio and the reproduced image's quality, and a high-density speckle-shift multiplexed recording condition is proposed.
Young Woo LEE Sang Min LEE Yoon Sang JI Jong Shill LEE Young Joon CHEE Sung Hwa HONG Sun I. KIM In Young KIM
Digital hearing aid users often complain of difficulty in understanding speech in the presence of background noise. To improve speech perception in a noisy environment, various speech enhancement algorithms have been applied in digital hearing aids. In this study, a speech enhancement algorithm using modified spectral subtraction and companding is proposed for digital hearing aids. We adjusted the biases of the estimated noise spectrum, based on a subtraction factor, to decrease the residual noise. Companding was applied to the channel of the formant frequency based on the speech presence indicator to enhance the formant. Noise suppression was achieved while retaining weak speech components and avoiding the residual noise phenomena. Objective and subjective evaluation under various environmental conditions confirmed the improvement due to the proposed algorithm. We tested segmental SNR and Log Likelihood Ratio (LLR), which have higher correlation with subjective measures. Segmental SNR has the highest and LLR the lowest correlation of the methods tested. In addition, we confirmed by spectrogram that the proposed method significantly reduced the residual noise and enhanced the formants. A mean opinion score that represented the global perception score was tested; this produced the highest quality speech using the proposed method. The results show that the proposed speech enhancement algorithm is beneficial for hearing aid users in noisy environments.
Hideyuki FURUHASHI Yoshinobu KAJIKAWA Yasuo NOMURA
In this paper, we propose a low complexity realization method for compensating for nonlinear distortion. Generally, nonlinear distortion is compensated for by a linearization system using a Volterra kernel. However, this method has a problem of requiring a huge computational complexity for the convolution needed between an input signal and the 2nd-order Volterra kernel. The Simplified Volterra Filter (SVF), which removes the lines along the main diagonal of the 2nd-order Volterra kernel, has been previously proposed as a way to reduce the computational complexity while maintaining the compensation performance for the nonlinear distortion. However, this method cannot greatly reduce the computational complexity. Hence, we propose a subband linearization system which consists of a subband parallel cascade realization method for the 2nd-order Volterra kernel and subband linear inverse filter. Experimental results show that this proposed linearization system can produce the same compensation ability as the conventional method while reducing the computational complexity.
Nguyen Hoang HAI Yoshinori NAMIHIRA Feroza BEGUM Shubi KAIJAGE S.M. Abdur RAZZAK Tatsuya KINJO Nianyu ZOU
This paper reports a novel design in Photonic Crystal Fibers (PCFs) with nearly zero ultra-flattened dispersion characteristics. We describe the chromatic dispersion controllability taking non-uniform air hole structures into consideration. Through optimizing non-uniform air hole structures, the ultra-flattened zero dispersion PCFs can be efficiently designed. We show numerically that the proposed non-uniform air cladding structures successfully archive flat dispersion characteristics as well as extremely low confinement losses. As an example, the proposed PCF with flattened dispersion of 0.27 ps/(nmkm) from 1.5 µm to 1.8 µm wavelength with confinement losses of less than 10-11 dB/m. Finally, we point out that full controllability of the chromatic dispersion and confinement losses, along with the fabrication technique, are the main advantages of the proposed PCF structure.
Toru IMAI Shoei SATO Shinichi HOMMA Kazuo ONOE Akio KOBAYASHI
This paper describes a new method to detect speech segments online with identifying gender attributes for efficient dual gender-dependent speech recognition and broadcast news captioning. The proposed online speech detection performs dual-gender phoneme recognition and detects a start-point and an end-point based on the ratio between the cumulative phoneme likelihood and the cumulative non-speech likelihood with a very small delay from the audio input. Obtaining the speech segments, the phoneme recognizer also identifies gender attributes with high discrimination in order to guide the subsequent dual-gender continuous speech recognizer efficiently. As soon as the start-point is detected, the continuous speech recognizer with paralleled gender-dependent acoustic models starts a search and allows search transitions between male and female in a speech segment based on the gender attributes. Speech recognition experiments on conversational commentaries and field reporting from Japanese broadcast news showed that the proposed speech detection method was effective in reducing the false rejection rate from 4.6% to 0.53% and also recognition errors in comparison with a conventional method using adaptive energy thresholds. It was also effective in identifying the gender attributes, whose correct rate was 99.7% of words. With the new speech detection and the gender identification, the proposed dual-gender speech recognition significantly reduced the word error rate by 11.2% relative to a conventional gender-independent system, while keeping the computational cost feasible for real-time operation.
This paper presents a novel approach to single channel speech enhancement in noisy environments. Widely adopted noise reduction techniques based on the spectral subtraction are generally expressed as a spectral gain depending on the signal-to-noise ratio (SNR) [1]-[4]. As the estimation method of the SNR, the well-known decision-directed (DD) estimator of Ephraim and Malah efficiently is known to reduces musical noise in noise frames, but the a priori SNR, which is a crucial parameter of the spectral gain, follows the a posteriori SNR with a delay of one frame in speech frames [5]. Therefore, the noise suppression gain using the delayed a priori SNR, which is estimated by the DD algorithm matches the previous frame rather than the current one, so after noise suppression, this degrades the performance of a noise reduction during abrupt transient parts. To overcome this artifact, we propose a computationally simple but effective speech enhancement technique based on the sigmoid type function to adaptively determine the weighting factor of the DD algorithm. Actually, the proposed approach avoids the delay problem of the a priori SNR while maintaining the advantage of the DD algorithm. The performance of the proposed enhancement algorithm is evaluated by the objective and subjective test under various environments and yields better results compared with the conventional DD scheme based approach.
Keiichi FUNAKI Tatsuhiko KINJO
This paper proposes a novel robust fundamental frequency (F0) estimation algorithm based on complex-valued speech analysis for an analytic speech signal. Since analytic signal provides spectra only over positive frequencies, spectra can be accurately estimated in low frequencies. Consequently, it is considered that F0 estimation using the residual signal extracted by complex-valued speech analysis can perform better for F0 estimation than that for the residual signal extracted by conventional real-valued LPC analysis. In this paper, the autocorrelation function weighted by AMDF is adopted for the F0 estimation criterion and four signals; speech signal, analytic speech signal, LPC residual and complex LPC residual, are evaluated for the F0 estimation. Speech signals used in the experiments were an IRS filtered speech corrupted by adding white Gaussian noise or Pink noise whose noise levels are 10, 5, 0, -5 [dB]. The experimental results demonstrate that the proposed algorithm based on complex LPC residual can perform better than other methods in noisy environment.
Sung-il JUNG Younghun KWON Sung-il YANG
A speech enhancement method is proposed that can be implemented efficiently due to its use of wavelet packet transform. The proposed method uses a modified spectral subtraction with noise estimation by a least-squares line method and with an overweighting gain per subband with nonlinear structure, where the overweighting gain is used for suppressing the residue of musical noise and the subband is used for applying the weighted values according to the change of signals. The enhanced speech by our method has the following properties: 1) the speech intelligibility can be assured reliably; 2) the musical noise can be reduced efficiently. Various assessments confirmed that the performance of the proposed method was better than that of the compared methods in various noise-level conditions. Especially, the proposed method showed good results even at low SNR.
Dan DENG Jin-kang ZHU Ling QIU
LDCs system with finite-rate error-free feedback is proposed in this letter. The optimal transmission codeword is selected at the receiver and the codeword index is sent to the transmitter. A simple random search algorithm is introduced for codebook generation. Moreover, the max-min singular value criterion is adopted for codeword selection. Simulation results showed that, with only 3-4 feedback bits, the low-complexity Zero-Forcing receiver can approach the Maximum-Likelihood (ML) performance.
This paper presents an inversion algorithm for dynamic Bayesian networks towards robust speech recognition, namely DBNI, which is a generalization of hidden Markov model inversion (HMMI). As a dual procedure of expectation maximization (EM)-based model reestimation, DBNI finds the 'uncontaminated' speech by moving the input noisy speech to the Gaussian means under the maximum likelihood (ML) sense given the DBN models trained on clean speech. This algorithm can provide both the expressive advantage from DBN and the noise-removal feature from model inversion. Experiments on the Aurora 2.0 database show that the hidden feature model (a typical DBN for speech recognition) with the DBNI algorithm achieves superior performance in terms of word error rate reduction.
A new level shifter is proposed in this paper that mitigates the contention problem between its pull-up and pull-down switches without suffering the delay penalty. Comparing this new one with two conventional shifters (CLS-1 and CLS-2) indicates that CLS-1 and CLS-2 have the delay times which are 308% and 26% slower than the proposed shifter when VDDL/VDDH=0.3 and the fan-out=2, respectively. In addition, the comparison of power-delay products shows CLS-2 consumes 28.5% more energy than the proposed shifter. For the layout area, the proposed shifter needs only 15% more than CLS-2. By comparing the propagation delay times, the power-delay products, and the area overhead, the proposed shifter is considered very suitable to future Very Deep Sub-Micron (VDSM) technologies with low-voltage applications.
Sildomar Takahashi MONTEIRO Yukio KOSUGI
This paper presents a novel feature extraction algorithm based on particle swarms for processing hyperspectral imagery data. Particle swarm optimization, originally developed for global optimization over continuous spaces, is extended to deal with the problem of feature extraction. A formulation utilizing two swarms of particles was developed to optimize simultaneously a desired performance criterion and the number of selected features. Candidate feature sets were evaluated on a regression problem. Artificial neural networks were trained to construct linear and nonlinear models of chemical concentration of glucose in soybean crops. Experimental results utilizing real-world hyperspectral datasets demonstrate the viability of the method. The particle swarms-based approach presented superior performance in comparison with conventional feature extraction methods, on both linear and nonlinear models.
The possibility of using three kinds of new type composite materials as material for high speed sliding contacts was investigated. The results of this investigation were compared with the results of the low speed tests that were reported earlier. As a result of the above, it was discovered that for high speed rotation in the range from 0.014 m/s to 2 m/s, the order of merit did not significantly change. Based on this, it was concluded that if solid lubricant is effectively supplied to the sliding surface, the influence by frictional heat generated by high speed is slight. Of the three kinds of composite material, it was clarified that composite material (CMML-1) had the lowest contact resistance and Composite Material (CMML-3) had the lowest maximum frictional coefficient of friction. 'CM' and 'ML' are initialisms for 'Composite Material' and 'Material of Lubrication' respectively. The number that is attached to the material name is a numeric value that was set by this laboratory.
Yoshinobu NAKAMURA Junya SEKIKAWA Takayoshi KUBONO
Ag and Pd electrical contact pairs are separated at constant separating speeds (5, 10 and 20 mm/s) in a DC 42 V/8.4 A resistive circuit. The motion of the breaking arc is observed with a high-speed video camera. For Ag contacts, the motion of the breaking arc becomes stable at a certain critical gap at separating speeds of 10 mm/s and 20 mm/s, and the breaking arc moves extensively at the separating speed of 5 mm/s. For Pd contacts, the breaking arc moves extensively regardless of the separating speed. These results are attributed to the following causes. For Ag contacts, the difference in the motion of arc spots at each separating speed is changed by the difference in the total energy input to the contacts. For Pd contacts, the temperature of the contact surfaces is kept high because of the lower thermal conductivity of Pd than Ag.