Conventional confidence measures for assessing the reliability of ASR (automatic speech recognition) output are typically derived from "low-level" information which is obtained during speech recognition decoding. In contrast to these approaches, we propose a novel utterance verification framework which incorporates "high-level" knowledge sources. Specifically, we investigate two application-independent measures: in-domain confidence, the degree of match between the input utterance and the application domain of the back-end system, and discourse coherence, the consistency between consecutive utterances in a dialogue session. A joint confidence score is generated by combining these two measures with an orthodox measure based on GPP (generalized posterior probability). The proposed framework was evaluated on an utterance verification task for spontaneous dialogue performed via a (English/Japanese) speech-to-speech translation system. Incorporating the two proposed measures significantly improved utterance verification accuracy compared to using GPP alone, realizing reductions in CER (confidence error-rate) of 11.4% and 8.1% for the English and Japanese sides, respectively. When negligible ASR errors (that do not affect translation) were ignored, further improvement was achieved for the English side, realizing a reduction in CER of up to 14.6% compared to the GPP case.
Weifeng LI Katsunobu ITOU Kazuya TAKEDA Fumitada ITAKURA
We address issues for improving hands-free speech enhancement and speech recognition performance in different car environments using a single distant microphone. This paper describes a new single-channel in-car speech enhancement method that estimates the log spectra of speech at a close-talking microphone based on the nonlinear regression of the log spectra of noisy signal captured by a distant microphone and the estimated noise. The proposed method provides significant overall quality improvements in our subjective evaluation on the regression-enhanced speech, and performed best in most objective measures. Based on our isolated word recognition experiments conducted under 15 real car environments, the proposed adaptive nonlinear regression approach shows an advantage in average relative word error rate (WER) reductions of 50.8% and 13.1%, respectively, compared to original noisy speech and ETSI advanced front-end (ETSI ES 202 050).
Sakriani SAKTI Satoshi NAKAMURA Konstantin MARKOV
Over the last decade, the Bayesian approach has increased in popularity in many application areas. It uses a probabilistic framework which encodes our beliefs or actions in situations of uncertainty. Information from several models can also be combined based on the Bayesian framework to achieve better inference and to better account for modeling uncertainty. The approach we adopted here is to utilize the benefits of the Bayesian framework to improve acoustic model precision in speech recognition systems, which modeling a wider-than-triphone context by approximating it using several less context-dependent models. Such a composition was developed in order to avoid the crucial problem of limited training data and to reduce the model complexity. To enhance the model reliability due to unseen contexts and limited training data, flooring and smoothing techniques are applied. Experimental results show that the proposed Bayesian pentaphone model improves word accuracy in comparison with the standard triphone model.
Shoei SATO Kazuo ONOE Akio KOBAYASHI Toru IMAI
This paper proposes a new compensation method of acoustic scores in the Viterbi search for robust speech recognition. This method introduces noise models to represent a wide variety of noises and realizes robust decoding together with conventional techniques of subtraction and adaptation. This method uses likelihoods of noise models in two ways. One is to calculate a confidence factor for each input frame by comparing likelihoods of speech models and noise models. Then the weight of the acoustic score for a noisy frame is reduced according to the value of the confidence factor for compensation. The other is to use the likelihood of noise model as an alternative that of a silence model when given noisy input. Since a lower confidence factor compresses acoustic scores, the decoder rather relies on language scores and keeps more hypotheses within a fixed search depth for a noisy frame. An experiment using commentary transcriptions of a broadcast sports program (MLB: Major League Baseball) showed that the proposed method obtained a 6.7% relative word error reduction. The method also reduced the relative error rate of key words by 17.9%, and this is expected lead to an improvement metadata extraction accuracy.
When the frame size is downscaled for video transcoding, the new motion vector (MV) must be computed. This paper presents an algorithm to utilize the activity measurement by DC value and the number of non-zero quantized DCT coefficients in the residual macroblock to compose the motion vector. It can reduce the complexity for motion estimation and improve the performance of the spatial domain video transcoder.
Noriko Y. YAMASAKI Yoh TAKEI Kensuke MASUI Kazuhisa MITSUDA Toshimitsu MOROOKA Satoshi NAKAYAMA
In frequency-domain multiplexing (FDM) for TES signals, a magnetic field summation method utilizing a multi-input SQUID has the fundamental merit of small degradation of the signal-to-noise ratio. We formulated shifts of the operation point due to a common impedance and cross talk currents. These effects are evaluated for several FDM methods, and the requirements for the bandwidth and filters are summarized. The design parameters of multi-input SQUIDs and a flux locked loop driving circuits are also presented.
Masakazu MIZOKAMI Kawori TAKAKUBO Hajime TAKAKUBO
A four-quadrant-input linear transconductor generating a product or a product sum current is proposed. The proposed circuit eliminates the influence of channel length modulation and expands a dynamic input voltage range. As an application of the proposed circuit, the four-quadrant analog multiplier is designed. The four-quadrant analog multiplier consists of the proposed circuit, an input circuit and a class AB current buffer. HSPICE simulation results with 0.35 µm n-well single CMOS process parameter are shown in order to evaluate the proposed circuit.
Isamu YAMAGUCHI Fujihiko MATSUMOTO Makoto IZUMA Yasuaki NOGUCHI
Linearity of a transconductor with a theoretical linear characteristic is deteriorated by mobility degradation, in practice. In this paper, a technique to improve the linearity by combining a source-coupled pair with the transconductor is proposed. The proposed transconductor is the circuit that the deteriorated linearity of the conventional part is compensated by the transconductance characteristic of the source-coupled pair. In order to confirm the validity of the proposed technique, SPICE simulation is carried out. The transconductance change ratio of the proposed technique is about 1% and is 1/10 or less of the conventional circuit.
Hyung-Min YOON Woo-Shik KANG Oh-Young KWON Seong-Hun JEONG Bum-Seok KANG Tack-Don HAN
New service concepts involving mobile devices with a diverse range of embedded sensors are emerging that share contexts supporting communication on a wireless network infrastructure. To promote these services in mobile devices, we propose a method that can efficiently detect a context provider by partitioning the location, time, speed, and discovery sensitivities.
Giscard WEPIWE Plamen L. SIMEONOV
The paper presents HiPeer, a robust resource distribution and discovery algorithm that can be used for fast and fault-tolerant location of resources in P2P network environments. HiPeer defines a concentric multi-ring overlay networking topology, whereon dynamic network management methods are deployed. In terms of performance, HiPeer delivers of number of lowest bounds. We demonstrate that for any De Bruijn digraph of degree d 2 and diameter DDB HiPeer constructs a highly reliable network, where each node maintains a routing table with at most 2d+2 entries independent of the number N of nodes in the system. Further, we show that any existing resource in the network with at most d nodes can be found within at most DHiPeer = log d(N(d-1)+d)-1 overlay hops. This result is as close to the Moore bound [1] as the query path length in other outstanding P2P proposals based on the De Bruijn digraphs. Thus, we argue that HiPeer defines a highly connected network with connectivity d and the lowest yet known lookup bound DHiPeer. Moreover, we show that any node's "join or leave" operation in HiPeer implies a constant expected reorganization cost of the magnitude order of O(d) control messages.
Tadashi KAWAZOE Kiyoshi KOBAYASHI Motoichi OHTSU
We observed the optically forbidden energy transfer between cubic CuCl quantum dots coupled via an optical near-field interaction using time-resolved near-field photoluminescence (PL) spectroscopy. The energy transfer time and exciton lifetime were estimated from the rise and decay times of the PL pump-probe signal, respectively. We found that the exciton lifetime increased as the energy transfer time fell. This result strongly supports the notion that near-field interaction between QD makes the anti-parallel dipole coupling. Namely, a quantum-dots pair coupled by an optical near field has a long exciton lifetime which indicates the anti-parallel coupling of QDs forming a weakly radiative quadrupole state.
Makoto HASEGAWA Masato AKITA Kazutaka IZUMI Takayoshi KUBONO
We initiated development of our own data processing software for laser microscope data with C# language. This software is provided with volume calculation function of a target portion, based on a new calculation algorithm that can precisely handle the volume calculation of the portion located on a tilted surface or on a distorted surface. In this paper, this algorithm and some exemplary results obtained thereby, as well as some further development aims, are briefly described.
Junya SEKIKAWA Tetsuya KITAJIMA Takayoshi ENDO Takayoshi KUBONO
The motion of arc spots of breaking arc is investigated for Ag electrical contacts in DC 42 V/10 A resistive circuit using a high-speed camera. Also, the eroded contact surfaces are observed with a microscope after each breaking operation. As results, some kinds of different films and eroded regions are distinguished. Diameters of these regions are corresponding to the widths of the cathode and anode spot regions that are obtained by using the high-speed camera. It is found that the films and eroded regions on the electrical contacts are generated at different stages of the breaking arc.
Hyeon-Ho KIM Sung-Hwan HAN Hyeon-Deok BAE
Recently, DOAS (differential optical absorption spectroscopy) has been used for nondestructive air monitoring, in which the LS (least squares) method is used to calculate trace gas concentrations due to its computational simplicity. This paper applies the ICA (independent component analysis) method to the DOAS system of air monitoring, since the LS method is insufficient to recover the desired spectra perfectly due to sparsity characteristic. If the sparsity of reference spectra in the DOAS system imposes the assumption of independence, the ICA algorithm can be used. The proposed method is used to regress the observed spectrum on the estimates of the reference spectra. The ICA algorithm can be seen as a preprocessing method where the ICs of the references are used as the input in the regression. The performance of the proposed method is evaluated in simulation studies using synthetic data.
Yasuo SAMBE Shintaro WATANABE Dong YU Taichi NAKAMURA Naoki WAKAMIYA
This paper describes a distributed video transcoding system that can simultaneously transcode an MPEG-2 video file into various video coding formats with different rates. The transcoder divides the MPEG-2 file into small segments along the time axis and transcodes them in parallel. Efficient video segment handling methods are proposed that minimize the inter-processor communication overhead and eliminate temporal discontinuities from the re-encoded video. We investigate how segment transcoding should be distributed to obtain the shortest total transcoding time. Experimental results show that implementing distributed transcoding on 10 PCs can decrease the total transcoding time by a factor of about 7 for single transcoding and by a factor of 9.5 for simultaneous three kinds of transcoding rates.
Qun WU Yu-Ming WU Jia-Hui FU Bo-Shi JIN Jong-Chul LEE
This paper presents a cascode-pair distributed amplifier design approach using 0.25 µm GaAs-based PHEMT MMIC technology, which covers 2-32 GHz. Electromagnetic simulation results show that this amplifier achieves 18 dB gain from 2 to 32 GHz and 0.5 dB gain flatness over the band. The reflected coefficients at the input and output ports are below -10 dB up to 27 GHz. The output power at 1 dB compression is greater than 24 dBm at 20 GHz. An appropriate feedback resistance can be utilized to improve P1 dB for about 6 dBm. The DOE (design of experiment) approach is carried out by a simulation tool for better performance and tolerance of the devices is also analyzed. The circuit configuration is capable of operating over ultra-broad band amplification.
The latest video coding standard, H.264/AVC, adopts 44 approximate transform instead of 88 discrete cosine transform (DCT) to avoid the inverse transform mismatch problem. However, that is only one of the factors that make it difficult to transcode pre-coded video contents with the previous standards to H.264/AVC in the common domain without causing cascaded pixel-domain transcoding. In this paper, to support the existent DCT-domain transcoding schemes and to reduce computational complexity, we propose an efficient algorithm that converts the quantized 88 DCT block into four newly quantized 44 transformed blocks. The experimental results show that the proposed scheme reduces computational complexity by 5-11% and improves video quality by 0.1-0.5 dB compared with the cascaded pixel-domain transcoding scheme that exploits inverse quantization (IQ), inverse DCT (IDCT), DCT, and re-quantization (re-Q).
Recent microprocessors have included SIMD (single instruction multiple data) extensions into their instruction set architecture to improve the performance of multimedia applications. SIMD instructions speed up the execution of programs but pose lots of challenges to software developers. An efficient matrix-based splitter (or merger), which can split an N N 2-D DCT block into four N/2 N/2 or two N N/2 (or N/2 N) 2-D DCT blocks (or merger small size blocks into a large size one), specialized for SIMD architectures is presented in this paper. The programming-level complexity of the proposed methods is lower than that of the direct approach. Furthermore, even without using SIMD instructions, the algorithmic-level complexity of the proposed DCT splitter/merger is still lower than that of the direct one and is the same as that of the most efficient approach existed in the literature. When N = 8, our method can be applied to act as a transcoder between the latest video coding standards AVC/H.264 and the older ones, such as MPEG-1, MPEG-2 and MPEG-4 part 2. We also provide the image quality tests to show the performance of the proposed 2-D DCT splitter and merger.
When video data are transmitted via the network, the quality of video data must be carefully chosen to be best under the condition that the transmission is not influenced by other internet services. They often use the simulcast type, which uses independent streams that are stored and transmitted for the quality, considering implementation, when they select the video quality. On the other hand, we had already proposed the scalable structure, which consists of base and enhancement data, but when they require the high quality video, these data are combined using the transcoding methods. In this paper, we propose the video contents delivery methods with scalable transcoding, in which users can update the quality of video data even after the transmission by base data and differential data. In order to reduce the total time of not only users' access time, but also watching time, we compare simulcast method with proposed methods in the total content utilization time using a video contents access model, and evaluate required transcoding time to reduce the waiting time of users.
Aranzazu OTIN Santiago CELMA Concepcion ALDEA
In this paper we report a 3rd-order Gm-C filter based on pseudo-differential continuous-time transconductors for applications in low-voltage systems over VHF range. By using a 0.18 µm pure digital CMOS process, a prototype low pass filter with -3 dB frequency programmable from 38 MHz to 213 MHz confirms the feasibility of the proposed filter in applications such as data storage systems.