1-5hit |
Hansjorg HOFMANN Sakriani SAKTI Chiori HORI Hideki KASHIOKA Satoshi NAKAMURA Wolfgang MINKER
The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.
When we design a robust vector quantizer (VQ) for noisy channels, an appropriate index assignment function should be contrived to minimize the channel-error effect. For relatively high rates, the complexity for finding an optimal index assignment function is too high to be implemented. To overcome such a problem, we use a structurally constrained VQ, which is called the sample-adaptive product quantizer (SAPQ) [12], for low complexities of quantization and index assignment. The product quantizer (PQ) and its variation SAPQ [13], which are based on the scalar quantizer (SQ) and thus belong to a class of the binary lattice VQ [16], have inherent error resilience even though the conventional affine index assignment functions, such as the natural binary code, are employed. The error resilience of SAPQ is observed in a weak sense through worst-case bounds. Using SAPQ for noisy channels is useful especially for high rates, e.g., > 1 bit/sample, and it is numerically shown that the channel-limit performance of SAPQ is comparable to that of the best codebook permutation of binary switching algorithm (BSA) [23]. Further, the PQ or SAPQ codebook with an affine index assignment function is used for the initial guess of the conventional clustering algorithm, and it is shown that the performance of the best BSA can be easily achieved.
Image quality assessment method is a methodology that measures the difference of quality between the reference image and its distorted one. In this paper, we propose a novel reduced-reference (RR) quality assessment method for JPEG-2000 compressed images, which exploits the statistical characteristics of context information extracted through partial entropy decoding or decoding. These statistical features obtained in the process of JPEG-2000 encoding are transmitted to the receiver as side information and used to estimate the quality of images transmitted over various noisy channels at the decompression side. In the framework of JPEG-2000, the context of a current coefficient is determined depending on the pattern of the significance and/or the sign of its neighbors in three bit-plane coding passes and four coding modes. As the context information represents the local property of images, it can efficiently describe textured pattern and edge orientation. The quality of transmitted images is measured by the difference of entropy of context information between received and original images. Moreover, the proposed quality assessment method can directly process the images in the JPEG-2000 compressed domain without full decompression. Therefore, our proposed can accelerate the work of assessing image quality. Through simulations, we demonstrate that our method achieves fairly good performance in terms of the quality measurement accuracy as well as the computational complexity.
The effects of noisy estimates of fading on turbo-coded modulation are studied in the presence of flat Rayleigh fading, and the channel capacity of the system is calculated to determine the limit above which no reliable transmission is guaranteed. This limit is then compared to the signal-to-noise ratio required for a turbo-coded modulation scheme to achieve a bit-error-rate of 10-5. Numerical results are obtained, especially for QAM signals. Our results show that even slightly noisy estimates significantly degrade the theoretical limits related to channel capacities, and that an effective use of capacity-approaching codes can lower the sensitivity to noisy estimates, though noise that exceeds a certain threshold cannot be offset by the performance improvement associated with error-correcting capability.
This paper presents an improved pragmatic approach to coded modulation design which provides higher coding gains especially for very noisy channels including those with Rayleigh fading. The signal constellation using four equally utilized dimensions implemented with two correlative carrier frequencies is adopted to enhance the performance of the pragmatic approach previously proposed by Viterbi et al.. The proposed scheme is shown to perform much better by analysis of system performance parameters and extensive computer simulation for practical channel conditions. The bandwidth and power efficiencies are also analyzed and discussed to provide more design flexibility for different communications environments.