The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] perceptual(42hit)

21-40hit(42hit)

  • A Cross Layer Perceptual Speech Quality Based Wireless VoIP Service

    Tein-Yaw CHUNG  Yung-Mu CHEN  Liang-Yi HUANG  

     
    PAPER

      Vol:
    E93-A No:11
      Page(s):
    2153-2162

    This paper proposes a cross layer wireless VoIP service which integrates an Adaptive QoS Playout (AQP) algorithm, E-model, Stream Control Transmission Protocol (SCTP), IEEE 802.21 Media Independent Handover (MIH) middleware and two user motion detection services. The proposed AQP algorithm integrates the effect of playout control and lost packet retransmission based on the E-model. Besides, by using the partial reliable transmission service from SCTP and the handoff notification from MIH services in a cross layer manner, AQP can reduce the lateness loss rate and improve speech quality under high frame error rates. In the simulations, the performance of AQP is compared with a fixed playout algorithm and four adaptive playout strategies. The simulation results show that the lateness loss rate of AQP is 2% lower than that of existing playout algorithms and the R-factor is 16% higher than the compared algorithms when a network has 50 ms wired propagation delay and 2.5% frame error rate.

  • Adaptive Spread-Transform Dither Modulation Using a New Perceptual Model for Color Image Watermarking

    Lihong MA  Dong YU  Gang WEI  Jing TIAN  Hanqing LU  

     
    PAPER-Information Network

      Vol:
    E93-D No:4
      Page(s):
    843-857

    Major challenges of the conventional spread-transform dither modulation (STDM) watermarking approach are two-fold: (i) it exploits a fixed watermarking strength (more particularly, the quantization index step size) to the whole cover image; and (ii) it is fairly vulnerable to the amplitude changes. To tackle the above challenges, an adaptive spread-transform dither modulation (ASTDM) approach is proposed in this paper for conducting robust color image watermarking by incorporating a new perceptual model into the conventional STDM framework. The proposed approach exploits a new perceptual model to adjust the quantization index step sizes according to the local perceptual characteristics of a cover image. Furthermore, in contrast to the conventional Watson's model is vulnerable to the amplitude changes, our proposed new perceptual model makes the luminance masking thresholds be consistent with any amplitude change, while keeping the consistence to the properties of the human visual system. In addition, certain color artifacts could be incurred during the watermark embedding procedure, since some intensity values are perceptibly changed to label the watermark. For that, a color artifact suppression algorithm is proposed by mathematically deriving an upper bound for the intensity values according to the inherent relationship between the saturation and the intensity components. Extensive experiments are conducted using 500 images selected from Corel database to demonstrate the superior performance of the proposed ASTDM approach.

  • Optimal Gain Filter Design for Perceptual Acoustic Echo Suppressor

    Kihyeon KIM  Hanseok KO  

     
    LETTER-Speech and Hearing

      Vol:
    E92-D No:6
      Page(s):
    1320-1323

    This Letter proposes an optimal gain filter for the perceptual acoustic echo suppressor. We designed an optimally-modified log-spectral amplitude estimation algorithm for the gain filter in order to achieve robust suppression of echo and noise. A new parameter including information about interferences (echo and noise) of single-talk duration is statistically analyzed, and then the speech absence probability and the a posteriori SNR are judiciously estimated to determine the optimal solution. The experiments show that the proposed gain filter attains a significantly improved reduction of echo and noise with less speech distortion.

  • A Subtractive-Type Speech Enhancement Using the Perceptual Frequency-Weighting Function

    Seiji HAYASHI  Hiroyuki INUKAI  Masahiro SUGUIMOTO  

     
    PAPER-Speech and Hearing

      Vol:
    E92-A No:1
      Page(s):
    226-234

    The present paper describes quality enhancement of speech corrupted by an additive background noise in a single-channel system. The proposed approach is based on the introduction of a perceptual criterion using a frequency-weighting filter in a subtractive-type enhancement process. Although this subtractive-type method is very attractive because of its simplicity, it produces an unnatural and unpleasant residual noise. Thus, it is difficult to select fixed optimized parameters for all speech and noise conditions. A new and effective algorithm is thus developed based on the masking properties of the human ear. This newly developed algorithm allows for an automatic adaptation in the time and frequency of the enhancement system and determines a suitable noise estimate according to the frequency of the noisy input speech. Experimental results demonstrate that the proposed approach can efficiently remove additive noise related to various kinds of noise corruption.

  • Masking Property Based Residual Acoustic Echo Cancellation for Hands-Free Communication in Automobile Environment

    Yoonjae LEE  Seokyeong JEONG  Hanseok KO  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:10
      Page(s):
    2528-2531

    A residual acoustic echo cancellation method that employs the masking property is proposed to enhance the speech quality of hands-free communication devices in an automobile environment. The conventional masking property is employed for speech enhancement using the masking threshold of the desired clean speech signal. In this Letter, either the near-end speech or residual noise is selected as the desired signal according to the double-talk detector. Then, the residual echo signal is masked by the desired signal (masker). Experiments confirm the effectiveness of the proposed method by deriving the echo return loss enhancement and by examining speech waveforms and spectrograms.

  • Locally Adaptive Perceptual Compression for Color Images

    Kuo-Cheng LIU  Chun-Hsien CHOU  

     
    PAPER-Image

      Vol:
    E91-A No:8
      Page(s):
    2213-2222

    The main idea in perceptual image compression is to remove the perceptual redundancy for representing images at the lowest possible bit rate without introducing perceivable distortion. A certain amount of perceptual redundancy is inherent in the color image since human eyes are not perfect sensors for discriminating small differences in color signals. Effectively exploiting the perceptual redundancy will help to improve the coding efficiency of compressing color images. In this paper, a locally adaptive perceptual compression scheme for color images is proposed. The scheme is based on the design of an adaptive quantizer for compressing color images with the nearly lossless visual quality at a low bit rate. An effective way to achieve the nearly lossless visual quality is to shape the quantization error as a part of perceptual redundancy while compressing color images. This method is to control the adaptive quantization stage by the perceptual redundancy of the color image. In this paper, the perceptual redundancy in the form of the noise detection threshold associated with each coefficient in each subband of three color components of the color image is derived based on the finding of perceptually indistinguishable regions of color stimuli in the uniform color space and various masking effects of human visual perception. The quantizer step size for the target coefficient in each color component is adaptively adjusted by the associated noise detection threshold to make sure that the resulting quantization error is not perceivable. Simulation results show that the compression performance of the proposed scheme using the adaptively coefficient-wise quantization is better than that using the band-wise quantization. The nearly lossless visual quality of the reconstructed image can be achieved by the proposed scheme at lower entropy.

  • An Effective QoS Control Scheme for 3D Virtual Environments Based on User's Perception

    Takayuki KURODA  Takuo SUGANUMA  Norio SHIRATORI  

     
    PAPER-Media Communication

      Vol:
    E91-D No:6
      Page(s):
    1604-1612

    In this paper, we present a new three-dimensional (3D) virtual environment (3DVE) system named "QuViE/P", which can enhance quality of service (QoS), that users actually feel, as good as possible when resources of computers and networks are limited. To realize this, we focus on characteristics of user's perceptual quality evaluation on 3D objects. We propose an effective QoS control scheme for QuViE/P by introducing relationships between system's internal quality parameters and user's perceptual quality parameters. This scheme can appropriately maintain the QoS of the 3DVE system and it is expected to improve convenience when using 3DVE system where resources are insufficient. We designed and implemented a prototype of QuViE/P using a multiagent framework. The experiment results show that even when the computer resource is reduced to 20% of the required amount, the proposed scheme can maintain the quality of important objects to a certain level.

  • Frame Splitting Scheme for Error-Robust Audio Streaming over Packet-Switching Networks

    Jong Kyu KIM  Jung Su KIM  Hwan Sik YUN  Joon-Hyuk CHANG  Nam Soo KIM  

     
    LETTER-Multimedia Systems for Communications

      Vol:
    E91-B No:2
      Page(s):
    677-680

    This letter presents a novel frame splitting scheme for an error-robust audio streaming over packet-switching networks. In our approach to perceptual audio coding, an audio frame is split into several subframes based on the network configuration such that each packet can be decoded independently at the receiver. Through a subjective comparison category rating (CCR) test, it is discovered that our approach enhances the quality of the decoded audio signal under the lossy packet-switching networks environment.

  • Speech Enhancement Based on Perceptually Comfortable Residual Noise

    Jong Won SHIN  Joon-Hyuk CHANG  Nam Soo KIM  

     
    LETTER-Multimedia Systems for Communications

      Vol:
    E90-B No:11
      Page(s):
    3323-3326

    In this letter, we propose a novel approach to speech enhancement, which incorporates a new criterion based on residual noise shaping. In the proposed approach, our goal is to make the residual noise perceptually comfortable instead of making it less audible. A predetermined `comfort noise' is provided as a target for the spectral shaping. Based on some assumptions, the resulting spectral gain function turns out to be a slight modification of the Wiener filter while requiring very low computational complexity. Subjective listening test shows that the proposed algorithm outperforms the conventional spectral enhancement technique based on soft decision and the noise suppression implemented in IS-893 Selectable Mode Vocoder.

  • Bandwidth Extension with Hybrid Signal Extrapolation for Audio Coding

    Chatree BUDSABATHON  Akinori NISHIHARA  

     
    PAPER

      Vol:
    E90-A No:8
      Page(s):
    1564-1569

    In this paper, we propose a blind method using hybrid signal extrapolation at the decoder to regenerate lost high-frequency components which are removed by encoders. At first, a decoded signal spectral resolution is enhanced by time domain linear predictive extrapolation and then the cut off frequency of each frame is estimated to avoid the spectrum gap between the end of original low frequency spectrum and the beginning of reconstructed high frequency spectrum. By utilizing a correlation between the high frequency spectrum and low frequency spectrum, the low frequency spectrum component is employed to reconstruct the high frequency spectrum component by frequency domain linear predictive extrapolation. Experimental results show an effective improvement of the proposed method in terms of SNR and human listening test results. The proposed method can be used to reconstruct the lost high frequency component to improve the perceptual quality of audio independent of the compression method.

  • Critical Band Subspace-Based Speech Enhancement Using SNR and Auditory Masking Aware Technique

    Jia-Ching WANG  Hsiao-Ping LEE  Jhing-Fa WANG  Chung-Hsien YANG  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:7
      Page(s):
    1055-1062

    In this paper, a new subspace-based speech enhancement algorithm is presented. First, we construct a perceptual filterbank from psycho-acoustic model and incorporate it in the subspace-based enhancement approach. This filterbank is created through a five-level wavelet packet decomposition. The masking properties of the human auditory system are then derived based on the perceptual filterbank. Finally, the prior SNR and the masking threshold of each critical band are taken to decide the attenuation factor of the optimal linear estimator. Five different types of in-car noises in TAICAR database were used in our evaluation. The experimental results demonstrated that our approach outperformed conventional subspace and spectral subtraction methods.

  • Single Channel Speech Enhancement Based on Perceptual Frequency-Weighting

    Seiji HAYASHI  Masahiro SUGUIMOTO  

     
    LETTER-Speech and Hearing

      Vol:
    E90-D No:6
      Page(s):
    998-1001

    The present paper describes a quality enhancement of speech corrupted by additive background noise in a single channel system. The proposed approach is based on the introduction of perceptual criteria using a frequency-weighting filter in a subtractive-type enhancement process. This newly developed algorithm allows for an automatic adaptation in the time and frequency of the enhancement system and finds a suitable noise estimate according to the frequency of the corrupted speech. Experimental results show that the proposed approach can efficiently remove additive noise related to various types of noise corruption.

  • Dithered Subband Coding with Spectral Subtraction

    Chatree BUDSABATHON  Akinori NISHIHARA  

     
    PAPER-Digital Signal Processing

      Vol:
    E89-A No:6
      Page(s):
    1788-1793

    In this paper, we propose a combination-based novel technique of dithered subband coding with spectral subtraction for improving the perceptual quality of coded audio at low bit rates. It is well known that signal-correlated distortion is audible when the audio signal is quantized at bit rates lower than the lower bound of perceptual coding. We show that this problem can be overcome by applying the dithering quantization process in each subband. Consequently, the quantization noise is rendered into a signal-independent white noise; this noise is then estimated and removed by spectral subtraction at the decoder. Experimental results show an effective improvement by the proposed method over the conventional one in terms of better SNR and human listening test results. The proposed method can be combined with other existing or future coding methods such as perceptual coding to improve their performance at low bit rates.

  • An Explicit-Form Gain Factor for Speech Enhancement Using Spectral-Domain-Constrained Approach

    Ching-Ta LU  Hsiao-Chuan WANG  

     
    PAPER-Speech and Hearing

      Vol:
    E89-D No:3
      Page(s):
    1195-1202

    Employing noise masking threshold (NMT) to adapt a speech enhancement system has become popular due to the advantage of rendering the residual noise to perceptually white. Most methods employ the NMT to empirically adjust the parameters of a speech enhancement system according to the various properties of noise. In this article, without any predefined empirical factor, an explicit-form gain factor for a frequency bin is derived by perceptually constraining the residual noise below the NMT in spectral domain. This perceptual constraint preserves the spectrum of noisy speech when the level of residual noise is less than the NMT. If the level of residual noise exceeds the NMT, then the spectrum of noisy speech is suppressed to reduce the corrupting noise. Experimental results show that the proposed approach can efficiently remove the added noise in cases of various noise corruptions, and almost free from musical residual noise.

  • Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-Based Telephony Open Access

    Nobuhiko KITAWAKI  

     
    INVITED PAPER

      Vol:
    E89-B No:2
      Page(s):
    262-272

    This paper describes the author's perspective on multimedia quality prediction methodologies for multimedia communications in advanced mobile and internet protocol (IP)-based telephony, and reports related experiments and trials. First, the paper describes the need for perceptual QoS (Quality of Service) assessment in which various quality factors in multimedia communications for advanced mobile and IP-based telephony are analyzed. Then an objective quality prediction scheme is proposed from the viewpoints of quality measurement tools for each quality factor and an opinion model for compound quality factors in mobile and IP-based communications networks. Finally, the author's current trials of measurement tools and opinion models are described.

  • Designing Target Cost Function Based on Prosody of Speech Database

    Kazuki ADACHI  Tomoki TODA  Hiromichi KAWANAMI  Hiroshi SARUWATARI  Kiyohiro SHIKANO  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E88-D No:3
      Page(s):
    519-524

    This research aims to construct a high-quality Japanese TTS (Text-to-Speech) system that has high flexibility in treating prosody. Many TTS systems have implemented a prosody control system but such systems have been fundamentally designed to output speech with a standard pitch and speech rate. In this study, we employ a unit selection-concatenation method and also introduce an analysis-synthesis process to provide precisely controlled prosody in output speech. Speech quality degrades in proportion to the amount of prosody modification, therefore a target cost for prosody is set to evaluate prosodic difference between target prosody and speech candidates in such a unit selection system. However, the conventional cost ignores the original prosody of speech segments, although it is assumed that the quality deterioration tendency varies in relation to the pitch or speech rate of original speech. In this paper, we propose a novel cost function design based on the prosody of speech segments. First, we recorded nine databases of Japanese speech with different prosodic characteristics. Then with respect to the speech databases, we investigated the relationships between the amount of prosody modification and the perceptual degradation. The results indicate that the tendency of perceptual degradation differs according to the prosodic features of the original speech. On the basis of these results, we propose a new cost function design, which changes a cost function according to the prosody of a speech database. Results of preference testing of synthetic speech show that the proposed cost functions generate speech of higher quality than the conventional method.

  • Judgment Biases of Temporal Order during Apparent Self-Motion

    Wataru TERAMOTO  Hiroshi WATANABE  Hiroyuki UMEMURA  Katsunori MATSUOKA  Shinichi KITA  

     
    PAPER

      Vol:
    E87-D No:6
      Page(s):
    1466-1476

    Virtual reality system is one of the most useful tools for investigating the characteristics of human perception in dynamic visual environment because we can easily and appropriately manipulate parameters of three-dimensional stimuli of vision in accordance with our purpose. In the present study we examined how the brain processes local stimuli during the global sensation of self-motion (vection) in view of temporal information processing -- perceptual latency -- with temporal order judgment task. In Experiment 1 we demonstrated that the targets in the left visual field were perceived prior to those in the right visual field when an observer stared at rightward optokinetic stimuli or perceived self-motion leftward, and vice versa. Especially at 16.0 deg of target eccentricity the biases were much larger with the continuous exposure of optokinetic stimuli than with their intermittent exposure; the former compelled observers to perceive self-motion and the latter hardly did. In Experiment 2 we examined the relationship between the occurrence of vection and temporal order judgments as the exposure duration of optokinetic stimuli was fixed between conditions, and showed that the biases were larger when vection occurred than when it did not. In Experiment 3 we showed that the biases were not modulated by the speed of optokinetic stimuli and not related with the speed of perceived self-motion. This phenomenon can be explained based on exogenous components of attention, the shift of the reference frame for determining the order in which objects come into awareness and imbalance between hemispheric activities. The mechanism is ecologically reasonable in that it allows us to be aware of the incoming events as soon as possible and to avoid any dangerous situations.

  • Color Transfer between Images Based on Basic Color Category

    Youngha CHANG  Suguru SAITO  Masayuki NAKAJIMA  

     
    PAPER-Image Processing, Image Pattern Recognition

      Vol:
    E86-D No:12
      Page(s):
    2780-2785

    Usually, paintings are more appealing than photographic images. This is because paintings can incorporate styles based on the artist's subjective view of motif. This style can be distinguished by looking at elements such as motif, color, shape deformation and brush texture. In our work, we focus on the effect of "color" element and devise a method for transforming the color of an input photograph according to a reference painting. To do this, we consider basic color category concepts in the color transformation process. We assume that color transformations from one basic color category to another may cause peculiar feelings. Therefore, we restrict each color transformation within the same basic color category. For this, our algorithm first categorizes each pixel color of a photograph into one of eleven basic color categories. Next, for every pixel color of the photograph, the algorithm finds its corresponding color in the same category of a reference painting. Finally, the algorithm substitutes the pixel color with its corresponding color. In this way, we achieve large but natural color transformations of an image.

  • Comparative Assessment of Test Signals Used for Measuring Residual Echo Characteristics

    Nobuhiko KITAWAKI  Takeshi YAMADA  Futoshi ASANO  

     
    PAPER-Network

      Vol:
    E86-B No:3
      Page(s):
    1102-1108

    Appropriate test signals defined by formula or generated by algorithm are used for measuring objective QoS (Quality of Services) for voice operated telecommunication devices such as telephone and speech codec (coder-decoder). However, that for measuring residual echo characteristics in hands-free telecommunications equipped with acoustic echo canceller is under study in ITU-T Recommendation G.167. This paper describes comparative assessment of test signals for measurement of residual echo characteristics. In hands-free telecommunications, acoustical echo canceller has been developed to remove a room echo signal through the loudspeaker to the microphone in the receiving end. Performance of the echo canceller system is evaluated by residual echo characteristics expressed in echo return loss enhancement (ERLE). The ERLE can be conventionally measured by putting white noise into the echo canceller system. However, white noise is not adequate as the test signal for measuring the performance of the echo canceller, since the performance may depend on the characteristics of input test signal, and the characteristics of the white noise differ from those of real voice. Therefore, this paper discusses appropriate characteristics of real voice required for objective quality evaluation of echo canceller system. The test signals used for this verification tests were real voice (RV), white noise (WN), frequency weighted noise (FWN), artificial voice (AV), and composite source signal (CSS) depending on the approximation of real voice characteristics. As the comparative assessment results, the ERLE characteristics measured by artificial voice conforming to ITU-T Recommendation P.50 having average characteristics of real voices in time and frequency domains are almost equivalent to those of real voice and best among those test signals. It is concluded that artificial voice P.50 is satisfied with measurement of residual echo characteristics.

  • Nonlinear Attractive Force Model for Perceptual Clustering and Geometrical Illusions

    Hiroyuki MATSUNAGA  Kiichi URAHAMA  

     
    PAPER-Neural Nets and Human Being

      Vol:
    E79-A No:10
      Page(s):
    1587-1594

    A mathematical model based on an optimization formulation is presented for perceptual clustering of dot patterns. The features in the present model are its nonlinearity enabling the model to reveal hysteresis phenomena and its scale invariance. The clustering of dots is given by the mutual linking of dots by virtual lines. Every dot is assumed to be perceived at locations displaced from their original places. It is exemplified with simulations that the model can produce a hierarchical clustering of dots by variation in thresholds for the wiring of virtual lines and also the model can additionally reproduce some geometrical illusions semiquantitatively. This model is further extended for perceptual grouping in line segment patterns and geometrical illusions obsrved in those patterns are reproduced by the extended model.

21-40hit(42hit)