The search functionality is under construction.

Author Search Result

[Author] Masahiro SERIZAWA(8hit)

1-8hit
  • A Fast Method of Calculating High-Order Backward LP Coefficients for Wideband CELP Coders

    Masahiro SERIZAWA  Kazunori OZAWA  Atsushi MURASHIMA  

     
    PAPER-Speech and Hearing

      Vol:
    E83-D No:4
      Page(s):
    870-875

    This paper proposes a fast method of calculating high-order backward Linear Prediction (LP) coefficients for wideband Code Excited LP (CELP) coders operating at around 16 kbit/s. The fast calculation is achieved by a recursive calculation for the high-order autocorrelation of the decoded signal. The recursive calculation can be employed thanks to a novel method of converting the autocorrelation of the decoded signal to that of the residual signal. High-order backward LP coefficients are computed from the autocorrelation of the residual signal using the Levinson-Durbin (LD) procedure. The conversion approximately performs inverse-filtering using LP coefficients representing a corresponding envelope spectrum. Due to the recursive calculation, the proposed fast calculation method achieves 30% to 45% reduction in computations to calculate the high-order backward LP coefficients compared to the conventional method. Subjective tests show that a wideband Multi-Pulse based CELP (MP-CELP) coder at 16 kbit/s with the proposed method achieves comparable coding quality to that with the conventional one with 35% reduction in computations needed for calculation of the backward LP coefficients.

  • Video-Quality Estimation Based on Reduced-Reference Model Employing Activity-Difference

    Toru YAMADA  Yoshihiro MIYAMOTO  Yuzo SENDA  Masahiro SERIZAWA  

     
    PAPER-Evaluation

      Vol:
    E92-A No:12
      Page(s):
    3284-3290

    This paper presents a Reduced-reference based video-quality estimation method suitable for individual end-user quality monitoring of IPTV services. With the proposed method, the activity values for individual given-size pixel blocks of an original video are transmitted to end-user terminals. At the end-user terminals, the video quality of a received video is estimated on the basis of the activity-difference between the original video and the received video. Psychovisual weightings and video-quality score adjustments for fatal degradations are applied to improve estimation accuracy. In addition, low-bit-rate transmission is achieved by using temporal sub-sampling and by transmitting only the lower six bits of each activity value. The proposed method achieves accurate video quality estimation using only low-bit-rate original video information (15 kbps for SDTV). The correlation coefficient between actual subjective video quality and estimated quality is 0.901 with 15 kbps side information. The proposed method does not need computationally demanding spatial and gain-and-offset registrations. Therefore, it is suitable for real-time video-quality monitoring in IPTV services.

  • Reduced-Reference Video Quality Estimation Using Representative Luminance

    Toru YAMADA  Yoshihiro MIYAMOTO  Masahiro SERIZAWA  Takao NISHITANI  

     
    PAPER-Measurement Technology

      Vol:
    E95-A No:5
      Page(s):
    961-968

    This paper proposes a video-quality estimation method based on a reduced-reference model for realtime quality monitoring in video streaming services. The proposed method chooses representative-luminance values for individual original-video frames at a server side and transmits those values, along with the pixel-position information of the representative-luminance values in each frame. On the basis of this information, peak signal-to-noise ratio (PSNR) values at client sides can be estimated. This enables realtime monitoring of video-quality degradation by transmission errors. Experimental results show that accurate PSNR estimation can be achieved with additional information at a low bit rate. For SDTV video sequences which are encoded at 1 to 5 Mbps, accurate PSNR estimation (correlation coefficient of 0.92 to 0.95) is achieved with small amount of additional information of 10 to 50 kbps. This enables accurate realtime quality monitoring in video streaming services without average video-quality degradation.

  • Noise Suppression with High Speech Quality Based on Weighted Noise Estimation and MMSE STSA

    Masanori KATO  Akihiko SUGIYAMA  Masahiro SERIZAWA  

     
    PAPER-Digital Signal Processing

      Vol:
    E85-A No:7
      Page(s):
    1710-1718

    A noise suppression algorithm with high speech quality based on weighted noise estimation and MMSE STSA is proposed. The proposed algorithm continuously updates the estimated noise by weighted noisy speech in accordance with an estimated SNR. The spectral gain is modified with the estimated SNR so that it can better utilize the improvement in noise estimation. With a better noise estimate, a more correct SNR is obtained resulting in the enhanced speech with low distortion. Subjective evaluation results show that five-grade mean opinion scores of the new algorithm with and without a speech codec are improved by as much as 0.35 and 0.40 respectively, compared with either the original MMSE STSA or the EVRC noise suppression algorithm.

  • A Packet Loss Recovery Method Using Packets Arrived behind the Playout Time for CELP Decoding

    Masahiro SERIZAWA  Hironori ITO  

     
    PAPER-Speech and Hearing

      Vol:
    E86-D No:12
      Page(s):
    2775-2779

    This paper proposes a packet loss recovery method using packets arrived behind the playout time for CELP (Code Excited Liner Prediction) decoding. The proposed method recovers synchronization of the filter states between encoding and decoding in the period following packet loss. The recovery is performed by replacing the degraded filter states with the ones calculated from the late arrival packet in decoding. When the proposed method is applied to the AMR (Adaptive Multi-Rate) speech decoder, it improves the segmental SNR (Signal-to-Noise Ratio) by 0.2 to 1.8 dB at packet loss rates of 2 to 10 % in case that all the packet losses occur due to their late arrival. PESQ (Perceptual Evaluation of Speech Quality) results also show that the proposed method slightly improves the speech quality. The subjective test results show that five-grade mean opinion scores are improved by 0.35 and 0.28 at a packet loss rate of 5 % at speech coding bitrates of 7.95 and 12.2 kbit/s, respectively.

  • A Silence Compression Algorithm for the Multi-Rate Dual-Bandwidth MPEG-4 CELP Standard

    Masahiro SERIZAWA  Hironori ITO  Toshiyuki NOMURA  

     
    PAPER-Speech and Audio Coding

      Vol:
    E86-D No:3
      Page(s):
    412-417

    This paper proposes a silence compression algorithm operating at multi-rates (MR) and with dual-bandwidths (DB), a narrowband and a wideband, for the MPEG (Moving Picture Experts Group)-4 CELP (Code Excited Linear Prediction) standard. The MR/DB operations are implemented by a Variable-Frame-size/Dual-Bandwidth Voice Activity Detection (VF/DB-VAD) module with bandwidth conversions of the input signal, and a Variable-Frame-size Comfort Noise Generator (VF-CNG) module. The CNG module adaptively smoothes the Root Mean Square (RMS) value of the input signal to improve the coding quality during transition periods. The algorithm also employs a Dual-Rate Discontinuous Transmission (DR-DTX) module to reduce an average transmission bitrate during silence periods. Subjective test results show that the proposed silence compression algorithm gives no degradation in coding quality for clean and noisy speech signals. These signals include about 20 to 30% non-speech frames and the average transmission bitrates are reduced by 20 to 40%. The proposed algorithm has been adopted as a part of the ISO/IEC MPEG-4 CELP version 2 standard.

  • M-LCELP Speech Coding at 4kb/s with Multi-Mode and Multi-Codebook

    Kazunori OZAWA  Masahiro SERIZAWA  Toshiki MIYANO  Toshiyuki NOMURA  Masao IKEKAWA  Shin-ichi TAUMI  

     
    PAPER

      Vol:
    E77-B No:9
      Page(s):
    1114-1121

    This paper presents the M-LCELP (Multi-mode Learned Code Excited LPC) speech coder, which has been developed for the next generation half-rate digital cellular telephone systems. M-LCELP develops the following techniques to achieve high-quality synthetic speech at 4kb/s with practically reasonable computation and memory requirements: (1) Multi-mode and multi-codebook coding to improve coding efficiency, (2) Pitch lag differential coding with pitch tracking to reduce lag transmission rate, (3) A two-stage joint design regular-pulse codebook with common phase structure in voiced frames, to drastically reduce computation and memory requirements, (4) An efficient vector quantization for LSP parameters, (5) An adaptive MA type comb filter to suppress excitation signal inter-harmonic noise. The MOS subjective test results demonstrate that 4.075kb/s M-LCELP synthetic speech quality is mostly equivalent to that for a North American full-rate standard VSELP coder. M-LCELP codec requires 18 MOPS computation amount. The codec has been implemented using 2 floating-point dsp chips.

  • 4 kbps Improved Pitch Prediction CELP Speech Coding with 20 msec Frame

    Masahiro SERIZAWA  Kazunori OZAWA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    758-763

    This paper proposes a new pitch prediction method for 4 kbps CELP (Code Excited LPC) speech coding with 20 msec frame, for the future ITU-T 4 kbps speech coding standardization. In the conventional CELP speech coding, synthetic speech quality deteriorates rapidly at 4 kbps, especially for female and children's speech with short pitch period. The pitch prediction performance is significantly degraded for such speech. The important reason is that when the pitch period is shorter than the subframe length, the simple repetition of the past excitation signal based on the estimated lag, not the pitch prediction, is usually carried out in the adaptive codebook operation. The proposed pitch prediction method can carry out the pitch prediction without the above approximation by utilizing the current subframe excitation codevector signal, when the pitch prediction parameters are determined. To further improve the performance, a split vector synthesis and perceptually spectral weighting method, and a low-complexity perceptually harmonic and spectral weighting method have also been developed. The informal listening test result shows that the 4 kbps speech coder with 20 msec frame, utilizing all of the proposed improvements, achieves 0.2 MOS higher results than the coder without them.