The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Naofumi AOKI(10hit)

1-10hit
  • A Band Extension Technique for G.711 Speech Using Steganography

    Naofumi AOKI  

     
    LETTER-Network

      Vol:
    E89-B No:6
      Page(s):
    1896-1898

    This study investigates a band extension technique for speech data encoded with G.711, the most common codec for digital speech communications system such as VoIP. The proposed technique employs steganography for the transmission of the side information required for the band extension. Due to the steganography, the proposed technique is able to enhance the speech quality without an increase of the amount of data transmission. From the results of a subjective experiment, it is indicated that the proposed technique may potentially be useful for improving the speech quality, compared with the conventional technique.

  • High Quality Speech Synthesis Based on the Reproduction of the Randomness in Speech Signals

    Naofumi AOKI  

     
    PAPER-Image & Signal Processing

      Vol:
    E84-A No:9
      Page(s):
    2198-2206

    A high quality speech synthesis technique based on the wavelet subband analysis of speech signals was newly devised for enhancing the naturalness of synthesized voiced consonant speech. The technique reproduces a speech characteristic of voiced consonant speech that shows unvoiced feature remarkably in the high frequency subbands. For mixing appropriately the unvoiced feature into voiced speech, a noise inclusion procedure that employed the discrete wavelet transform was proposed. This paper also describes a developed speech synthesizer that employs several random fractal techniques. These techniques were employed for enhancing especially the naturalness of synthesized purely voiced speech. Three types of fluctuations, (1) pitch period fluctuation, (2) amplitude fluctuation, and (3) waveform fluctuation were treated in the speech synthesizer. In addition, instead of a normal impulse train, a triangular pulse was used as a simple model for the glottal excitation pulse. For the compensation for the degraded frequency characteristic of the triangular pulse that overdecreases than the spectral -6 dB/oct characteristic required for the glottal excitation pulse, the random fractal interpolation technique was applied. In order to evaluate the developed speech synthesis system, psychoacoustic experiments were carried out. The experiments especially focused on how the mixed excitation scheme effectively contributed to enhancing the naturalness of voiced consonant speech. In spite that the proposed techniques were just a little modification for enhancing the conventional LPC (linear predictive coding) speech synthesizer, the subjective evaluation suggested that the system could effectively gain the naturalness of the synthesized speech that tended to degrade in the conventional LPC speech synthesis scheme.

  • Some Evaluations on a Digital Watermarking Technique for Music Data Using Distortion Effect

    Yuto MATSUNAGA  Tetsuya KOJIMA  Naofumi AOKI  Yoshinori DOBASHI  Tsuyoshi YAMAMOTO  

     
    PAPER-Information Network

      Pubricized:
    2019/03/13
      Vol:
    E102-D No:6
      Page(s):
    1119-1125

    We have proposed a novel concept of a digital watermarking technique for music data that focuses on the use of sound synthesis and sound effect techniques. This paper describes the details of our proposed technique that employs the distortion effect, one of the most common sound effects frequently utilized especially for guitar and bass instruments. This paper describes the experimental results of evaluating the resistance of the proposed technique against some basic malicious attacks utilizing MP3 coding, tempo alteration, pitch alteration, and high-pass filtering. It is demonstrated that the proposed technique potentially has appropriate resistance against such attacks except for the high-pass filtering attack. A technique for increasing the resistance against the high-pass filtering attack is also supplementarily discussed.

  • A Band Extension Technique for Narrow-Band Telephony Speech Based on Full Wave Rectification

    Naofumi AOKI  

     
    LETTER-Network

      Vol:
    E93-B No:3
      Page(s):
    729-731

    This study investigates a band extension technique for narrow-band telephony speech. The proposed technique employs full wave rectification that nonlinearly generates high-band overtones from the low band. In order to improve the conventional technique, this study investigates a frame-by-frame gain control based on the estimation of gain parameter from narrow-band telephony speech. A subjective evaluation indicates that the proposed technique outperforms the conventional technique.

  • A Fragile Digital Watermarking Technique by Number Theoretic Transform

    Hideaki TAMORI  Naofumi AOKI  Tsuyoshi YAMAMOTO  

     
    LETTER-Image/Visual Signal Processing

      Vol:
    E85-A No:8
      Page(s):
    1902-1904

    This paper suggests that a watermarking technique based on the number theoretic transform (NTT) may effectively be employed for detecting alterations on lossless digital master images. Due to its fragility, the NTT-based technique is sensitive to detecting alterations, compared with that based on the discrete Fourier transform (DFT).

  • Improving the Recognition Accuracy of a Sound Communication System Designed with a Neural Network

    Kosei OZEKI  Naofumi AOKI  Saki ANAZAWA  Yoshinori DOBASHI  Kenichi IKEDA  Hiroshi YASUDA  

     
    PAPER-Engineering Acoustics

      Pubricized:
    2021/05/06
      Vol:
    E104-A No:11
      Page(s):
    1577-1584

    This study has developed a system that performs data communications using high frequency bands of sound signals. Unlike radio communication systems using advanced wireless devices, it only requires the legacy devices such as microphones and speakers employed in ordinary telephony communication systems. In this study, we have investigated the possibility of a machine learning approach to improve the recognition accuracy identifying binary symbols exchanged through sound media. This paper describes some experimental results evaluating the performance of our proposed technique employing a neural network as its classifier of binary symbols. The experimental results indicate that the proposed technique may have a certain appropriateness for designing an optimal classifier for the symbol identification task.

  • A Technique of Lossless Steganography for G.711

    Naofumi AOKI  

     
    LETTER-Network

      Vol:
    E90-B No:11
      Page(s):
    3271-3273

    This study proposes a technique of lossless steganography for G.711, the most common codec for digital speech communications systems such as VoIP. The proposed technique exploits the characteristics of G.711 for embedding steganogram information without degradation. This paper shows the capacity of the proposed technique.

  • Fractal Modeling of Fluctuations in the Steady Part of Sustained Vowels for High Quality Speech Synthesis

    Naofumi AOKI  Tohru IFUKUBE  

     
    PAPER-Chaos, Bifurcation and Fractal

      Vol:
    E81-A No:9
      Page(s):
    1803-1810

    The naturalness of normal sustained vowels is considered to be attributable to the fluctuations observed in the steady part where speech signal is seemingly almost periodic. There always exist two kinds of involuntary fluctuations in the steady part of sustained vowels, even if the sustained vowels are phonated as steadily as possible. One is pitch period fluctuation and the other is waveform fluctuation. In this study, frequency analyses on these fluctuations were conducted in order to investigate their general characteristics. The results of the analyses suggested that the frequency characteristics of the fluctuations were possible to be approximated as 1/fβ-like, which is regarded as the specific feature of random fractal. Therefore, a procedure based on random fractal generation methods was proposed in order to produce these fluctuations for the improvement of the voice quality of synthesized sustained vowels. A series of psychoacoustic experiments was also conducted to evaluate the proposed technique. Experimental results indicated that the proposed technique was effective for synthesized sustained vowels to be perceived as human-like. Unlike the sustained vowels which were synthesized without pitch period fluctuation nor waveform fluctuation, the synthesized sustained vowels which contained the fluctuations were not perceived as buzzer-like, which is the major problem of the voice quality of synthesized sustained vowels. However, it was also found that both of the fluctuations were not always the acoustic cues for the naturalness of normal sustained vowels. The synthesized sustained vowels which contained the fluctuations whose frequency characteristics were the same as that of white noise were perceived as noise-like, which is not at all the voice quality of normal sustained vowels. The results of psychoacoustic experiments indicated that the frequency characteristics of the fluctuations, which are possible to be modeled as 1/fβ-like, were the significant factors for the naturalness of normal sustained vowels.

  • A Packet Loss Concealment Technique for VoIP Using Steganography

    Noriko KOMAKI  Naofumi AOKI  Tsuyoshi YAMAMOTO  

     
    LETTER

      Vol:
    E86-A No:8
      Page(s):
    2069-2072

    Speech quality of VoIP (Voice over Internet Protocol) may potentially be degraded by transmission errors such as packet loss and delay that are basically inevitable in best-effort communications. This study newly proposes an error concealment technique for such degradation by taking account of both sender-based and receiver-based techniques. In the proposed technique, sender-based side information, which is required by the receiver-based technique, is transmitted by using steganography, so that its datagram is completely compatible with the conventional format of VoIP. From experimental results of objective evaluation, it is indicated that the proposed technique may potentially be useful for improvement of speech quality, compared with the conventional technique.

  • Modification of Two-Side Pitch Waveform Replication Technique for VoIP Packet Loss Concealment

    Naofumi AOKI  

     
    LETTER-Network

      Vol:
    E87-B No:4
      Page(s):
    1041-1044

    Speech quality of VoIP (Voice over Internet Protocol) may potentially be degraded by transmission errors such as packet loss and delay which are basically inevitable in best-effort communications. This study investigates an error concealment technique for such degradation by using a receiver-based technique called pitch waveform replication. For enhancing the conventional technique, this study proposes a waveform reconstruction technique that also takes account of the pitch variation between the backward and forward frames of gap frames. From experimental results of objective evaluation, it is indicated that the proposed technique may potentially be useful for improving the speech quality, compared with the conventional technique.