The search functionality is under construction.

Keyword Search Result

[Keyword] sub-band(19hit)

1-19hit
  • Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials Open Access

    Takaaki SAEKI  Yuki SAITO  Shinnosuke TAKAMICHI  Hiroshi SARUWATARI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/04/16
      Vol:
    E104-D No:7
      Page(s):
    1002-1016

    This paper proposes two high-fidelity and computationally efficient neural voice conversion (VC) methods based on a direct waveform modification using spectral differentials. The conventional spectral-differential VC method with a minimum-phase filter achieves high-quality conversion for narrow-band (16 kHz-sampled) VC but requires heavy computational cost in filtering. This is because the minimum phase obtained using a fixed lifter of the Hilbert transform often results in a long-tap filter. Furthermore, when we extend the method to full-band (48 kHz-sampled) VC, the computational cost is heavy due to increased sampling points, and the converted-speech quality degrades due to large fluctuations in the high-frequency band. To construct a short-tap filter, we propose a lifter-training method for data-driven phase reconstruction that trains a lifter of the Hilbert transform by taking into account filter truncation. We also propose a frequency-band-wise modeling method based on sub-band multi-rate signal processing (sub-band modeling method) for full-band VC. It enhances the computational efficiency by reducing sampling points of signals converted with filtering and improves converted-speech quality by modeling only the low-frequency band. We conducted several objective and subjective evaluations to investigate the effectiveness of the proposed methods through implementation of the real-time, online, full-band VC system we developed, which is based on the proposed methods. The results indicate that 1) the proposed lifter-training method for narrow-band VC can shorten the tap length to 1/16 without degrading the converted-speech quality, and 2) the proposed sub-band modeling method for full-band VC can improve the converted-speech quality while reducing the computational cost, and 3) our real-time, online, full-band VC system can convert 48 kHz-sampled speech in real time attaining the converted speech with a 3.6 out of 5.0 mean opinion score of naturalness.

  • New Sub-Band Adaptive Volterra Filter for Identification of Loudspeaker

    Satoshi KINOSHITA  Yoshinobu KAJIKAWA  

     
    PAPER-Digital Signal Processing

      Vol:
    E102-A No:12
      Page(s):
    1946-1955

    Adaptive Volterra filters (AVFs) are usually used to identify nonlinear systems, such as loudspeaker systems, and ordinary adaptive algorithms can be used to update the filter coefficients of AVFs. However, AVFs require huge computational complexity even if the order of the AVF is constrained to the second order. Improving calculation efficiency is therefore an important issue for the real-time implementation of AVFs. In this paper, we propose a novel sub-band AVF with high calculation efficiency for second-order AVFs. The proposed sub-band AVF consists of four parts: input signal transformation for a single sub-band AVF, tap length determination to improve calculation efficiency, switching the number of sub-bands while maintaining the estimation accuracy, and an automatic search for an appropriate number of sub-bands. The proposed sub-band AVF can improve calculation efficiency for which the dominant nonlinear components are concentrated in any frequency band, such as loudspeakers. A simulation result demonstrates that the proposed sub-band AVF can realize higher estimation accuracy than conventional efficient AVFs.

  • Statistical Bandwidth Extension for Speech Synthesis Based on Gaussian Mixture Model with Sub-Band Basis Spectrum Model

    Yamato OHTANI  Masatsune TAMURA  Masahiro MORITA  Masami AKAMINE  

     
    PAPER-Voice conversion

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2481-2489

    This paper describes a novel statistical bandwidth extension (BWE) technique based on a Gaussian mixture model (GMM) and a sub-band basis spectrum model (SBM), in which each dimensional component represents a specific acoustic space in the frequency domain. The proposed method can achieve the BWE from speech data with an arbitrary frequency bandwidth whereas the conventional methods perform the conversion from fixed narrow-band data. In the proposed method, we train a GMM with SBM parameters extracted from full-band spectra in advance. According to the bandwidth of input signal, the trained GMM is reconstructed to the GMM of the joint probability density between low-band SBM and high-band SBM components. Then high-band SBM components are estimated from low-band SBM components of the input signal based on the reconstructed GMM. Finally, BWE is achieved by adding the spectra decoded from estimated high-band SBM components to the ones of the input signal. To construct the full-band signal from the narrow-band one, we apply this method to log-amplitude spectra and aperiodic components. Objective and subjective evaluation results show that the proposed method extends the bandwidth of speech data robustly for the log-amplitude spectra. Experimental results also indicate that the aperiodic component extracted from the upsampled narrow-band signal realizes the same performance as the restored and the full-band aperiodic components in the proposed method.

  • Performance Evaluation of an Improved Multiband Impulse Radio UWB Communication System Based on Sub-Band Selection

    Lin QI  Masaaki KATAYAMA  

     
    PAPER-Communication Theory and Signals

      Vol:
    E99-A No:7
      Page(s):
    1446-1454

    Performance evaluation of an improved multiband impulse radio ultra-wideband (MIR UWB) system based on sub-band selection is proposed in this paper. In the improved scheme, a data mapping algorithm is introduced to a conventional MIR UWB system, and out of all the sub-bands, only partial ones are selected to transmit information data, which can improve the flexibility of sub-bands/spectrum allocation, avoid interference and provide a variety of data rates. Given diagrams of a transmitter and receiver, the exact bit error rate (BER) of the improved system is derived. A comparison of system performance between the improved MIR UWB system and the conventional MIR UWB system is presented in different channels. Simulation results show that the improved system can achieve the same data rate and better BER performance than the conventional MIR UWB system under additive white Gaussian noise (AWGN), multipath fading and interference coexistence channels. In addition, different data transmission rates and BER performances can be easily achieved by an appropriate choice of system parameters.

  • Sub-Band Noise Reduction in Multi-Channel Digital Hearing Aid

    Qingyun WANG  Ruiyu LIANG  Li JING  Cairong ZOU  Li ZHAO  

     
    LETTER-Speech and Hearing

      Pubricized:
    2015/10/14
      Vol:
    E99-D No:1
      Page(s):
    292-295

    Since digital hearing aids are sensitive to time delay and power consumption, the computational complexity of noise reduction must be reduced as much as possible. Therefore, some complicated algorithms based on the analysis of the time-frequency domain are very difficult to implement in digital hearing aids. This paper presents a new approach that yields an improved noise reduction algorithm with greatly reduce computational complexity for multi-channel digital hearing aids. First, the sub-band sound pressure level (SPL) is calculated in real time. Then, based on the calculated sub-band SPL, the noise in the sub-band is estimated and the possibility of speech is computed. Finally, a posteriori and a priori signal-to-noise ratios are estimated and the gain function is acquired to reduce the noise adaptively. By replacing the FFT and IFFT transforms by the known SPL, the proposed algorithm greatly reduces the computation loads. Experiments on a prototype digital hearing aid show that the time delay is decreased to nearly half that of the traditional adaptive Wiener filtering and spectral subtraction algorithms, but the SNR improvement and PESQ score are rather satisfied. Compared with modulation frequency-based noise reduction algorithm, which is used in many commercial digital hearing aids, the proposed algorithm achieves not only more than 5dB SNR improvement but also less time delay and power consumption.

  • An Effective Acoustic Feedback Cancellation Algorithm Based on the Normalized Sub-Band Adaptive Filter

    Xia WANG  Ruiyu LIANG  Qingyun WANG  Li ZHAO  Cairong ZOU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2015/10/20
      Vol:
    E99-D No:1
      Page(s):
    288-291

    In this letter, an effective acoustic feedback cancellation algorithm is proposed based on the normalized sub-band adaptive filter (NSAF). To improve the confliction between fast convergence rate and low misalignment in the NSAF algorithm, a variable step size is designed to automatically vary according to the update state of the filter. The update state of the filter is adaptively detected via the normalized distance between the long term average and the short term average of the tap-weight vector. Simulation results demonstrate that the proposed algorithm has superior performance in terms of convergence rate and misalignment.

  • The Impact of Sub-Band Spreading Bandwidth on DS-MB-UWB System over Multipath and Narrowband Interference

    Chin-Sean SUM  Hiroshi HARADA  

     
    LETTER-Spread Spectrum Technologies and Applications

      Vol:
    E96-A No:3
      Page(s):
    740-744

    In this paper, we investigate the impact of different sub-band spreading bandwidth (SSBW) on a direct sequence (DS) multiband (MB) ultra wideband (UWB) system in multipath and narrowband interference over realistic UWB channel models based on actual measurements. As an approach to effectively mitigate multipath and narrowband interference, the DS-MB-UWB system employs multiple sub-bands instead of a wide single band for data transmission. By using spreading chips with different duration settings, the SSBW can be manipulated. As a result, it is observed that increasing SSBW does not always improve system performance. Optimum SSBW values exist and are found to vary in accordance to different operating parameters such as the number of sub-bands and types of propagation channel model. Additionally, we have also found that system performance in the presence of narrowband interference is heavily dependent on the number of employed sub-bands.

  • A Novel Approach Based on Adaptive Long-Term Sub-Band Entropy and Multi-Thresholding Scheme for Detecting Speech Signal

    Kun-Ching WANG  

     
    LETTER-Speech and Hearing

      Vol:
    E95-D No:11
      Page(s):
    2732-2736

    Conventional entropy measure is derived from full-band (range from 0 Hz to 4 kHz); however, it can not clearly describe the spectrum variability during voice-activity. Here we propose a novel concept of adaptive long-term sub-band entropy ( ALT-SubEnpy ) measure and combine it with a multi-thresholding scheme for voice activity detection. In detail, the ALT-SubEnpy measure developed with four part parameters of sub-entropy which uses different long-term spectral window length at each part. Consequently, the proposed ALT-SubEnpy -based algorithm recursively updates the four adaptive thresholds on each part. The proposed ALT-SubEnpy-based VAD method is shown to be an effective method while working at variable noise-level condition.

  • Voice-Activity Detection Using Long-Term Sub-Band Entropy Measure

    Kun-Ching WANG  

     
    LETTER-Engineering Acoustics

      Vol:
    E95-A No:9
      Page(s):
    1606-1609

    A novel long-term sub-band entropy (LT-SubEntropy) measure, which uses improved long-term spectral analysis and sub-band entropy, is proposed for voice activity detection (VAD). Based on the measure, we can accurately exploit the inherent nature of the formant structure on speech spectrogram (the well-known as voiceprint). Results show that the proposed VAD is superior to existing standard VAD methods at low SNR levels, especially at variable-level noise.

  • Interference Mitigation Capability of a Low Duty DS-Multiband-UWB System in Realistic Environment

    Chin-Sean SUM  Shigenobu SASAKI  Hiroshi HARADA  

     
    PAPER

      Vol:
    E94-A No:12
      Page(s):
    2762-2772

    In this paper, the performance of a low duty factor (DF) hybrid direct sequence (DS) multiband (MB)-pulsed ultra wideband (UWB) system is evaluated over realistic propagation channels to highlight its capability of interference mitigation. The interference mitigation techniques incorporated in the DS-MB-UWB system is a novel design that includes the utilization of the frequency-agile multiple sub-band configuration and the coexistence-friendly low DF signaling. The system design consists of a Rake type receiver over multipath and multi-user channel in the presence of a coexisting narrowband interferer. The propagation channels are modeled based on actual measurement data. Firstly, by suppressing the power in the particular sub-band coexisting with the narrowband signal, performance degradation due to narrowband interference can be improved. It is observed that by fully suppressing the sub-band affected by the narrowband signal, a typical 1-digit performance improvement (e.g. BER improves from 10-3 to 10-4) can be achieved. Secondly, by employing lower DF signaling, self interference (SI) and multi-user interference (MUI) can be mitigated. It is found that a typical 3 dB improvement is achieved by reducing the DF from 0.5 to 0.04. Together, the sub-band power suppression and low DF signaling are shown to be effective mitigation techniques against environment with the presence of SI, MUI and narrowband interference.

  • An Approach Using Combination of Multiple Features through Sigmoid Function for Speech-Presence/Absence Discrimination

    Kun-Ching WANG  Chiun-Li CHIN  

     
    PAPER-Engineering Acoustics

      Vol:
    E94-A No:8
      Page(s):
    1630-1637

    In this paper, we present an approach of detecting speech presence for which the decision rule is based on a combination of multiple features using a sigmoid function. A minimum classification error (MCE) training is used to update the weights adjustment for the combination. The features, consisting of three parameters: the ratio of ZCR, the spectral energy, and spectral entropy, are combined linearly with weights derived from the sub-band domain. First, the Bark-scale wavelet decomposition (BSWD) is used to split the input speech into 24 critical sub-bands. Next, the feature parameters are derived from the selected frequency sub-band to form robust voice feature parameters. In order to discard the seriously corrupted frequency sub-band, a strategy of adaptive frequency sub-band extraction (AFSE) dependant on the sub-band SNR is then applied to only the frequency sub-band used. Finally, these three feature parameters, which only consider the useful sub-band, are combined through a sigmoid type function incorporating optimal weights based on MSE training to detect either a speech present frame or a speech absent frame. Experimental results show that the performance of the proposed algorithm is superior to the standard methods such as G.729B and AMR2.

  • The Use of Overlapped Sub-Bands in Multi-Band, Multi-SNR, Multi-Path Recognition of Noisy Word Utterances

    Yutaka TSUBOI  Takehiro IHARA  Kazuyuki TAKAGI  Kazuhiko OZEKI  

     
    PAPER-Speech and Hearing

      Vol:
    E91-D No:6
      Page(s):
    1774-1782

    A solution to the problem of improving robustness to noise in automatic speech recognition is presented in the framework of multi-band, multi-SNR, and multi-path approaches. In our word recognizer, the whole frequency band is divided into seven-overlapped sub-bands, and then sub-band noisy phoneme HMMs are trained on speech data mixed with the filtered white Gaussian noise at multiple SNRs. The acoustic model of a word is built as a set of concatenations of clean and noisy sub-band phoneme HMMs arranged in parallel. A Viterbi decoder allows a search path to transit to another SNR condition at a phoneme boundary. The recognition scores of the sub-bands are then recombined to give the score for a word. Experiments show that the overlapped seven-band system yields the best performance under nonstationary ambient noises. It is also shown that the use of filtered white Gaussian noise is advantageous for training noisy phoneme HMMs.

  • Noise Robust Speaker Identification Using Sub-Band Weighting in Multi-Band Approach

    Sungtak KIM  Mikyong JI  Youngjoo SUH  Hoirin KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E90-D No:12
      Page(s):
    2110-2114

    Recently, many techniques have been proposed to improve speaker identification in noise environments. Among these techniques, we consider the feature recombination technique for the multi-band approach in noise robust speaker identification. The conventional feature recombination technique is very effective in the band-limited noise condition, but in broad-band noise condition, the conventional feature recombination technique does not provide notable performance improvement compared with the full-band system. Even though the speech is corrupted by the broad-band noise, the degree of the noise corruption on each sub-band is different from each other. In the conventional feature recombination for speaker identification, all sub-band features are used to compute multi-band likelihood score, but this likelihood computation does not use a merit of multi-band approach effectively, even though the sub-band features are extracted independently. Here we propose a new technique of sub-band likelihood computation with sub-band weighting in the feature recombination method. The signal to noise ratio (SNR) is used to compute the sub-band weights. The proposed sub-band-weighted likelihood computation makes a speaker identification system more robust to noise. Experimental results show that the average error reduction rate (ERR) in various noise environments is more than 24% compared with the conventional feature recombination-based speaker identification system.

  • Performance Degradation of a Subband Adaptive Digital Filter with Critical Sampling

    Hiroshi YASUKAWA  

     
    LETTER

      Vol:
    E77-A No:9
      Page(s):
    1497-1501

    A method for evaluating the degradation of subband adaptive digital filters (ADF) is presented. The performance of a simple ADF that uses critical sampling is mainly influenced by the subband filter bank's characteristics and the finite precision arithmetic operations used. This paper considers a two-channel mirror filter bank and a normalized least mean square algorithm with floating point arithmetic. The theoretical ERLE (Echo Return Loss Enhancement) and the theoretical relationships between the output error of the ADF and the circuit parameters considering finite precision A/D conversion and finite word length effects in floating point arithmetic operation are obtained using an equivalent noise model. Simulation results are found to be in good agreement to analytical values; the difference is only 3 to 5 dB.

  • Automatic Tap Assignment in Sub-Band Adaptive Filter

    Zhiqiang MA  Kenji NAKAYAMA  Akihiko SUGIYAMA  

     
    LETTER

      Vol:
    E76-B No:7
      Page(s):
    751-754

    An automatic tap assignment method in sub-band adaptive filter is proposed in this letter. The number of taps of the adaptive filter in each band is controlled by the mean-squared error. The numbers of taps increase in the bands which have large errors, while they decrease in the bands having small errors, until residual errors in all the bands become the same. In this way, the number of taps in a band is roughly proportional to the length of the impulse response of the unknown system in this band. The convergence rate and the residual error are improved, in comparison with existing uniform tap assignment. Effectiveness of the proposed method has been confirmed through computer simulation.

  • Realization of Acoustic Inverse Filtering through Multi-Microphone Sub-Band Processing

    Hong WANG  Fumitada ITAKURA  

     
    PAPER

      Vol:
    E75-A No:11
      Page(s):
    1474-1483

    The realization of acoustic inverse filter is often difficult because of the non-minimum phase property and the long time duration of the impulse response of the acoustic enclosure. However, if the signals are divided into a large number of sub-bands, many of the sub-bands are found to be invertible. The invertibility of a sub-band signal depends on the zero distribution of the transfer function in the z-plane. In a multi-microphone system, the transfer functions between the sound source and the mirophones have different zero distributions. The method proposed here, taking advantage of the differences of zero distributions, selects the best invertible microphone in each sub-band, and reconstructs the full band signal by summing up the inverse filtered sub-band signals of the best microphones. The quality of the dereverberated signal using the proposed inverse filtering approach is improved with increasing number of microphones and sub-bands. When seven microphones are used and the number of sub-bands is 513, the quality of the dereverberated speech signals are almost the same with the original ones even when the revergeration time is about one second. The introduction of multi-microphones in addition to sub-band processing provides a new way of dealing with the non-minimum phase problem in deconvolution.

  • An Acoustic Echo Canceller with Sub-Band Noise Cancelling

    Hiroshi YASUKAWA  

     
    PAPER

      Vol:
    E75-A No:11
      Page(s):
    1516-1523

    An acoustic echo canceller that also cancels room noise is proposed. This system has an additive (noise reference) input port, and a noise canceller (NC) precedes the echo canceller (EC) in a cascade configuration. The adaptation control problem for the cascaded echo and noise canceller is solved by controlling the adaptation process to match the occurrence of intermittent speech/echo; the room noise is a stationary signal. A simulation shows that adaptation using the NLMS algorithm is very effective for the echo and noise cancellation. Sub-band cancelling techniques are utilized. Noise cancellation is realized with a lower band EC. Hardware is implemented and its performance evaluated through experiments under a real acoustic field. The combination of the EC with NC maintains excellent performance at all echo to room noise power ratios. It is shown that the proposed canceller overcomes the disadvantages traditionally associated with ECs and NSc.

  • Variable Rate Video Coding Scheme for Broadcast Quality Transmission and Its ATM Network Applications

    Kenichiro HOSODA  

     
    PAPER

      Vol:
    E75-B No:5
      Page(s):
    349-357

    This paper describes the configuration and performance of a stable, high compression video coding scheme suitable for broadcast quality. This scheme was developed for application to high quality image packet transmission in Asynchronous Transfer Mode (ATM) networks. There are two problems in implementing image packet transmission in ATM networks, namely the achievement of a compression scheme with high coding efficiency, and the achievement of an effective compensation method for cell loss. We describe a scheme which resolves both these problems. It comprises the division of a two-dimensional spectral image signal into several sub-bands. In the case of the high frequency band, block-matching interframe prediction and Discrete Cosine Transform (DCT) are applied to achieve high compression ratio, while intraframe DCT coding is applied to the baseband. This scheme, moreover, provides a stable compensation for cell loss. It is shown that, based on this system, an original image signal of 216Mbit/s is compressed to about 1/10, and a high quality reconstructed image stable to cell loss is obtained.

  • High-Fidelity Sub-Band Coding for Very High Resolution Images

    Takahiro SAITO  Hirofumi HIGUCHI  Takashi KOMATSU  

     
    PAPER

      Vol:
    E75-B No:5
      Page(s):
    327-339

    Very high resolution images with more than 2,000*2.000 pels will play a very important role in a wide variety of applications of future multimedia communications ranging from electronic publishing to broadcasting. To make communication of very high resolution images practicable, we need to develop image coding techniques that can compress very high resolution images efficiently. Taking the channel capacity limitation of the future communication into consideration, the requisite compression ratio will be estimated to be at least 1/10 to 1/20 for color signals. Among existing image coding techniques, the sub-band coding technique is one of the most suitable techniques. With its applications to high-fidelity compression of very high resolution images, one of the major problem is how to encode high frequency sub-band signals. High frequency sub-band signals are well modeled as having approximately memoryless probability distribution, and hence the best way to solve this problem is to improve the quantization of high frequency sub-band signals. From the standpoint stated above, the work herein first compares three different scalor quantization schemes and improved permutation codes, which the authors have previously developed extending the concept of permutation codes, from the aspect of quantization performance for a memoryless probability distribution that well approximates the real statistical properties of high frequency sub-band signals, and thus demonstrates that at low coding rates improved permutation codes outperform the other scalor quatization schemes and that its superiority decreases as its coding rate increases. Moreover, from the results stated above, the work herein, develops a rate-adaptive quantization technique where the number of bits assigned to each subblock is determined according to the signal variance within the subblock and the proper quantization scheme is chosen from among different types of quantization schemes according to the allocated number of bits, and applies it to the high-fidelity encoding of sub-band signals of very high resolution images to demonstrate its usefulness.