The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] spectra(266hit)

61-80hit(266hit)

  • Spectral Features Based on Local Hu Moments of Gabor Spectrograms for Speech Emotion Recognition

    Huawei TAO  Ruiyu LIANG  Cheng ZHA  Xinran ZHANG  Li ZHAO  

     
    LETTER-Pattern Recognition

      Pubricized:
    2016/05/06
      Vol:
    E99-D No:8
      Page(s):
    2186-2189

    To improve the recognition rate of the speech emotion, new spectral features based on local Hu moments of Gabor spectrograms are proposed, denoted by GSLHu-PCA. Firstly, the logarithmic energy spectrum of the emotional speech is computed. Secondly, the Gabor spectrograms are obtained by convoluting logarithmic energy spectrum with Gabor wavelet. Thirdly, Gabor local Hu moments(GLHu) spectrograms are obtained through block Hu strategy, then discrete cosine transform (DCT) is used to eliminate correlation among components of GLHu spectrograms. Fourthly, statistical features are extracted from cepstral coefficients of GLHu spectrograms, then all the statistical features form a feature vector. Finally, principal component analysis (PCA) is used to reduce redundancy of features. The experimental results on EmoDB and ABC databases validate the effectiveness of GSLHu-PCA.

  • Learning Deep Dictionary for Hyperspectral Image Denoising

    Leigang HUO  Xiangchu FENG  Chunlei HUO  Chunhong PAN  

     
    LETTER-Pattern Recognition

      Pubricized:
    2015/04/20
      Vol:
    E98-D No:7
      Page(s):
    1401-1404

    Using traditional single-layer dictionary learning methods, it is difficult to reveal the complex structures hidden in the hyperspectral images. Motivated by deep learning technique, a deep dictionary learning approach is proposed for hyperspectral image denoising, which consists of hierarchical dictionary learning, feature denoising and fine-tuning. Hierarchical dictionary learning is helpful for uncovering the hidden factors in the spectral dimension, and fine-tuning is beneficial for preserving the spectral structure. Experiments demonstrate the effectiveness of the proposed approach.

  • Error Evaluation of an F0-Adaptive Spectral Envelope Estimator in Robustness against the Additive Noise and F0 Error

    Masanori MORISE  

     
    LETTER-Speech and Hearing

      Pubricized:
    2015/04/02
      Vol:
    E98-D No:7
      Page(s):
    1405-1408

    This paper describes an evaluation of a temporally stable spectral envelope estimator proposed in our past research. The past research demonstrated that the proposed algorithm can synthesize speech that is as natural as the input speech. This paper focuses on an objective comparison, in which the proposed algorithm is compared with two modern estimation algorithms in terms of estimation performance and temporal stability. The results show that the proposed algorithm is superior to the others in both aspects.

  • Experimental Validation of Digital Pre-distortion Technique for Dual-band Dual-signal Amplification by Single Feedback Architecture Employing Dual-band Mixer

    Ikuma ANDO  Gia Khanh TRAN  Kiyomichi ARAKI  Takayuki YAMADA  Takana KAHO  Yo YAMAGUCHI  Tadao NAKAGAWA  

     
    PAPER-Electromagnetic Theory

      Vol:
    E98-C No:3
      Page(s):
    242-251

    In this paper we describe and experimentally validate a dual-band digital predistortion (DPD) model we propose that takes account of the intermodulation and harmonic distortion produced when the center frequencies of input bands have a harmonic relationship. We also describe and experimentally validate our proposed novel dual-band power amplifier (PA) linearization architecture consisting of a single feedback loop employing a dual-band mixer. Experiment results show that the DPD linearization the proposed model provides can compensate for intermodulation and harmonic distortion in a way that the conventional two-dimensional (2-D) DPD approach cannot. The proposed feedback architecture should make it possible to simplify analog-to-digital converter (ADC) design and eliminate the time lag between different feedback paths.

  • Speech Watermarking Method Based on Formant Tuning

    Shengbei WANG  Masashi UNOKI  

     
    PAPER

      Vol:
    E98-D No:1
      Page(s):
    29-37

    This paper proposes a speech watermarking method based on the concept of formant tuning. The characteristic that formant tuning can improve the sound quality of synthesized speech was employed to achieve inaudibility for watermarking. In the proposed method, formants were firstly extracted with linear prediction (LP) analysis and then embedded with watermarks by symmetrically controlling a pair of line spectral frequencies (LSFs) as formant tuning. We evaluated the proposed method by two kinds of experiments regarding inaudibility and robustness compared with other methods. Inaudibility was evaluated with objective and subjective tests and robustness was evaluated with speech codecs and speech processing. The results revealed that the proposed method could satisfy both inaudibility and robustness that required for speech watermarking.

  • Adaptive Band Activity Ratio Control with Cascaded Energy Allocation for Amplify-and-Forward OFDM Relay Systems

    Quang Thang DUONG  Shinsuke IBI  Seiichi SAMPEI  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E97-B No:11
      Page(s):
    2424-2434

    This paper proposes an adaptive band activity ratio control (ABC) with cascaded energy allocation (CEA) scheme to improve end-to-end spectral efficiency for two-hop amplify-and-forward orthogonal frequency division multiplexing relay systems under transmit energy constraint. Subchannel pairing (SP) based spectrum mapping maps spectral components transmitted over high gain subchannels in the source-to-relay link onto high gain subchannels of the relay-to-destination link to improve the spectral efficiency. However, SP suffers from a frame efficiency reduction due to the notification of information of spectral component order. To compensate for the deficiency of SP, the proposed scheme employs dynamic spectrum control with ABC in which spectral components are mapped onto subchannels having high channel gain in each link, while band activity ratio (BAR) is controlled to an optimal value, which is smaller than 1, so that all spectral components are transmitted over relatively high gain subchannels of the two links. To further improve the performance, energy allocation at the source node and the relay node is serially conducted based on convex optimization, and BAR is controlled to improve discrete-input continuous-output memoryless channel capacity at the relay node. In the proposed scheme, since only information of BAR needs to be notified, the notification overhead is drastically reduced compared to that in SP based spectrum mapping. Numerical analysis confirms that the proposed ABC combined with CEA significantly reduces the required notification overhead while achieving almost the same frame error rate performance compared with the SP based scheme.

  • Improved Spectral Envelope Coding Algorithm Using Adaptive Filtering for G.729.1

    Keunseok CHO  Sangbae JEONG  Minsoo HAHN  

     
    LETTER-Speech and Hearing

      Vol:
    E97-A No:11
      Page(s):
    2254-2257

    This paper proposes a new algorithm to encode the spectral envelope for G.729.1 more accurately. It applies the normalized least-mean- square (NLMS) algorithm to each subband energy of the modified discrete cosine transform (MDCT) in the time-domain alias cancellation (TDAC) of G.729.1. By utilizing the estimation error of subband energies by means of NLMS, allocated bit reduction for spectral envelope coding is achieved. The saved bits are then reused to improve the spectral envelope estimation and thus enhance the sound quality. Experimental results confirm that the proposed algorithm improves the sound quality under both clean and packet loss conditions.

  • Voice Timbre Control Based on Perceived Age in Singing Voice Conversion

    Kazuhiro KOBAYASHI  Tomoki TODA  Hironori DOI  Tomoyasu NAKANO  Masataka GOTO  Graham NEUBIG  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Voice Conversion and Speech Enhancement

      Vol:
    E97-D No:6
      Page(s):
    1419-1428

    The perceived age of a singing voice is the age of the singer as perceived by the listener, and is one of the notable characteristics that determines perceptions of a song. In this paper, we describe an investigation of acoustic features that have an effect on the perceived age, and a novel voice timbre control technique based on the perceived age for singing voice conversion (SVC). Singers can sing expressively by controlling prosody and voice timbre, but the varieties of voices that singers can produce are limited by physical constraints. Previous work has attempted to overcome this limitation through the use of statistical voice conversion. This technique makes it possible to convert singing voice timbre of an arbitrary source singer into those of an arbitrary target singer. However, it is still difficult to intuitively control singing voice characteristics by manipulating parameters corresponding to specific physical traits, such as gender and age. In this paper, we first perform an investigation of the factors that play a part in the listener's perception of the singer's age at first. Then, we applied a multiple-regression Gaussian mixture models (MR-GMM) to SVC for the purpose of controlling voice timbre based on the perceived age and we propose SVC based on the modified MR-GMM for manipulating the perceived age while maintaining singer's individuality. The experimental results show that 1) the perceived age of singing voices corresponds relatively well to the actual age of the singer, 2) prosodic features have a larger effect on the perceived age than spectral features, 3) the individuality of a singer is influenced more heavily by segmental features than prosodic features 4) the proposed voice timbre control method makes it possible to change the singer's perceived age while not having an adverse effect on the perceived individuality.

  • A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation

    Kou TANAKA  Tomoki TODA  Graham NEUBIG  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Voice Conversion and Speech Enhancement

      Vol:
    E97-D No:6
      Page(s):
    1429-1437

    This paper presents an electrolaryngeal (EL) speech enhancement method capable of significantly improving naturalness of EL speech while causing no degradation in its intelligibility. An electrolarynx is an external device that artificially generates excitation sounds to enable laryngectomees to produce EL speech. Although proficient laryngectomees can produce quite intelligible EL speech, it sounds very unnatural due to the mechanical excitation produced by the device. Moreover, the excitation sounds produced by the device often leak outside, adding to EL speech as noise. To address these issues, there are mainly two conventional approached to EL speech enhancement through either noise reduction or statistical voice conversion (VC). The former approach usually causes no degradation in intelligibility but yields only small improvements in naturalness as the mechanical excitation sounds remain essentially unchanged. On the other hand, the latter approach significantly improves naturalness of EL speech using spectral and excitation parameters of natural voices converted from acoustic parameters of EL speech, but it usually causes degradation in intelligibility owing to errors in conversion. We propose a hybrid approach using a noise reduction method for enhancing spectral parameters and statistical voice conversion method for predicting excitation parameters. Moreover, we further modify the prediction process of the excitation parameters to improve its prediction accuracy and reduce adverse effects caused by unvoiced/voiced prediction errors. The experimental results demonstrate the proposed method yields significant improvements in naturalness compared with EL speech while keeping intelligibility high enough.

  • Linear Complexity of Generalized Cyclotomic Quaternary Sequences with Period pq

    Dan-dan LI  Qiao-yan WEN  Jie ZHANG  Zu-ling CHANG  

     
    LETTER-Cryptography and Information Security

      Vol:
    E97-A No:5
      Page(s):
    1153-1158

    Pseudo-random sequences with high linear complexity play important roles in many domains. We give linear complexity of generalized cyclotomic quaternary sequences with period pq over Z4 via the weights of its Fourier spectral sequence. The results show that such sequences have high linear complexity.

  • Multimode Image Clustering Using Optimal Image Descriptor Open Access

    Nasir AHMED  Abdul JALIL  

     
    PAPER

      Vol:
    E97-D No:4
      Page(s):
    743-751

    Manifold learning based image clustering models are usually employed at local level to deal with images sampled from nonlinear manifold. Multimode patterns in image data matrices can vary from nominal to significant due to images with different expressions, pose, illumination, or occlusion variations. We show that manifold learning based image clustering models are unable to achieve well separated images at local level for image datasets with significant multimode data patterns. Because gray level image features used in these clustering models are not able to capture the local neighborhood structure effectively for multimode image datasets. In this study, we use nearest neighborhood quality (NNQ) measure based criterion to improve local neighborhood structure in terms of correct nearest neighbors of images locally. We found Gist as the optimal image descriptor among HOG, Gist, SUN, SURF, and TED image descriptors based on an overall maximum NNQ measure on 10 benchmark image datasets. We observed significant performance improvement for recently reported clustering models such as Spectral Embedded Clustering (SEC) and Nonnegative Spectral Clustering with Discriminative Regularization (NSDR) using proposed approach. Experimentally, significant overall performance improvement of 10.5% (clustering accuracy) and 9.2% (normalized mutual information) on 13 benchmark image datasets is observed for SEC and NSDR clustering models. Further, overall computational cost of SEC model is reduced to 19% and clustering performance for challenging outdoor natural image databases is significantly improved by using proposed NNQ measure based optimal image representations.

  • A Study on Objective Quality Measure for Bandwidth-Extended Speech in Mobile Voice Communications

    Takashi SUDO  Hirokazu TANAKA  Ryuji KOHNO  

     
    PAPER-Speech and Hearing

      Vol:
    E97-A No:3
      Page(s):
    792-799

    In this paper, we study an objective quality measure that approximates the subjective mean opinion score (MOS) for bandwidth-extended wideband speech with respect to narrowband speech. Bandwidth-extended speech should be widely evaluated by a subjective quality assessment such as MOS. However, such subjective quality assessments are expensive and time-consuming. This paper proposes a new objective quality measure that combines the perceptual evaluation of speech quality (PESQ) and spectral-distortion. We evaluated the correlation between our proposed scheme and MOS using AMR and AMR-WB speech codecs. The coefficient of correlation between the proposed scheme and the MOS value was found to be 0.973. We concluded that the proposed scheme is a valid and effective objective quality measure.

  • Time-Varying AR Spectral Estimation Using an Indefinite Matrix-Based Sliding Window Fast Linear Prediction

    Kiyoshi NISHIYAMA  

     
    PAPER-Digital Signal Processing

      Vol:
    E97-A No:2
      Page(s):
    547-556

    A method for efficiently estimating the time-varying spectra of nonstationary autoregressive (AR) signals is derived using an indefinite matrix-based sliding window fast linear prediction (ISWFLP). In the linear prediction, the indefinite matrix plays a very important role in sliding an exponentially weighted finite-length window over the prediction error samples. The resulting ISWFLP algorithm successively estimates the time-varying AR parameters of order N at a computational complexity of O(N) per sample. The performance of the AR parameter estimation is superior to the performances of the conventional techniques, including the Yule-Walker, covariance, and Burg methods. Consequently, the ISWFLP-based AR spectral estimation method is able to rapidly track variations in the frequency components with a high resolution and at a low computational cost. The effectiveness of the proposed method is demonstrated by the spectral analysis results of a sinusoidal signal and a speech signal.

  • Improved Spectral Efficiency at Reduced Outage Probability for Cooperative Wireless Networks by Using CSI Directed Estimate and Forward Strategy

    Yihenew Wondie MARYE  Chen LIU  Feng LU  Hua-An ZHAO  

     
    PAPER-Foundations

      Vol:
    E97-A No:1
      Page(s):
    7-17

    Cooperative wireless communication is a communication mechanism to attain diversity through virtual antenna array that is formed by sharing resources among different users. Different strategies of resource utilization such as amplify-and-forward (AF) and decode-and-forward (DF) already exist in cooperative networks. Although the implementation of these strategies is simple, their utilization of the channel state information (CSI) is generally poor. As a result, the outage and bit error rate (BER) performances need much more improvement in order to satisfy the upcoming high data rate demands. For that to happen the spectral efficiency supported by a wireless system at a very low outage probability should be increased. In this paper a new approach, based on the previously existing ones, called CSI directed estimate and forward (CDEF) with a reduced estimation domain is proposed. A closed form solution for the optimal signal estimation at the relay using minimum mean square error (MMSE) as well as a possible set reduction of the estimation domain is given. It will be shown that this new strategy attains better symbol error rate (SER) and outage performance than AF or DF when the source relay link is comparatively better than the relay destination link. Simulation results also show that it has got better spectral efficiency at low outage probability for a given signal to noise ratio (SNR) as well as for a fixed outage probability in any operating SNR range.

  • Speckle-Free Phosphor-Scattered Blue Light Emitted out of InGaN/GaN Laser Diode with Broadened Spectral Behavior for High Luminance White Lamp Applications Open Access

    Junichi KINOSHITA  Yoshihisa IKEDA  Yuji TAKEDA  

     
    INVITED PAPER

      Vol:
    E96-C No:11
      Page(s):
    1391-1398

    Ultra-high luminance lamps emitting white light with a well-scattered blue spectrum from InGaN/GaN laser diodes and a phosphor-converted yellow spectrum show speckle contrast values as low as LED. Spectral behavior of the laser diodes is analyzed to find the reason why such low values are obtained. As a result, the PWM-driven, multi-longitudinal mode with dynamically broadened line-width is found to have a great effect on reducing speckle contrast. Despite using the lasers, such speckle-free lamps are considered to be very suitable for high-luminance and other various lighting applications.

  • Nonlinear Modeling and Analysis on Concurrent Amplification of Dual-Band Gaussian Signals Open Access

    Ikuma ANDO  GiaKhanh TRAN  Kiyomichi ARAKI  Takayuki YAMADA  Takana KAHO  Yo YAMAGUCHI  Kazuhiro UEHARA  

     
    PAPER

      Vol:
    E96-C No:10
      Page(s):
    1254-1262

    In the recently developed Flexible Wireless System (FWS), the same platform needs to deal with different wireless systems. This increases nonlinear distortion in its wideband power amplifier (PA) because the PA needs to concurrently amplify multi-band signals. By taking higher harmonics as well as inter- and cross-modulation distortion into consideration, we have developed a method to analytically evaluate the adjacent channel leakage power ratio (ACPR) and error vector magnitude (EVM) on the basis of the PA's nonlinear characteristics. We devise a novel method for modeling the PA amplifying dual-band signals. The method makes it possible to model it merely by performing a one-tone test, making use of the Volterra series expansion and the general Wiener model. We then use the Mehler formula to derive the closed-form expressions of the PA's output power spectral density (PSD), ACPR, and EVM. The derivations are based on the assumption that the transmitted signals are complex Gaussian distributed in orthogonal frequency division multiplexing (OFDM) transmission systems. We validate the method by comparing measurement and simulation results and confirm it can appropriately predict the ACPR and EVM performance of the nonlinear PA output with OFDM inputs. In short, the method enables correct modeling of a wideband PA that amplifies dual-band signals merely by conducting a one-tone test.

  • Improved Speech-Presence Uncertainty Estimation Based on Spectral Gradient for Global Soft Decision-Based Speech Enhancement

    Jong-Woong KIM  Joon-Hyuk CHANG  Sang Won NAM  Dong Kook KIM  Jong Won SHIN  

     
    LETTER-Speech and Hearing

      Vol:
    E96-A No:10
      Page(s):
    2025-2028

    In this paper, we propose a speech-presence uncertainty estimation to improve the global soft decision-based speech enhancement technique by using the spectral gradient scheme. The conventional soft decision-based speech enhancement technique uses a fixed ratio (Q) of the a priori speech-presence and speech-absence probabilities to derive the speech-absence probability (SAP). However, we attempt to adaptively change Q according to the spectral gradient between the current and past frames as well as the status of the voice activity in the previous two frames. As a result, the distinct values of Q to each frequency in each frame are assigned in order to improve the performance of the SAP by tracking the robust a priori information of the speech-presence in time.

  • Horizontal Spectral Entropy with Long-Span of Time for Robust Voice Activity Detection

    Kun-Ching WANG  

     
    LETTER-Speech and Hearing

      Vol:
    E96-D No:9
      Page(s):
    2156-2161

    This letter introduces innovative VAD based on horizontal spectral entropy with long-span of time (HSELT) feature sets to improve mobile ASR performance in low signal-to-noise ratio (SNR) conditions. Since the signal characteristics of nonstationary noise change with time, we need long-term information of the noisy speech signal to define a more robust decision rule yielding high accuracy. We find that HSELT measures can horizontally enhance the transition between speech and non-speech segments. Based on this finding, we use the HSELT measures to achieve high accuracy for detecting speech signal form various stationary and nonstationary noises.

  • Spectral Subtraction Based on Non-extensive Statistics for Speech Recognition

    Hilman PARDEDE  Koji IWANO  Koichi SHINODA  

     
    PAPER-Speech and Hearing

      Vol:
    E96-D No:8
      Page(s):
    1774-1782

    Spectral subtraction (SS) is an additive noise removal method which is derived in an extensive framework. In spectral subtraction, it is assumed that speech and noise spectra follow Gaussian distributions and are independent with each other. Hence, noisy speech also follows a Gaussian distribution. Spectral subtraction formula is obtained by maximizing the likelihood of noisy speech distribution with respect to its variance. However, it is well known that noisy speech observed in real situations often follows a heavy-tailed distribution, not a Gaussian distribution. In this paper, we introduce a q-Gaussian distribution in the non-extensive statistics to represent the distribution of noisy speech and derive a new spectral subtraction method based on it. We found that the q-Gaussian distribution fits the noisy speech distribution better than the Gaussian distribution does. Our speech recognition experiments using the Aurora-2 database showed that the proposed method, q-spectral subtraction (q-SS), outperformed the conventional SS method.

  • Spectral Correlation Based Blind Automatic Modulation Classification Using Symbol Rate Estimation

    Azril HANIZ  Minseok KIM  Md. Abdur RAHMAN  Jun-ichi TAKADA  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E96-B No:5
      Page(s):
    1158-1167

    Automatic modulation classification (AMC) is an important function of radio surveillance systems in order to identify unknown signals. Many previous works on AMC have utilized signal cyclostationarity, particularly spectral correlation density (SCD), but many of them fail to address several implementation issues, such as the assumption of perfect knowledge of the symbol rate. In this paper, we discuss several practical issues, e.g. cyclic frequency mismatch, which may affect the SCD, and propose compensation techniques to overcome those issues. We also propose a novel feature extraction technique from the SCD, which utilizes the SCD of not only the original received signal, but also the squared received signal. A symbol rate estimation technique which complements the feature extraction is also proposed. Finally, the classification performance of the system is evaluated through Monte Carlo simulations using a wide variety of modulated signals, and simulation results show that the proposed technique can estimate the symbol rate and classify modulation with a probability of above 0.9 down to SNRs of 5 dB.

61-80hit(266hit)