IEICE global.ieice.org Site

Keyword Search Result

[Keyword] spectra(266hit)

61-80hit(266hit)

Spectral Features Based on Local Hu Moments of Gabor Spectrograms for Speech Emotion Recognition
Huawei TAO Ruiyu LIANG Cheng ZHA Xinran ZHANG Li ZHAO

LETTER-Pattern Recognition

Pubricized:
2016/05/06
Vol:
E99-D No:8
Page(s):
2186-2189
To improve the recognition rate of the speech emotion, new spectral features based on local Hu moments of Gabor spectrograms are proposed, denoted by GSLHu-PCA. Firstly, the logarithmic energy spectrum of the emotional speech is computed. Secondly, the Gabor spectrograms are obtained by convoluting logarithmic energy spectrum with Gabor wavelet. Thirdly, Gabor local Hu moments(GLHu) spectrograms are obtained through block Hu strategy, then discrete cosine transform (DCT) is used to eliminate correlation among components of GLHu spectrograms. Fourthly, statistical features are extracted from cepstral coefficients of GLHu spectrograms, then all the statistical features form a feature vector. Finally, principal component analysis (PCA) is used to reduce redundancy of features. The experimental results on EmoDB and ABC databases validate the effectiveness of GSLHu-PCA.
Learning Deep Dictionary for Hyperspectral Image Denoising
Leigang HUO Xiangchu FENG Chunlei HUO Chunhong PAN

LETTER-Pattern Recognition

Pubricized:
2015/04/20
Vol:
E98-D No:7
Page(s):
1401-1404
Using traditional single-layer dictionary learning methods, it is difficult to reveal the complex structures hidden in the hyperspectral images. Motivated by deep learning technique, a deep dictionary learning approach is proposed for hyperspectral image denoising, which consists of hierarchical dictionary learning, feature denoising and fine-tuning. Hierarchical dictionary learning is helpful for uncovering the hidden factors in the spectral dimension, and fine-tuning is beneficial for preserving the spectral structure. Experiments demonstrate the effectiveness of the proposed approach.
Error Evaluation of an F0-Adaptive Spectral Envelope Estimator in Robustness against the Additive Noise and F0 Error
Masanori MORISE

LETTER-Speech and Hearing

Pubricized:
2015/04/02
Vol:
E98-D No:7
Page(s):
1405-1408
This paper describes an evaluation of a temporally stable spectral envelope estimator proposed in our past research. The past research demonstrated that the proposed algorithm can synthesize speech that is as natural as the input speech. This paper focuses on an objective comparison, in which the proposed algorithm is compared with two modern estimation algorithms in terms of estimation performance and temporal stability. The results show that the proposed algorithm is superior to the others in both aspects.
Experimental Validation of Digital Pre-distortion Technique for Dual-band Dual-signal Amplification by Single Feedback Architecture Employing Dual-band Mixer
Ikuma ANDO Gia Khanh TRAN Kiyomichi ARAKI Takayuki YAMADA Takana KAHO Yo YAMAGUCHI Tadao NAKAGAWA

PAPER-Electromagnetic Theory

Vol:
E98-C No:3
Page(s):
242-251
In this paper we describe and experimentally validate a dual-band digital predistortion (DPD) model we propose that takes account of the intermodulation and harmonic distortion produced when the center frequencies of input bands have a harmonic relationship. We also describe and experimentally validate our proposed novel dual-band power amplifier (PA) linearization architecture consisting of a single feedback loop employing a dual-band mixer. Experiment results show that the DPD linearization the proposed model provides can compensate for intermodulation and harmonic distortion in a way that the conventional two-dimensional (2-D) DPD approach cannot. The proposed feedback architecture should make it possible to simplify analog-to-digital converter (ADC) design and eliminate the time lag between different feedback paths.
Speech Watermarking Method Based on Formant Tuning
Shengbei WANG Masashi UNOKI

PAPER

Vol:
E98-D No:1
Page(s):
29-37
This paper proposes a speech watermarking method based on the concept of formant tuning. The characteristic that formant tuning can improve the sound quality of synthesized speech was employed to achieve inaudibility for watermarking. In the proposed method, formants were firstly extracted with linear prediction (LP) analysis and then embedded with watermarks by symmetrically controlling a pair of line spectral frequencies (LSFs) as formant tuning. We evaluated the proposed method by two kinds of experiments regarding inaudibility and robustness compared with other methods. Inaudibility was evaluated with objective and subjective tests and robustness was evaluated with speech codecs and speech processing. The results revealed that the proposed method could satisfy both inaudibility and robustness that required for speech watermarking.
Adaptive Band Activity Ratio Control with Cascaded Energy Allocation for Amplify-and-Forward OFDM Relay Systems
Quang Thang DUONG Shinsuke IBI Seiichi SAMPEI

PAPER-Wireless Communication Technologies

Vol:
E97-B No:11
Page(s):
2424-2434
This paper proposes an adaptive band activity ratio control (ABC) with cascaded energy allocation (CEA) scheme to improve end-to-end spectral efficiency for two-hop amplify-and-forward orthogonal frequency division multiplexing relay systems under transmit energy constraint. Subchannel pairing (SP) based spectrum mapping maps spectral components transmitted over high gain subchannels in the source-to-relay link onto high gain subchannels of the relay-to-destination link to improve the spectral efficiency. However, SP suffers from a frame efficiency reduction due to the notification of information of spectral component order. To compensate for the deficiency of SP, the proposed scheme employs dynamic spectrum control with ABC in which spectral components are mapped onto subchannels having high channel gain in each link, while band activity ratio (BAR) is controlled to an optimal value, which is smaller than 1, so that all spectral components are transmitted over relatively high gain subchannels of the two links. To further improve the performance, energy allocation at the source node and the relay node is serially conducted based on convex optimization, and BAR is controlled to improve discrete-input continuous-output memoryless channel capacity at the relay node. In the proposed scheme, since only information of BAR needs to be notified, the notification overhead is drastically reduced compared to that in SP based spectrum mapping. Numerical analysis confirms that the proposed ABC combined with CEA significantly reduces the required notification overhead while achieving almost the same frame error rate performance compared with the SP based scheme.
Improved Spectral Envelope Coding Algorithm Using Adaptive Filtering for G.729.1
Keunseok CHO Sangbae JEONG Minsoo HAHN

LETTER-Speech and Hearing

Vol:
E97-A No:11
Page(s):
2254-2257
This paper proposes a new algorithm to encode the spectral envelope for G.729.1 more accurately. It applies the normalized least-mean- square (NLMS) algorithm to each subband energy of the modified discrete cosine transform (MDCT) in the time-domain alias cancellation (TDAC) of G.729.1. By utilizing the estimation error of subband energies by means of NLMS, allocated bit reduction for spectral envelope coding is achieved. The saved bits are then reused to improve the spectral envelope estimation and thus enhance the sound quality. Experimental results confirm that the proposed algorithm improves the sound quality under both clean and packet loss conditions.
Voice Timbre Control Based on Perceived Age in Singing Voice Conversion
Kazuhiro KOBAYASHI Tomoki TODA Hironori DOI Tomoyasu NAKANO Masataka GOTO Graham NEUBIG Sakriani SAKTI Satoshi NAKAMURA

PAPER-Voice Conversion and Speech Enhancement

Vol:
E97-D No:6
Page(s):
1419-1428
The perceived age of a singing voice is the age of the singer as perceived by the listener, and is one of the notable characteristics that determines perceptions of a song. In this paper, we describe an investigation of acoustic features that have an effect on the perceived age, and a novel voice timbre control technique based on the perceived age for singing voice conversion (SVC). Singers can sing expressively by controlling prosody and voice timbre, but the varieties of voices that singers can produce are limited by physical constraints. Previous work has attempted to overcome this limitation through the use of statistical voice conversion. This technique makes it possible to convert singing voice timbre of an arbitrary source singer into those of an arbitrary target singer. However, it is still difficult to intuitively control singing voice characteristics by manipulating parameters corresponding to specific physical traits, such as gender and age. In this paper, we first perform an investigation of the factors that play a part in the listener's perception of the singer's age at first. Then, we applied a multiple-regression Gaussian mixture models (MR-GMM) to SVC for the purpose of controlling voice timbre based on the perceived age and we propose SVC based on the modified MR-GMM for manipulating the perceived age while maintaining singer's individuality. The experimental results show that 1) the perceived age of singing voices corresponds relatively well to the actual age of the singer, 2) prosodic features have a larger effect on the perceived age than spectral features, 3) the individuality of a singer is influenced more heavily by segmental features than prosodic features 4) the proposed voice timbre control method makes it possible to change the singer's perceived age while not having an adverse effect on the perceived individuality.
A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation
Kou TANAKA Tomoki TODA Graham NEUBIG Sakriani SAKTI Satoshi NAKAMURA

PAPER-Voice Conversion and Speech Enhancement

Vol:
E97-D No:6
Page(s):
1429-1437
This paper presents an electrolaryngeal (EL) speech enhancement method capable of significantly improving naturalness of EL speech while causing no degradation in its intelligibility. An electrolarynx is an external device that artificially generates excitation sounds to enable laryngectomees to produce EL speech. Although proficient laryngectomees can produce quite intelligible EL speech, it sounds very unnatural due to the mechanical excitation produced by the device. Moreover, the excitation sounds produced by the device often leak outside, adding to EL speech as noise. To address these issues, there are mainly two conventional approached to EL speech enhancement through either noise reduction or statistical voice conversion (VC). The former approach usually causes no degradation in intelligibility but yields only small improvements in naturalness as the mechanical excitation sounds remain essentially unchanged. On the other hand, the latter approach significantly improves naturalness of EL speech using spectral and excitation parameters of natural voices converted from acoustic parameters of EL speech, but it usually causes degradation in intelligibility owing to errors in conversion. We propose a hybrid approach using a noise reduction method for enhancing spectral parameters and statistical voice conversion method for predicting excitation parameters. Moreover, we further modify the prediction process of the excitation parameters to improve its prediction accuracy and reduce adverse effects caused by unvoiced/voiced prediction errors. The experimental results demonstrate the proposed method yields significant improvements in naturalness compared with EL speech while keeping intelligibility high enough.
Linear Complexity of Generalized Cyclotomic Quaternary Sequences with Period pq
Dan-dan LI Qiao-yan WEN Jie ZHANG Zu-ling CHANG

LETTER-Cryptography and Information Security

Vol:
E97-A No:5
Page(s):
1153-1158
Pseudo-random sequences with high linear complexity play important roles in many domains. We give linear complexity of generalized cyclotomic quaternary sequences with period pq over Z4 via the weights of its Fourier spectral sequence. The results show that such sequences have high linear complexity.
Multimode Image Clustering Using Optimal Image Descriptor Open Access
Nasir AHMED Abdul JALIL

PAPER

Vol:
E97-D No:4
Page(s):
743-751
Manifold learning based image clustering models are usually employed at local level to deal with images sampled from nonlinear manifold. Multimode patterns in image data matrices can vary from nominal to significant due to images with different expressions, pose, illumination, or occlusion variations. We show that manifold learning based image clustering models are unable to achieve well separated images at local level for image datasets with significant multimode data patterns. Because gray level image features used in these clustering models are not able to capture the local neighborhood structure effectively for multimode image datasets. In this study, we use nearest neighborhood quality (NNQ) measure based criterion to improve local neighborhood structure in terms of correct nearest neighbors of images locally. We found Gist as the optimal image descriptor among HOG, Gist, SUN, SURF, and TED image descriptors based on an overall maximum NNQ measure on 10 benchmark image datasets. We observed significant performance improvement for recently reported clustering models such as Spectral Embedded Clustering (SEC) and Nonnegative Spectral Clustering with Discriminative Regularization (NSDR) using proposed approach. Experimentally, significant overall performance improvement of 10.5% (clustering accuracy) and 9.2% (normalized mutual information) on 13 benchmark image datasets is observed for SEC and NSDR clustering models. Further, overall computational cost of SEC model is reduced to 19% and clustering performance for challenging outdoor natural image databases is significantly improved by using proposed NNQ measure based optimal image representations.
A Study on Objective Quality Measure for Bandwidth-Extended Speech in Mobile Voice Communications
Takashi SUDO Hirokazu TANAKA Ryuji KOHNO

PAPER-Speech and Hearing

Vol:
E97-A No:3
Page(s):
792-799
In this paper, we study an objective quality measure that approximates the subjective mean opinion score (MOS) for bandwidth-extended wideband speech with respect to narrowband speech. Bandwidth-extended speech should be widely evaluated by a subjective quality assessment such as MOS. However, such subjective quality assessments are expensive and time-consuming. This paper proposes a new objective quality measure that combines the perceptual evaluation of speech quality (PESQ) and spectral-distortion. We evaluated the correlation between our proposed scheme and MOS using AMR and AMR-WB speech codecs. The coefficient of correlation between the proposed scheme and the MOS value was found to be 0.973. We concluded that the proposed scheme is a valid and effective objective quality measure.
Time-Varying AR Spectral Estimation Using an Indefinite Matrix-Based Sliding Window Fast Linear Prediction
Kiyoshi NISHIYAMA

PAPER-Digital Signal Processing

Vol:
E97-A No:2
Page(s):
547-556
A method for efficiently estimating the time-varying spectra of nonstationary autoregressive (AR) signals is derived using an indefinite matrix-based sliding window fast linear prediction (ISWFLP). In the linear prediction, the indefinite matrix plays a very important role in sliding an exponentially weighted finite-length window over the prediction error samples. The resulting ISWFLP algorithm successively estimates the time-varying AR parameters of order N at a computational complexity of O(N) per sample. The performance of the AR parameter estimation is superior to the performances of the conventional techniques, including the Yule-Walker, covariance, and Burg methods. Consequently, the ISWFLP-based AR spectral estimation method is able to rapidly track variations in the frequency components with a high resolution and at a low computational cost. The effectiveness of the proposed method is demonstrated by the spectral analysis results of a sinusoidal signal and a speech signal.
Improved Spectral Efficiency at Reduced Outage Probability for Cooperative Wireless Networks by Using CSI Directed Estimate and Forward Strategy
Yihenew Wondie MARYE Chen LIU Feng LU Hua-An ZHAO

PAPER-Foundations

Vol:
E97-A No:1
Page(s):
7-17
Cooperative wireless communication is a communication mechanism to attain diversity through virtual antenna array that is formed by sharing resources among different users. Different strategies of resource utilization such as amplify-and-forward (AF) and decode-and-forward (DF) already exist in cooperative networks. Although the implementation of these strategies is simple, their utilization of the channel state information (CSI) is generally poor. As a result, the outage and bit error rate (BER) performances need much more improvement in order to satisfy the upcoming high data rate demands. For that to happen the spectral efficiency supported by a wireless system at a very low outage probability should be increased. In this paper a new approach, based on the previously existing ones, called CSI directed estimate and forward (CDEF) with a reduced estimation domain is proposed. A closed form solution for the optimal signal estimation at the relay using minimum mean square error (MMSE) as well as a possible set reduction of the estimation domain is given. It will be shown that this new strategy attains better symbol error rate (SER) and outage performance than AF or DF when the source relay link is comparatively better than the relay destination link. Simulation results also show that it has got better spectral efficiency at low outage probability for a given signal to noise ratio (SNR) as well as for a fixed outage probability in any operating SNR range.
Speckle-Free Phosphor-Scattered Blue Light Emitted out of InGaN/GaN Laser Diode with Broadened Spectral Behavior for High Luminance White Lamp Applications Open Access
Junichi KINOSHITA Yoshihisa IKEDA Yuji TAKEDA

INVITED PAPER

Vol:
E96-C No:11
Page(s):
1391-1398
Ultra-high luminance lamps emitting white light with a well-scattered blue spectrum from InGaN/GaN laser diodes and a phosphor-converted yellow spectrum show speckle contrast values as low as LED. Spectral behavior of the laser diodes is analyzed to find the reason why such low values are obtained. As a result, the PWM-driven, multi-longitudinal mode with dynamically broadened line-width is found to have a great effect on reducing speckle contrast. Despite using the lasers, such speckle-free lamps are considered to be very suitable for high-luminance and other various lighting applications.
Nonlinear Modeling and Analysis on Concurrent Amplification of Dual-Band Gaussian Signals Open Access
Ikuma ANDO GiaKhanh TRAN Kiyomichi ARAKI Takayuki YAMADA Takana KAHO Yo YAMAGUCHI Kazuhiro UEHARA

PAPER

Vol:
E96-C No:10
Page(s):
1254-1262
In the recently developed Flexible Wireless System (FWS), the same platform needs to deal with different wireless systems. This increases nonlinear distortion in its wideband power amplifier (PA) because the PA needs to concurrently amplify multi-band signals. By taking higher harmonics as well as inter- and cross-modulation distortion into consideration, we have developed a method to analytically evaluate the adjacent channel leakage power ratio (ACPR) and error vector magnitude (EVM) on the basis of the PA's nonlinear characteristics. We devise a novel method for modeling the PA amplifying dual-band signals. The method makes it possible to model it merely by performing a one-tone test, making use of the Volterra series expansion and the general Wiener model. We then use the Mehler formula to derive the closed-form expressions of the PA's output power spectral density (PSD), ACPR, and EVM. The derivations are based on the assumption that the transmitted signals are complex Gaussian distributed in orthogonal frequency division multiplexing (OFDM) transmission systems. We validate the method by comparing measurement and simulation results and confirm it can appropriately predict the ACPR and EVM performance of the nonlinear PA output with OFDM inputs. In short, the method enables correct modeling of a wideband PA that amplifies dual-band signals merely by conducting a one-tone test.
Improved Speech-Presence Uncertainty Estimation Based on Spectral Gradient for Global Soft Decision-Based Speech Enhancement
Jong-Woong KIM Joon-Hyuk CHANG Sang Won NAM Dong Kook KIM Jong Won SHIN

LETTER-Speech and Hearing

Vol:
E96-A No:10
Page(s):
2025-2028
In this paper, we propose a speech-presence uncertainty estimation to improve the global soft decision-based speech enhancement technique by using the spectral gradient scheme. The conventional soft decision-based speech enhancement technique uses a fixed ratio (Q) of the a priori speech-presence and speech-absence probabilities to derive the speech-absence probability (SAP). However, we attempt to adaptively change Q according to the spectral gradient between the current and past frames as well as the status of the voice activity in the previous two frames. As a result, the distinct values of Q to each frequency in each frame are assigned in order to improve the performance of the SAP by tracking the robust a priori information of the speech-presence in time.
Horizontal Spectral Entropy with Long-Span of Time for Robust Voice Activity Detection
Kun-Ching WANG

LETTER-Speech and Hearing

Vol:
E96-D No:9
Page(s):
2156-2161
This letter introduces innovative VAD based on horizontal spectral entropy with long-span of time (HSELT) feature sets to improve mobile ASR performance in low signal-to-noise ratio (SNR) conditions. Since the signal characteristics of nonstationary noise change with time, we need long-term information of the noisy speech signal to define a more robust decision rule yielding high accuracy. We find that HSELT measures can horizontally enhance the transition between speech and non-speech segments. Based on this finding, we use the HSELT measures to achieve high accuracy for detecting speech signal form various stationary and nonstationary noises.
Spectral Subtraction Based on Non-extensive Statistics for Speech Recognition
Hilman PARDEDE Koji IWANO Koichi SHINODA

PAPER-Speech and Hearing

Vol:
E96-D No:8
Page(s):
1774-1782
Spectral subtraction (SS) is an additive noise removal method which is derived in an extensive framework. In spectral subtraction, it is assumed that speech and noise spectra follow Gaussian distributions and are independent with each other. Hence, noisy speech also follows a Gaussian distribution. Spectral subtraction formula is obtained by maximizing the likelihood of noisy speech distribution with respect to its variance. However, it is well known that noisy speech observed in real situations often follows a heavy-tailed distribution, not a Gaussian distribution. In this paper, we introduce a q-Gaussian distribution in the non-extensive statistics to represent the distribution of noisy speech and derive a new spectral subtraction method based on it. We found that the q-Gaussian distribution fits the noisy speech distribution better than the Gaussian distribution does. Our speech recognition experiments using the Aurora-2 database showed that the proposed method, q-spectral subtraction (q-SS), outperformed the conventional SS method.
Spectral Correlation Based Blind Automatic Modulation Classification Using Symbol Rate Estimation
Azril HANIZ Minseok KIM Md. Abdur RAHMAN Jun-ichi TAKADA

PAPER-Wireless Communication Technologies

Vol:
E96-B No:5
Page(s):
1158-1167
Automatic modulation classification (AMC) is an important function of radio surveillance systems in order to identify unknown signals. Many previous works on AMC have utilized signal cyclostationarity, particularly spectral correlation density (SCD), but many of them fail to address several implementation issues, such as the assumption of perfect knowledge of the symbol rate. In this paper, we discuss several practical issues, e.g. cyclic frequency mismatch, which may affect the SCD, and propose compensation techniques to overcome those issues. We also propose a novel feature extraction technique from the SCD, which utilizes the SCD of not only the original received signal, but also the squared received signal. A symbol rate estimation technique which complements the feature extraction is also proposed. Finally, the classification performance of the system is evaluated through Monte Carlo simulations using a wide variety of modulated signals, and simulation results show that the proposed technique can estimate the symbol rate and classify modulation with a probability of above 0.9 down to SNRs of 5 dB.

61-80hit(266hit)

Keyword Search Result

[Keyword] spectra(266hit)

Spectral Features Based on Local Hu Moments of Gabor Spectrograms for Speech Emotion Recognition

Learning Deep Dictionary for Hyperspectral Image Denoising

Error Evaluation of an F0-Adaptive Spectral Envelope Estimator in Robustness against the Additive Noise and F0 Error

Experimental Validation of Digital Pre-distortion Technique for Dual-band Dual-signal Amplification by Single Feedback Architecture Employing Dual-band Mixer

Speech Watermarking Method Based on Formant Tuning

Adaptive Band Activity Ratio Control with Cascaded Energy Allocation for Amplify-and-Forward OFDM Relay Systems

Improved Spectral Envelope Coding Algorithm Using Adaptive Filtering for G.729.1

Voice Timbre Control Based on Perceived Age in Singing Voice Conversion

A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation

Linear Complexity of Generalized Cyclotomic Quaternary Sequences with Period pq

Multimode Image Clustering Using Optimal Image Descriptor Open Access

A Study on Objective Quality Measure for Bandwidth-Extended Speech in Mobile Voice Communications

Time-Varying AR Spectral Estimation Using an Indefinite Matrix-Based Sliding Window Fast Linear Prediction

Improved Spectral Efficiency at Reduced Outage Probability for Cooperative Wireless Networks by Using CSI Directed Estimate and Forward Strategy

Speckle-Free Phosphor-Scattered Blue Light Emitted out of InGaN/GaN Laser Diode with Broadened Spectral Behavior for High Luminance White Lamp Applications Open Access

Nonlinear Modeling and Analysis on Concurrent Amplification of Dual-Band Gaussian Signals Open Access

Improved Speech-Presence Uncertainty Estimation Based on Spectral Gradient for Global Soft Decision-Based Speech Enhancement

Horizontal Spectral Entropy with Long-Span of Time for Robust Voice Activity Detection

Spectral Subtraction Based on Non-extensive Statistics for Speech Recognition

Spectral Correlation Based Blind Automatic Modulation Classification Using Symbol Rate Estimation

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles