Shang CAI Yeming XIAO Jielin PAN Qingwei ZHAO Yonghong YAN
Mel Frequency Cepstral Coefficients (MFCC) are the most popular acoustic features used in automatic speech recognition (ASR), mainly because the coefficients capture the most useful information of the speech and fit well with the assumptions used in hidden Markov models. As is well known, MFCCs already employ several principles which have known counterparts in the peripheral properties of human hearing: decoupling across frequency, mel-warping of the frequency axis, log-compression of energy, etc. It is natural to introduce more mechanisms in the auditory periphery to improve the noise robustness of MFCC. In this paper, a k-nearest neighbors based frequency masking filter is proposed to reduce the audibility of spectra valleys which are sensitive to noise. Besides, Moore and Glasberg's critical band equivalent rectangular bandwidth (ERB) expression is utilized to determine the filter bandwidth. Furthermore, a new bandpass infinite impulse response (IIR) filter is proposed to imitate the temporal masking phenomenon of the human auditory system. These three auditory perceptual mechanisms are combined with the standard MFCC algorithm in order to investigate their effects on ASR performance, and a revised MFCC extraction scheme is presented. Recognition performances with the standard MFCC, RASTA perceptual linear prediction (RASTA-PLP) and the proposed feature extraction scheme are evaluated on a medium-vocabulary isolated-word recognition task and a more complex large vocabulary continuous speech recognition (LVCSR) task. Experimental results show that consistent robustness against background noise is achieved on these two tasks, and the proposed method outperforms both the standard MFCC and RASTA-PLP.
Peng ZHANG Shuzheng XU Huazhong YANG
To improve the robustness and transparency of spread spectrum (SS) based watermarking, this paper presents a new informed embedding strategy, which we call selective host-interference cancellation. We show that part of the host-interference in SS-based watermarking is beneficial to blind watermark extraction or detection, and can be utilized rather than removed. Utilizing this positive effect of the host itself can improve the watermark robustness without significantly sacrificing the media fidelity. The proposed strategy is realized by selectively applying improved SS (ISS) modulation to traditional SS watermarking. Theoretically, the error probability of the new method under additive white Gaussian noise attacks is several orders of magnitude lower than that of ISS for high signal-to-watermark ratios, and the required minimum watermark power is reduced by 3dB. Experiments were conducted on real audio signals, and the results show that our scheme is robust against most of common attacks even in high-transparency or high-payload applications.
In this letter, we analyze the influence of motion and out-of-focus blur on both frequency spectrum and cepstrum of an iris image. Based on their characteristics, we define two new discriminative blur features represented by Energy Spectral Density Distribution (ESDD) and Singular Cepstrum Histogram (SCH). To merge the two features for blur detection, a merging kernel which is a linear combination of two kernels is proposed when employing Support Vector Machine. Extensive experiments demonstrate the validity of our method by showing the improved blur detection performance on both synthetic and real datasets.
Minseok KIM Yohei KONISHI Jun-ichi TAKADA Boxin GAO
This letter proposes an automatic IQ imbalance compensation technique for quadrature modulators by means of spectrum measurement of RF signal using a spectrum analyzer. The analyzer feeds back only magnitude information of the frequency spectrum of the signal. To realize IQ imbalance compensation, the conventional method of steepest descent is modified; the descent direction is empirically determined and a variable step-size is introduced for accelerating convergence. The experimental results for a four-channel transmitter operating at 11 GHz are presented for verification.
Speaker change detection involves the identification of the time indices of an audio stream, where the identity of the speaker changes. This paper proposes novel measures for speaker change detection over the centroid model, which divides the feature space into non-overlapping clusters for effective speaker-change comparison. The centroid model is a computationally-efficient variant of the widely-used mixture-distribution based background models for speaker recognition. Experiments on both synthetic and real-world data were performed; the results show that the proposed approach yields promising results compared with the conventional statistical measures.
Toshiyuki UTO Yuka TAKEMURA Hidekazu KAMITANI Kenji OHUE
This paper describes a blind watermarking scheme through cyclic signal processing. Due to various rapid networks, there is a growing demand of copyright protection for multimedia data. As efficient watermarking of images, there exist two major approaches: a quantization-based method and a correlation-based method. In this paper, we proposes a correlation-based watermarking technique of three-dimensional (3-D) polygonal models using the fast Fourier transforms (FFTs). For generating a watermark with desirable properties, similar to a pseudonoise signal, an impulse signal on a two-dimensional (2-D) space is spread through the FFT, the multiplication of a complex sinusoid signal, and the inverse FFT. This watermark, i.e., spread impulse signal, in a transform domain is converted to a spatial domain by an inverse wavelet transform, and embedded into 3-D data aligned by the principle component analysis (PCA). In the detection procedure, after realigning the watermarked mesh model through the PCA, we map the 3-D data on the 2-D space via block segmentation and averaging operation. The 2-D data are processed by the inverse system, i.e., the FFT, the division of the complex sinusoid signal, and the inverse FFT. From the resulting 2-D signal, we detect the position of the maximum value as a signature. For 3-D bunny models, detection rates and information capacity are shown to evaluate the performance of the proposed method.
Kitti KOONSANIT Chuleerat JARUSKULCHAI
Nowadays, clustering is a popular tool for exploratory data analysis, with one technique being K-means clustering. Determining the appropriate number of clusters is a significant problem in K-means clustering because the results of the k-means technique depend on different numbers of clusters. Automatic determination of the appropriate number of clusters in a K-means clustering application is often needed in advance as an input parameter to the K-means algorithm. We propose a new method for automatic determination of the appropriate number of clusters using an extended co-occurrence matrix technique called a tri-co-occurrence matrix technique for multispectral imagery in the pre-clustering steps. The proposed method was tested using a dataset from a known number of clusters. The experimental results were compared with ground truth images and evaluated in terms of accuracy, with the numerical result of the tri-co-occurrence providing an accuracy of 84.86%. The results from the tests confirmed the effectiveness of the proposed method in finding the appropriate number of clusters and were compared with the original co-occurrence matrix technique and other algorithms.
This paper presents our recent work in regard to building Large Vocabulary Continuous Speech Recognition (LVCSR) systems for the Thai, Indonesian, and Chinese languages. For Thai, since there is no word boundary in the written form, we have proposed a new method for automatically creating word-like units from a text corpus, and applied topic and speaking style adaptation to the language model to recognize spoken-style utterances. For Indonesian, we have applied proper noun-specific adaptation to acoustic modeling, and rule-based English-to-Indonesian phoneme mapping to solve the problem of large variation in proper noun and English word pronunciation in a spoken-query information retrieval system. In spoken Chinese, long organization names are frequently abbreviated, and abbreviated utterances cannot be recognized if the abbreviations are not included in the dictionary. We have proposed a new method for automatically generating Chinese abbreviations, and by expanding the vocabulary using the generated abbreviations, we have significantly improved the performance of spoken query-based search.
In this paper, a Schmitt Trigger based 10T SRAM (ST 10T SRAM) cell with the vertical MOSFET is proposed for low supply voltage operation, and its impacts on cell size, stability and speed performance are investigated. The proposed ST 10T SRAM cell with the vertical MOSFET achieves smaller cell size than the ST 10T SRAM cell with the conventional planar MOSFET. Moreover, the proposed SRAM cell realizes large and constant static noise margin (SNM) against bottom node resistance of the vertical MOSFET without any architectural changes from the present 6T SRAM architecture. The proposed SRAM cell also suppresses the degradation of the read time of the ST 10T SRAM cell due to the back-bias effect free characteristic of the vertical MOSFET. The proposed ST 10T SRAM cell with the vertical MOSFET is a superior SRAM cell for low supply voltage operation with a small cell size, stable operation, and fast speed performance with the present 6T SRAM architecture.
Sang-Youl LEE Seung-Dong YANG Jae-Sub OH Ho-Jin YUN Kwang-Seok JEONG Yu-Mi KIM Hi-Deok LEE Ga-Won LEE
In this paper, we fabricated a gate-all-around bandgap-engineered (BE) silicon-oxide-nitride-oxide-silicon (SONOS) and silicon-oxide-high-k-oxide-silicon (SOHOS) flash memory device with a vertical silicon pillar type structure for a potential solution to scaling down. Silicon nitride (Si3N4) and hafnium oxide (HfO2) were used as trapping layers in the SONOS and SOHOS devices, respectively. The BE-SOHOS device has better electrical characteristics such as a lower threshold voltage (VTH) of 0.16 V, a higher gm.max of 0.593 µA/V and on/off current ratio of 5.76108, than the BE-SONOS device. The memory characteristics of the BE-SONOS device, such as program/erase speed (P/E speed), endurance, and data retention, were compared with those of the BE-SOHOS device. The measured data show that the BE-SONOS device has good memory characteristics, such as program speed and data retention. Compared with the BE-SONOS device, the erase speed is enhanced about five times in BE-SOHOS, while the program speed and data retention characteristic are slightly worse, which can be explained via the many interface traps between the trapping layer and the tunneling oxide.
Kai LI Yanmeng GUO Qiang FU Junfeng LI Yonghong YAN
Traditional two-microphone noise reduction algorithms to deal with highly nonstationary directional noises generally use the direction of arrival or phase difference information. The performance of these algorithms deteriorate when diffuse noises coexist with nonstationary directional noises in realistic adverse environments. In this paper, we present a two-channel noise reduction algorithm using a spatial information-based speech estimator and a spatial-information-controlled soft-decision noise estimator to improve the noise reduction performance in realistic non-stationary noisy environments. A target presence probability estimator based on Bayes rules using both phase difference and magnitude squared coherence is proposed for soft-decision of noise estimation, so that they can share complementary advantages when both directional noises and diffuse noises are present. Performances of the proposed two-microphone noise reduction algorithm are evaluated by noise reduction, log-spectral distance (LSD) and word recognition rate (WRR) of a distant-talking ASR system in a real room's noisy environment. Experimental results show that the proposed algorithm achieves better noises suppression without further distorting the desired signal components over the comparative dual-channel noise reduction algorithms.
Gia Khanh TRAN Shinichi TAJIMA Rindranirina RAMAMONJISON Kei SAKAGUCHI Kiyomichi ARAKI Shoji KANEKO Noriaki MIYAZAKI Satoshi KONISHI Yoji KISHI
This work studies the benefits of heterogeneous cellular networks with overlapping picocells in a large macrocell. We consider three different strategies for resource allocation and cell association. The first model employs a spectrum overlapping strategy with an SINR-based cell association. The second model avoids the interference between macrocell and picocell through a spectrum splitting strategy. Furthermore, picocell range expansion is also considered in this strategy to enable a load balancing between the macrocell and picocells. The last model is a hybrid one, called as fractional spectrum splitting strategy, where spectrum splitting strategy is only applied at the picocell-edge, while the picocell-inner reuses the spectrum of the macrocell. We constructs resource allocation optimization problem for these strategies to maximize the system rate. Our results show that in terms of system rate, all the three strategies outperform the performance of macrocell-only case, which shows the benefit of heterogeneous networks. Moreover, fractional spectrum splitting strategy provides highest system rate at the expense of outage user rate degradation due to inter-macro-pico interference. Spectrum overlapping model provides the second highest system rate gain and also improves outage user rate owing to full spectrum reuse and the benefit of macro diversity, while spectrum splitting model achieves a moderate system rate gain.
Haruhiko KAIYA Atsushi OHNISHI
Defining quality requirements completely and correctly is more difficult than defining functional requirements because stakeholders do not state most of quality requirements explicitly. We thus propose a method to measure a requirements specification for identifying the amount of quality requirements in the specification. We also propose another method to recommend quality requirements to be defined in such a specification. We expect stakeholders can identify missing and unnecessary quality requirements when measured quality requirements are different from recommended ones. We use a semi-formal language called X-JRDL to represent requirements specifications because it is suitable for analyzing quality requirements. We applied our methods to a requirements specification, and found our methods contribute to defining quality requirements more completely and correctly.
Hiroki HARADA Hiromasa FUJII Shunji MIURA Hidetoshi KAYAMA Yoshiki OKANO Tetsuro IMAI
An important and widely considered signal identification technique for cognitive radios is cyclostationarity-based feature detection because this method does not require time and frequency synchronization and prior information except for information concerning cyclic autocorrelation features of target signals. This paper presents the development and experimental evaluation of cyclostationarity-based signal identification equipment. A spatial channel emulator is used in conjunction with the equipment that provides an environment to evaluate realistic spectrum sharing scenarios. The results reveal the effectiveness of the cyclostationarity-based signal identification methodology in realistic spectrum sharing scenarios, especially in terms of the capability to identify weak signals.
Mohammad Azizur RAHMAN Chunyi SONG Hiroshi HARADA
This paper introduces a unified method of spectrum sensing for all existing analog television (TV) signals including NTSC, PAL and SECAM. We propose a correlation based method (CBM) with a single reference signal for sensing any analog TV signals. In addition we also propose an improved energy detection method. The CBM approach has been implemented in a hardware prototype specially designed for participating in Singapore TV white space (WS) test trial conducted by Infocomm Development Authority (IDA) of the Singapore government. Analytical and simulation results of the CBM method will be presented in the paper, as well as hardware testing results for sensing various analog TV signals. Both AWGN and fading channels will be considered. It is shown that the theoretical results closely match with those from simulations. Sensing performance of the hardware prototype will also be presented in fading environment by using a fading simulator. We present performance of the proposed techniques in terms of probability of false alarm, probability of detection, sensing time etc. We also present a comparative study of the various techniques.
In this paper, we study the problem of distributed spectrum allocation under a vertical spectrum sharing scenario in a cognitive radio network. The secondary users share the spectrum licensed to the primary user by observing the activity statistics of the primary users, and regulate their transmission strategy in order to abide by the spectrum sharing etiquette. When the primary user is inactive in a subset of the available frequency bands, from the perspective of the secondary users the problem reduces to a distributed horizontal spectrum sharing. For a specific class of networks, the latter problem is addressed by the recently proposed GADIA algorithm [1]. In this paper, we present analytical and numerical results on the performance of the GADIA algorithm in conjunction with the above-mentioned vertical spectrum sharing scenario. These results reveal near-optimal performance guarantees for the overall vertical spectrum sharing scenario.
Francisco NOVILLO Ramon FERRUS
Allowing WLANs to exploit opportunistic spectrum access (OSA) is a promising approach to alleviate spectrum congestion problems in overcrowded unlicensed ISM bands, especially in highly dense WLAN deployments. In this context, novel channel assignment mechanisms jointly considering available channels in both unlicensed ISM and OSA-enabled licensed bands are needed. Unlike classical schemes proposed for legacy WLANs, channel assignment mechanisms for OSA-enabled WLAN should face two distinguishing issues: channel prioritization and spectrum heterogeneity. The first refers to the fact that additional prioritization criteria other than interference conditions should be considered when choosing between ISM or licensed band channels. The second refers to the fact that channel availability might not be the same for all WLAN Access Points because of primary users' activity in the OSA-enabled bands. This paper firstly formulates the channel assignment problem for OSA-enabled WLANs as a Binary Linear Programming (BLP) problem. The resulting BLP problem is optimally solved by means of branch and bound algorithms and used as a benchmark to develop more computationally efficient heuristics. Upon such a basis, a novel channel assignment algorithm based on weighted graph coloring heuristics and able to exploit both channel prioritization and spectrum heterogeneity is proposed. The algorithm is evaluated under different conditions of AP density and primary band availability.
Javad Afshar JAHANSHAHI Mohammad ESLAMI Seyed Ali GHORASHI
of late, many researchers have been interested in sparse representation of signals and its applications such as Compressive Sensing in Cognitive Radio (CR) networks as a way of overcoming the issue of limited bandwidth. Compressive sensing based wideband spectrum sensing is a novel approach in cognitive radio systems. Also in these systems, using spatial-frequency opportunistic reuse is emerged interestingly by constructing and deploying spatial-frequency Power Spectral Density (PSD) maps. Since the CR sensors are distributed in the region of support, the sensed PSD by each sensor should be transmitted to a master node (base-station) in order to construct the PSD maps in space and frequency domains. When the number of sensors is large, this data transmission which is required for construction of PSD map can be challenging. In this paper, in order to transmit the CR sensors' data to the master node, the compressive sensing based scheme is used. Therefore, the measurements are sampled in a lower sampling rate than of the Nyquist rate. By using the proposed method, an acceptable PSD map for cognitive radio purposes can be achieved by only 30% of full data transmission. Also, simulation results show the robustness of the proposed method against the channel variations in comparison with classical methods. Different solution schemes such as Basis Pursuit, Lasso, Lars and Orthogonal Matching Pursuit are used and the quality performance of them is evaluated by several simulation results over a Rician channel with respect to several different compression and Signal to Noise Ratios. It is also illustrated that the performance of Basis Pursuit and Lasso methods outperform the other compression methods particularly in higher compression rates.
Motohiro TANABE Masahiro UMEHIRA
An OFDMA-based (Orthogonal Frequency Division Multiple Access-based) channel access scheme for dynamic spectrum access has the drawbacks of large PAPR (Peak to Average Power Ratio) and large ACI (Adjacent Channel Interference). To solve these problems, a flexible channel access scheme using an overlap FFT filter-bank was proposed based on single carrier modulation for dynamic spectrum access. In order to apply the overlap FFT filter-bank for dynamic spectrum access, it is necessary to clarify the performance of the overlap FFT filter-bank according to the design parameters since its frequency characteristics are critical for dynamic spectrum access applications. This paper analyzes the overlap FFT filter-bank and evaluates its performance such as frequency characteristics and ACI performance according to the design parameters.
Heewan PARK Byungsik YOON Sangwon KANG Andreas SPANIAS
A new codebook mapping algorithm for artificial bandwidth extension (ABE) is introduced in this paper. We design a wideband line spectrum pair (LSP) codebook which is coupled with the same index as the LSP codebook of a narrowband speech codec. The received narrowband LSP codebook indices are used to directly induce wideband LSP codewords. Thus, the proposed scheme eliminates codebook search processing to estimate the wideband spectrum envelope. We apply the proposed scheme to bandwidth extension in adaptive multi-rate (AMR) compressed domain. Its performance is assessed via the perceptual evaluation of speech quality (PESQ), informal listening tests, and weighted million operations per second (WMOPS) calculations.