The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)

721-740hit(2504hit)

  • Noise Robust Feature Scheme for Automatic Speech Recognition Based on Auditory Perceptual Mechanisms

    Shang CAI  Yeming XIAO  Jielin PAN  Qingwei ZHAO  Yonghong YAN  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:6
      Page(s):
    1610-1618

    Mel Frequency Cepstral Coefficients (MFCC) are the most popular acoustic features used in automatic speech recognition (ASR), mainly because the coefficients capture the most useful information of the speech and fit well with the assumptions used in hidden Markov models. As is well known, MFCCs already employ several principles which have known counterparts in the peripheral properties of human hearing: decoupling across frequency, mel-warping of the frequency axis, log-compression of energy, etc. It is natural to introduce more mechanisms in the auditory periphery to improve the noise robustness of MFCC. In this paper, a k-nearest neighbors based frequency masking filter is proposed to reduce the audibility of spectra valleys which are sensitive to noise. Besides, Moore and Glasberg's critical band equivalent rectangular bandwidth (ERB) expression is utilized to determine the filter bandwidth. Furthermore, a new bandpass infinite impulse response (IIR) filter is proposed to imitate the temporal masking phenomenon of the human auditory system. These three auditory perceptual mechanisms are combined with the standard MFCC algorithm in order to investigate their effects on ASR performance, and a revised MFCC extraction scheme is presented. Recognition performances with the standard MFCC, RASTA perceptual linear prediction (RASTA-PLP) and the proposed feature extraction scheme are evaluated on a medium-vocabulary isolated-word recognition task and a more complex large vocabulary continuous speech recognition (LVCSR) task. Experimental results show that consistent robustness against background noise is achieved on these two tasks, and the proposed method outperforms both the standard MFCC and RASTA-PLP.

  • Selective Host-Interference Cancellation: A New Informed Embedding Strategy for Spread Spectrum Watermarking

    Peng ZHANG  Shuzheng XU  Huazhong YANG  

     
    PAPER-Cryptography and Information Security

      Vol:
    E95-A No:6
      Page(s):
    1065-1073

    To improve the robustness and transparency of spread spectrum (SS) based watermarking, this paper presents a new informed embedding strategy, which we call selective host-interference cancellation. We show that part of the host-interference in SS-based watermarking is beneficial to blind watermark extraction or detection, and can be utilized rather than removed. Utilizing this positive effect of the host itself can improve the watermark robustness without significantly sacrificing the media fidelity. The proposed strategy is realized by selectively applying improved SS (ISS) modulation to traditional SS watermarking. Theoretically, the error probability of the new method under additive white Gaussian noise attacks is several orders of magnitude lower than that of ISS for high signal-to-watermark ratios, and the required minimum watermark power is reduced by 3dB. Experiments were conducted on real audio signals, and the results show that our scheme is robust against most of common attacks even in high-transparency or high-payload applications.

  • Iris Image Blur Detection with Multiple Kernel Learning

    Lili PAN  Mei XIE  Ling MAO  

     
    LETTER-Pattern Recognition

      Vol:
    E95-D No:6
      Page(s):
    1698-1701

    In this letter, we analyze the influence of motion and out-of-focus blur on both frequency spectrum and cepstrum of an iris image. Based on their characteristics, we define two new discriminative blur features represented by Energy Spectral Density Distribution (ESDD) and Singular Cepstrum Histogram (SCH). To merge the two features for blur detection, a merging kernel which is a linear combination of two kernels is proposed when employing Support Vector Machine. Extensive experiments demonstrate the validity of our method by showing the improved blur detection performance on both synthetic and real datasets.

  • Automatic IQ Imbalance Compensation Technique for Quadrature Modulator by Single-Tone Testing

    Minseok KIM  Yohei KONISHI  Jun-ichi TAKADA  Boxin GAO  

     
    LETTER-Wireless Communication Technologies

      Vol:
    E95-B No:5
      Page(s):
    1864-1868

    This letter proposes an automatic IQ imbalance compensation technique for quadrature modulators by means of spectrum measurement of RF signal using a spectrum analyzer. The analyzer feeds back only magnitude information of the frequency spectrum of the signal. To realize IQ imbalance compensation, the conventional method of steepest descent is modified; the descent direction is empirically determined and a variable step-size is introduced for accelerating convergence. The experimental results for a four-channel transmitter operating at 11 GHz are presented for verification.

  • Speaker Change Detection Based on a Weighted Distance Measure over the Centroid Model

    Jin Soo SEO  

     
    LETTER-Speech and Hearing

      Vol:
    E95-D No:5
      Page(s):
    1543-1546

    Speaker change detection involves the identification of the time indices of an audio stream, where the identity of the speaker changes. This paper proposes novel measures for speaker change detection over the centroid model, which divides the feature space into non-overlapping clusters for effective speaker-change comparison. The centroid model is a computationally-efficient variant of the widely-used mixture-distribution based background models for speaker recognition. Experiments on both synthetic and real-world data were performed; the results show that the proposed approach yields promising results compared with the conventional statistical measures.

  • A Correlation-Based Watermarking Technique of 3-D Meshes via Cyclic Signal Processing

    Toshiyuki UTO  Yuka TAKEMURA  Hidekazu KAMITANI  Kenji OHUE  

     
    PAPER-Image Processing

      Vol:
    E95-D No:5
      Page(s):
    1272-1279

    This paper describes a blind watermarking scheme through cyclic signal processing. Due to various rapid networks, there is a growing demand of copyright protection for multimedia data. As efficient watermarking of images, there exist two major approaches: a quantization-based method and a correlation-based method. In this paper, we proposes a correlation-based watermarking technique of three-dimensional (3-D) polygonal models using the fast Fourier transforms (FFTs). For generating a watermark with desirable properties, similar to a pseudonoise signal, an impulse signal on a two-dimensional (2-D) space is spread through the FFT, the multiplication of a complex sinusoid signal, and the inverse FFT. This watermark, i.e., spread impulse signal, in a transform domain is converted to a spatial domain by an inverse wavelet transform, and embedded into 3-D data aligned by the principle component analysis (PCA). In the detection procedure, after realigning the watermarked mesh model through the PCA, we map the 3-D data on the 2-D space via block segmentation and averaging operation. The 2-D data are processed by the inverse system, i.e., the FFT, the division of the complex sinusoid signal, and the inverse FFT. From the resulting 2-D signal, we detect the position of the maximum value as a signature. For 3-D bunny models, detection rates and information capacity are shown to evaluate the performance of the proposed method.

  • Automatic Determination of the Appropriate Number of Clusters for Multispectral Image Data

    Kitti KOONSANIT  Chuleerat JARUSKULCHAI  

     
    PAPER-Image Processing

      Vol:
    E95-D No:5
      Page(s):
    1256-1263

    Nowadays, clustering is a popular tool for exploratory data analysis, with one technique being K-means clustering. Determining the appropriate number of clusters is a significant problem in K-means clustering because the results of the k-means technique depend on different numbers of clusters. Automatic determination of the appropriate number of clusters in a K-means clustering application is often needed in advance as an input parameter to the K-means algorithm. We propose a new method for automatic determination of the appropriate number of clusters using an extended co-occurrence matrix technique called a tri-co-occurrence matrix technique for multispectral imagery in the pre-clustering steps. The proposed method was tested using a dataset from a known number of clusters. The experimental results were compared with ground truth images and evaluated in terms of accuracy, with the numerical result of the tri-co-occurrence providing an accuracy of 84.86%. The results from the tests confirmed the effectiveness of the proposed method in finding the appropriate number of clusters and were compared with the original co-occurrence matrix technique and other algorithms.

  • Selected Topics from LVCSR Research for Asian Languages at Tokyo Tech

    Sadaoki FURUI  

     
    PAPER-Speech Processing

      Vol:
    E95-D No:5
      Page(s):
    1182-1194

    This paper presents our recent work in regard to building Large Vocabulary Continuous Speech Recognition (LVCSR) systems for the Thai, Indonesian, and Chinese languages. For Thai, since there is no word boundary in the written form, we have proposed a new method for automatically creating word-like units from a text corpus, and applied topic and speaking style adaptation to the language model to recognize spoken-style utterances. For Indonesian, we have applied proper noun-specific adaptation to acoustic modeling, and rule-based English-to-Indonesian phoneme mapping to solve the problem of large variation in proper noun and English word pronunciation in a spoken-query information retrieval system. In spoken Chinese, long organization names are frequently abbreviated, and abbreviated utterances cannot be recognized if the abbreviations are not included in the dictionary. We have proposed a new method for automatically generating Chinese abbreviations, and by expanding the vocabulary using the generated abbreviations, we have significantly improved the performance of spoken query-based search.

  • A Schmitt Trigger Based SRAM with Vertical MOSFET

    Hyoungjun NA  Tetsuo ENDOH  

     
    PAPER

      Vol:
    E95-C No:5
      Page(s):
    792-801

    In this paper, a Schmitt Trigger based 10T SRAM (ST 10T SRAM) cell with the vertical MOSFET is proposed for low supply voltage operation, and its impacts on cell size, stability and speed performance are investigated. The proposed ST 10T SRAM cell with the vertical MOSFET achieves smaller cell size than the ST 10T SRAM cell with the conventional planar MOSFET. Moreover, the proposed SRAM cell realizes large and constant static noise margin (SNM) against bottom node resistance of the vertical MOSFET without any architectural changes from the present 6T SRAM architecture. The proposed SRAM cell also suppresses the degradation of the read time of the ST 10T SRAM cell due to the back-bias effect free characteristic of the vertical MOSFET. The proposed ST 10T SRAM cell with the vertical MOSFET is a superior SRAM cell for low supply voltage operation with a small cell size, stable operation, and fast speed performance with the present 6T SRAM architecture.

  • Comparative Analysis of Bandgap-Engineered Pillar Type Flash Memory with HfO2 and S3N4 as Trapping Layer

    Sang-Youl LEE  Seung-Dong YANG  Jae-Sub OH  Ho-Jin YUN  Kwang-Seok JEONG  Yu-Mi KIM  Hi-Deok LEE  Ga-Won LEE  

     
    PAPER

      Vol:
    E95-C No:5
      Page(s):
    831-836

    In this paper, we fabricated a gate-all-around bandgap-engineered (BE) silicon-oxide-nitride-oxide-silicon (SONOS) and silicon-oxide-high-k-oxide-silicon (SOHOS) flash memory device with a vertical silicon pillar type structure for a potential solution to scaling down. Silicon nitride (Si3N4) and hafnium oxide (HfO2) were used as trapping layers in the SONOS and SOHOS devices, respectively. The BE-SOHOS device has better electrical characteristics such as a lower threshold voltage (VTH) of 0.16 V, a higher gm.max of 0.593 µA/V and on/off current ratio of 5.76108, than the BE-SONOS device. The memory characteristics of the BE-SONOS device, such as program/erase speed (P/E speed), endurance, and data retention, were compared with those of the BE-SOHOS device. The measured data show that the BE-SONOS device has good memory characteristics, such as program speed and data retention. Compared with the BE-SONOS device, the erase speed is enhanced about five times in BE-SOHOS, while the program speed and data retention characteristic are slightly worse, which can be explained via the many interface traps between the trapping layer and the tunneling oxide.

  • Two-Microphone Noise Reduction Using Spatial Information-Based Spectral Amplitude Estimation

    Kai LI  Yanmeng GUO  Qiang FU  Junfeng LI  Yonghong YAN  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:5
      Page(s):
    1454-1464

    Traditional two-microphone noise reduction algorithms to deal with highly nonstationary directional noises generally use the direction of arrival or phase difference information. The performance of these algorithms deteriorate when diffuse noises coexist with nonstationary directional noises in realistic adverse environments. In this paper, we present a two-channel noise reduction algorithm using a spatial information-based speech estimator and a spatial-information-controlled soft-decision noise estimator to improve the noise reduction performance in realistic non-stationary noisy environments. A target presence probability estimator based on Bayes rules using both phase difference and magnitude squared coherence is proposed for soft-decision of noise estimation, so that they can share complementary advantages when both directional noises and diffuse noises are present. Performances of the proposed two-microphone noise reduction algorithm are evaluated by noise reduction, log-spectral distance (LSD) and word recognition rate (WRR) of a distant-talking ASR system in a real room's noisy environment. Experimental results show that the proposed algorithm achieves better noises suppression without further distorting the desired signal components over the comparative dual-channel noise reduction algorithms.

  • Study on Resource Optimization for Heterogeneous Networks

    Gia Khanh TRAN  Shinichi TAJIMA  Rindranirina RAMAMONJISON  Kei SAKAGUCHI  Kiyomichi ARAKI  Shoji KANEKO  Noriaki MIYAZAKI  Satoshi KONISHI  Yoji KISHI  

     
    PAPER

      Vol:
    E95-B No:4
      Page(s):
    1198-1207

    This work studies the benefits of heterogeneous cellular networks with overlapping picocells in a large macrocell. We consider three different strategies for resource allocation and cell association. The first model employs a spectrum overlapping strategy with an SINR-based cell association. The second model avoids the interference between macrocell and picocell through a spectrum splitting strategy. Furthermore, picocell range expansion is also considered in this strategy to enable a load balancing between the macrocell and picocells. The last model is a hybrid one, called as fractional spectrum splitting strategy, where spectrum splitting strategy is only applied at the picocell-edge, while the picocell-inner reuses the spectrum of the macrocell. We constructs resource allocation optimization problem for these strategies to maximize the system rate. Our results show that in terms of system rate, all the three strategies outperform the performance of macrocell-only case, which shows the benefit of heterogeneous networks. Moreover, fractional spectrum splitting strategy provides highest system rate at the expense of outage user rate degradation due to inter-macro-pico interference. Spectrum overlapping model provides the second highest system rate gain and also improves outage user rate owing to full spectrum reuse and the benefit of macro diversity, while spectrum splitting model achieves a moderate system rate gain.

  • Finding Incorrect and Missing Quality Requirements Definitions Using Requirements Frame

    Haruhiko KAIYA  Atsushi OHNISHI  

     
    PAPER

      Vol:
    E95-D No:4
      Page(s):
    1031-1043

    Defining quality requirements completely and correctly is more difficult than defining functional requirements because stakeholders do not state most of quality requirements explicitly. We thus propose a method to measure a requirements specification for identifying the amount of quality requirements in the specification. We also propose another method to recommend quality requirements to be defined in such a specification. We expect stakeholders can identify missing and unnecessary quality requirements when measured quality requirements are different from recommended ones. We use a semi-formal language called X-JRDL to represent requirements specifications because it is suitable for analyzing quality requirements. We applied our methods to a requirements specification, and found our methods contribute to defining quality requirements more completely and correctly.

  • Development and Experimental Evaluation of Cyclostationarity-Based Signal Identification Equipment for Cognitive Radios

    Hiroki HARADA  Hiromasa FUJII  Shunji MIURA  Hidetoshi KAYAMA  Yoshiki OKANO  Tetsuro IMAI  

     
    PAPER

      Vol:
    E95-B No:4
      Page(s):
    1100-1108

    An important and widely considered signal identification technique for cognitive radios is cyclostationarity-based feature detection because this method does not require time and frequency synchronization and prior information except for information concerning cyclic autocorrelation features of target signals. This paper presents the development and experimental evaluation of cyclostationarity-based signal identification equipment. A spatial channel emulator is used in conjunction with the equipment that provides an environment to evaluate realistic spectrum sharing scenarios. The results reveal the effectiveness of the cyclostationarity-based signal identification methodology in realistic spectrum sharing scenarios, especially in terms of the capability to identify weak signals.

  • Sensing Methods for Detecting Analog Television Signals

    Mohammad Azizur RAHMAN  Chunyi SONG  Hiroshi HARADA  

     
    PAPER

      Vol:
    E95-B No:4
      Page(s):
    1066-1075

    This paper introduces a unified method of spectrum sensing for all existing analog television (TV) signals including NTSC, PAL and SECAM. We propose a correlation based method (CBM) with a single reference signal for sensing any analog TV signals. In addition we also propose an improved energy detection method. The CBM approach has been implemented in a hardware prototype specially designed for participating in Singapore TV white space (WS) test trial conducted by Infocomm Development Authority (IDA) of the Singapore government. Analytical and simulation results of the CBM method will be presented in the paper, as well as hardware testing results for sensing various analog TV signals. Both AWGN and fading channels will be considered. It is shown that the theoretical results closely match with those from simulations. Sensing performance of the hardware prototype will also be presented in fading environment by using a fading simulator. We present performance of the proposed techniques in terms of probability of false alarm, probability of detection, sensing time etc. We also present a comparative study of the various techniques.

  • Distributed Dynamic Spectrum Allocation for Secondary Users in a Vertical Spectrum Sharing Scenario Open Access

    Behtash BABADI  Vahid TAROKH  

     
    INVITED PAPER

      Vol:
    E95-B No:4
      Page(s):
    1044-1055

    In this paper, we study the problem of distributed spectrum allocation under a vertical spectrum sharing scenario in a cognitive radio network. The secondary users share the spectrum licensed to the primary user by observing the activity statistics of the primary users, and regulate their transmission strategy in order to abide by the spectrum sharing etiquette. When the primary user is inactive in a subset of the available frequency bands, from the perspective of the secondary users the problem reduces to a distributed horizontal spectrum sharing. For a specific class of networks, the latter problem is addressed by the recently proposed GADIA algorithm [1]. In this paper, we present analytical and numerical results on the performance of the GADIA algorithm in conjunction with the above-mentioned vertical spectrum sharing scenario. These results reveal near-optimal performance guarantees for the overall vertical spectrum sharing scenario.

  • Channel Assignment Algorithms for OSA-Enabled WLANs Exploiting Prioritization and Spectrum Heterogeneity

    Francisco NOVILLO  Ramon FERRUS  

     
    PAPER

      Vol:
    E95-B No:4
      Page(s):
    1125-1134

    Allowing WLANs to exploit opportunistic spectrum access (OSA) is a promising approach to alleviate spectrum congestion problems in overcrowded unlicensed ISM bands, especially in highly dense WLAN deployments. In this context, novel channel assignment mechanisms jointly considering available channels in both unlicensed ISM and OSA-enabled licensed bands are needed. Unlike classical schemes proposed for legacy WLANs, channel assignment mechanisms for OSA-enabled WLAN should face two distinguishing issues: channel prioritization and spectrum heterogeneity. The first refers to the fact that additional prioritization criteria other than interference conditions should be considered when choosing between ISM or licensed band channels. The second refers to the fact that channel availability might not be the same for all WLAN Access Points because of primary users' activity in the OSA-enabled bands. This paper firstly formulates the channel assignment problem for OSA-enabled WLANs as a Binary Linear Programming (BLP) problem. The resulting BLP problem is optimally solved by means of branch and bound algorithms and used as a benchmark to develop more computationally efficient heuristics. Upon such a basis, a novel channel assignment algorithm based on weighted graph coloring heuristics and able to exploit both channel prioritization and spectrum heterogeneity is proposed. The algorithm is evaluated under different conditions of AP density and primary band availability.

  • PSD Map Construction Scheme Based on Compressive Sensing in Cognitive Radio Networks

    Javad Afshar JAHANSHAHI  Mohammad ESLAMI  Seyed Ali GHORASHI  

     
    PAPER

      Vol:
    E95-B No:4
      Page(s):
    1056-1065

    of late, many researchers have been interested in sparse representation of signals and its applications such as Compressive Sensing in Cognitive Radio (CR) networks as a way of overcoming the issue of limited bandwidth. Compressive sensing based wideband spectrum sensing is a novel approach in cognitive radio systems. Also in these systems, using spatial-frequency opportunistic reuse is emerged interestingly by constructing and deploying spatial-frequency Power Spectral Density (PSD) maps. Since the CR sensors are distributed in the region of support, the sensed PSD by each sensor should be transmitted to a master node (base-station) in order to construct the PSD maps in space and frequency domains. When the number of sensors is large, this data transmission which is required for construction of PSD map can be challenging. In this paper, in order to transmit the CR sensors' data to the master node, the compressive sensing based scheme is used. Therefore, the measurements are sampled in a lower sampling rate than of the Nyquist rate. By using the proposed method, an acceptable PSD map for cognitive radio purposes can be achieved by only 30% of full data transmission. Also, simulation results show the robustness of the proposed method against the channel variations in comparison with classical methods. Different solution schemes such as Basis Pursuit, Lasso, Lars and Orthogonal Matching Pursuit are used and the quality performance of them is evaluated by several simulation results over a Rician channel with respect to several different compression and Signal to Noise Ratios. It is also illustrated that the performance of Basis Pursuit and Lasso methods outperform the other compression methods particularly in higher compression rates.

  • Design and Performance of Overlap FFT Filter-Bank for Dynamic Spectrum Access Applications

    Motohiro TANABE  Masahiro UMEHIRA  

     
    PAPER

      Vol:
    E95-B No:4
      Page(s):
    1249-1255

    An OFDMA-based (Orthogonal Frequency Division Multiple Access-based) channel access scheme for dynamic spectrum access has the drawbacks of large PAPR (Peak to Average Power Ratio) and large ACI (Adjacent Channel Interference). To solve these problems, a flexible channel access scheme using an overlap FFT filter-bank was proposed based on single carrier modulation for dynamic spectrum access. In order to apply the overlap FFT filter-bank for dynamic spectrum access, it is necessary to clarify the performance of the overlap FFT filter-bank according to the design parameters since its frequency characteristics are critical for dynamic spectrum access applications. This paper analyzes the overlap FFT filter-bank and evaluates its performance such as frequency characteristics and ACI performance according to the design parameters.

  • Search-Free Codebook Mapping for Artificial Bandwidth Extension

    Heewan PARK  Byungsik YOON  Sangwon KANG  Andreas SPANIAS  

     
    LETTER-Multimedia Systems for Communications

      Vol:
    E95-B No:4
      Page(s):
    1479-1482

    A new codebook mapping algorithm for artificial bandwidth extension (ABE) is introduced in this paper. We design a wideband line spectrum pair (LSP) codebook which is coupled with the same index as the LSP codebook of a narrowband speech codec. The received narrowband LSP codebook indices are used to directly induce wideband LSP codewords. Thus, the proposed scheme eliminates codebook search processing to estimate the wideband spectrum envelope. We apply the proposed scheme to bandwidth extension in adaptive multi-rate (AMR) compressed domain. Its performance is assessed via the perceptual evaluation of speech quality (PESQ), informal listening tests, and weighted million operations per second (WMOPS) calculations.

721-740hit(2504hit)