The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SI(16314hit)

2941-2960hit(16314hit)

  • One-Bit to Four-Bit Dual Conversion for Security Enhancement against Power Analysis

    Seungkwang LEE  Nam-Su JHO  

     
    PAPER-Cryptography and Information Security

      Vol:
    E99-A No:10
      Page(s):
    1833-1842

    Power analysis exploits the leaked information gained from cryptographic devices including, but not limited to, power consumption generated during cryptographic operations. If a number of power traces are given to an attacker, it is possible to reveal a cryptographic key efficiently, sometimes within a few minutes, using various statistical methods. In this sense, software countermeasures including higher-order masking or software dual-rail with precharge logic have been proposed to produce randomized or constant power consumption during the key-dependent operations. However, they have critical disadvantages in terms of computational time and security. In this paper, we propose a new solution called “one-bit to four-bit dual conversion” for enhanced security against power analysis. For an exemplary embodiment of the proposed scheme, we apply it to an AES implementation and demonstrate its security and performance. The overall costs are approximately 148KB memory space for the lookup tables and about a 3-fold increase in execution time than the straightforward implementation of AES.

  • Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis

    Xin WANG  Shinji TAKAKI  Junichi YAMAGISHI  

     
    PAPER-Speech synthesis

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2471-2480

    Building high-quality text-to-speech (TTS) systems without expert knowledge of the target language and/or time-consuming manual annotation of speech and text data is an important yet challenging research topic. In this kind of TTS system, it is vital to find representation of the input text that is both effective and easy to acquire. Recently, the continuous representation of raw word inputs, called “word embedding”, has been successfully used in various natural language processing tasks. It has also been used as the additional or alternative linguistic input features to a neural-network-based acoustic model for TTS systems. In this paper, we further investigate the use of this embedding technique to represent phonemes, syllables and phrases for the acoustic model based on the recurrent and feed-forward neural network. Results of the experiments show that most of these continuous representations cannot significantly improve the system's performance when they are fed into the acoustic model either as additional component or as a replacement of the conventional prosodic context. However, subjective evaluation shows that the continuous representation of phrases can achieve significant improvement when it is combined with the prosodic context as input to the acoustic model based on the feed-forward neural network.

  • Statistical Bandwidth Extension for Speech Synthesis Based on Gaussian Mixture Model with Sub-Band Basis Spectrum Model

    Yamato OHTANI  Masatsune TAMURA  Masahiro MORITA  Masami AKAMINE  

     
    PAPER-Voice conversion

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2481-2489

    This paper describes a novel statistical bandwidth extension (BWE) technique based on a Gaussian mixture model (GMM) and a sub-band basis spectrum model (SBM), in which each dimensional component represents a specific acoustic space in the frequency domain. The proposed method can achieve the BWE from speech data with an arbitrary frequency bandwidth whereas the conventional methods perform the conversion from fixed narrow-band data. In the proposed method, we train a GMM with SBM parameters extracted from full-band spectra in advance. According to the bandwidth of input signal, the trained GMM is reconstructed to the GMM of the joint probability density between low-band SBM and high-band SBM components. Then high-band SBM components are estimated from low-band SBM components of the input signal based on the reconstructed GMM. Finally, BWE is achieved by adding the spectra decoded from estimated high-band SBM components to the ones of the input signal. To construct the full-band signal from the narrow-band one, we apply this method to log-amplitude spectra and aperiodic components. Objective and subjective evaluation results show that the proposed method extends the bandwidth of speech data robustly for the log-amplitude spectra. Experimental results also indicate that the aperiodic component extracted from the upsampled narrow-band signal realizes the same performance as the restored and the full-band aperiodic components in the proposed method.

  • A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models

    Shinnosuke TAKAMICHI  Tomoki TODA  Graham NEUBIG  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Voice conversion

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2490-2498

    This paper presents a novel statistical sample-based approach for Gaussian Mixture Model (GMM)-based Voice Conversion (VC). Although GMM-based VC has the promising flexibility of model adaptation, quality in converted speech is significantly worse than that of natural speech. This paper addresses the problem of inaccurate modeling, which is one of the main reasons causing the quality degradation. Recently, we have proposed statistical sample-based speech synthesis using rich context models for high-quality and flexible Hidden Markov Model (HMM)-based Text-To-Speech (TTS) synthesis. This method makes it possible not only to produce high-quality speech by introducing ideas from unit selection synthesis, but also to preserve flexibility of the original HMM-based TTS. In this paper, we apply this idea to GMM-based VC. The rich context models are first trained for individual joint speech feature vectors, and then we gather them mixture by mixture to form a Rich context-GMM (R-GMM). In conversion, an iterative generation algorithm using R-GMMs is used to convert speech parameters, after initialization using over-trained probability distributions. Because the proposed method utilizes individual speech features, and its formulation is the same as that of conventional GMM-based VC, it makes it possible to produce high-quality speech while keeping flexibility of the original GMM-based VC. The experimental results demonstrate that the proposed method yields significant improvements in term of speech quality and speaker individuality in converted speech.

  • Acoustic Scene Analysis Based on Hierarchical Generative Model of Acoustic Event Sequence

    Keisuke IMOTO  Suehiro SHIMAUCHI  

     
    PAPER-Acoustic event detection

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2539-2549

    We propose a novel method for estimating acoustic scenes such as user activities, e.g., “cooking,” “vacuuming,” “watching TV,” or situations, e.g., “being on the bus,” “being in a park,” “meeting,” utilizing the information of acoustic events. There are some methods for estimating acoustic scenes that associate a combination of acoustic events with an acoustic scene. However, the existing methods cannot adequately express acoustic scenes, e.g., “cooking,” that have more than one subordinate category, e.g., “frying ingredients” or “plating food,” because they directly associate acoustic events with acoustic scenes. In this paper, we propose an acoustic scene estimation method based on a hierarchical probabilistic generative model of an acoustic event sequence taking into account the relation among acoustic scenes, their subordinate categories, and acoustic event sequences. In the proposed model, each acoustic scene is represented as a probability distribution over their unsupervised subordinate categories, called “acoustic sub-topics,” and each acoustic sub-topic is represented as a probability distribution over acoustic events. Acoustic scene estimation experiments with real-life sounds showed that the proposed method could correctly extract subordinate categories of acoustic scenes.

  • Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection

    Naoki SAWADA  Hiromitsu NISHIZAKI  

     
    PAPER-Spoken term detection

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2518-2527

    This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections.

  • HISTORY: An Efficient and Robust Algorithm for Noisy 1-Bit Compressed Sensing

    Biao SUN  Hui FENG  Xinxin XU  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2016/07/06
      Vol:
    E99-D No:10
      Page(s):
    2566-2573

    We consider the problem of sparse signal recovery from 1-bit measurements. Due to the noise present in the acquisition and transmission process, some quantized bits may be flipped to their opposite states. These sign flips may result in severe performance degradation. In this study, a novel algorithm, termed HISTORY, is proposed. It consists of Hamming support detection and coefficients recovery. The HISTORY algorithm has high recovery accuracy and is robust to strong measurement noise. Numerical results are provided to demonstrate the effectiveness and superiority of the proposed algorithm.

  • Automatic Model Order Selection for Convolutive Non-Negative Matrix Factorization

    Yinan LI  Xiongwei ZHANG  Meng SUN  Chong JIA  Xia ZOU  

     
    LETTER-Speech and Hearing

      Vol:
    E99-A No:10
      Page(s):
    1867-1870

    Exploring a parsimonious model that is just enough to represent the temporal dependency of time serial signals such as audio or speech is a practical requirement for many signal processing applications. A well suited method for intuitively and efficiently representing magnitude spectra is to use convolutive non-negative matrix factorization (CNMF) to discover the temporal relationship among nearby frames. However, the model order selection problem in CNMF, i.e., the choice of the number of convolutive bases, has seldom been investigated ever. In this paper, we propose a novel Bayesian framework that can automatically learn the optimal model order through maximum a posteriori (MAP) estimation. The proposed method yields a parsimonious and low-rank approximation by removing the redundant bases iteratively. We conducted intuitive experiments to show that the proposed algorithm is very effective in automatically determining the correct model order.

  • Improved End-to-End Speech Recognition Using Adaptive Per-Dimensional Learning Rate Methods

    Xuyang WANG  Pengyuan ZHANG  Qingwei ZHAO  Jielin PAN  Yonghong YAN  

     
    LETTER-Acoustic modeling

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2550-2553

    The introduction of deep neural networks (DNNs) leads to a significant improvement of the automatic speech recognition (ASR) performance. However, the whole ASR system remains sophisticated due to the dependent on the hidden Markov model (HMM). Recently, a new end-to-end ASR framework, which utilizes recurrent neural networks (RNNs) to directly model context-independent targets with connectionist temporal classification (CTC) objective function, is proposed and achieves comparable results with the hybrid HMM/DNN system. In this paper, we investigate per-dimensional learning rate methods, ADAGRAD and ADADELTA included, to improve the recognition of the end-to-end system, based on the fact that the blank symbol used in CTC technique dominates the output and these methods give frequent features small learning rates. Experiment results show that more than 4% relative reduction of word error rate (WER) as well as 5% absolute improvement of label accuracy on the training set are achieved when using ADADELTA, and fewer epochs of training are needed.

  • Short Text Classification Based on Distributional Representations of Words

    Chenglong MA  Qingwei ZHAO  Jielin PAN  Yonghong YAN  

     
    LETTER-Text classification

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2562-2565

    Short texts usually encounter the problem of data sparseness, as they do not provide sufficient term co-occurrence information. In this paper, we show how to mitigate the problem in short text classification through word embeddings. We assume that a short text document is a specific sample of one distribution in a Gaussian-Bayesian framework. Furthermore, a fast clustering algorithm is utilized to expand and enrich the context of short text in embedding space. This approach is compared with those based on the classical bag-of-words approaches and neural network based methods. Experimental results validate the effectiveness of the proposed method.

  • Virtual Sensor Idea-Based Geolocation Using RF Multipath Diversity

    Zhigang CHEN  Lei WANG  He HUANG  Guomei ZHANG  

     
    PAPER-Digital Signal Processing

      Vol:
    E99-A No:10
      Page(s):
    1799-1805

    A novel virtual sensors-based positioning method has been presented in this paper, which can make use of both direct paths and indirect paths. By integrating the virtual sensor idea and Bayesian state and observation framework, this method models the indirect paths corresponding to persistent virtual sensors as virtual direct paths and further reformulates the wireless positioning problem as the maximum likelihood estimation of both the mobile terminal's positions and the persistent virtual sensors' positions. Then the method adopts the EM (Expectation Maximization) and the particle filtering schemes to estimate the virtual sensors' positions and finally exploits not only the direct paths' measurements but also the indirect paths' measurements to realize the mobile terminal's positions estimation, thus achieving better positioning performance. Simulation results demonstrate the effectiveness of the proposed method.

  • Channel Impulse Response Measurements-Based Location Estimation Using Kernel Principal Component Analysis

    Zhigang CHEN  Xiaolei ZHANG  Hussain KHURRAM  He HUANG  Guomei ZHANG  

     
    LETTER-Digital Signal Processing

      Vol:
    E99-A No:10
      Page(s):
    1876-1880

    In this letter, a novel channel impulse response (CIR)-based fingerprinting positioning method using kernel principal component analysis (KPCA) has been proposed. During the offline phase of the proposed method, a survey is performed to collect all CIRs from access points, and a fingerprint database is constructed, which has vectors including CIR and physical location. During the online phase, KPCA is first employed to solve the nonlinearity and complexity in the CIR-position dependencies and extract the principal nonlinear features in CIRs, and support vector regression is then used to adaptively learn the regress function between the KPCA components and physical locations. In addition, the iterative narrowing-scope step is further used to refine the estimation. The performance comparison shows that the proposed method outperforms the traditional received signal strength based positioning methods.

  • Side-Lobe Reduced, Circularly Polarized Patch Array Antenna for Synthetic Aperture Radar Imaging

    Mohd Zafri BAHARUDDIN  Yuta IZUMI  Josaphat Tetuko Sri SUMANTYO   YOHANDRI  

     
    PAPER

      Vol:
    E99-C No:10
      Page(s):
    1174-1181

    Antenna radiation patterns have side-lobes that add to ambiguity in the form of ghosting and object repetition in SAR images. An L-band 1.27GHz, 2×5 element proximity-coupled corner-truncated patch array antenna synthesized using the Dolph-Chebyshev method to reduce side-lobe levels is proposed. The designed antenna was sim-ulated, optimized, and fabricated for antenna performance parameter measurements. Antenna performance characteristics show good agree-ment with simulated results. A set of antennas were fabricated and then used together with a custom synthetic aperture radar system and SAR imaging performed on a point target in an anechoic chamber. Imaging results are also discussed in this paper showing improvement in image output. The antenna and its connected SAR systems developed in this work are different from most previous work in that this work is utilizing circular polarization as opposed to linear polarization.

  • A 10-bit 6.8-GS/s Direct Digital Frequency Synthesizer Employing Complementary Dual-Phase Latch-Based Architecture

    Abdel MARTINEZ ALONSO  Masaya MIYAHARA  Akira MATSUZAWA  

     
    PAPER

      Vol:
    E99-C No:10
      Page(s):
    1200-1210

    This paper introduces a novel Direct Digital Frequency Synthesizer based on Complementary Dual-Phase Latch-Based sequencing method. Compared to conventional Direct Digital Frequency Synthesizer using Flip-Flop as synchronizing element, the proposed architecture allows to double the data sampling rate while trading-off area and Power Efficiency. Digital domain modulations can be easily implemented by using a Direct Digital Frequency Synthesizer. However, due to performance limitations, CMOS-based applications have been almost exclusively restricted to VHF, UHF and L bands. This work aims to increase the operation speed and extend the applicability of this technology to Multi-band Multi-standard wireless systems operating up to 2.7 GHz. The design features a 24 bits pipelined Phase Accumulator and a 14x10 bits Phase to Amplitude Converter. The Phase to Amplitude Converter module is compressed by using Quarter Wave Symmetry technique and is entirely made up of combinational logic inserted into 12 Complementary Dual-Phase Latch-Based pipeline stages. The logic is represented in the form of Sum of Product terms obtained from a 14x10 bits sinusoidal Look-Up-Table. The proposed Direct Digital Frequency Synthesizer is designed and simulated based on 65nm CMOS standard-cell technology. A maximum data sampling rate of 6.8 GS/s is expected. Estimated Spurious Free Dynamic Range and Power Efficiency are 61 dBc and 22 mW/(GS/s) respectively.

  • A Fully Canonical Bandpass Filter Design Using Microstrip Transversal Resonator Array Configuration

    Masataka OHIRA  Toshiki KATO  Zhewang MA  

     
    PAPER

      Vol:
    E99-C No:10
      Page(s):
    1122-1129

    This paper proposes a new and simple microstrip bandpass filter structure for the design of a fully canonical transversal array filter. The filter is constructed by the parallel arrangement of microstrip even- and odd-mode half-wavelength resonators. In this filter, transmission zeros (TZs) are not produced by cross couplings used in conventional filter designs, but by an intrinsic negative coupling of the odd-mode resonators having open ends with respect to the even-mode resonators with shorted ends. Thus, the control of the resonant frequency and the external Q factor of each resonator makes it possible to form both a specified passband and TZs. As an example, a fully canonical bandpass filter with 2-GHz center frequency, 6% bandwidth, and four TZs is synthesized with a coupling-matrix optimization, and its structural parameters are designed. The designed filter achieves a rapid roll-off and low-loss passband response, which can be confirmed numerically and experimentally.

  • Cooperative Path Selection Framework for Effective Data Gathering in UAV-Aided Wireless Sensor Networks

    Sotheara SAY  Mohamad Erick ERNAWAN  Shigeru SHIMAMOTO  

     
    PAPER

      Vol:
    E99-B No:10
      Page(s):
    2156-2167

    Sensor networks are often used to understand underlying phenomena that are reflected through sensing data. In real world applications, this understanding supports decision makers attempting to access a disaster area or monitor a certain event regularly and thus necessary actions can be triggered in response to the problems. Practitioners designing such systems must overcome difficulties due to the practical limitations of the data and the fidelity of a network condition. This paper explores the design of a network solution for the data acquisition domain with the goal of increasing the efficiency of data gathering efforts. An unmanned aerial vehicle (UAV) is introduced to address various real-world sensor network challenges such as limited resources, lack of real-time representative data, and mobility of a relay station. Towards this goal, we introduce a novel cooperative path selection framework to effectively collect data from multiple sensor sources. The framework consists of six main parts ranging from the system initialization to the UAV data acquisition. The UAV data acquisition is useful to increase situational awareness or used as inputs for data manipulation that support response efforts. We develop a system-based simulation that creates the representative sensor networks and uses the UAV for collecting data packets. Results using our proposed framework are analyzed and compared to existing approaches to show the efficiency of the scheme.

  • Spoken Term Detection Using SVM-Based Classifier Trained with Pre-Indexed Keywords

    Kentaro DOMOTO  Takehito UTSURO  Naoki SAWADA  Hiromitsu NISHIZAKI  

     
    PAPER-Spoken term detection

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2528-2538

    This study presents a two-stage spoken term detection (STD) method that uses the same STD engine twice and a support vector machine (SVM)-based classifier to verify detected terms from the STD engine's output. In a front-end process, the STD engine is used to pre-index target spoken documents from a keyword list built from an automatic speech recognition result. The STD result includes a set of keywords and their detection intervals (positions) in the spoken documents. For keywords having competitive intervals, we rank them based on the STD matching cost and select the one having the longest duration among competitive detections. The selected keywords are registered in the pre-index. They are then used to train an SVM-based classifier. In a query term search process, a query term is searched by the same STD engine, and the output candidates are verified by the SVM-based classifier. Our proposed two-stage STD method with pre-indexing was evaluated using the NTCIR-10 SpokenDoc-2 STD task and it drastically outperformed the traditional STD method based on dynamic time warping and a confusion network-based index.

  • Deforming Pyramid: Multiscale Image Representation Using Pixel Deformation and Filters for Non-Equispaced Signals

    Saho YAGYU  Akie SAKIYAMA  Yuichi TANAKA  

     
    PAPER

      Vol:
    E99-A No:9
      Page(s):
    1646-1654

    We propose an edge-preserving multiscale image decomposition method using filters for non-equispaced signals. It is inspired by the domain transform, which is a high-speed edge-preserving smoothing method, and it can be used in many image processing applications. One of the disadvantages of the domain transform is sensitivity to noise. Even though the proposed method is based on non-equispaced filters similar to the domain transform, it is robust to noise since it employs a multiscale decomposition. It uses the Laplacian pyramid scheme to decompose an input signal into the piecewise-smooth components and detail components. We design the filters by using an optimization based on edge-preserving smoothing with a conversion of signal distances and filters taking into account the distances between signal intervals. In addition, we also propose construction methods of filters for non-equispaced signals by using arbitrary continuous filters or graph spectral filters in order that various filters can be accommodated by the proposed method. As expected, we find that, similar to state-of-the-art edge-preserving smoothing techniques, including the domain transform, our approach can be used in many applications. We evaluated its effectiveness in edge-preserving smoothing of noise-free and noisy images, detail enhancement, pencil drawing, and stylization.

  • Sparse-Graph Codes and Peeling Decoder for Compressed Sensing

    Weijun ZENG  Huali WANG  Xiaofu WU  Hui TIAN  

     
    LETTER-Digital Signal Processing

      Vol:
    E99-A No:9
      Page(s):
    1712-1716

    In this paper, we propose a compressed sensing scheme using sparse-graph codes and peeling decoder (SGPD). By using a mix method for construction of sensing matrices proposed by Pawar and Ramchandran, it generates local sensing matrices and implements sensing and signal recovery in an adaptive manner. Then, we show how to optimize the construction of local sensing matrices using the theory of sparse-graph codes. Like the existing compressed sensing schemes based on sparse-graph codes with “good” degree profile, SGPD requires only O(k) measurements to recover a k-sparse signal of dimension n in the noiseless setting. In the presence of noise, SGPD performs better than the existing compressed sensing schemes based on sparse-graph codes, still with a similar implementation cost. Furthermore, the average variable node degree for sensing matrices is empirically minimized for SGPD among various existing CS schemes, which can reduce the sensing computational complexity.

  • Optimal Gaussian Weight Predictor and Sorting Using Genetic Algorithm for Reversible Watermarking Based on PEE and HS

    Chaiyaporn PANYINDEE  Chuchart PINTAVIROOJ  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2016/06/03
      Vol:
    E99-D No:9
      Page(s):
    2306-2319

    This paper introduces a reversible watermarking algorithm that exploits an adaptable predictor and sorting parameter customized for each image and each payload. Our proposed method relies on a well-known prediction-error expansion (PEE) technique. Using small PE values and a harmonious PE sorting parameter greatly decreases image distortion. In order to exploit adaptable tools, Gaussian weight predictor and expanded variance mean (EVM) are used as parameters in this work. A genetic algorithm is also introduced to optimize all parameters and produce the best results possible. Our results show an improvement in image quality when compared with previous conventional works.

2941-2960hit(16314hit)