The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] (42807hit)

7121-7140hit(42807hit)

  • Improved End-to-End Speech Recognition Using Adaptive Per-Dimensional Learning Rate Methods

    Xuyang WANG  Pengyuan ZHANG  Qingwei ZHAO  Jielin PAN  Yonghong YAN  

     
    LETTER-Acoustic modeling

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2550-2553

    The introduction of deep neural networks (DNNs) leads to a significant improvement of the automatic speech recognition (ASR) performance. However, the whole ASR system remains sophisticated due to the dependent on the hidden Markov model (HMM). Recently, a new end-to-end ASR framework, which utilizes recurrent neural networks (RNNs) to directly model context-independent targets with connectionist temporal classification (CTC) objective function, is proposed and achieves comparable results with the hybrid HMM/DNN system. In this paper, we investigate per-dimensional learning rate methods, ADAGRAD and ADADELTA included, to improve the recognition of the end-to-end system, based on the fact that the blank symbol used in CTC technique dominates the output and these methods give frequent features small learning rates. Experiment results show that more than 4% relative reduction of word error rate (WER) as well as 5% absolute improvement of label accuracy on the training set are achieved when using ADADELTA, and fewer epochs of training are needed.

  • Speeding up Deep Neural Networks in Speech Recognition with Piecewise Quantized Sigmoidal Activation Function

    Anhao XING  Qingwei ZHAO  Yonghong YAN  

     
    LETTER-Acoustic modeling

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2558-2561

    This paper proposes a new quantization framework on activation function of deep neural networks (DNN). We implement fixed-point DNN by quantizing the activations into powers-of-two integers. The costly multiplication operations in using DNN can be replaced with low-cost bit-shifts to massively save computations. Thus, applying DNN-based speech recognition on embedded systems becomes much easier. Experiments show that the proposed method leads to no performance degradation.

  • Robust Hybrid Finger Pattern Identification Using Intersection Enhanced Gabor Based Direction Coding

    Wenming YANG  Wenyang JI  Fei ZHOU  Qingmin LIAO  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2016/07/06
      Vol:
    E99-D No:10
      Page(s):
    2668-2671

    Automated biometrics identification using finger vein images has increasingly generated interest among researchers with emerging applications in human biometrics. The traditional feature-level fusion strategy is limited and expensive. To solve the problem, this paper investigates the possible use of infrared hybrid finger patterns on the back side of a finger, which includes both the information of finger vein and finger dorsal textures in original image, and a database using the proposed hybrid pattern is established. Accordingly, an Intersection enhanced Gabor based Direction Coding (IGDC) method is proposed. The Experiment achieves a recognition ratio of 98.4127% and an equal error rate of 0.00819 on our newly established database, which is fairly competitive.

  • Short Text Classification Based on Distributional Representations of Words

    Chenglong MA  Qingwei ZHAO  Jielin PAN  Yonghong YAN  

     
    LETTER-Text classification

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2562-2565

    Short texts usually encounter the problem of data sparseness, as they do not provide sufficient term co-occurrence information. In this paper, we show how to mitigate the problem in short text classification through word embeddings. We assume that a short text document is a specific sample of one distribution in a Gaussian-Bayesian framework. Furthermore, a fast clustering algorithm is utilized to expand and enrich the context of short text in embedding space. This approach is compared with those based on the classical bag-of-words approaches and neural network based methods. Experimental results validate the effectiveness of the proposed method.

  • Transfer Semi-Supervised Non-Negative Matrix Factorization for Speech Emotion Recognition

    Peng SONG  Shifeng OU  Xinran ZHANG  Yun JIN  Wenming ZHENG  Jinglei LIU  Yanwei YU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2016/07/01
      Vol:
    E99-D No:10
      Page(s):
    2647-2650

    In practice, emotional speech utterances are often collected from different devices or conditions, which will lead to discrepancy between the training and testing data, resulting in sharp decrease of recognition rates. To solve this problem, in this letter, a novel transfer semi-supervised non-negative matrix factorization (TSNMF) method is presented. A semi-supervised negative matrix factorization algorithm, utilizing both labeled source and unlabeled target data, is adopted to learn common feature representations. Meanwhile, the maximum mean discrepancy (MMD) as a similarity measurement is employed to reduce the distance between the feature distributions of two databases. Finally, the TSNMF algorithm, which optimizes the SNMF and MMD functions together, is proposed to obtain robust feature representations across databases. Extensive experiments demonstrate that in comparison to the state-of-the-art approaches, our proposed method can significantly improve the cross-corpus recognition rates.

  • Sensitivity-Characterised Activity Neurogram (SCAN) for Visualising and Understanding the Inner Workings of Deep Neural Network Open Access

    Khe Chai SIM  

     
    INVITED PAPER

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2423-2430

    Deep Neural Network (DNN) is a powerful machine learning model that has been successfully applied to a wide range of pattern classification tasks. Due to the great ability of the DNNs in learning complex mapping functions, it has been possible to train and deploy DNNs pretty much as a black box without the need to have an in-depth understanding of the inner workings of the model. However, this often leads to solutions and systems that achieve great performance, but offer very little in terms of how and why they work. This paper introduces Sensitivity-characterised Activity Neorogram (SCAN), a novel approach for understanding the inner workings of a DNN by analysing and visualising the sensitivity patterns of the neuron activities. SCAN constructs a low-dimensional visualisation space for the neurons so that the neuron activities can be visualised in a meaningful and interpretable way. The embedding of the neurons within this visualisation space can be used to compare the neurons, both within the same DNN and across different DNNs trained for the same task. This paper will present the observations from using SCAN to analyse DNN acoustic models for automatic speech recognition.

  • Investigation of Combining Various Major Language Model Technologies including Data Expansion and Adaptation Open Access

    Ryo MASUMURA  Taichi ASAMI  Takanobu OBA  Hirokazu MASATAKI  Sumitaka SAKAUCHI  Akinori ITO  

     
    PAPER-Language modeling

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2452-2461

    This paper aims to investigate the performance improvements made possible by combining various major language model (LM) technologies together and to reveal the interactions between LM technologies in spontaneous automatic speech recognition tasks. While it is clear that recent practical LMs have several problems, isolated use of major LM technologies does not appear to offer sufficient performance. In consideration of this fact, combining various LM technologies has been also examined. However, previous works only focused on modeling technologies with limited text resources, and did not consider other important technologies in practical language modeling, i.e., use of external text resources and unsupervised adaptation. This paper, therefore, employs not only manual transcriptions of target speech recognition tasks but also external text resources. In addition, unsupervised LM adaptation based on multi-pass decoding is also added to the combination. We divide LM technologies into three categories and employ key ones including recurrent neural network LMs or discriminative LMs. Our experiments show the effectiveness of combining various LM technologies in not only in-domain tasks, the subject of our previous work, but also out-of-domain tasks. Furthermore, we also reveal the relationships between the technologies in both tasks.

  • Multi-Sensor Multi-Target Bernoulli Filter with Registration Biases

    Lin GAO  Jian HUANG  Wen SUN  Ping WEI  Hongshu LIAO  

     
    PAPER-Digital Signal Processing

      Vol:
    E99-A No:10
      Page(s):
    1774-1781

    The cardinality balanced multi-target multi-Bernoulli (CBMeMBer) filter has emerged as a promising tool for tracking a time-varying number of targets. However, the standard CBMeMBer filter may perform poorly when measurements are coupled with sensor biases. This paper extends the CBMeMBer filter for simultaneous target tracking and sensor biases estimation by introducing the sensor translational biases into the multi-Bernoulli distribution. In the extended CBMeMBer filter, the biases are modeled as the first order Gauss-Markov process and assumed to be uncorrelated with target states. Furthermore, the sequential Monte Carlo (SMC) method is adopted to handle the non-linearity and the non-Gaussian conditions. Simulations are carried out to examine the performance of the proposed filter.

  • Internal Power Loss Formulas of Lumped-Element Matching Circuits for High-Efficiency Wireless Power Transfer

    Kyohei YAMADA  Naoki SAKAI  Takashi OHIRA  

     
    PAPER

      Vol:
    E99-C No:10
      Page(s):
    1182-1189

    Internal power losses in lumped-element impedance matching circuits are formulated by means of Q factors of the elements and port impedances to be matched. Assuming that Q factors are relatively high, the above mentioned loss is expressed by a simple formula containing only the tangents of the impedances. The formula is a powerful tool for such applications that put emphasis on power efficiency as wireless power transfer. As well as the formulation, we illustrate some design examples with the derived formula: design of the least lossy L-section circuit and two-stage low-pass ladder. The examples provide ready-to-use knowledge for low-loss matching design.

  • Virtual Sensor Idea-Based Geolocation Using RF Multipath Diversity

    Zhigang CHEN  Lei WANG  He HUANG  Guomei ZHANG  

     
    PAPER-Digital Signal Processing

      Vol:
    E99-A No:10
      Page(s):
    1799-1805

    A novel virtual sensors-based positioning method has been presented in this paper, which can make use of both direct paths and indirect paths. By integrating the virtual sensor idea and Bayesian state and observation framework, this method models the indirect paths corresponding to persistent virtual sensors as virtual direct paths and further reformulates the wireless positioning problem as the maximum likelihood estimation of both the mobile terminal's positions and the persistent virtual sensors' positions. Then the method adopts the EM (Expectation Maximization) and the particle filtering schemes to estimate the virtual sensors' positions and finally exploits not only the direct paths' measurements but also the indirect paths' measurements to realize the mobile terminal's positions estimation, thus achieving better positioning performance. Simulation results demonstrate the effectiveness of the proposed method.

  • Steady-versus-Transient Plot for Analysis of Digital Maps

    Hiroki YAMAOKA  Toshimichi SAITO  

     
    PAPER-Nonlinear Problems

      Vol:
    E99-A No:10
      Page(s):
    1806-1812

    A digital map is a simple dynamical system that is related to various digital dynamical systems including cellular automata, dynamic binary neural networks, and digital spiking neurons. Depending on parameters and initial condition, the map can exhibit various periodic orbits and transient phenomena to them. In order to analyze the dynamics, we present two simple feature quantities. The first and second quantities characterize the plentifulness of the periodic phenomena and the deviation of the transient phenomena, respectively. Using the two feature quantities, we construct the steady-versus-transient plot that is useful in the visualization and consideration of various digital dynamical systems. As a first step, we demonstrate analysis results for an example of the digital maps based on analog bifurcating neuron models.

  • On the Three-Dimensional Channel Routing

    Satoshi TAYU  Toshihiko TAKAHASHI  Eita KOBAYASHI  Shuichi UENO  

     
    PAPER-Graphs and Networks

      Vol:
    E99-A No:10
      Page(s):
    1813-1821

    The 3-D channel routing is a fundamental problem on the physical design of 3-D integrated circuits. The 3-D channel is a 3-D grid G and the terminals are vertices of G located in the top and bottom layers. A net is a set of terminals to be connected. The objective of the 3-D channel routing problem is to connect the terminals in each net with a Steiner tree (wire) in G using as few layers as possible and as short wires as possible in such a way that wires for distinct nets are disjoint. This paper shows that the problem is intractable. We also show that a sparse set of ν 2-terminal nets can be routed in a 3-D channel with O(√ν) layers using wires of length O(√ν).

  • Non-Crossover and Multi-Mutation Based Genetic Algorithm for Flexible Job-Shop Scheduling Problem

    Zhongshan ZHANG  Yuning CHEN  Yuejin TAN  Jungang YAN  

     
    PAPER-Mathematical Systems Science

      Vol:
    E99-A No:10
      Page(s):
    1856-1862

    This paper presents a non-crossover and multi-mutation based genetic algorithm (NMGA) for the Flexible Job-shop Scheduling problem (FJSP) with the criterion to minimize the maximum completion time (makespan). Aiming at the characteristics of FJSP, three mutation operators based on operation sequence coding and machine assignment coding are proposed: flip, slide, and swap. Meanwhile, the NMGA framework, coding scheme, as well as the decoding algorithm are also specially designed for the FJSP. In the framework, recombination operator crossover is not included and a special selection strategy is employed. Computational results based on a set of representative benchmark problems were provided. The evidence indicates that the proposed algorithm is superior to several recently published genetic algorithms in terms of solution quality and convergence ability.

  • The Cooperative Recovery Scheme Using Adjacent Base Stations in the Wireless Communication System

    Young-Min KO  Jae-Hyun RO  Hyoung-Kyu SONG  

     
    LETTER-Digital Signal Processing

      Vol:
    E99-A No:10
      Page(s):
    1871-1875

    In a wireless communication system, the base station failure can result in a communication disruption in the cell. This letter aims to propose an alternative way to cope with the base station failure in a wireless communication system, based on MIMO-OFDM. Cooperative communication can be a solution to the problem. Unlike general cooperative communication, this letter attempts to cover cooperation among adjacent base stations. This letter proposes a specific configuration of transmission signals which is applied to the CDD scheme. The proposed cooperative system can obtain multiplexing gain and diversity gain at the same time. A more reliable performance can be obtained by the proposed cooperative system which uses cooperation of adjacent base stations.

  • Channel Impulse Response Measurements-Based Location Estimation Using Kernel Principal Component Analysis

    Zhigang CHEN  Xiaolei ZHANG  Hussain KHURRAM  He HUANG  Guomei ZHANG  

     
    LETTER-Digital Signal Processing

      Vol:
    E99-A No:10
      Page(s):
    1876-1880

    In this letter, a novel channel impulse response (CIR)-based fingerprinting positioning method using kernel principal component analysis (KPCA) has been proposed. During the offline phase of the proposed method, a survey is performed to collect all CIRs from access points, and a fingerprint database is constructed, which has vectors including CIR and physical location. During the online phase, KPCA is first employed to solve the nonlinearity and complexity in the CIR-position dependencies and extract the principal nonlinear features in CIRs, and support vector regression is then used to adaptively learn the regress function between the KPCA components and physical locations. In addition, the iterative narrowing-scope step is further used to refine the estimation. The performance comparison shows that the proposed method outperforms the traditional received signal strength based positioning methods.

  • A 7.1 GHz 170 W Solid-State Power Amplifier with 20-Way Combiner for Space Applications

    Naoki HASEGAWA  Naoki SHINOHARA  Shigeo KAWASAKI  

     
    PAPER

      Vol:
    E99-C No:10
      Page(s):
    1140-1146

    The high performance GaN power amplifier circuit operating at 7.1 GHz was demonstrated for potential use such as in a space ground station. First, the GaN HEMT chips were investigated for the high power amplifier circuit design. And next, the designed amplifier circuits matching with the load and source impedance of the non-linear models were fabricated. From measurement, the AB-class power amplifier circuit with the four-cell chip showed the power added efficiency (PAE) of 42.6% and output power with 41.7dBm at -3dB gain compression. Finally, the good performance of the power amplifier was confirmed in a 20-way radial power combiner with the PAE of 17.4% and output power of 52.6 dBm at -3dB gain compression.

  • Side-Lobe Reduced, Circularly Polarized Patch Array Antenna for Synthetic Aperture Radar Imaging

    Mohd Zafri BAHARUDDIN  Yuta IZUMI  Josaphat Tetuko Sri SUMANTYO   YOHANDRI  

     
    PAPER

      Vol:
    E99-C No:10
      Page(s):
    1174-1181

    Antenna radiation patterns have side-lobes that add to ambiguity in the form of ghosting and object repetition in SAR images. An L-band 1.27GHz, 2×5 element proximity-coupled corner-truncated patch array antenna synthesized using the Dolph-Chebyshev method to reduce side-lobe levels is proposed. The designed antenna was sim-ulated, optimized, and fabricated for antenna performance parameter measurements. Antenna performance characteristics show good agree-ment with simulated results. A set of antennas were fabricated and then used together with a custom synthetic aperture radar system and SAR imaging performed on a point target in an anechoic chamber. Imaging results are also discussed in this paper showing improvement in image output. The antenna and its connected SAR systems developed in this work are different from most previous work in that this work is utilizing circular polarization as opposed to linear polarization.

  • A 10-bit 6.8-GS/s Direct Digital Frequency Synthesizer Employing Complementary Dual-Phase Latch-Based Architecture

    Abdel MARTINEZ ALONSO  Masaya MIYAHARA  Akira MATSUZAWA  

     
    PAPER

      Vol:
    E99-C No:10
      Page(s):
    1200-1210

    This paper introduces a novel Direct Digital Frequency Synthesizer based on Complementary Dual-Phase Latch-Based sequencing method. Compared to conventional Direct Digital Frequency Synthesizer using Flip-Flop as synchronizing element, the proposed architecture allows to double the data sampling rate while trading-off area and Power Efficiency. Digital domain modulations can be easily implemented by using a Direct Digital Frequency Synthesizer. However, due to performance limitations, CMOS-based applications have been almost exclusively restricted to VHF, UHF and L bands. This work aims to increase the operation speed and extend the applicability of this technology to Multi-band Multi-standard wireless systems operating up to 2.7 GHz. The design features a 24 bits pipelined Phase Accumulator and a 14x10 bits Phase to Amplitude Converter. The Phase to Amplitude Converter module is compressed by using Quarter Wave Symmetry technique and is entirely made up of combinational logic inserted into 12 Complementary Dual-Phase Latch-Based pipeline stages. The logic is represented in the form of Sum of Product terms obtained from a 14x10 bits sinusoidal Look-Up-Table. The proposed Direct Digital Frequency Synthesizer is designed and simulated based on 65nm CMOS standard-cell technology. A maximum data sampling rate of 6.8 GS/s is expected. Estimated Spurious Free Dynamic Range and Power Efficiency are 61 dBc and 22 mW/(GS/s) respectively.

  • Effective Magnetic Sheet Loading Method for Near Field Communication Antennas

    Takaho SEKIGUCHI  Yoshinobu OKANO  Satoshi OGINO  

     
    BRIEF PAPER

      Vol:
    E99-C No:10
      Page(s):
    1211-1214

    Near field communication (NFC) antennas are often lined with magnetic sheets to reduce performance degradation caused by nearby metal objects. Though amorphous sheets have a high permeability and are suitable magnetic sheets for lining, their magnetic loss is also high. Therefore, this paper suggests a technique of suppressing magnetic loss by modifying the shape of the sheet without changing its composition. The utility of the proposed technique was investigated in this study.

  • Investigation of DNN-Based Audio-Visual Speech Recognition

    Satoshi TAMURA  Hiroshi NINOMIYA  Norihide KITAOKA  Shin OSUGA  Yurie IRIBE  Kazuya TAKEDA  Satoru HAYAMIZU  

     
    PAPER-Acoustic modeling

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2444-2451

    Audio-Visual Speech Recognition (AVSR) is one of techniques to enhance robustness of speech recognizer in noisy or real environments. On the other hand, Deep Neural Networks (DNNs) have recently attracted a lot of attentions of researchers in the speech recognition field, because we can drastically improve recognition performance by using DNNs. There are two ways to employ DNN techniques for speech recognition: a hybrid approach and a tandem approach; in the hybrid approach an emission probability on each Hidden Markov Model (HMM) state is computed using a DNN, while in the tandem approach a DNN is composed into a feature extraction scheme. In this paper, we investigate and compare several DNN-based AVSR methods to mainly clarify how we should incorporate audio and visual modalities using DNNs. We carried out recognition experiments using a corpus CENSREC-1-AV, and we discuss the results to find out the best DNN-based AVSR modeling. Then it turns out that a tandem-based method using audio Deep Bottle-Neck Features (DBNFs) and visual ones with multi-stream HMMs is the most suitable, followed by a hybrid approach and another tandem scheme using audio-visual DBNFs.

7121-7140hit(42807hit)