The search functionality is under construction.

Author Search Result

[Author] Jia-Ching WANG(11hit)

1-11hit
  • A Deep Neural Network for Real-Time Driver Drowsiness Detection

    Toan H. VU  An DANG  Jia-Ching WANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2019/09/25
      Vol:
    E102-D No:12
      Page(s):
    2637-2641

    We develop a deep neural network (DNN) for detecting driver drowsiness in videos. The proposed DNN model that receives driver's faces extracted from video frames as inputs consists of three components - a convolutional neural network (CNN), a convolutional control gate-based recurrent neural network (ConvCGRNN), and a voting layer. The CNN is to learn facial representations from global faces which are then fed to the ConvCGRNN to learn their temporal dependencies. The voting layer works like an ensemble of many sub-classifiers to predict drowsiness state. Experimental results on the NTHU-DDD dataset show that our model not only achieve a competitive accuracy of 84.81% without any post-processing but it can work in real-time with a high speed of about 100 fps.

  • Efficient Coding Translation of GSM and G.729 Speech Coders across Mobile and IP Networks

    Shu-Min TSAI  Jia-Ching WANG  Jar-Ferr YANG  Jhing-Fa WANG  

     
    PAPER-Speech and Hearing

      Vol:
    E87-D No:2
      Page(s):
    444-452

    In this paper, we propose a speech coding translation scheme by transferring coding parameters between GSM half rate and G.729 coders. Compared to the conventional decode-then-encode (DTE) scheme, the proposed parameter conversions provide speech interoperability between mobile and IP networks with reducing computational complexity and coding delay. Simulation results show that the proposed methods can reduce about 30% computational load and coding delay acquired in the target encoders and achieve almost imperceptible degradation in performance.

  • Critical Band Subspace-Based Speech Enhancement Using SNR and Auditory Masking Aware Technique

    Jia-Ching WANG  Hsiao-Ping LEE  Jhing-Fa WANG  Chung-Hsien YANG  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:7
      Page(s):
    1055-1062

    In this paper, a new subspace-based speech enhancement algorithm is presented. First, we construct a perceptual filterbank from psycho-acoustic model and incorporate it in the subspace-based enhancement approach. This filterbank is created through a five-level wavelet packet decomposition. The masking properties of the human auditory system are then derived based on the perceptual filterbank. Finally, the prior SNR and the masking threshold of each critical band are taken to decide the attenuation factor of the optimal linear estimator. Five different types of in-car noises in TAICAR database were used in our evaluation. The experimental results demonstrated that our approach outperformed conventional subspace and spectral subtraction methods.

  • Maximum Volume Constrained Graph Nonnegative Matrix Factorization for Facial Expression Recognition

    Viet-Hang DUONG  Manh-Quan BUI  Jian-Jiun DING  Bach-Tung PHAM  Pham The BAO  Jia-Ching WANG  

     
    LETTER-Image

      Vol:
    E100-A No:12
      Page(s):
    3081-3085

    In this work, two new proposed NMF models are developed for facial expression recognition. They are called maximum volume constrained nonnegative matrix factorization (MV_NMF) and maximum volume constrained graph nonnegative matrix factorization (MV_GNMF). They achieve sparseness from a larger simplicial cone constraint and the extracted features preserve the topological structure of the original images.

  • Locality Preserved Joint Nonnegative Matrix Factorization for Speech Emotion Recognition

    Seksan MATHULAPRANGSAN  Yuan-Shan LEE  Jia-Ching WANG  

     
    LETTER

      Pubricized:
    2019/01/28
      Vol:
    E102-D No:4
      Page(s):
    821-825

    This study presents a joint dictionary learning approach for speech emotion recognition named locality preserved joint nonnegative matrix factorization (LP-JNMF). The learned representations are shared between the learned dictionaries and annotation matrix. Moreover, a locality penalty term is incorporated into the objective function. Thus, the system's discriminability is further improved.

  • VLSI Architecture and Implementation for Speech Recognizer Based on Discriminative Bayesian Neural Network

    Jhing-Fa WANG  Jia-Ching WANG  An-Nan SUEN  Chung-Hsien WU  Fan-Min LI  

     
    PAPER-Implementations of Signal Processing Systems

      Vol:
    E85-A No:8
      Page(s):
    1861-1869

    In this paper, we present an efficient VLSI architecture for the stand-alone application of a speech recognition system based on discriminative Bayesian neural network (DBNN). Regarding the recognition phase, the architecture of the Bayesian distance unit (BDU) is constructed first. In association with the BDU, we propose a template-serial architecture for the path distance accumulation to perform the recognition procedure. A corresponding architecture is also developed to accelerate the discriminative training procedure. It contains the intelligent look-up table for the sigmoid function. In comparison to the traditional one-table method, the memory size reduces drastically with only slight loss of accuracy. Combining the proposed hardware accelerators with the cost efficient programmable core, we took the most out of both programmable and application-specific architectures, including performance, design complexity, and flexibility.

  • Projection Based Adaptive Window Size Selection for Efficient Motion Estimation in H.264/AVC

    Anand PAUL  Jhing-Fa WANG  Jia-Ching WANG  An-Chao TSAI  Jang-Ting CHEN  

     
    PAPER

      Vol:
    E89-A No:11
      Page(s):
    2970-2976

    This paper introduces a block based motion estimation algorithm based on projection with adaptive window size selection. The blocks cannot match well if their corresponding 1D projection does not match well, with this as foundation 2D block matching problem is translated to a simpler 1D matching, which eliminates majority of potential pixel participation. This projection method is combined with adaptive window size selection in which, appropriate search window for each block is determined on the basis of motion vectors and prediction errors obtained for the previous block, which makes this novel method several times faster than exhaustive search with negligible performance degradation. Encoding QCIF size video by the proposed method results in reduction of computational complexity of motion estimation by roughly 45% and over all encoding by 23%, while maintaining image/video quality.

  • A Novel Fast Mode Decision Algorithm for H.264/AVC Using Particle Swarm Optimization

    Jia-Ching WANG  Yu-Huan SUNG  

     
    PAPER-Image Processing

      Vol:
    E96-A No:11
      Page(s):
    2154-2160

    Video coding plays an important role in human life especially in communications. H.264/AVC is a prominent video coding standard that has been used in a variety of applications due to its high efficiency comes from several new coding techniques. However, the extremely high encoding complexity hinders itself from real-time applications. This paper presents a new encoding algorithm that makes use of particle swarm optimization (PSO) to train discriminant functions for classification based fast mode decision. Experimental results show that the proposed algorithm can successfully reduce encoding time at the expense of negligible quality degradation and bitrate increases.

  • Fast Gated Recurrent Network for Speech Synthesis

    Bima PRIHASTO  Tzu-Chiang TAI  Pao-Chi CHANG  Jia-Ching WANG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2022/06/10
      Vol:
    E105-D No:9
      Page(s):
    1634-1638

    The recurrent neural network (RNN) has been used in audio and speech processing, such as language translation and speech recognition. Although RNN-based architecture can be applied to speech synthesis, the long computing time is still the primary concern. This research proposes a fast gated recurrent neural network, a fast RNN-based architecture, for speech synthesis based on the minimal gated unit (MGU). Our architecture removes the unit state history from some equations in MGU. Our MGU-based architecture is about twice faster, with equally good sound quality than the other MGU-based architectures.

  • A New Approach of Matrix Factorization on Complex Domain for Data Representation

    Viet-Hang DUONG  Manh-Quan BUI  Jian-Jiun DING  Yuan-Shan LEE  Bach-Tung PHAM  Pham The BAO  Jia-Ching WANG  

     
    LETTER-Pattern Recognition

      Pubricized:
    2017/09/15
      Vol:
    E100-D No:12
      Page(s):
    3059-3063

    This work presents a new approach which derives a learned data representation method through matrix factorization on the complex domain. In particular, we introduce an encoding matrix-a new representation of data-that satisfies the simplicial constraint of the projective basis matrix on the field of complex numbers. A complex optimization framework is provided. It employs the gradient descent method and computes the derivative of the cost function based on Wirtinger's calculus.

  • A Block-Based Architecture for Lifting Scheme Discrete Wavelet Transform

    Chung-Hsien YANG  Jia-Ching WANG  Jhing-Fa WANG  Chi-Wei CHANG  

     
    PAPER-Image

      Vol:
    E90-A No:5
      Page(s):
    1062-1071

    Two-dimensional discrete wavelet transform (DWT) for processing image is conventionally designed by line-based architectures, which are simple and have low complexity. However, they suffer from two main shortcomings - the memory required for storing intermediate data and the long latency of computing wavelet coefficients. This work presents a new block-based architecture for computing lifting-based 2-D DWT coefficients. This architecture yields a significantly lower buffer size. Additionally, the latency is reduced from N2 down to 3N as compared to the line-based architectures. The proposed architecture supports the JPEG2000 default filters and has been realized in ARM-based ALTERA EPXA10 Development Board at a frequency of 44.33 MHz.