IEICE global.ieice.org Site

Author Search Result

[Author] Jia-Ching WANG(11hit)

1-11hit

A Deep Neural Network for Real-Time Driver Drowsiness Detection
Toan H. VU An DANG Jia-Ching WANG

LETTER-Image Recognition, Computer Vision

Pubricized:
2019/09/25
Vol:
E102-D No:12
Page(s):
2637-2641
We develop a deep neural network (DNN) for detecting driver drowsiness in videos. The proposed DNN model that receives driver's faces extracted from video frames as inputs consists of three components - a convolutional neural network (CNN), a convolutional control gate-based recurrent neural network (ConvCGRNN), and a voting layer. The CNN is to learn facial representations from global faces which are then fed to the ConvCGRNN to learn their temporal dependencies. The voting layer works like an ensemble of many sub-classifiers to predict drowsiness state. Experimental results on the NTHU-DDD dataset show that our model not only achieve a competitive accuracy of 84.81% without any post-processing but it can work in real-time with a high speed of about 100 fps.
Efficient Coding Translation of GSM and G.729 Speech Coders across Mobile and IP Networks
Shu-Min TSAI Jia-Ching WANG Jar-Ferr YANG Jhing-Fa WANG

PAPER-Speech and Hearing

Vol:
E87-D No:2
Page(s):
444-452
In this paper, we propose a speech coding translation scheme by transferring coding parameters between GSM half rate and G.729 coders. Compared to the conventional decode-then-encode (DTE) scheme, the proposed parameter conversions provide speech interoperability between mobile and IP networks with reducing computational complexity and coding delay. Simulation results show that the proposed methods can reduce about 30% computational load and coding delay acquired in the target encoders and achieve almost imperceptible degradation in performance.
Critical Band Subspace-Based Speech Enhancement Using SNR and Auditory Masking Aware Technique
Jia-Ching WANG Hsiao-Ping LEE Jhing-Fa WANG Chung-Hsien YANG

PAPER-Speech and Hearing

Vol:
E90-D No:7
Page(s):
1055-1062
In this paper, a new subspace-based speech enhancement algorithm is presented. First, we construct a perceptual filterbank from psycho-acoustic model and incorporate it in the subspace-based enhancement approach. This filterbank is created through a five-level wavelet packet decomposition. The masking properties of the human auditory system are then derived based on the perceptual filterbank. Finally, the prior SNR and the masking threshold of each critical band are taken to decide the attenuation factor of the optimal linear estimator. Five different types of in-car noises in TAICAR database were used in our evaluation. The experimental results demonstrated that our approach outperformed conventional subspace and spectral subtraction methods.
Maximum Volume Constrained Graph Nonnegative Matrix Factorization for Facial Expression Recognition
Viet-Hang DUONG Manh-Quan BUI Jian-Jiun DING Bach-Tung PHAM Pham The BAO Jia-Ching WANG

LETTER-Image

Vol:
E100-A No:12
Page(s):
3081-3085
In this work, two new proposed NMF models are developed for facial expression recognition. They are called maximum volume constrained nonnegative matrix factorization (MV_NMF) and maximum volume constrained graph nonnegative matrix factorization (MV_GNMF). They achieve sparseness from a larger simplicial cone constraint and the extracted features preserve the topological structure of the original images.
Locality Preserved Joint Nonnegative Matrix Factorization for Speech Emotion Recognition
Seksan MATHULAPRANGSAN Yuan-Shan LEE Jia-Ching WANG

LETTER

Pubricized:
2019/01/28
Vol:
E102-D No:4
Page(s):
821-825
This study presents a joint dictionary learning approach for speech emotion recognition named locality preserved joint nonnegative matrix factorization (LP-JNMF). The learned representations are shared between the learned dictionaries and annotation matrix. Moreover, a locality penalty term is incorporated into the objective function. Thus, the system's discriminability is further improved.
VLSI Architecture and Implementation for Speech Recognizer Based on Discriminative Bayesian Neural Network
Jhing-Fa WANG Jia-Ching WANG An-Nan SUEN Chung-Hsien WU Fan-Min LI

PAPER-Implementations of Signal Processing Systems

Vol:
E85-A No:8
Page(s):
1861-1869
In this paper, we present an efficient VLSI architecture for the stand-alone application of a speech recognition system based on discriminative Bayesian neural network (DBNN). Regarding the recognition phase, the architecture of the Bayesian distance unit (BDU) is constructed first. In association with the BDU, we propose a template-serial architecture for the path distance accumulation to perform the recognition procedure. A corresponding architecture is also developed to accelerate the discriminative training procedure. It contains the intelligent look-up table for the sigmoid function. In comparison to the traditional one-table method, the memory size reduces drastically with only slight loss of accuracy. Combining the proposed hardware accelerators with the cost efficient programmable core, we took the most out of both programmable and application-specific architectures, including performance, design complexity, and flexibility.
Projection Based Adaptive Window Size Selection for Efficient Motion Estimation in H.264/AVC
Anand PAUL Jhing-Fa WANG Jia-Ching WANG An-Chao TSAI Jang-Ting CHEN

PAPER

Vol:
E89-A No:11
Page(s):
2970-2976
This paper introduces a block based motion estimation algorithm based on projection with adaptive window size selection. The blocks cannot match well if their corresponding 1D projection does not match well, with this as foundation 2D block matching problem is translated to a simpler 1D matching, which eliminates majority of potential pixel participation. This projection method is combined with adaptive window size selection in which, appropriate search window for each block is determined on the basis of motion vectors and prediction errors obtained for the previous block, which makes this novel method several times faster than exhaustive search with negligible performance degradation. Encoding QCIF size video by the proposed method results in reduction of computational complexity of motion estimation by roughly 45% and over all encoding by 23%, while maintaining image/video quality.
A Novel Fast Mode Decision Algorithm for H.264/AVC Using Particle Swarm Optimization
Jia-Ching WANG Yu-Huan SUNG

PAPER-Image Processing

Vol:
E96-A No:11
Page(s):
2154-2160
Video coding plays an important role in human life especially in communications. H.264/AVC is a prominent video coding standard that has been used in a variety of applications due to its high efficiency comes from several new coding techniques. However, the extremely high encoding complexity hinders itself from real-time applications. This paper presents a new encoding algorithm that makes use of particle swarm optimization (PSO) to train discriminant functions for classification based fast mode decision. Experimental results show that the proposed algorithm can successfully reduce encoding time at the expense of negligible quality degradation and bitrate increases.
Fast Gated Recurrent Network for Speech Synthesis
Bima PRIHASTO Tzu-Chiang TAI Pao-Chi CHANG Jia-Ching WANG

LETTER-Speech and Hearing

Pubricized:
2022/06/10
Vol:
E105-D No:9
Page(s):
1634-1638
The recurrent neural network (RNN) has been used in audio and speech processing, such as language translation and speech recognition. Although RNN-based architecture can be applied to speech synthesis, the long computing time is still the primary concern. This research proposes a fast gated recurrent neural network, a fast RNN-based architecture, for speech synthesis based on the minimal gated unit (MGU). Our architecture removes the unit state history from some equations in MGU. Our MGU-based architecture is about twice faster, with equally good sound quality than the other MGU-based architectures.
A New Approach of Matrix Factorization on Complex Domain for Data Representation
Viet-Hang DUONG Manh-Quan BUI Jian-Jiun DING Yuan-Shan LEE Bach-Tung PHAM Pham The BAO Jia-Ching WANG

LETTER-Pattern Recognition

Pubricized:
2017/09/15
Vol:
E100-D No:12
Page(s):
3059-3063
This work presents a new approach which derives a learned data representation method through matrix factorization on the complex domain. In particular, we introduce an encoding matrix-a new representation of data-that satisfies the simplicial constraint of the projective basis matrix on the field of complex numbers. A complex optimization framework is provided. It employs the gradient descent method and computes the derivative of the cost function based on Wirtinger's calculus.
A Block-Based Architecture for Lifting Scheme Discrete Wavelet Transform
Chung-Hsien YANG Jia-Ching WANG Jhing-Fa WANG Chi-Wei CHANG

PAPER-Image

Vol:
E90-A No:5
Page(s):
1062-1071
Two-dimensional discrete wavelet transform (DWT) for processing image is conventionally designed by line-based architectures, which are simple and have low complexity. However, they suffer from two main shortcomings - the memory required for storing intermediate data and the long latency of computing wavelet coefficients. This work presents a new block-based architecture for computing lifting-based 2-D DWT coefficients. This architecture yields a significantly lower buffer size. Additionally, the latency is reduced from N2 down to 3N as compared to the line-based architectures. The proposed architecture supports the JPEG2000 default filters and has been realized in ARM-based ALTERA EPXA10 Development Board at a frequency of 44.33 MHz.

Author Search Result

[Author] Jia-Ching WANG(11hit)

A Deep Neural Network for Real-Time Driver Drowsiness Detection

Efficient Coding Translation of GSM and G.729 Speech Coders across Mobile and IP Networks

Critical Band Subspace-Based Speech Enhancement Using SNR and Auditory Masking Aware Technique

Maximum Volume Constrained Graph Nonnegative Matrix Factorization for Facial Expression Recognition

Locality Preserved Joint Nonnegative Matrix Factorization for Speech Emotion Recognition

VLSI Architecture and Implementation for Speech Recognizer Based on Discriminative Bayesian Neural Network

Projection Based Adaptive Window Size Selection for Efficient Motion Estimation in H.264/AVC

A Novel Fast Mode Decision Algorithm for H.264/AVC Using Particle Swarm Optimization

Fast Gated Recurrent Network for Speech Synthesis

A New Approach of Matrix Factorization on Complex Domain for Data Representation

A Block-Based Architecture for Lifting Scheme Discrete Wavelet Transform

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles