The search functionality is under construction.

Author Search Result

[Author] Yang GUO(5hit)

1-5hit
  • A Comprehensive Method to Improve Loudness Compensation and High-Frequency Speech Intelligibility for Digital Hearing Aids

    Zhaoyang GUO  Bo WANG  Xin'an WANG  

     
    LETTER-Speech and Hearing

      Vol:
    E100-A No:7
      Page(s):
    1552-1556

    A comprehensive method applying a nonlinear frequency compression (FC) as complementary to multi-band loudness compensation is proposed, which is able to improve loudness compensation and simultaneously increase high-frequency speech intelligibility for digital hearing aids. The proposed nonlinear FC (NLFC) improves the conventional methods in the aspect that the compression ratio (CR) is adjusted based on the speech intelligibility percentage in different frequency ranges. Then, an adaptive wide dynamic range compression (AWDRC) with a time-varying CR is applied to achieve adaptive loudness compensation. The experimental test results show that the mean speech identification is improved in comparison with the state-of-art methods.

  • On the Key Parameters of the Oscillator-Based Random Source

    Chenyang GUO  Yujie ZHOU  

     
    PAPER-Nonlinear Problems

      Vol:
    E100-A No:9
      Page(s):
    1956-1964

    This paper presents a mathematical model for the oscillator-based true random number generator (TRNG) to study the influence of some key parameters to the randomness of the output sequence. The output of the model is so close to the output of the real design of the TRNG that the model can generate the random bits instead of the analog simulation for research. It will cost less time than the analog simulation and be more convenient for the researchers to change some key parameters in the design. The authors give a method to improve the existing design of the oscillator-based TRNG to deal with the possible bias of the key parameters. The design is fabricated with a 55-nm CMOS process.

  • An Improved Perceptual MBSS Noise Reduction with an SNR-Based VAD for a Fully Operational Digital Hearing Aid

    Zhaoyang GUO  Xin'an WANG  Bo WANG  Shanshan YONG  

     
    PAPER-Speech and Hearing

      Pubricized:
    2017/02/17
      Vol:
    E100-D No:5
      Page(s):
    1087-1096

    This paper first reviews the state-of-the-art noise reduction methods and points out their vulnerability in noise reduction performance and speech quality, especially under the low signal-noise ratios (SNR) environments. Then this paper presents an improved perceptual multiband spectral subtraction (MBSS) noise reduction algorithm (NRA) and a novel robust voice activity detection (VAD) based on the amended sub-band SNR. The proposed SNR-based VAD can considerably increase the accuracy of discrimination between noise and speech frame. The simulation results show that the proposed NRA has better segmental SNR (segSNR) and perceptual evaluation of speech quality (PESQ) performance than other noise reduction algorithms especially under low SNR environments. In addition, a fully operational digital hearing aid chip is designed and fabricated in the 0.13 µm CMOS process based on the proposed NRA. The final chip implementation shows that the whole chip dissipates 1.3 mA at the 1.2 V operation. The acoustic test result shows that the maximum output sound pressure level (OSPL) is 114.6 dB SPL, the equivalent input noise is 5.9 dB SPL, and the total harmonic distortion is 2.5%. So the proposed digital hearing aid chip is a promising candidate for high performance hearing-aid systems.

  • Design and Implementation of Deep Neural Network for Edge Computing

    Junyang ZHANG  Yang GUO  Xiao HU  Rongzhen LI  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2018/05/02
      Vol:
    E101-D No:8
      Page(s):
    1982-1996

    In recent years, deep learning based image recognition, speech recognition, text translation and other related applications have brought great convenience to people's lives. With the advent of the era of internet of everything, how to run a computationally intensive deep learning algorithm on a limited resources edge device is a major challenge. For an edge oriented computing vector processor, combined with a specific neural network model, a new data layout method for putting the input feature maps in DDR, rearrangement of the convolutional kernel parameters in the nuclear memory bank is proposed. Aiming at the difficulty of parallelism of two-dimensional matrix convolution, a method of parallelizing the matrix convolution calculation in the third dimension is proposed, by setting the vector register with zero as the initial value of the max pooling to fuse the rectified linear unit (ReLU) activation function and pooling operations to reduce the repeated access to intermediate data. On the basis of single core implementation, a multi-core implementation scheme of Inception structure is proposed. Finally, based on the proposed vectorization method, we realize five kinds of neural network models, namely, AlexNet, VGG16, VGG19, GoogLeNet, ResNet18, and performance statistics and analysis based on CPU, gtx1080TI and FT2000 are presented. Experimental results show that the vector processor has better computing advantages than CPU and GPU, and can calculate large-scale neural network model in real time.

  • A Novel 3D Gradient LBP Descriptor for Action Recognition

    Zhaoyang GUO  Xin'an WANG  Bo WANG  Zheng XIE  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2017/03/02
      Vol:
    E100-D No:6
      Page(s):
    1388-1392

    In the field of action recognition, Spatio-Temporal Interest Points (STIPs)-based features have shown high efficiency and robustness. However, most of state-of-the-art work to describe STIPs, they typically focus on 2-dimensions (2D) images, which ignore information in 3D spatio-temporal space. Besides, the compact representation of descriptors should be considered due to the costs of storage and computational time. In this paper, a novel local descriptor named 3D Gradient LBP is proposed, which extends the traditional descriptor Local Binary Patterns (LBP) into 3D spatio-temporal space. The proposed descriptor takes advantage of the neighbourhood information of cuboids in three dimensions, which accounts for its excellent descriptive power for the distribution of grey-level space. Experiments on three challenging datasets (KTH, Weizmann and UT Interaction) validate the effectiveness of our approach in the recognition of human actions.