The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Wenyi GE(3hit)

1-3hit
  • An Improved U-Net Architecture for Image Dehazing

    Wenyi GE  Yi LIN  Zhitao WANG  Guigui WANG  Shihan TAN  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/09/14
      Vol:
    E104-D No:12
      Page(s):
    2218-2225

    In this paper, we present a simple yet powerful deep neural network for natural image dehazing. The proposed method is designed based on U-Net architecture and we made some design changes to make it better. We first use Group Normalization to replace Batch Normalization to solve the problem of insufficient batch size due to hardware limitations. Second, we introduce FReLU activation into the U-Net block, which can achieve capturing complicated visual layouts with regular convolutions. Experimental results on public benchmarks demonstrate the effectiveness of the modified components. On the SOTS Indoor and Outdoor datasets, it obtains PSNR of 32.23 and 31.64 respectively, which are comparable performances with state-of-the-art methods. The code is publicly available online soon.

  • Speech Recognition for Air Traffic Control via Feature Learning and End-to-End Training

    Peng FAN  Xiyao HUA  Yi LIN  Bo YANG  Jianwei ZHANG  Wenyi GE  Dongyue GUO  

     
    PAPER-Speech and Hearing

      Pubricized:
    2023/01/23
      Vol:
    E106-D No:4
      Page(s):
    538-544

    In this work, we propose a new automatic speech recognition (ASR) system based on feature learning and an end-to-end training procedure for air traffic control (ATC) systems. The proposed model integrates the feature learning block, recurrent neural network (RNN), and connectionist temporal classification loss to build an end-to-end ASR model. Facing the complex environments of ATC speech, instead of the handcrafted features, a learning block is designed to extract informative features from raw waveforms for acoustic modeling. Both the SincNet and 1D convolution blocks are applied to process the raw waveforms, whose outputs are concatenated to the RNN layers for the temporal modeling. Thanks to the ability to learn representations from raw waveforms, the proposed model can be optimized in a complete end-to-end manner, i.e., from waveform to text. Finally, the multilingual issue in the ATC domain is also considered to achieve the ASR task by constructing a combined vocabulary of Chinese characters and English letters. The proposed approach is validated on a multilingual real-world corpus (ATCSpeech), and the experimental results demonstrate that the proposed approach outperforms other baselines, achieving a 6.9% character error rate.

  • Remote Sensing Image Dehazing Using Multi-Scale Gated Attention for Flight Simulator Open Access

    Qi LIU  Bo WANG  Shihan TAN  Shurong ZOU  Wenyi GE  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2024/05/14
      Vol:
    E107-D No:9
      Page(s):
    1206-1218

    For flight simulators, it is crucial to create three-dimensional terrain using clear remote sensing images. However, due to haze and other contributing variables, the obtained remote sensing images typically have low contrast and blurry features. In order to build a flight simulator visual system, we propose a deep learning-based dehaze model for remote sensing images dehazing. An encoder-decoder architecture is proposed that consists of a multiscale fusion module and a gated large kernel convolutional attention module. This architecture can fuse multi-resolution global and local semantic features and can adaptively extract image features under complex terrain. The experimental results demonstrate that, with good generality and application, the model outperforms existing comparison techniques and achieves high-confidence dehazing in remote sensing images with a variety of haze concentrations, multi-complex terrains, and multi-spatial resolutions.