1-2hit |
Jianguo WEI Xugang LU Jianwu DANG
Machine learning techniques have long been applied in many fields and have gained a lot of success. The purpose of learning processes is generally to obtain a set of parameters based on a given data set by minimizing a certain objective function which can explain the data set in a maximum likelihood or minimum estimation error sense. However, most of the learned parameters are highly data dependent and rarely reflect the true physical mechanism that is involved in the observation data. In order to obtain the inherent knowledge involved in the observed data, it is necessary to combine physical models with learning process rather than only fitting the observations with a black box model. To reveal underlying properties of human speech production, we proposed a learning process based on a physiological articulatory model and a coarticulation model, where both of the models are derived from human mechanisms. A two-layer learning framework was designed to learn the parameters concerned with physiological level using the physiological articulatory model and the parameters in the motor planning level using the coarticulation model. The learning process was carried out on an articulatory database of human speech production. The learned parameters were evaluated by numerical experiments and listening tests. The phonetic targets obtained in the planning stage provided an evidence for understanding the virtual targets of human speech production. As a result, the model based learning process reveals the inherent mechanism of the human speech via the learned parameters with certain physical meaning.
Gang JIN Jingsheng ZHAI Jianguo WEI
In this paper, we propose an end-to-end two-branch feature attention network. The network is mainly used for single image dehazing. The network consists of two branches, we call it CAA-Net: 1) A U-NET network composed of different-level feature fusion based on attention (FEPA) structure and residual dense block (RDB). In order to make full use of all the hierarchical features of the image, we use RDB. RDB contains dense connected layers and local feature fusion with local residual learning. We also propose a structure which called FEPA.FEPA structure could retain the information of shallow layer and transfer it to the deep layer. FEPA is composed of serveral feature attention modules (FPA). FPA combines local residual learning with channel attention mechanism and pixel attention mechanism, and could extract features from different channels and image pixels. 2) A network composed of several different levels of FEPA structures. The network could make feature weights learn from FPA adaptively, and give more weight to important features. The final output result of CAA-Net is the combination of all branch prediction results. Experimental results show that the CAA-Net proposed by us surpasses the most advanced algorithms before for single image dehazing.