Xi ZHANG Yanan ZHANG Tao GAO Yong FANG Ting CHEN
The original single-shot multibox detector (SSD) algorithm has good detection accuracy and speed for regular object recognition. However, the SSD is not suitable for detecting small objects for two reasons: 1) the relationships among different feature layers with various scales are not considered, 2) the predicted results are solely determined by several independent feature layers. To enhance its detection capability for small objects, this study proposes an improved SSD-based algorithm called proportional channels' fusion SSD (PCF-SSD). Three enhancements are provided by this novel PCF-SSD algorithm. First, a fusion feature pyramid model is proposed by concatenating channels of certain key feature layers in a given proportion for object detection. Second, the default box sizes are adjusted properly for small object detection. Third, an improved loss function is suggested to train the above-proposed fusion model, which can further improve object detection performance. A series of experiments are conducted on the public database Pascal VOC to validate the PCF-SSD. On comparing with the original SSD algorithm, our algorithm improves the mean average precision and detection accuracy for small objects by 3.3% and 3.9%, respectively, with a detection speed of 40FPS. Furthermore, the proposed PCF-SSD can achieve a better balance of detection accuracy and efficiency than the original SSD algorithm, as demonstrated by a series of experimental results.
Xingsi XUE Yirui HUANG Zeqing ZHANG
Ontologies are regarded as the solution to data heterogeneity on the Semantic Web (SW), but they also suffer from the heterogeneity problem, which leads to the ambiguity of data information. Ontology Meta-Matching technique (OMM) is able to solve the ontology heterogeneity problem through aggregating various similarity measures to find the heterogeneous entities. Inspired by the success of Reinforcement Learning (RL) in solving complex optimization problems, this work proposes a RL-based OMM technique to address the ontology heterogeneity problem. First, we propose a novel RL-based OMM framework, and then, a neural network that is called evaluated network is proposed to replace the Q table when we choose the next action of the agent, which is able to reduce memory consumption and computing time. After that, to better guide the training of neural network and improve the accuracy of RL agent, we establish a memory bank to mine depth information during the evaluated network's training procedure, and we use another neural network that is called target network to save the historical parameters. The experiment uses the famous benchmark in ontology matching domain to test our approach's performance, and the comparisons among Deep Reinforcement Learning(DRL), RL and state-of-the-art ontology matching systems show that our approach is able to effectively determine high-quality alignments.
Xincheng CAO Bin YAO Binqiang CHEN Wangpeng HE Suqin GUO Kun CHEN
Tool condition monitoring is one of the core tasks of intelligent manufacturing in digital workshop. This paper presents an intelligent recognize method of tool condition based on deep learning. First, the industrial microphone is used to collect the acoustic signal during machining; then, a central fractal decomposition algorithm is proposed to extract sensitive information; finally, the multi-scale convolutional recurrent neural network is used for deep feature extraction and pattern recognition. The multi-process milling experiments proved that the proposed method is superior to the existing methods, and the recognition accuracy reached 88%.
Wen LIU Yixiao SHAO Shihong ZHAI Zhao YANG Peishuai CHEN
Automatic continuous tracking of objects involved in a construction project is required for such tasks as productivity assessment, unsafe behavior recognition, and progress monitoring. Many computer-vision-based tracking approaches have been investigated and successfully tested on construction sites; however, their practical applications are hindered by the tracking accuracy limited by the dynamic, complex nature of construction sites (i.e. clutter with background, occlusion, varying scale and pose). To achieve better tracking performance, a novel deep-learning-based tracking approach called the Multi-Domain Convolutional Neural Networks (MD-CNN) is proposed and investigated. The proposed approach consists of two key stages: 1) multi-domain representation of learning; and 2) online visual tracking. To evaluate the effectiveness and feasibility of this approach, it is applied to a metro project in Wuhan China, and the results demonstrate good tracking performance in construction scenarios with complex background. The average distance error and F-measure for the MDNet are 7.64 pixels and 67, respectively. The results demonstrate that the proposed approach can be used by site managers to monitor and track workers for hazard prevention in construction sites.
Yong LI Shidi WEI Xuan LIU Yinzheng LUO Yafeng LI Feng SHUANG
The traditional manual inspection is gradually replaced by the unmanned aerial vehicles (UAV) automatic inspection. However, due to the limited computational resources carried by the UAV, the existing deep learning-based algorithm needs a large amount of computational resources, which makes it impossible to realize the online detection. Moreover, there is no effective online detection system at present. To realize the high-precision online detection of electrical equipment, this paper proposes an SSD (Single Shot Multibox Detector) detection algorithm based on the improved Dual network for the images of insulators and spacers taken by UAVs. The proposed algorithm uses MnasNet and MobileNetv3 to form the Dual network to extract multi-level features, which overcomes the shortcoming of single convolutional network-based backbone for feature extraction. Then the features extracted from the two networks are fused together to obtain the features with high-level semantic information. Finally, the proposed algorithm is tested on the public dataset of the insulator and spacer. The experimental results show that the proposed algorithm can detect insulators and spacers efficiently. Compared with other methods, the proposed algorithm has the advantages of smaller model size and higher accuracy. The object detection accuracy of the proposed method is up to 95.1%.
Yue PENG Zuqiang MENG Lina YANG
Medical images play an important role in medical diagnosis. However, acquiring a large number of datasets with annotations is still a difficult task in the medical field. For this reason, research in the field of image-to-image translation is combined with computer-aided diagnosis, and data augmentation methods based on generative adversarial networks are applied to medical images. In this paper, we try to perform data augmentation on unimodal data. The designed StarGAN V2 based network has high performance in augmenting the dataset using a small number of original images, and the augmented data is expanded from unimodal data to multimodal medical images, and this multimodal medical image data can be applied to the segmentation task with some improvement in the segmentation results. Our experiments demonstrate that the generated multimodal medical image data can improve the performance of glioma segmentation.
Runze WANG Zehua ZHANG Yueqin ZHANG Zhongyuan JIANG Shilin SUN Guixiang MA
Recent studies in protein structure prediction such as AlphaFold have enabled deep learning to achieve great attention on the Drug-Target Affinity (DTA) task. Most works are dedicated to embed single molecular property and homogeneous information, ignoring the diverse heterogeneous information gains that are contained in the molecules and interactions. Motivated by this, we propose an end-to-end deep learning framework to perform Molecular Heterogeneous features Fusion (MolHF) for DTA prediction on heterogeneity. To address the challenges that biochemical attributes locates in different heterogeneous spaces, we design a Molecular Heterogeneous Information Learning module with multi-strategy learning. Especially, Molecular Heterogeneous Attention Fusion module is present to obtain the gains of molecular heterogeneous features. With these, the diversity of molecular structure information for drugs can be extracted. Extensive experiments on two benchmark datasets show that our method outperforms the baselines in all four metrics. Ablation studies validate the effect of attentive fusion and multi-group of drug heterogeneous features. Visual presentations demonstrate the impact of protein embedding level and the model ability of fitting data. In summary, the diverse gains brought by heterogeneous information contribute to drug-target affinity prediction.
Hiroyuki NOZAKA Kosuke KAMATA Kazufumi YAMAGATA
The data augmentation method is known as a helpful technique to generate a dataset with a large number of images from one with a small number of images for supervised training in deep learning. However, a low validity augmentation method for image recognition was reported in a recent study on artificial intelligence (AI). This study aimed to clarify the optimal data augmentation method in deep learning model generation for the recognition of white blood cells (WBCs). Study Design: We conducted three different data augmentation methods (rotation, scaling, and distortion) on original WBC images, with each AI model for WBC recognition generated by supervised training. The subjects of the clinical assessment were 51 healthy persons. Thin-layer blood smears were prepared from peripheral blood and subjected to May-Grünwald-Giemsa staining. Results: The only significantly effective technique among the AI models for WBC recognition was data augmentation with rotation. By contrast, the effectiveness of both image distortion and image scaling was poor, and improved accuracy was limited to a specific WBC subcategory. Conclusion: Although data augmentation methods are often used for achieving high accuracy in AI generation with supervised training, we consider that it is necessary to select the optimal data augmentation method for medical AI generation based on the characteristics of medical images.
Wenxin DONG Jianxun ZHANG Shuqiu TAN Xinyue ZHANG
In the pork fat content detection task, traditional physical or chemical methods are strongly destructive, have substantial technical requirements and cannot achieve nondestructive detection without slaughtering. To solve these problems, we propose a novel, convenient and economical method for detecting the fat content of pig B-ultrasound images based on hybrid attention and multiscale fusion learning, which extracts and fuses shallow detail information and deep semantic information at multiple scales. First, a deep learning network is constructed to learn the salient features of fat images through a hybrid attention mechanism. Then, the information describing pork fat is extracted at multiple scales, and the detailed information expressed in the shallow layer and the semantic information expressed in the deep layer are fused later. Finally, a deep convolution network is used to predict the fat content compared with the real label. The experimental results show that the determination coefficient is greater than 0.95 on the 130 groups of pork B-ultrasound image data sets, which is 2.90, 6.10 and 5.13 percentage points higher than that of VGGNet, ResNet and DenseNet, respectively. It indicats that the model could effectively identify the B-ultrasound image of pigs and predict the fat content with high accuracy.
Lie GUO Yibing ZHAO Jiandong GAO
The commonly used object detection algorithm based on convolutional neural network is difficult to meet the real-time requirement on embedded platform due to its large size of model, large amount of calculation, and long inference time. It is necessary to use model compression to reduce the amount of network calculation and increase the speed of network inference. This paper conducts compression of vehicle and pedestrian detection network by pruning and removing redundant parameters. The vehicle and pedestrian detection network is trained based on YOLOv3 model by using K-means++ to cluster the anchor boxes. The detection accuracy is improved by changing the proportion of categorical losses and regression losses for each category in the loss function because of the unbalanced number of targets in the dataset. A layer and channel pruning algorithm is proposed by combining global channel pruning thresholds and L1 norm, which can reduce the time cost of the network layer transfer process and the amount of computation. Network layer fusion based on TensorRT is performed and inference is performed using half-precision floating-point to improve the speed of inference. Results show that the vehicle and pedestrian detection compression network pruned 84% channels and 15 Shortcut modules can reduce the size by 32% and the amount of calculation by 17%. While the network inference time can be decreased to 21 ms, which is 1.48 times faster than the network pruned 84% channels.
Shaorong HU Yuqi ZHANG Yuefei JIN Ziqi DOU
Bus bunching often occurs in public transit system, resulting in a series of problems such as poor punctuality, long waiting time and low service quality. In this paper, we explore the influence of the discrete distribution of traffic operation state on the dynamic evolution of bus bunching. Firstly, we use self-organizing map (SOM) to find the threshold of bus bunching and analyze the factors that affect bus bunching based on GPS data of No. 600 bus line in Xi'an. Then, taking the bus headway as the research index, we construct the bus bunching mechanism model. Finally, a simulation platform is built by MATLAB to examine the trend of headway when various influencing factors show different distribution states along the bus line. In terms of influencing factors, inter vehicle speed, queuing time at intersection and loading time at station are shown to have a significant impact on headway between buses. In terms of the impact of the distribution of crowded road sections on headway, long-distance and concentrated crowded road sections will lead to large interval or bus bunching. When the traffic states along the bus line are randomly distributed among crowded, normal and free, the headway may fluctuate in a large range, which may result in bus bunching, or fluctuate in a small range and remain relatively stable. The headway change curve is determined by the distribution length of each traffic state along the bus line. The research results can help to formulate improvement measures according to traffic operation state for equilibrium bus headway and alleviating bus bunching.
Jialun CAI Weibo HUANG Yingxuan YOU Zhan CHEN Bin REN Hong LIU
Robot motion planning is an important part of the unmanned supermarket. The challenges of motion planning in supermarkets lie in the diversity of the supermarket environment, the complexity of obstacle movement, the vastness of the search space. This paper proposes an adaptive Search and Path planning method based on the Semantic information and Deep reinforcement learning (SPSD), which effectively improves the autonomous decision-making ability of supermarket robots. Firstly, based on the backbone of deep reinforcement learning (DRL), supermarket robots process real-time information from multi-modality sensors to realize high-speed and collision-free motion planning. Meanwhile, in order to solve the problem caused by the uncertainty of the reward in the deep reinforcement learning, common spatial semantic relationships between landmarks and target objects are exploited to define reward function. Finally, dynamics randomization is introduced to improve the generalization performance of the algorithm in the training. The experimental results show that the SPSD algorithm is excellent in the three indicators of generalization performance, training time and path planning length. Compared with other methods, the training time of SPSD is reduced by 27.42% at most, the path planning length is reduced by 21.08% at most, and the trained network of SPSD can be applied to unfamiliar scenes safely and efficiently. The results are motivating enough to consider the application of the proposed method in practical scenes. We have uploaded the video of the results of the experiment to https://www.youtube.com/watch?v=h1wLpm42NZk.
Rong FEI Yufan GUO Junhuai LI Bo HU Lu YANG
With the widespread use of indoor positioning technology, the need for high-precision positioning services is rising; nevertheless, there are several challenges, such as the difficulty of simulating the distribution of interior location data and the enormous inaccuracy of probability computation. As a result, this paper proposes three different neural network model comparisons for indoor location based on WiFi fingerprint - indoor location algorithm based on improved back propagation neural network model, RSSI indoor location algorithm based on neural network angle change, and RSSI indoor location algorithm based on depth neural network angle change - to raise accurately predict indoor location coordinates. Changing the action range of the activation function in the standard back-propagation neural network model achieves the goal of accurately predicting location coordinates. The revised back-propagation neural network model has strong stability and enhances indoor positioning accuracy based on experimental comparisons of loss rate (loss), accuracy rate (acc), and cumulative distribution function (CDF).
Xianyu WANG Cong LI Heyi LI Rui ZHANG Zhifeng LIANG Hai WANG
Visual object tracking is always a challenging task in computer vision. During the tracking, the shape and appearance of the target may change greatly, and because of the lack of sufficient training samples, most of the online learning tracking algorithms will have performance bottlenecks. In this paper, an improved real-time algorithm based on deep learning features is proposed, which combines multi-feature fusion, multi-scale estimation, adaptive updating of target model and re-detection after target loss. The effectiveness and advantages of the proposed algorithm are proved by a large number of comparative experiments with other excellent algorithms on large benchmark datasets.
Yongtang BAO Pengfei ZHOU Yue QI Zhihui WANG Qing FAN
A frontal and realistic face image was synthesized from a single profile face image. It has a wide range of applications in face recognition. Although the frontal face method based on deep learning has made substantial progress in recent years, there is still no guarantee that the generated face has identity consistency and illumination consistency in a significant posture. This paper proposes a novel pixel-based feature regression generative adversarial network (PFR-GAN), which can learn to recover local high-frequency details and preserve identity and illumination frontal face images in an uncontrolled environment. We first propose a Reslu block to obtain richer feature representation and improve the convergence speed of training. We then introduce a feature conversion module to reduce the artifacts caused by face rotation discrepancy, enhance image generation quality, and preserve more high-frequency details of the profile image. We also construct a 30,000 face pose dataset to learn about various uncontrolled field environments. Our dataset includes ages of different races and wild backgrounds, allowing us to handle other datasets and obtain better results. Finally, we introduce a discriminator used for recovering the facial structure of the frontal face images. Quantitative and qualitative experimental results show our PFR-GAN can generate high-quality and high-fidelity frontal face images, and our results are better than the state-of-art results.
Shi-Long SHEN Ai-Guo WU Yong XU
A generative model is presented for two types of person image generation in this paper. First, this model is applied to pose-guided person image generation, i.e., converting the pose of a source person image to the target pose while preserving the texture of that source person image. Second, this model is also used for clothing-guided person image generation, i.e., changing the clothing texture of a source person image to the desired clothing texture. The core idea of the proposed model is to establish the multi-scale correspondence, which can effectively address the misalignment introduced by transferring pose, thereby preserving richer information on appearance. Specifically, the proposed model consists of two stages: 1) It first generates the target semantic map imposed on the target pose to provide more accurate guidance during the generation process. 2) After obtaining the multi-scale feature map by the encoder, the multi-scale correspondence is established, which is useful for a fine-grained generation. Experimental results show the proposed method is superior to state-of-the-art methods in pose-guided person image generation and show its effectiveness in clothing-guided person image generation.
In this paper, we propose improved Generative Adversarial Networks with attention module in Generator, which can enhance the effectiveness of Generator. Furthermore, recent work has shown that Generator conditioning affects GAN performance. Leveraging this insight, we explored the effect of different normalization (spectral normalization, instance normalization) on Generator and Discriminator. Moreover, an enhanced loss function called Wasserstein Divergence distance, can alleviate the problem of difficult to train module in practice.
Wenrong XIAO Yong CHEN Suqin GUO Kun CHEN
An attention residual network with triple feature as input is proposed to predict the remaining useful life (RUL) of bearings. First, the channel attention and spatial attention are connected in series into the residual connection of the residual neural network to obtain a new attention residual module, so that the newly constructed deep learning network can better pay attention to the weak changes of the bearing state. Secondly, the “triple feature” is used as the input of the attention residual network, so that the deep learning network can better grasp the change trend of bearing running state, and better realize the prediction of the RUL of bearing. Finally, The method is verified by a set of experimental data. The results show the method is simple and effective, has high prediction accuracy, and reduces manual intervention in RUL prediction.
Epileptic seizure prediction is an important research topic in the clinical epilepsy treatment, which can provide opportunities to take precautionary measures for epilepsy patients and medical staff. EEG is an commonly used tool for studying brain activity, which records the electrical discharge of brain. Many studies based on machine learning algorithms have been proposed to solve the task using EEG signal. In this study, we propose a novel seizure prediction models based on convolutional neural networks and scalp EEG for a binary classification between preictal and interictal states. The short-time Fourier transform has been used to translate raw EEG signals into STFT sepctrums, which is applied as input of the models. The fusion features have been obtained through the side-output constructions and used to train and test our models. The test results show that our models can achieve comparable results in both sensitivity and FPR upon fusion features. The proposed patient-specific model can be used in seizure prediction system for EEG classification.
In recent years, driver's visual attention has been actively studied for driving automation technology. However, the number of models is few to perceive an insight understanding of driver's attention in various moments. All attention models process multi-level image representations by a two-stream/multi-stream network, increasing the computational cost due to an increment of model parameters. However, multi-level image representation such as optical flow plays a vital role in tasks involving videos. Therefore, to reduce the computational cost of a two-stream network and use multi-level image representation, this work proposes a single stream driver's visual attention model for a critical situation. The experiment was conducted using a publicly available critical driving dataset named BDD-A. Qualitative results confirm the effectiveness of the proposed model. Moreover, quantitative results highlight that the proposed model outperforms state-of-the-art visual attention models according to CC and SIM. Extensive ablation studies verify the presence of optical flow in the model, the position of optical flow in the spatial network, the convolution layers to process optical flow, and the computational cost compared to a two-stream model.