Huafei WANG Xianpeng WANG Xiang LAN Ting SU
Using deep learning (DL) to achieve direction-of-arrival (DOA) estimation is an open and meaningful exploration. Existing DL-based methods achieve DOA estimation by spectrum regression or multi-label classification task. While, both of them face the problem of off-grid errors. In this paper, we proposed a cascaded deep neural network (DNN) framework named as off-grid network (OGNet) to provide accurate DOA estimation in the case of off-grid. The OGNet is composed of an autoencoder consisted by fully connected (FC) layers and a deep convolutional neural network (CNN) with 2-dimensional convolutional layers. In the proposed OGNet, the off-grid error is modeled into labels to achieve off-grid DOA estimation based on its sparsity. As compared to the state-of-the-art grid-based methods, the OGNet shows advantages in terms of precision and resolution. The effectiveness and superiority of the OGNet are demonstrated by extensive simulation experiments in different experimental conditions.
Na XING Lu LI Ye ZHANG Shiyi YANG
Unmanned aerial vehicle (UAV)-assisted systems have attracted a lot of attention due to its high probability of line-of-sight (LoS) connections and flexible deployment. In this paper, we aim to minimize the upload time required for the UAV to collect information from the sensor nodes in disaster scenario, while optimizing the deployment position of UAV. In order to get the deployment solution quickly, a data-driven approach is proposed in which an optimization strategy acts as the expert. Considering that images could capture the spatial configurations well, we use a convolutional neural network (CNN) to learn how to place the UAV. In the end, the simulation results demonstrate the effectiveness and generalization of the proposed method. After training, our CNN can generate UAV configuration faster than the general optimization-based algorithm.
Zheqing ZHANG Hao ZHOU Chuan LI Weiwei JIANG
Single-image dehazing is a challenging task in computer vision research. Aiming at the limitations of traditional convolutional neural network representation capabilities and the high computational overhead of the self-attention mechanism in recent years, we proposed image attention and designed a single image dehazing network based on the image attention: IAD-Net. The proposed image attention is a plug-and-play module with the ability of global modeling. IAD-Net is a parallel network structure that combines the global modeling ability of image attention and the local modeling ability of convolution, so that the network can learn global and local features. The proposed network model has excellent feature learning ability and feature expression ability, has low computational overhead, and also improves the detail information of hazy images. Experiments verify the effectiveness of the image attention module and the competitiveness of IAD-Net with state-of-the-art methods.
Zeyuan JU Zhipeng LIU Yu GAO Haotian LI Qianhang DU Kota YOSHIKAWA Shangce GAO
Medical imaging plays an indispensable role in precise patient diagnosis. The integration of deep learning into medical diagnostics is becoming increasingly common. However, existing deep learning models face performance and efficiency challenges, especially in resource-constrained scenarios. To overcome these challenges, we introduce a novel dendritic neural efficientnet model called DEN, inspired by the function of brain neurons, which efficiently extracts image features and enhances image classification performance. Assessments on a diabetic retinopathy fundus image dataset reveal DEN’s superior performance compared to EfficientNet and other classical neural network models.
Akira KITAYAMA Goichi ONO Hiroaki ITO
Edge devices with strict safety and reliability requirements, such as autonomous driving cars, industrial robots, and drones, necessitate software verification on such devices before operation. The human cost and time required for this analysis constitute a barrier in the cycle of software development and updating. In particular, the final verification at the edge device should at least strictly confirm that the updated software is not degraded from the current it. Since the edge device does not have the correct data, it is necessary for a human to judge whether the difference between the updated software and the operating it is due to degradation or improvement. Therefore, this verification is very costly. This paper proposes a novel automated method for efficient verification on edge devices of an object detection AI, which has found practical use in various applications. In the proposed method, a target object existence detector (TOED) (a simple binary classifier) judges whether an object in the recognition target class exists in the region of a prediction difference between the AI’s operating and updated versions. Using the results of this TOED judgement and the predicted difference, an automated verification system for the updated AI was constructed. TOED was designed as a simple binary classifier with four convolutional layers, and the accuracy of object existence judgment was evaluated for the difference between the predictions of the YOLOv5 L and X models using the Cityscapes dataset. The results showed judgement with more than 99.5% accuracy and 8.6% over detection, thus indicating that a verification system adopting this method would be more efficient than simple analysis of the prediction differences.
Zhichao SHA Ziji MA Kunlai XIONG Liangcheng QIN Xueying WANG
Diagnosis at an early stage is clinically important for the cure of skin cancer. However, since some skin cancers have similar intuitive characteristics, and dermatologists rely on subjective experience to distinguish skin cancer types, the accuracy is often suboptimal. Recently, the introduction of computer methods in the medical field has better assisted physicians to improve the recognition rate but some challenges still exist. In the face of massive dermoscopic image data, residual network (ResNet) is more suitable for learning feature relationships inside big data because of its deeper network depth. Aiming at the deficiency of ResNet, this paper proposes a multi-region feature extraction and raising dimension matching method, which further improves the utilization rate of medical image features. This method firstly extracted rich and diverse features from multiple regions of the feature map, avoiding the deficiency of traditional residual modules repeatedly extracting features in a few fixed regions. Then, the fused features are strengthened by up-dimensioning the branch path information and stacking it with the main path, which solves the problem that the information of two paths is not ideal after fusion due to different dimensionality. The proposed method is experimented on the International Skin Imaging Collaboration (ISIC) Archive dataset, which contains more than 40,000 images. The results of this work on this dataset and other datasets are evaluated to be improved over networks containing traditional residual modules and some popular networks.
In underwater acoustic communication systems based on orthogonal frequency division multiplexing (OFDM), taking clipping to reduce the peak-to-average power ratio leads to nonlinear distortion of the signal, making the receiver unable to recover the faded signal accurately. In this letter, an Aquila optimizer-based convolutional attention block stacked network (AO-CABNet) is proposed to replace the receiver to improve the ability to recover the original signal. Simulation results show that the AO method has better optimization capability to quickly obtain the optimal parameters of the network model, and the proposed AO-CABNet structure outperforms existing schemes.
Sota MORIYAMA Koichi ICHIGE Yuichi HORI Masayuki TACHI
In this paper, we propose a method for video reflection removal using a video restoration framework with enhanced deformable networks (EDVR). We examine the effect of each module in EDVR on video reflection removal and modify the models using 3D convolutions. The performance of each modified model is evaluated in terms of the RMSE between the structural similarity (SSIM) and the smoothed SSIM representing temporal consistency.
Ji XI Yue XIE Pengxu JIANG Wei JIANG
Currently, a significant portion of acoustic scene categorization (ASC) research is centered around utilizing Convolutional Neural Network (CNN) models. This preference is primarily due to CNN’s ability to effectively extract time-frequency information from audio recordings of scenes by employing spectrum data as input. The expression of many dimensions can be achieved by utilizing 2D spectrum characteristics. Nevertheless, the diverse interpretations of the same object’s existence in different positions on the spectrum map can be attributed to the discrepancies between spectrum properties and picture qualities. The lack of distinction between different aspects of input information in ASC-based CNN networks may result in a decline in system performance. Considering this, a feature pyramid segmentation (FPS) approach based on CNN is proposed. The proposed approach involves utilizing spectrum features as the input for the model. These features are split based on a preset scale, and each segment-level feature is then fed into the CNN network for learning. The SoftMax classifier will receive the output of all feature scales, and these high-level features will be fused and fed to it to categorize different scenarios. The experiment provides evidence to support the efficacy of the FPS strategy and its potential to enhance the performance of the ASC system.
Pengxu JIANG Yang YANG Yue XIE Cairong ZOU Qingyun WANG
Convolutional neural network (CNN) is widely used in acoustic scene classification (ASC) tasks. In most cases, local convolution is utilized to gather time-frequency information between spectrum nodes. It is challenging to adequately express the non-local link between frequency domains in a finite convolution region. In this paper, we propose a dual-path convolutional neural network based on band interaction block (DCNN-bi) for ASC, with mel-spectrogram as the model’s input. We build two parallel CNN paths to learn the high-frequency and low-frequency components of the input feature. Additionally, we have created three band interaction blocks (bi-blocks) to explore the pertinent nodes between various frequency bands, which are connected between two paths. Combining the time-frequency information from two paths, the bi-blocks with three distinct designs acquire non-local information and send it back to the respective paths. The experimental results indicate that the utilization of the bi-block has the potential to improve the initial performance of the CNN substantially. Specifically, when applied to the DCASE 2018 and DCASE 2020 datasets, the CNN exhibited performance improvements of 1.79% and 3.06%, respectively.
Xueying WANG Yuan HUANG Xin LONG Ziji MA
In recent years, the increasing complexity of deep network structures has hindered their application in small resource constrained hardware. Therefore, we urgently need to compress and accelerate deep network models. Channel pruning is an effective method to compress deep neural networks. However, most existing channel pruning methods are prone to falling into local optima. In this paper, we propose a channel pruning method via Improved Grey Wolf Optimizer Pruner which called IGWO-Pruner to prune redundant channels of convolutional neural networks. It identifies pruning ratio of each layer by using Improved Grey Wolf algorithm, and then fine-tuning the new pruned network model. In experimental section, we evaluate the proposed method in CIFAR datasets and ILSVRC-2012 with several classical networks, including VGGNet, GoogLeNet and ResNet-18/34/56/152, and experimental results demonstrate the proposed method is able to prune a large number of redundant channels and parameters with rare performance loss.
Ren TAKEUCHI Rikima MITSUHASHI Masakatsu NISHIGAKI Tetsushi OHKI
The war between cyber attackers and security analysts is gradually intensifying. Owing to the ease of obtaining and creating support tools, recent malware continues to diversify into variants and new species. This increases the burden on security analysts and hinders quick analysis. Identifying malware families is crucial for efficiently analyzing diversified malware; thus, numerous low-cost, general-purpose, deep-learning-based classification techniques have been proposed in recent years. Among these methods, malware images that represent binary features as images are often used. However, no models or architectures specific to malware classification have been proposed in previous studies. Herein, we conduct a detailed analysis of the behavior and structure of malware and focus on PE sections that capture the unique characteristics of malware. First, we validate the features of each PE section that can distinguish malware families. Then, we identify PE sections that contain adequate features to classify families. Further, we propose an ensemble learning-based classification method that combines features of highly discriminative PE sections to improve classification accuracy. The validation of two datasets confirms that the proposed method improves accuracy over the baseline, thereby emphasizing its importance.
Acoustic scene classification (ASC) is a fundamental domain within the realm of artificial intelligence classification tasks. ASC-based tasks commonly employ models based on convolutional neural networks (CNNs) that utilize log-Mel spectrograms as input for gathering acoustic features. In this paper, we designed a CNN-based multi-scale pooling (MSP) strategy for ASC. The log-Mel spectrograms are utilized as the input to CNN, which is partitioned into four frequency axis segments. Furthermore, we devised four CNN channels to acquire inputs from distinct frequency ranges. The high-level features extracted from outputs in various frequency bands are integrated through frequency pyramid average pooling layers at multiple levels. Subsequently, a softmax classifier is employed to classify different scenes. Our study demonstrates that the implementation of our designed model leads to a significant enhancement in the model's performance, as evidenced by the testing of two acoustic datasets.
Jinsoo SEO Junghyun KIM Hyemi KIM
Song-level feature summarization is fundamental for the browsing, retrieval, and indexing of digital music archives. This study proposes a deep neural network model, CQTXNet, for extracting song-level feature summary for cover song identification. CQTXNet incorporates depth-wise separable convolution, residual network connections, and attention models to extend previous approaches. An experimental evaluation of the proposed CQTXNet was performed on two publicly available cover song datasets by varying the number of network layers and the type of attention modules.
Aorui GOU Jingjing LIU Xiaoxiang CHEN Xiaoyang ZENG Yibo FAN
Convolutional Neural Networks (CNNs) and Transformers have achieved remarkable performance in detection and classification tasks. Nevertheless, their feature extraction cannot consider both local and global information, so the detection and classification performance can be further improved. In addition, more and more deep learning networks are designed as more and more complex, and the amount of computation and storage space required is also significantly increased. This paper proposes a combination of CNN and transformer, and designs a local feature enhancement module and global context modeling module to enhance the cascade network. While the local feature enhancement module increases the range of feature extraction, the global context modeling is used to capture the feature maps' global information. To decrease the model complexity, a shared sublayer is designed to realize the sharing of weight parameters between the adjacent convolutional layers or cross convolutional layers, thereby reducing the number of convolutional weight parameters. Moreover, to effectively improve the detection performance of neural networks without increasing network parameters, the optimal transport assignment approach is proposed to resolve the problem of label assignment. The classification loss and regression loss are the summations of the cost between the demander and supplier. The experiment results demonstrate that the proposed Combination of CNN and Transformer with Shared Sublayer (CCTSS) performs better than the state-of-the-art methods in various datasets and applications.
Keita IMAIZUMI Koichi ICHIGE Tatsuya NAGAO Takahiro HAYASHI
In this paper, we propose a method for predicting radio wave propagation using a correlation graph convolutional neural network (C-Graph CNN). We examine what kind of parameters are suitable to be used as system parameters in C-Graph CNN. Performance of the proposed method is evaluated by the path loss estimation accuracy and the computational cost through simulation.
Single image deraining is an ill-posed problem which also has been a long-standing issue. In past few years, convolutional neural network (CNN) methods almost dominated the computer vision and achieved considerable success in image deraining. Recently the Swin Transformer-based model also showed impressive performance, even surpassed the CNN-based methods and became the state-of-the-art on high-level vision tasks. Therefore, we attempt to introduce Swin Transformer to deraining tasks. In this paper, we propose a deraining model with two sub-networks. The first sub-network includes two branches. Rain Recognition Network is a Unet with the Swin Transformer layer, which works as preliminarily restoring the background especially for the location where rain streaks appear. Detail Complement Network can extract the background detail beneath the rain streak. The second sub-network which called Refine-Unet utilizes the output of the previous one to further restore the image. Through experiments, our network achieves improvements on single image deraining compared with the previous Transformer research.
Daiki HIRATA Norikazu TAKAHASHI
Convolutional Neural Networks (CNNs) have shown remarkable performance in image recognition tasks. In this letter, we propose a new CNN model called the EnsNet which is composed of one base CNN and multiple Fully Connected SubNetworks (FCSNs). In this model, the set of feature maps generated by the last convolutional layer in the base CNN is divided along channels into disjoint subsets, and these subsets are assigned to the FCSNs. Each of the FCSNs is trained independent of others so that it can predict the class label of each feature map in the subset assigned to it. The output of the overall model is determined by majority vote of the base CNN and the FCSNs. Experimental results using the MNIST, Fashion-MNIST and CIFAR-10 datasets show that the proposed approach further improves the performance of CNNs. In particular, an EnsNet achieves a state-of-the-art error rate of 0.16% on MNIST.
Hiroyuki NOZAKA Kosuke KAMATA Kazufumi YAMAGATA
The data augmentation method is known as a helpful technique to generate a dataset with a large number of images from one with a small number of images for supervised training in deep learning. However, a low validity augmentation method for image recognition was reported in a recent study on artificial intelligence (AI). This study aimed to clarify the optimal data augmentation method in deep learning model generation for the recognition of white blood cells (WBCs). Study Design: We conducted three different data augmentation methods (rotation, scaling, and distortion) on original WBC images, with each AI model for WBC recognition generated by supervised training. The subjects of the clinical assessment were 51 healthy persons. Thin-layer blood smears were prepared from peripheral blood and subjected to May-Grünwald-Giemsa staining. Results: The only significantly effective technique among the AI models for WBC recognition was data augmentation with rotation. By contrast, the effectiveness of both image distortion and image scaling was poor, and improved accuracy was limited to a specific WBC subcategory. Conclusion: Although data augmentation methods are often used for achieving high accuracy in AI generation with supervised training, we consider that it is necessary to select the optimal data augmentation method for medical AI generation based on the characteristics of medical images.
Wenxin DONG Jianxun ZHANG Shuqiu TAN Xinyue ZHANG
In the pork fat content detection task, traditional physical or chemical methods are strongly destructive, have substantial technical requirements and cannot achieve nondestructive detection without slaughtering. To solve these problems, we propose a novel, convenient and economical method for detecting the fat content of pig B-ultrasound images based on hybrid attention and multiscale fusion learning, which extracts and fuses shallow detail information and deep semantic information at multiple scales. First, a deep learning network is constructed to learn the salient features of fat images through a hybrid attention mechanism. Then, the information describing pork fat is extracted at multiple scales, and the detailed information expressed in the shallow layer and the semantic information expressed in the deep layer are fused later. Finally, a deep convolution network is used to predict the fat content compared with the real label. The experimental results show that the determination coefficient is greater than 0.95 on the 130 groups of pork B-ultrasound image data sets, which is 2.90, 6.10 and 5.13 percentage points higher than that of VGGNet, ResNet and DenseNet, respectively. It indicats that the model could effectively identify the B-ultrasound image of pigs and predict the fat content with high accuracy.