The search functionality is under construction.

Keyword Search Result

[Keyword] attention mechanism(36hit)

1-20hit(36hit)

  • Power Peak Load Forecasting Based on Deep Time Series Analysis Method Open Access

    Ying-Chang HUNG  Duen-Ren LIU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2024/03/21
      Vol:
    E107-D No:7
      Page(s):
    845-856

    The prediction of peak power load is a critical factor directly impacting the stability of power supply, characterized significantly by its time series nature and intricate ties to the seasonal patterns in electricity usage. Despite its crucial importance, the current landscape of power peak load forecasting remains a multifaceted challenge in the field. This study aims to contribute to this domain by proposing a method that leverages a combination of three primary models - the GRU model, self-attention mechanism, and Transformer mechanism - to forecast peak power load. To contextualize this research within the ongoing discourse, it’s essential to consider the evolving methodologies and advancements in power peak load forecasting. By delving into additional references addressing the complexities and current state of the power peak load forecasting problem, this study aims to build upon the existing knowledge base and offer insights into contemporary challenges and strategies adopted within the field. Data preprocessing in this study involves comprehensive cleaning, standardization, and the design of relevant functions to ensure robustness in the predictive modeling process. Additionally, recognizing the necessity to capture temporal changes effectively, this research incorporates features such as “Weekly Moving Average” and “Monthly Moving Average” into the dataset. To evaluate the proposed methodologies comprehensively, this study conducts comparative analyses with established models such as LSTM, Self-attention network, Transformer, ARIMA, and SVR. The outcomes reveal that the models proposed in this study exhibit superior predictive performance compared to these established models, showcasing their effectiveness in accurately forecasting electricity consumption. The significance of this research lies in two primary contributions. Firstly, it introduces an innovative prediction method combining the GRU model, self-attention mechanism, and Transformer mechanism, aligning with the contemporary evolution of predictive modeling techniques in the field. Secondly, it introduces and emphasizes the utility of “Weekly Moving Average” and “Monthly Moving Average” methodologies, crucial in effectively capturing and interpreting seasonal variations within the dataset. By incorporating these features, this study enhances the model’s ability to account for seasonal influencing factors, thereby significantly improving the accuracy of peak power load forecasting. This contribution aligns with the ongoing efforts to refine forecasting methodologies and addresses the pertinent challenges within power peak load forecasting.

  • Real-Time Video Matting Based on RVM and Mobile ViT Open Access

    Chengyu WU  Jiangshan QIN  Xiangyang LI  Ao ZHAN  Zhengqiang WANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2024/01/29
      Vol:
    E107-D No:6
      Page(s):
    792-796

    Real-time matting is a challenging research in deep learning. Conventional CNN (Convolutional Neural Networks) approaches are easy to misjudge the foreground and background semantic and have blurry matting edges, which result from CNN’s limited concentration on global context due to receptive field. We propose a real-time matting approach called RMViT (Real-time matting with Vision Transformer) with Transformer structure, attention and content-aware guidance to solve issues above. The semantic accuracy improves a lot due to the establishment of global context and long-range pixel information. The experiments show our approach exceeds a 30% reduction in error metrics compared with existing real-time matting approaches.

  • Analysis of Blood Cell Image Recognition Methods Based on Improved CNN and Vision Transformer Open Access

    Pingping WANG  Xinyi ZHANG  Yuyan ZHAO  Yueti LI  Kaisheng XU  Shuaiyin ZHAO  

     
    PAPER-Neural Networks and Bioengineering

      Pubricized:
    2023/09/15
      Vol:
    E107-A No:6
      Page(s):
    899-908

    Leukemia is a common and highly dangerous blood disease that requires early detection and treatment. Currently, the diagnosis of leukemia types mainly relies on the pathologist’s morphological examination of blood cell images, which is a tedious and time-consuming process, and the diagnosis results are highly subjective and prone to misdiagnosis and missed diagnosis. This research suggests a blood cell image recognition technique based on an enhanced Vision Transformer to address these problems. Firstly, this paper incorporate convolutions with token embedding to replace the positional encoding which represent coarse spatial information. Then based on the Transformer’s self-attention mechanism, this paper proposes a sparse attention module that can select identifying regions in the image, further enhancing the model’s fine-grained feature expression capability. Finally, this paper uses a contrastive loss function to further increase the intra-class consistency and inter-class difference of classification features. According to experimental results, The model in this study has an identification accuracy of 92.49% on the Munich single-cell morphological dataset, which is an improvement of 1.41% over the baseline. And comparing with sota Swin transformer, this method still get greater performance. So our method has the potential to provide reference for clinical diagnosis by physicians.

  • FA-YOLO: A High-Precision and Efficient Method for Fabric Defect Detection in Textile Industry Open Access

    Kai YU  Wentao LYU  Xuyi YU  Qing GUO  Weiqiang XU  Lu ZHANG  

     
    PAPER-Neural Networks and Bioengineering

      Pubricized:
    2023/09/04
      Vol:
    E107-A No:6
      Page(s):
    890-898

    The automatic defect detection for fabric images is an essential mission in textile industry. However, there are some inherent difficulties in the detection of fabric images, such as complexity of the background and the highly uneven scales of defects. Moreover, the trade-off between accuracy and speed should be considered in real applications. To address these problems, we propose a novel model based on YOLOv4 to detect defects in fabric images, called Feature Augmentation YOLO (FA-YOLO). In terms of network structure, FA-YOLO adds an additional detection head to improve the detection ability of small defects and builds a powerful Neck structure to enhance feature fusion. First, to reduce information loss during feature fusion, we perform the residual feature augmentation (RFA) on the features after dimensionality reduction by using 1×1 convolution. Afterward, the attention module (SimAM) is embedded into the locations with rich features to improve the adaptation ability to complex backgrounds. Adaptive spatial feature fusion (ASFF) is also applied to output of the Neck to filter inconsistencies across layers. Finally, the cross-stage partial (CSP) structure is introduced for optimization. Experimental results based on three real industrial datasets, including Tianchi fabric dataset (72.5% mAP), ZJU-Leaper fabric dataset (0.714 of average F1-score) and NEU-DET steel dataset (77.2% mAP), demonstrate the proposed FA-YOLO achieves competitive results compared to other state-of-the-art (SoTA) methods.

  • CASEformer — A Transformer-Based Projection Photometric Compensation Network

    Yuqiang ZHANG  Huamin YANG  Cheng HAN  Chao ZHANG  Chaoran ZHU  

     
    PAPER

      Pubricized:
    2023/09/29
      Vol:
    E107-D No:1
      Page(s):
    13-28

    In this paper, we present a novel photometric compensation network named CASEformer, which is built upon the Swin module. For the first time, we combine coordinate attention and channel attention mechanisms to extract rich features from input images. Employing a multi-level encoder-decoder architecture with skip connections, we establish multiscale interactions between projection surfaces and projection images, achieving precise inference and compensation. Furthermore, through an attention fusion module, which simultaneously leverages both coordinate and channel information, we enhance the global context of feature maps while preserving enhanced texture coordinate details. The experimental results demonstrate the superior compensation effectiveness of our approach compared to the current state-of-the-art methods. Additionally, we propose a method for multi-surface projection compensation, further enriching our contributions.

  • A Driver Fatigue Detection Algorithm Based on Dynamic Tracking of Small Facial Targets Using YOLOv7

    Shugang LIU  Yujie WANG  Qiangguo YU  Jie ZHAN  Hongli LIU  Jiangtao LIU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2023/08/21
      Vol:
    E106-D No:11
      Page(s):
    1881-1890

    Driver fatigue detection has become crucial in vehicle safety technology. Achieving high accuracy and real-time performance in detecting driver fatigue is paramount. In this paper, we propose a novel driver fatigue detection algorithm based on dynamic tracking of Facial Eyes and Yawning using YOLOv7, named FEY-YOLOv7. The Coordinate Attention module is inserted into YOLOv7 to enhance its dynamic tracking accuracy by focusing on coordinate information. Additionally, a small target detection head is incorporated into the network architecture to promote the feature extraction ability of small facial targets such as eyes and mouth. In terms of compution, the YOLOv7 network architecture is significantly simplified to achieve high detection speed. Using the proposed PERYAWN algorithm, driver status is labeled and detected by four classes: open_eye, closed_eye, open_mouth, and closed_mouth. Furthermore, the Guided Image Filtering algorithm is employed to enhance image details. The proposed FEY-YOLOv7 is trained and validated on RGB-infrared datasets. The results show that FEY-YOLOv7 has achieved mAP of 0.983 and FPS of 101. This indicates that FEY-YOLOv7 is superior to state-of-the-art methods in accuracy and speed, providing an effective and practical solution for image-based driver fatigue detection.

  • A Lightweight and Efficient Infrared Pedestrian Semantic Segmentation Method

    Shangdong LIU  Chaojun MEI  Shuai YOU  Xiaoliang YAO  Fei WU  Yimu JI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2023/06/13
      Vol:
    E106-D No:9
      Page(s):
    1564-1571

    The thermal imaging pedestrian segmentation system has excellent performance in different illumination conditions, but it has some drawbacks(e.g., weak pedestrian texture information, blurred object boundaries). Meanwhile, high-performance large models have higher latency on edge devices with limited computing performance. To solve the above problems, in this paper, we propose a real-time thermal infrared pedestrian segmentation method. The feature extraction layers of our method consist of two paths. Firstly, we utilize the lossless spatial downsampling to obtain boundary texture details on the spatial path. On the context path, we use atrous convolutions to improve the receptive field and obtain more contextual semantic information. Then, the parameter-free attention mechanism is introduced at the end of the two paths for effective feature selection, respectively. The Feature Fusion Module (FFM) is added to fuse the semantic information of the two paths after selection. Finally, we accelerate method inference through multi-threading techniques on the edge computing device. Besides, we create a high-quality infrared pedestrian segmentation dataset to facilitate research. The comparative experiments on the self-built dataset and two public datasets with other methods show that our method also has certain effectiveness. Our code is available at https://github.com/mcjcs001/LEIPNet.

  • An Integrated Convolutional Neural Network with a Fusion Attention Mechanism for Acoustic Scene Classification

    Pengxu JIANG  Yue XIE  Cairong ZOU  Li ZHAO  Qingyun WANG  

     
    LETTER-Engineering Acoustics

      Pubricized:
    2023/02/06
      Vol:
    E106-A No:8
      Page(s):
    1057-1061

    In human-computer interaction, acoustic scene classification (ASC) is one of the relevant research domains. In real life, the recorded audio may include a lot of noise and quiet clips, making it hard for earlier ASC-based research to isolate the crucial scene information in sound. Furthermore, scene information may be scattered across numerous audio frames; hence, selecting scene-related frames is crucial for ASC. In this context, an integrated convolutional neural network with a fusion attention mechanism (ICNN-FA) is proposed for ASC. Firstly, segmented mel-spectrograms as the input of ICNN can assist the model in learning the short-term time-frequency correlation information. Then, the designed ICNN model is employed to learn these segment-level features. In addition, the proposed global attention layer may gather global information by integrating these segment features. Finally, the developed fusion attention layer is utilized to fuse all segment-level features while the classifier classifies various situations. Experimental findings using ASC datasets from DCASE 2018 and 2019 indicate the efficacy of the suggested method.

  • A Visual Question Answering Network Merging High- and Low-Level Semantic Information

    Huimin LI  Dezhi HAN  Chongqing CHEN  Chin-Chen CHANG  Kuan-Ching LI  Dun LI  

     
    PAPER-Core Methods

      Pubricized:
    2022/01/06
      Vol:
    E106-D No:5
      Page(s):
    581-589

    Visual Question Answering (VQA) usually uses deep attention mechanisms to learn fine-grained visual content of images and textual content of questions. However, the deep attention mechanism can only learn high-level semantic information while ignoring the impact of the low-level semantic information on answer prediction. For such, we design a High- and Low-Level Semantic Information Network (HLSIN), which employs two strategies to achieve the fusion of high-level semantic information and low-level semantic information. Adaptive weight learning is taken as the first strategy to allow different levels of semantic information to learn weights separately. The gate-sum mechanism is used as the second to suppress invalid information in various levels of information and fuse valid information. On the benchmark VQA-v2 dataset, we quantitatively and qualitatively evaluate HLSIN and conduct extensive ablation studies to explore the reasons behind HLSIN's effectiveness. Experimental results demonstrate that HLSIN significantly outperforms the previous state-of-the-art, with an overall accuracy of 70.93% on test-dev.

  • Learning Pixel Perception for Identity and Illumination Consistency Face Frontalization in the Wild

    Yongtang BAO  Pengfei ZHOU  Yue QI  Zhihui WANG  Qing FAN  

     
    PAPER-Person Image Generation

      Pubricized:
    2022/06/21
      Vol:
    E106-D No:5
      Page(s):
    794-803

    A frontal and realistic face image was synthesized from a single profile face image. It has a wide range of applications in face recognition. Although the frontal face method based on deep learning has made substantial progress in recent years, there is still no guarantee that the generated face has identity consistency and illumination consistency in a significant posture. This paper proposes a novel pixel-based feature regression generative adversarial network (PFR-GAN), which can learn to recover local high-frequency details and preserve identity and illumination frontal face images in an uncontrolled environment. We first propose a Reslu block to obtain richer feature representation and improve the convergence speed of training. We then introduce a feature conversion module to reduce the artifacts caused by face rotation discrepancy, enhance image generation quality, and preserve more high-frequency details of the profile image. We also construct a 30,000 face pose dataset to learn about various uncontrolled field environments. Our dataset includes ages of different races and wild backgrounds, allowing us to handle other datasets and obtain better results. Finally, we introduce a discriminator used for recovering the facial structure of the frontal face images. Quantitative and qualitative experimental results show our PFR-GAN can generate high-quality and high-fidelity frontal face images, and our results are better than the state-of-art results.

  • Chinese Named Entity Recognition Method Based on Dictionary Semantic Knowledge Enhancement

    Tianbin WANG  Ruiyang HUANG  Nan HU  Huansha WANG  Guanghan CHU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/02/15
      Vol:
    E106-D No:5
      Page(s):
    1010-1017

    Chinese Named Entity Recognition is the fundamental technology in the field of the Chinese Natural Language Process. It is extensively adopted into information extraction, intelligent question answering, and knowledge graph. Nevertheless, due to the diversity and complexity of Chinese, most Chinese NER methods fail to sufficiently capture the character granularity semantics, which affects the performance of the Chinese NER. In this work, we propose DSKE-Chinese NER: Chinese Named Entity Recognition based on Dictionary Semantic Knowledge Enhancement. We novelly integrate the semantic information of character granularity into the vector space of characters and acquire the vector representation containing semantic information by the attention mechanism. In addition, we verify the appropriate number of semantic layers through the comparative experiment. Experiments on public Chinese datasets such as Weibo, Resume and MSRA show that the model outperforms character-based LSTM baselines.

  • Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network

    Wenkai LIU  Cuizhu QIN  Menglong WU  Wenle BAI  Hongxia DONG  

     
    LETTER-Human-computer Interaction

      Pubricized:
    2023/02/15
      Vol:
    E106-D No:5
      Page(s):
    1081-1084

    Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.

  • DFAM-DETR: Deformable Feature Based Attention Mechanism DETR on Slender Object Detection

    Feng WEN  Mei WANG  Xiaojie HU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/12/09
      Vol:
    E106-D No:3
      Page(s):
    401-409

    Object detection is one of the most important aspects of computer vision, and the use of CNNs for object detection has yielded substantial results in a variety of fields. However, due to the fixed sampling in standard convolution layers, it restricts receptive fields to fixed locations and limits CNNs in geometric transformations. This leads to poor performance of CNNs for slender object detection. In order to achieve better slender object detection accuracy and efficiency, this proposed detector DFAM-DETR not only can adjust the sampling points adaptively, but also enhance the ability to focus on slender object features and extract essential information from global to local on the image through an attention mechanism. This study uses slender objects images from MS-COCO dataset. The experimental results show that DFAM-DETR achieves excellent detection performance on slender objects compared to CNN and transformer-based detectors.

  • Face Hallucination via Multi-Scale Structure Prior Learning

    Yuexi YAO  Tao LU  Kanghui ZHAO  Yanduo ZHANG  Yu WANG  

     
    LETTER-Image

      Pubricized:
    2022/07/19
      Vol:
    E106-A No:1
      Page(s):
    92-96

    Recently, the face hallucination method based on deep learning understands the mapping between low-resolution (LR) and high-resolution (HR) facial patterns by exploring the priors of facial structure. However, how to maintain the face structure consistency after the reconstruction of face images at different scales is still a challenging problem. In this letter, we propose a novel multi-scale structure prior learning (MSPL) for face hallucination. First, we propose a multi-scale structure prior block (MSPB). Considering the loss of high-frequency information in the LR space, we mainly process the input image in three different scale ascending dimensional spaces, and map the image to the high dimensional space to extract multi-scale structural prior information. Then the size of feature maps is recovered by downsampling, and finally the multi-scale information is fused to restore the feature channels. On this basis, we propose a local detail attention module (LDAM) to focus on the local texture information of faces. We conduct extensive face hallucination reconstruction experiments on a public face dataset (LFW) to verify the effectiveness of our method.

  • A Survey on Explainable Fake News Detection

    Ken MISHIMA  Hayato YAMANA  

     
    SURVEY PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2022/04/22
      Vol:
    E105-D No:7
      Page(s):
    1249-1257

    The increasing amount of fake news is a growing problem that will progressively worsen in our interconnected world. Machine learning, particularly deep learning, is being used to detect misinformation; however, the models employed are essentially black boxes, and thus are uninterpretable. This paper presents an overview of explainable fake news detection models. Specifically, we first review the existing models, datasets, evaluation techniques, and visualization processes. Subsequently, possible improvements in this field are identified and discussed.

  • Recursive Multi-Scale Channel-Spatial Attention for Fine-Grained Image Classification

    Dichao LIU  Yu WANG  Kenji MASE  Jien KATO  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/12/22
      Vol:
    E105-D No:3
      Page(s):
    713-726

    Fine-grained image classification is a difficult problem, and previous studies mainly overcome this problem by locating multiple discriminative regions in different scales and then aggregating complementary information explored from the located regions. However, locating discriminative regions introduces heavy overhead and is not suitable for real-world application. In this paper, we propose the recursive multi-scale channel-spatial attention module (RMCSAM) for addressing this problem. Following the experience of previous research on fine-grained image classification, RMCSAM explores multi-scale attentional information. However, the attentional information is explored by recursively refining the deep feature maps of a convolutional neural network (CNN) to better correspond to multi-scale channel-wise and spatial-wise attention, instead of localizing attention regions. In this way, RMCSAM provides a lightweight module that can be inserted into standard CNNs. Experimental results show that RMCSAM can improve the classification accuracy and attention capturing ability over baselines. Also, RMCSAM performs better than other state-of-the-art attention modules in fine-grained image classification, and is complementary to some state-of-the-art approaches for fine-grained image classification. Code is available at https://github.com/Dichao-Liu/Recursive-Multi-Scale-Channel-Spatial-Attention-Module.

  • Gender Recognition Using a Gaze-Guided Self-Attention Mechanism Robust Against Background Bias in Training Samples

    Masashi NISHIYAMA  Michiko INOUE  Yoshio IWAI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/11/18
      Vol:
    E105-D No:2
      Page(s):
    415-426

    We propose an attention mechanism in deep learning networks for gender recognition using the gaze distribution of human observers when they judge the gender of people in pedestrian images. Prevalent attention mechanisms spatially compute the correlation among values of all cells in an input feature map to calculate attention weights. If a large bias in the background of pedestrian images (e.g., test samples and training samples containing different backgrounds) is present, the attention weights learned using the prevalent attention mechanisms are affected by the bias, which in turn reduces the accuracy of gender recognition. To avoid this problem, we incorporate an attention mechanism called gaze-guided self-attention (GSA) that is inspired by human visual attention. Our method assigns spatially suitable attention weights to each input feature map using the gaze distribution of human observers. In particular, GSA yields promising results even when using training samples with the background bias. The results of experiments on publicly available datasets confirm that our GSA, using the gaze distribution, is more accurate in gender recognition than currently available attention-based methods in the case of background bias between training and test samples.

  • Detecting Depression from Speech through an Attentive LSTM Network

    Yan ZHAO  Yue XIE  Ruiyu LIANG  Li ZHANG  Li ZHAO  Chengyu LIU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2021/08/24
      Vol:
    E104-D No:11
      Page(s):
    2019-2023

    Depression endangers people's health conditions and affects the social order as a mental disorder. As an efficient diagnosis of depression, automatic depression detection has attracted lots of researcher's interest. This study presents an attention-based Long Short-Term Memory (LSTM) model for depression detection to make full use of the difference between depression and non-depression between timeframes. The proposed model uses frame-level features, which capture the temporal information of depressive speech, to replace traditional statistical features as an input of the LSTM layers. To achieve more multi-dimensional deep feature representations, the LSTM output is then passed on attention layers on both time and feature dimensions. Then, we concat the output of the attention layers and put the fused feature representation into the fully connected layer. At last, the fully connected layer's output is passed on to softmax layer. Experiments conducted on the DAIC-WOZ database demonstrate that the proposed attentive LSTM model achieves an average accuracy rate of 90.2% and outperforms the traditional LSTM network and LSTM with local attention by 0.7% and 2.3%, respectively, which indicates its feasibility.

  • Triplet Attention Network for Video-Based Person Re-Identification

    Rui SUN  Qili LIANG  Zi YANG  Zhenghui ZHAO  Xudong ZHANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/21
      Vol:
    E104-D No:10
      Page(s):
    1775-1779

    Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.

  • Optic Disc Detection Based on Saliency Detection and Attention Convolutional Neural Networks

    Ying WANG  Xiaosheng YU  Chengdong WU  

     
    LETTER-Image

      Pubricized:
    2021/03/23
      Vol:
    E104-A No:9
      Page(s):
    1370-1374

    The automatic analysis of retinal fundus images is of great significance in large-scale ocular pathologies screening, of which optic disc (OD) location is a prerequisite step. In this paper, we propose a method based on saliency detection and attention convolutional neural network for OD detection. Firstly, the wavelet transform based saliency detection method is used to detect the OD candidate regions to the maximum extent such that the intensity, edge and texture features of the fundus images are all considered into the OD detection process. Then, the attention mechanism that can emphasize the representation of OD region is combined into the dense network. Finally, it is determined whether the detected candidate regions are OD region or non-OD region. The proposed method is implemented on DIARETDB0, DIARETDB1 and MESSIDOR datasets, the experimental results of which demonstrate its superiority and robustness.

1-20hit(36hit)