The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] object(437hit)

1-20hit(437hit)

  • REM-CiM: Attentional RGB-Event Fusion Multi-Modal Analog CiM for Area/Energy-Efficient Edge Object Detection during Both Day and Night Open Access

    Yuya ICHIKAWA  Ayumu YAMADA  Naoko MISAWA  Chihiro MATSUI  Ken TAKEUCHI  

     
    PAPER

      Pubricized:
    2024/04/09
      Vol:
    E107-C No:10
      Page(s):
    426-435

    Integrating RGB and event sensors improves object detection accuracy, especially during the night, due to the high-dynamic range of event camera. However, introducing an event sensor leads to an increase in computational resources, which makes the implementation of RGB-event fusion multi-modal AI to CiM difficult. To tackle this issue, this paper proposes RGB-Event fusion Multi-modal analog Computation-in-Memory (CiM), called REM-CiM, for multi-modal edge object detection AI. In REM-CiM, two proposals about multi-modal AI algorithms and circuit implementation are co-designed. First, Memory capacity-Efficient Attentional Feature Pyramid Network (MEA-FPN), the model architecture for RGB-event fusion analog CiM, is proposed for parameter-efficient RGB-event fusion. Convolution-less bi-directional calibration (C-BDC) in MEA-FPN extracts important features of each modality with attention modules, while reducing the number of weight parameters by removing large convolutional operations from conventional BDC. Proposed MEA-FPN w/ C-BDC achieves a 76% reduction of parameters while maintaining mean Average Precision (mAP) degradation to < 2.3% during both day and night, compared with Attentional FPN fusion (A-FPN), a conventional BDC-adopted FPN fusion. Second, the low-bit quantization with clipping (LQC) is proposed to reduce area/energy. Proposed REM-CiM with MEA-FPN and LQC achieves almost the same memory cells, 21% less ADC area, 24% less ADC energy and 0.17% higher mAP than conventional FPN fusion CiM without LQC.

  • Reliable Image Matching Using Optimal Combination of Color and Intensity Information Based on Relationship with Surrounding Objects Open Access

    Rina TAGAMI  Hiroki KOBAYASHI  Shuichi AKIZUKI  Manabu HASHIMOTO  

     
    PAPER-Pattern Recognition

      Pubricized:
    2024/05/30
      Vol:
    E107-D No:10
      Page(s):
    1312-1321

    Due to the revitalization of the semiconductor industry and efforts to reduce labor and unmanned operations in the retail and food manufacturing industries, objects to be recognized at production sites are increasingly diversified in color and design. Depending on the target objects, it may be more reliable to process only color information, while intensity information may be better, or a combination of color and intensity information may be better. However, there are not many conventional method for optimizing the color and intensity information to be used, and deep learning is too costly for production sites. In this paper, we optimize the combination of the color and intensity information of a small number of pixels used for matching in the framework of template matching, on the basis of the mutual relationship between the target object and surrounding objects. We propose a fast and reliable matching method using these few pixels. Pixels with a low pixel pattern frequency are selected from color and grayscale images of the target object, and pixels that are highly discriminative from surrounding objects are carefully selected from these pixels. The use of color and intensity information makes the method highly versatile for object design. The use of a small number of pixels that are not shared by the target and surrounding objects provides high robustness to the surrounding objects and enables fast matching. Experiments using real images have confirmed that when 14 pixels are used for matching, the processing time is 6.3 msec and the recognition success rate is 99.7%. The proposed method also showed better positional accuracy than the comparison method, and the optimized pixels had a higher recognition success rate than the non-optimized pixels.

  • DETrack: Multi-Object Tracking Algorithm Based on Feature Decomposition and Feature Enhancement Open Access

    Feng WEN  Haixin HUANG  Xiangyang YIN  Junguang MA  Xiaojie HU  

     
    PAPER-Neural Networks and Bioengineering

      Pubricized:
    2024/04/22
      Vol:
    E107-A No:9
      Page(s):
    1522-1533

    Multi-object tracking (MOT) algorithms are typically classified as one-shot or two-step algorithms. The one-shot MOT algorithm is widely studied and applied due to its fast inference speed. However, one-shot algorithms include two sub-tasks of detection and re-ID, which have conflicting directions for model optimization, thus limiting tracking performance. Additionally, MOT algorithms often suffer from serious ID switching issues, which can negatively affect the tracking effect. To address these challenges, this study proposes the DETrack algorithm, which consists of feature decomposition and feature enhancement modules. The feature decomposition module can effectively exploit the differences and correlations of different tasks to solve the conflict problem. Moreover, it can effectively mitigate the competition between the detection and re-ID tasks, while simultaneously enhancing their cooperation. The feature enhancement module can improve feature quality and alleviate the problem of target ID switching. Experimental results demonstrate that DETrack has achieved improvements in multi-object tracking performance, while reducing the number of ID switching. The designed method of feature decomposition and feature enhancement can significantly enhance target tracking effectiveness.

  • Reinforced Voxel-RCNN: An Efficient 3D Object Detection Method Based on Feature Aggregation Open Access

    Jia-ji JIANG  Hai-bin WAN  Hong-min SUN  Tuan-fa QIN  Zheng-qiang WANG  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2024/04/24
      Vol:
    E107-D No:9
      Page(s):
    1228-1238

    In this paper, the Towards High Performance Voxel-based 3D Object Detection (Voxel-RCNN) three-dimensional (3D) point cloud object detection model is used as the benchmark network. Aiming at the problems existing in the current mainstream 3D point cloud voxelization methods, such as the backbone and the lack of feature expression ability under the bird’s-eye view (BEV), a high-performance voxel-based 3D object detection network (Reinforced Voxel-RCNN) is proposed. Firstly, a 3D feature extraction module based on the integration of inverted residual convolutional network and weight normalization is designed on the 3D backbone. This module can not only well retain more point cloud feature information, enhance the information interaction between convolutional layers, but also improve the feature extraction ability of the backbone network. Secondly, a spatial feature-semantic fusion module based on spatial and channel attention is proposed from a BEV perspective. The mixed use of channel features and semantic features further improves the network’s ability to express point cloud features. In the comparison of experimental results on the public dataset KITTI, the experimental results of this paper are better than many voxel-based methods. Compared with the baseline network, the 3D average accuracy and BEV average accuracy on the three categories of Car, Cyclist, and Pedestrians are improved. Among them, in the 3D average accuracy, the improvement rate of Car category is 0.23%, Cyclist is 0.78%, and Pedestrians is 2.08%. In the context of BEV average accuracy, enhancements are observed: 0.32% for the Car category, 0.99% for Cyclist, and 2.38% for Pedestrians. The findings demonstrate that the algorithm enhancement introduced in this study effectively enhances the accuracy of target category detection.

  • Edge Device Verification Techniques for Updated Object Detection AI via Target Object Existence Open Access

    Akira KITAYAMA  Goichi ONO  Hiroaki ITO  

     
    PAPER-Intelligent Transport System

      Pubricized:
    2023/12/20
      Vol:
    E107-A No:8
      Page(s):
    1286-1295

    Edge devices with strict safety and reliability requirements, such as autonomous driving cars, industrial robots, and drones, necessitate software verification on such devices before operation. The human cost and time required for this analysis constitute a barrier in the cycle of software development and updating. In particular, the final verification at the edge device should at least strictly confirm that the updated software is not degraded from the current it. Since the edge device does not have the correct data, it is necessary for a human to judge whether the difference between the updated software and the operating it is due to degradation or improvement. Therefore, this verification is very costly. This paper proposes a novel automated method for efficient verification on edge devices of an object detection AI, which has found practical use in various applications. In the proposed method, a target object existence detector (TOED) (a simple binary classifier) judges whether an object in the recognition target class exists in the region of a prediction difference between the AI’s operating and updated versions. Using the results of this TOED judgement and the predicted difference, an automated verification system for the updated AI was constructed. TOED was designed as a simple binary classifier with four convolutional layers, and the accuracy of object existence judgment was evaluated for the difference between the predictions of the YOLOv5 L and X models using the Cityscapes dataset. The results showed judgement with more than 99.5% accuracy and 8.6% over detection, thus indicating that a verification system adopting this method would be more efficient than simple analysis of the prediction differences.

  • Amodal Instance Segmentation of Thin Objects with Large Overlaps by Seed-to-Mask Extending Open Access

    Ryohei KANKE  Masanobu TAKAHASHI  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2024/02/29
      Vol:
    E107-D No:7
      Page(s):
    908-911

    Amodal Instance Segmentation (AIS) aims to segment the regions of both visible and invisible parts of overlapping objects. The mainstream Mask R-CNN-based methods are unsuitable for thin objects with large overlaps because of their object proposal features with bounding boxes for three reasons. First, capturing the entire shapes of overlapping thin objects is difficult. Second, the bounding boxes of close objects are almost identical. Third, a bounding box contains many objects in most cases. In this paper, we propose a box-free AIS method, Seed-to-Mask, for thin objects with large overlaps. The method specifies a target object using a seed and iteratively extends the segmented region. We have achieved better performance in experiments on artificial data consisting only of thin objects.

  • A Monkey Swing Counting Algorithm Based on Object Detection Open Access

    Hao CHEN  Zhe-Ming LU  Jie LIU  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2023/12/07
      Vol:
    E107-D No:4
      Page(s):
    579-583

    This Letter focuses on deep learning-based monkeys' head swing counting problem. Nowadays, there are very few papers on monkey detection, and even fewer papers on monkeys' head swing counting. This research tries to fill in the gap and try to calculate the head swing frequency of monkeys through deep learning, where we further extend the traditional target detection algorithm. After analyzing object detection results, we localize the monkey's actions over a period. This Letter analyzes the task of counting monkeys' head swings, and proposes the standard that accurately describes a monkey's head swing. Under the guidance of this standard, the monkeys' head swing counting accuracy in 50 test videos reaches 94.23%.

  • Non-Cooperative Rational Synthesis Problem on Stochastic Games for Positional Strategies

    So KOIDE  Yoshiaki TAKATA  Hiroyuki SEKI  

     
    PAPER

      Pubricized:
    2023/10/11
      Vol:
    E107-D No:3
      Page(s):
    301-311

    Synthesis problems on multiplayer non-zero-sum games (MG) with multiple environment players that behave rationally are the problems to find a good strategy of the system and have been extensively studied. This paper concerns the synthesis problems on stochastic MG (SMG), where a special controller other than players, called nature, which chooses a move in its turn randomly, may exist. Two types of synthesis problems on SMG exist: cooperative rational synthesis problem (CRSP) and non-cooperative rational synthesis problem (NCRSP). The rationality of environment players is modeled by Nash equilibria, and CRSP is the problem to decide whether there exists a Nash equilibrium that gives the system a payoff not less than a given threshold. Ummels et al. studied the complexity of CRSP for various classes of objectives and strategies of players. CRSP fits the situation where the system can make a suggestion of a strategy profile (a tuple of strategies of all players) to the environment players. However, in real applications, the system may rarely have an opportunity to make suggestions to the environment, and thus CRSP is optimistic. NCRSP is the problem to decide whether there exists a strategy σ0 of the system satisfying that for every strategy profile of the environment players that forms a 0-fixed Nash equilibrium (a Nash equilibrium where the system's strategy is fixed to σ0), the system obtains a payoff not less than a given threshold. In this paper, we investigate the complexity of NCRSP for positional (i.e. pure memoryless) strategies. We consider ω-regular objectives as the model of players' objectives, and show the complexity results of the problem for several subclasses of ω-regular objectives. In particular, the problem for terminal reachability (TR) objectives is shown to be Σp2-complete.

  • Improved Head and Data Augmentation to Reduce Artifacts at Grid Boundaries in Object Detection

    Shinji UCHINOURA  Takio KURITA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2023/10/23
      Vol:
    E107-D No:1
      Page(s):
    115-124

    We investigated the influence of horizontal shifts of the input images for one stage object detection method. We found that the object detector class scores drop when the target object center is at the grid boundary. Many approaches have focused on reducing the aliasing effect of down-sampling to achieve shift-invariance. However, down-sampling does not completely solve this problem at the grid boundary; it is necessary to suppress the dispersion of features in pixels close to the grid boundary into adjacent grid cells. Therefore, this paper proposes two approaches focused on the grid boundary to improve this weak point of current object detection methods. One is the Sub-Grid Feature Extraction Module, in which the sub-grid features are added to the input of the classification head. The other is Grid-Aware Data Augmentation, where augmented data are generated by the grid-level shifts and are used in training. The effectiveness of the proposed approaches is demonstrated using the COCO validation set after applying the proposed method to the FCOS architecture.

  • Two-Path Object Knowledge Injection for Detecting Novel Objects With Single-Stage Dense Detector

    KuanChao CHU  Hideki NAKAYAMA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2023/08/02
      Vol:
    E106-D No:11
      Page(s):
    1868-1880

    We present an effective system for integrating generative zero-shot classification modules into a YOLO-like dense detector to detect novel objects. Most double-stage-based novel object detection methods are achieved by refining the classification output branch but cannot be applied to a dense detector. Our system utilizes two paths to inject knowledge of novel objects into a dense detector. One involves injecting the class confidence for novel classes from a classifier trained on data synthesized via a dual-step generator. This generator learns a mapping function between two feature spaces, resulting in better classification performance. The second path involves re-training the detector head with feature maps synthesized on different intensity levels. This approach significantly increases the predicted objectness for novel objects, which is a major challenge for a dense detector. We also introduce a stop-and-reload mechanism during re-training for optimizing across head layers to better learn synthesized features. Our method relaxes the constraint on the detector head architecture in the previous method and has markedly enhanced performance on the MSCOCO dataset.

  • Multi-Objective Design of EMI Filter with Uncertain Parameters by Preference Set-Based Design Method and Polynomial Chaos Method

    Duc Chinh BUI  Yoshiki KAYANO  Fengchao XIAO  Yoshio KAMI  

     
    PAPER-Electromagnetic Compatibility(EMC)

      Pubricized:
    2023/06/30
      Vol:
    E106-B No:10
      Page(s):
    959-968

    Today's electronic devices must meet many requirements, such as those related to performance, limits to the radiated electromagnetic field, size, etc. For such a design, the requirement is to have a solution that simultaneously meets multiple objectives that sometimes include conflicting requirements. In addition, it is also necessary to consider uncertain parameters. This paper proposes a new combination of statistical analysis using the Polynomial Chaos (PC) method for dealing with the random and multi-objective satisfactory design using the Preference Set-based Design (PSD) method. The application in this paper is an Electromagnetic Interference (EMI) filter for a practical case, which includes plural element parameters and uncertain parameters, which are resistors at the source and load, and the performances of the attenuation characteristics. The PC method generates simulation data with high enough accuracy and good computational efficiency, and these data are used as initial data for the meta-modeling of the PSD method. The design parameters of the EMI filter, which satisfy required performances, are obtained in a range by the PSD method. The authors demonstrate the validity of the proposed method. The results show that applying a multi-objective design method using PSD with a statistical method using PC to handle the uncertain problem can be applied to electromagnetic designs to reduce the time and cost of product development.

  • GAN-based Image Translation Model with Self-Attention for Nighttime Dashcam Data Augmentation

    Rebeka SULTANA  Gosuke OHASHI  

     
    PAPER-Intelligent Transport System

      Pubricized:
    2023/06/27
      Vol:
    E106-A No:9
      Page(s):
    1202-1210

    High-performance deep learning-based object detection models can reduce traffic accidents using dashcam images during nighttime driving. Deep learning requires a large-scale dataset to obtain a high-performance model. However, existing object detection datasets are mostly daytime scenes and a few nighttime scenes. Increasing the nighttime dataset is laborious and time-consuming. In such a case, it is possible to convert daytime images to nighttime images by image-to-image translation model to augment the nighttime dataset with less effort so that the translated dataset can utilize the annotations of the daytime dataset. Therefore, in this study, a GAN-based image-to-image translation model is proposed by incorporating self-attention with cycle consistency and content/style separation for nighttime data augmentation that shows high fidelity to annotations of the daytime dataset. Experimental results highlight the effectiveness of the proposed model compared with other models in terms of translated images and FID scores. Moreover, the high fidelity of translated images to the annotations is verified by a small object detection model according to detection results and mAP. Ablation studies confirm the effectiveness of self-attention in the proposed model. As a contribution to GAN-based data augmentation, the source code of the proposed image translation model is publicly available at https://github.com/subecky/Image-Translation-With-Self-Attention

  • Location First Non-Maximum Suppression for Uncovered Muck Truck Detection

    Yuxiang ZHANG  Dehua LIU  Chuanpeng SU  Juncheng LIU  

     
    PAPER-Image

      Pubricized:
    2022/12/13
      Vol:
    E106-A No:6
      Page(s):
    924-931

    Uncovered muck truck detection aims to detect the muck truck and distinguish whether it is covered or not by dust-proof net to trace the source of pollution. Unlike traditional detection problem, recalling all uncovered trucks is more important than accurate locating for pollution traceability. When two objects are very close in an image, the occluded object may not be recalled because the non-maximum suppression (NMS) algorithm can remove the overlapped proposal. To address this issue, we propose a Location First NMS method to match the ground truth boxes and predicted boxes by position rather than class identifier (ID) in the training stage. Firstly, a box matching method is introduced to re-assign the predicted box ID using the closest ground truth one, which can avoid object missing when the IoU of two proposals is greater than the threshold. Secondly, we design a loss function to adapt the proposed algorithm. Thirdly, a uncovered muck truck detection system is designed using the method in a real scene. Experiment results show the effectiveness of the proposed method.

  • A Novel SSD-Based Detection Algorithm Suitable for Small Object

    Xi ZHANG  Yanan ZHANG  Tao GAO  Yong FANG  Ting CHEN  

     
    PAPER-Core Methods

      Pubricized:
    2022/01/06
      Vol:
    E106-D No:5
      Page(s):
    625-634

    The original single-shot multibox detector (SSD) algorithm has good detection accuracy and speed for regular object recognition. However, the SSD is not suitable for detecting small objects for two reasons: 1) the relationships among different feature layers with various scales are not considered, 2) the predicted results are solely determined by several independent feature layers. To enhance its detection capability for small objects, this study proposes an improved SSD-based algorithm called proportional channels' fusion SSD (PCF-SSD). Three enhancements are provided by this novel PCF-SSD algorithm. First, a fusion feature pyramid model is proposed by concatenating channels of certain key feature layers in a given proportion for object detection. Second, the default box sizes are adjusted properly for small object detection. Third, an improved loss function is suggested to train the above-proposed fusion model, which can further improve object detection performance. A series of experiments are conducted on the public database Pascal VOC to validate the PCF-SSD. On comparing with the original SSD algorithm, our algorithm improves the mean average precision and detection accuracy for small objects by 3.3% and 3.9%, respectively, with a detection speed of 40FPS. Furthermore, the proposed PCF-SSD can achieve a better balance of detection accuracy and efficiency than the original SSD algorithm, as demonstrated by a series of experimental results.

  • Computer Vision-Based Tracking of Workers in Construction Sites Based on MDNet

    Wen LIU  Yixiao SHAO  Shihong ZHAI  Zhao YANG  Peishuai CHEN  

     
    PAPER-Smart Industry

      Pubricized:
    2022/10/20
      Vol:
    E106-D No:5
      Page(s):
    653-661

    Automatic continuous tracking of objects involved in a construction project is required for such tasks as productivity assessment, unsafe behavior recognition, and progress monitoring. Many computer-vision-based tracking approaches have been investigated and successfully tested on construction sites; however, their practical applications are hindered by the tracking accuracy limited by the dynamic, complex nature of construction sites (i.e. clutter with background, occlusion, varying scale and pose). To achieve better tracking performance, a novel deep-learning-based tracking approach called the Multi-Domain Convolutional Neural Networks (MD-CNN) is proposed and investigated. The proposed approach consists of two key stages: 1) multi-domain representation of learning; and 2) online visual tracking. To evaluate the effectiveness and feasibility of this approach, it is applied to a metro project in Wuhan China, and the results demonstrate good tracking performance in construction scenarios with complex background. The average distance error and F-measure for the MDNet are 7.64 pixels and 67, respectively. The results demonstrate that the proposed approach can be used by site managers to monitor and track workers for hazard prevention in construction sites.

  • An Improved Real-Time Object Tracking Algorithm Based on Deep Learning Features

    Xianyu WANG  Cong LI  Heyi LI  Rui ZHANG  Zhifeng LIANG  Hai WANG  

     
    PAPER-Object Recognition and Tracking

      Pubricized:
    2022/01/07
      Vol:
    E106-D No:5
      Page(s):
    786-793

    Visual object tracking is always a challenging task in computer vision. During the tracking, the shape and appearance of the target may change greatly, and because of the lack of sufficient training samples, most of the online learning tracking algorithms will have performance bottlenecks. In this paper, an improved real-time algorithm based on deep learning features is proposed, which combines multi-feature fusion, multi-scale estimation, adaptive updating of target model and re-detection after target loss. The effectiveness and advantages of the proposed algorithm are proved by a large number of comparative experiments with other excellent algorithms on large benchmark datasets.

  • OPENnet: Object Position Embedding Network for Locating Anti-Bird Thorn of High-Speed Railway

    Zhuo WANG  Junbo LIU  Fan WANG  Jun WU  

     
    LETTER-Intelligent Transportation Systems

      Pubricized:
    2022/11/14
      Vol:
    E106-D No:5
      Page(s):
    824-828

    Machine vision-based automatic anti-bird thorn failure inspection, instead of manual identification, remains a great challenge. In this paper, we proposed a novel Object Position Embedding Network (OPENnet), which can improve the precision of anti-bird thorn localization. OPENnet can simultaneously predict the location boxes of the support device and anti-bird thorn by using the proposed double-head network. And then, OPENnet is optimized using the proposed symbiotic loss function (SymLoss), which embeds the object position into the network. The comprehensive experiments are conducted on the real railway video dataset. OPENnet yields competitive performance on anti-bird thorn localization. Specifically, the localization performance gains +3.65 AP, +2.10 AP50, and +1.22 AP75.

  • Effectiveness of Feature Extraction System for Multimodal Sensor Information Based on VRAE and Its Application to Object Recognition

    Kazuki HAYASHI  Daisuke TANAKA  

     
    LETTER-Object Recognition and Tracking

      Pubricized:
    2023/01/12
      Vol:
    E106-D No:5
      Page(s):
    833-835

    To achieve object recognition, it is necessary to find the unique features of the objects to be recognized. Results in prior research suggest that methods that use multiple modalities information are effective to find the unique features. In this paper, the overview of the system that can extract the features of the objects to be recognized by integrating visual, tactile, and auditory information as multimodal sensor information with VRAE is shown. Furthermore, a discussion about changing the combination of modalities information is also shown.

  • Convolution Block Feature Addition Module (CBFAM) for Lightweight and Fast Object Detection on Non-GPU Devices

    Min Ho KWAK  Youngwoo KIM  Kangin LEE  Jae Young CHOI  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2023/01/24
      Vol:
    E106-D No:5
      Page(s):
    1106-1110

    This letter proposes a novel lightweight deep learning object detector named LW-YOLOv4-tiny, which incorporates the convolution block feature addition module (CBFAM). The novelty of LW-YOLOv4-tiny is the use of channel-wise convolution and element-wise addition in the CBFAM instead of utilizing the concatenation of different feature maps. The model size and computation requirement are reduced by up to 16.9 Mbytes, 5.4 billion FLOPs (BFLOPS), and 11.3 FPS, which is 31.9%, 22.8%, and 30% smaller and faster than the most recent version of YOLOv4-tiny. From the MSCOCO2017 and PASCAL VOC2012 benchmarks, LW-YOLOv4-tiny achieved 40.2% and 69.3% mAP, respectively.

  • Object-ABN: Learning to Generate Sharp Attention Maps for Action Recognition

    Tomoya NITTA  Tsubasa HIRAKAWA  Hironobu FUJIYOSHI  Toru TAMAKI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/12/14
      Vol:
    E106-D No:3
      Page(s):
    391-400

    In this paper we propose an extension of the Attention Branch Network (ABN) by using instance segmentation for generating sharper attention maps for action recognition. Methods for visual explanation such as Grad-CAM usually generate blurry maps which are not intuitive for humans to understand, particularly in recognizing actions of people in videos. Our proposed method, Object-ABN, tackles this issue by introducing a new mask loss that makes the generated attention maps close to the instance segmentation result. Further the Prototype Conformity (PC) loss and multiple attention maps are introduced to enhance the sharpness of the maps and improve the performance of classification. Experimental results with UCF101 and SSv2 shows that the generated maps by the proposed method are much clearer qualitatively and quantitatively than those of the original ABN.

1-20hit(437hit)