The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] object detection(56hit)

1-20hit(56hit)

  • REM-CiM: Attentional RGB-Event Fusion Multi-Modal Analog CiM for Area/Energy-Efficient Edge Object Detection during Both Day and Night Open Access

    Yuya ICHIKAWA  Ayumu YAMADA  Naoko MISAWA  Chihiro MATSUI  Ken TAKEUCHI  

     
    PAPER

      Pubricized:
    2024/04/09
      Vol:
    E107-C No:10
      Page(s):
    426-435

    Integrating RGB and event sensors improves object detection accuracy, especially during the night, due to the high-dynamic range of event camera. However, introducing an event sensor leads to an increase in computational resources, which makes the implementation of RGB-event fusion multi-modal AI to CiM difficult. To tackle this issue, this paper proposes RGB-Event fusion Multi-modal analog Computation-in-Memory (CiM), called REM-CiM, for multi-modal edge object detection AI. In REM-CiM, two proposals about multi-modal AI algorithms and circuit implementation are co-designed. First, Memory capacity-Efficient Attentional Feature Pyramid Network (MEA-FPN), the model architecture for RGB-event fusion analog CiM, is proposed for parameter-efficient RGB-event fusion. Convolution-less bi-directional calibration (C-BDC) in MEA-FPN extracts important features of each modality with attention modules, while reducing the number of weight parameters by removing large convolutional operations from conventional BDC. Proposed MEA-FPN w/ C-BDC achieves a 76% reduction of parameters while maintaining mean Average Precision (mAP) degradation to < 2.3% during both day and night, compared with Attentional FPN fusion (A-FPN), a conventional BDC-adopted FPN fusion. Second, the low-bit quantization with clipping (LQC) is proposed to reduce area/energy. Proposed REM-CiM with MEA-FPN and LQC achieves almost the same memory cells, 21% less ADC area, 24% less ADC energy and 0.17% higher mAP than conventional FPN fusion CiM without LQC.

  • Reliable Image Matching Using Optimal Combination of Color and Intensity Information Based on Relationship with Surrounding Objects Open Access

    Rina TAGAMI  Hiroki KOBAYASHI  Shuichi AKIZUKI  Manabu HASHIMOTO  

     
    PAPER-Pattern Recognition

      Pubricized:
    2024/05/30
      Vol:
    E107-D No:10
      Page(s):
    1312-1321

    Due to the revitalization of the semiconductor industry and efforts to reduce labor and unmanned operations in the retail and food manufacturing industries, objects to be recognized at production sites are increasingly diversified in color and design. Depending on the target objects, it may be more reliable to process only color information, while intensity information may be better, or a combination of color and intensity information may be better. However, there are not many conventional method for optimizing the color and intensity information to be used, and deep learning is too costly for production sites. In this paper, we optimize the combination of the color and intensity information of a small number of pixels used for matching in the framework of template matching, on the basis of the mutual relationship between the target object and surrounding objects. We propose a fast and reliable matching method using these few pixels. Pixels with a low pixel pattern frequency are selected from color and grayscale images of the target object, and pixels that are highly discriminative from surrounding objects are carefully selected from these pixels. The use of color and intensity information makes the method highly versatile for object design. The use of a small number of pixels that are not shared by the target and surrounding objects provides high robustness to the surrounding objects and enables fast matching. Experiments using real images have confirmed that when 14 pixels are used for matching, the processing time is 6.3 msec and the recognition success rate is 99.7%. The proposed method also showed better positional accuracy than the comparison method, and the optimized pixels had a higher recognition success rate than the non-optimized pixels.

  • Reinforced Voxel-RCNN: An Efficient 3D Object Detection Method Based on Feature Aggregation Open Access

    Jia-ji JIANG  Hai-bin WAN  Hong-min SUN  Tuan-fa QIN  Zheng-qiang WANG  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2024/04/24
      Vol:
    E107-D No:9
      Page(s):
    1228-1238

    In this paper, the Towards High Performance Voxel-based 3D Object Detection (Voxel-RCNN) three-dimensional (3D) point cloud object detection model is used as the benchmark network. Aiming at the problems existing in the current mainstream 3D point cloud voxelization methods, such as the backbone and the lack of feature expression ability under the bird’s-eye view (BEV), a high-performance voxel-based 3D object detection network (Reinforced Voxel-RCNN) is proposed. Firstly, a 3D feature extraction module based on the integration of inverted residual convolutional network and weight normalization is designed on the 3D backbone. This module can not only well retain more point cloud feature information, enhance the information interaction between convolutional layers, but also improve the feature extraction ability of the backbone network. Secondly, a spatial feature-semantic fusion module based on spatial and channel attention is proposed from a BEV perspective. The mixed use of channel features and semantic features further improves the network’s ability to express point cloud features. In the comparison of experimental results on the public dataset KITTI, the experimental results of this paper are better than many voxel-based methods. Compared with the baseline network, the 3D average accuracy and BEV average accuracy on the three categories of Car, Cyclist, and Pedestrians are improved. Among them, in the 3D average accuracy, the improvement rate of Car category is 0.23%, Cyclist is 0.78%, and Pedestrians is 2.08%. In the context of BEV average accuracy, enhancements are observed: 0.32% for the Car category, 0.99% for Cyclist, and 2.38% for Pedestrians. The findings demonstrate that the algorithm enhancement introduced in this study effectively enhances the accuracy of target category detection.

  • Edge Device Verification Techniques for Updated Object Detection AI via Target Object Existence Open Access

    Akira KITAYAMA  Goichi ONO  Hiroaki ITO  

     
    PAPER-Intelligent Transport System

      Pubricized:
    2023/12/20
      Vol:
    E107-A No:8
      Page(s):
    1286-1295

    Edge devices with strict safety and reliability requirements, such as autonomous driving cars, industrial robots, and drones, necessitate software verification on such devices before operation. The human cost and time required for this analysis constitute a barrier in the cycle of software development and updating. In particular, the final verification at the edge device should at least strictly confirm that the updated software is not degraded from the current it. Since the edge device does not have the correct data, it is necessary for a human to judge whether the difference between the updated software and the operating it is due to degradation or improvement. Therefore, this verification is very costly. This paper proposes a novel automated method for efficient verification on edge devices of an object detection AI, which has found practical use in various applications. In the proposed method, a target object existence detector (TOED) (a simple binary classifier) judges whether an object in the recognition target class exists in the region of a prediction difference between the AI’s operating and updated versions. Using the results of this TOED judgement and the predicted difference, an automated verification system for the updated AI was constructed. TOED was designed as a simple binary classifier with four convolutional layers, and the accuracy of object existence judgment was evaluated for the difference between the predictions of the YOLOv5 L and X models using the Cityscapes dataset. The results showed judgement with more than 99.5% accuracy and 8.6% over detection, thus indicating that a verification system adopting this method would be more efficient than simple analysis of the prediction differences.

  • A Monkey Swing Counting Algorithm Based on Object Detection Open Access

    Hao CHEN  Zhe-Ming LU  Jie LIU  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2023/12/07
      Vol:
    E107-D No:4
      Page(s):
    579-583

    This Letter focuses on deep learning-based monkeys' head swing counting problem. Nowadays, there are very few papers on monkey detection, and even fewer papers on monkeys' head swing counting. This research tries to fill in the gap and try to calculate the head swing frequency of monkeys through deep learning, where we further extend the traditional target detection algorithm. After analyzing object detection results, we localize the monkey's actions over a period. This Letter analyzes the task of counting monkeys' head swings, and proposes the standard that accurately describes a monkey's head swing. Under the guidance of this standard, the monkeys' head swing counting accuracy in 50 test videos reaches 94.23%.

  • Improved Head and Data Augmentation to Reduce Artifacts at Grid Boundaries in Object Detection

    Shinji UCHINOURA  Takio KURITA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2023/10/23
      Vol:
    E107-D No:1
      Page(s):
    115-124

    We investigated the influence of horizontal shifts of the input images for one stage object detection method. We found that the object detector class scores drop when the target object center is at the grid boundary. Many approaches have focused on reducing the aliasing effect of down-sampling to achieve shift-invariance. However, down-sampling does not completely solve this problem at the grid boundary; it is necessary to suppress the dispersion of features in pixels close to the grid boundary into adjacent grid cells. Therefore, this paper proposes two approaches focused on the grid boundary to improve this weak point of current object detection methods. One is the Sub-Grid Feature Extraction Module, in which the sub-grid features are added to the input of the classification head. The other is Grid-Aware Data Augmentation, where augmented data are generated by the grid-level shifts and are used in training. The effectiveness of the proposed approaches is demonstrated using the COCO validation set after applying the proposed method to the FCOS architecture.

  • GAN-based Image Translation Model with Self-Attention for Nighttime Dashcam Data Augmentation

    Rebeka SULTANA  Gosuke OHASHI  

     
    PAPER-Intelligent Transport System

      Pubricized:
    2023/06/27
      Vol:
    E106-A No:9
      Page(s):
    1202-1210

    High-performance deep learning-based object detection models can reduce traffic accidents using dashcam images during nighttime driving. Deep learning requires a large-scale dataset to obtain a high-performance model. However, existing object detection datasets are mostly daytime scenes and a few nighttime scenes. Increasing the nighttime dataset is laborious and time-consuming. In such a case, it is possible to convert daytime images to nighttime images by image-to-image translation model to augment the nighttime dataset with less effort so that the translated dataset can utilize the annotations of the daytime dataset. Therefore, in this study, a GAN-based image-to-image translation model is proposed by incorporating self-attention with cycle consistency and content/style separation for nighttime data augmentation that shows high fidelity to annotations of the daytime dataset. Experimental results highlight the effectiveness of the proposed model compared with other models in terms of translated images and FID scores. Moreover, the high fidelity of translated images to the annotations is verified by a small object detection model according to detection results and mAP. Ablation studies confirm the effectiveness of self-attention in the proposed model. As a contribution to GAN-based data augmentation, the source code of the proposed image translation model is publicly available at https://github.com/subecky/Image-Translation-With-Self-Attention

  • Location First Non-Maximum Suppression for Uncovered Muck Truck Detection

    Yuxiang ZHANG  Dehua LIU  Chuanpeng SU  Juncheng LIU  

     
    PAPER-Image

      Pubricized:
    2022/12/13
      Vol:
    E106-A No:6
      Page(s):
    924-931

    Uncovered muck truck detection aims to detect the muck truck and distinguish whether it is covered or not by dust-proof net to trace the source of pollution. Unlike traditional detection problem, recalling all uncovered trucks is more important than accurate locating for pollution traceability. When two objects are very close in an image, the occluded object may not be recalled because the non-maximum suppression (NMS) algorithm can remove the overlapped proposal. To address this issue, we propose a Location First NMS method to match the ground truth boxes and predicted boxes by position rather than class identifier (ID) in the training stage. Firstly, a box matching method is introduced to re-assign the predicted box ID using the closest ground truth one, which can avoid object missing when the IoU of two proposals is greater than the threshold. Secondly, we design a loss function to adapt the proposed algorithm. Thirdly, a uncovered muck truck detection system is designed using the method in a real scene. Experiment results show the effectiveness of the proposed method.

  • A Novel SSD-Based Detection Algorithm Suitable for Small Object

    Xi ZHANG  Yanan ZHANG  Tao GAO  Yong FANG  Ting CHEN  

     
    PAPER-Core Methods

      Pubricized:
    2022/01/06
      Vol:
    E106-D No:5
      Page(s):
    625-634

    The original single-shot multibox detector (SSD) algorithm has good detection accuracy and speed for regular object recognition. However, the SSD is not suitable for detecting small objects for two reasons: 1) the relationships among different feature layers with various scales are not considered, 2) the predicted results are solely determined by several independent feature layers. To enhance its detection capability for small objects, this study proposes an improved SSD-based algorithm called proportional channels' fusion SSD (PCF-SSD). Three enhancements are provided by this novel PCF-SSD algorithm. First, a fusion feature pyramid model is proposed by concatenating channels of certain key feature layers in a given proportion for object detection. Second, the default box sizes are adjusted properly for small object detection. Third, an improved loss function is suggested to train the above-proposed fusion model, which can further improve object detection performance. A series of experiments are conducted on the public database Pascal VOC to validate the PCF-SSD. On comparing with the original SSD algorithm, our algorithm improves the mean average precision and detection accuracy for small objects by 3.3% and 3.9%, respectively, with a detection speed of 40FPS. Furthermore, the proposed PCF-SSD can achieve a better balance of detection accuracy and efficiency than the original SSD algorithm, as demonstrated by a series of experimental results.

  • Convolution Block Feature Addition Module (CBFAM) for Lightweight and Fast Object Detection on Non-GPU Devices

    Min Ho KWAK  Youngwoo KIM  Kangin LEE  Jae Young CHOI  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2023/01/24
      Vol:
    E106-D No:5
      Page(s):
    1106-1110

    This letter proposes a novel lightweight deep learning object detector named LW-YOLOv4-tiny, which incorporates the convolution block feature addition module (CBFAM). The novelty of LW-YOLOv4-tiny is the use of channel-wise convolution and element-wise addition in the CBFAM instead of utilizing the concatenation of different feature maps. The model size and computation requirement are reduced by up to 16.9 Mbytes, 5.4 billion FLOPs (BFLOPS), and 11.3 FPS, which is 31.9%, 22.8%, and 30% smaller and faster than the most recent version of YOLOv4-tiny. From the MSCOCO2017 and PASCAL VOC2012 benchmarks, LW-YOLOv4-tiny achieved 40.2% and 69.3% mAP, respectively.

  • DFAM-DETR: Deformable Feature Based Attention Mechanism DETR on Slender Object Detection

    Feng WEN  Mei WANG  Xiaojie HU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/12/09
      Vol:
    E106-D No:3
      Page(s):
    401-409

    Object detection is one of the most important aspects of computer vision, and the use of CNNs for object detection has yielded substantial results in a variety of fields. However, due to the fixed sampling in standard convolution layers, it restricts receptive fields to fixed locations and limits CNNs in geometric transformations. This leads to poor performance of CNNs for slender object detection. In order to achieve better slender object detection accuracy and efficiency, this proposed detector DFAM-DETR not only can adjust the sampling points adaptively, but also enhance the ability to focus on slender object features and extract essential information from global to local on the image through an attention mechanism. This study uses slender objects images from MS-COCO dataset. The experimental results show that DFAM-DETR achieves excellent detection performance on slender objects compared to CNN and transformer-based detectors.

  • Access Control with Encrypted Feature Maps for Object Detection Models

    Teru NAGAMORI  Hiroki ITO  AprilPyone MAUNGMAUNG  Hitoshi KIYA  

     
    PAPER

      Pubricized:
    2022/11/02
      Vol:
    E106-D No:1
      Page(s):
    12-21

    In this paper, we propose an access control method with a secret key for object detection models for the first time so that unauthorized users without a secret key cannot benefit from the performance of trained models. The method enables us not only to provide a high detection performance to authorized users but to also degrade the performance for unauthorized users. The use of transformed images was proposed for the access control of image classification models, but these images cannot be used for object detection models due to performance degradation. Accordingly, in this paper, selected feature maps are encrypted with a secret key for training and testing models, instead of input images. In an experiment, the protected models allowed authorized users to obtain almost the same performance as that of non-protected models but also with robustness against unauthorized access without a key.

  • MemFRCN: Few Shot Object Detection with Memorable Faster-RCNN

    TongWei LU  ShiHai JIA  Hao ZHANG  

     
    LETTER-Vision

      Pubricized:
    2022/05/24
      Vol:
    E105-A No:12
      Page(s):
    1626-1630

    At this stage, research in the field of Few-shot image classification (FSC) has made good progress, but there are still many difficulties in the field of Few-shot object detection (FSOD). Almost all of the current FSOD methods face catastrophic forgetting problems, which are manifested in that the accuracy of base class recognition will drop seriously when acquiring the ability to recognize Novel classes. And for many methods, the accuracy of the model will fall back as the class increases. To address this problem we propose a new memory-based method called Memorable Faster R-CNN (MemFRCN), which makes the model remember the categories it has already seen. Specifically, we propose a new tow-stage object detector consisting of a memory-based classifier (MemCla), a fully connected neural network classifier (FCC) and an adaptive fusion block (AdFus). The former stores the embedding vector of each category as memory, which enables the model to have memory capabilities to avoid catastrophic forgetting events. The final part fuses the outputs of FCC and MemCla, which can automatically adjust the fusion method of the model when the number of samples increases so that the model can achieve better performance under various conditions. Our method can perform well on unseen classes while maintaining the detection accuracy of seen classes. Experimental results demonstrate that our method outperforms other current methods on multiple benchmarks.

  • A Bus Crowdedness Sensing System Using Deep-Learning Based Object Detection

    Wenhao HUANG  Akira TSUGE  Yin CHEN  Tadashi OKOSHI  Jin NAKAZAWA  

     
    PAPER

      Pubricized:
    2022/06/23
      Vol:
    E105-D No:10
      Page(s):
    1712-1720

    Crowdedness of buses is playing an increasingly important role in the disease control of COVID-19. The lack of a practical approach to sensing the crowdedness of buses is a major problem. This paper proposes a bus crowdedness sensing system which exploits deep learning-based object detection to count the numbers of passengers getting on and off a bus and thus estimate the crowdedness of buses in real time. In our prototype system, we combine YOLOv5s object detection model with Kalman Filter object tracking algorithm to implement a sensing algorithm running on a Jetson nano-based vehicular device mounted on a bus. By using the driving recorder video data taken from real bus, we experimentally evaluate the performance of the proposed sensing system to verify that our proposed system system improves counting accuracy and achieves real-time processing at the Jetson Nano platform.

  • BFF R-CNN: Balanced Feature Fusion for Object Detection

    Hongzhe LIU  Ningwei WANG  Xuewei LI  Cheng XU  Yaze LI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/05/17
      Vol:
    E105-D No:8
      Page(s):
    1472-1480

    In the neck part of a two-stage object detection network, feature fusion is generally carried out in either a top-down or bottom-up manner. However, two types of imbalance may exist: feature imbalance in the neck of the model and gradient imbalance in the region of interest extraction layer due to the scale changes of objects. The deeper the network is, the more abstract the learned features are, that is to say, more semantic information can be extracted. However, the extracted image background, spatial location, and other resolution information are less. In contrast, the shallow part can learn little semantic information, but a lot of spatial location information. We propose the Both Ends to Centre to Multiple Layers (BEtM) feature fusion method to solve the feature imbalance problem in the neck and a Multi-level Region of Interest Feature Extraction (MRoIE) layer to solve the gradient imbalance problem. In combination with the Region-based Convolutional Neural Network (R-CNN) framework, our Balanced Feature Fusion (BFF) method offers significantly improved network performance compared with the Faster R-CNN architecture. On the MS COCO 2017 dataset, it achieves an average precision (AP) that is 1.9 points and 3.2 points higher than those of the Feature Pyramid Network (FPN) Faster R-CNN framework and the Generic Region of Interest Extractor (GRoIE) framework, respectively.

  • Temporal Ensemble SSDLite: Exploiting Temporal Correlation in Video for Accurate Object Detection

    Lukas NAKAMURA  Hiromitsu AWANO  

     
    PAPER-Vision

      Pubricized:
    2022/01/18
      Vol:
    E105-A No:7
      Page(s):
    1082-1090

    We propose “Temporal Ensemble SSDLite,” a new method for video object detection that boosts accuracy while maintaining detection speed and energy consumption. Object detection for video is becoming increasingly important as a core part of applications in robotics, autonomous driving and many other promising fields. Many of these applications require high accuracy and speed to be viable, but are used in compute and energy restricted environments. Therefore, new methods that increase the overall performance of video object detection i.e., accuracy and speed have to be developed. To increase accuracy we use ensemble, the machine learning method of combining predictions of multiple different models. The drawback of ensemble is the increased computational cost which is proportional to the number of models used. We overcome this deficit by deploying our ensemble temporally, meaning we inference with only a single model at each frame, cycling through our ensemble of models at each frame. Then, we combine the predictions for the last N frames where N is the number of models in our ensemble through non-max-suppression. This is possible because close frames in a video are extremely similar due to temporal correlation. As a result, we increase accuracy through the ensemble while only inferencing a single model at each frame and therefore keeping the detection speed. To evaluate the proposal, we measure the accuracy, detection speed and energy consumption on the Google Edge TPU, a machine learning inference accelerator, with the Imagenet VID dataset. Our results demonstrate an accuracy boost of up to 4.9% while maintaining real-time detection speed and an energy consumption of 181mJ per image.

  • Saliency Detection via Absorbing Markov Chain with Multi-Level Cues

    Pengfei LV  Xiaosheng YU  Jianning CHI  Chengdong WU  

     
    LETTER-Image

      Pubricized:
    2021/12/07
      Vol:
    E105-A No:6
      Page(s):
    1010-1014

    A robust saliency detection approach for images with a complex background is proposed. The absorbing Markov chain integrating low-level, mid-level and high-level cues dynamically evolves by using the similarity between pixels to detect saliency objects. The experimental results show that the proposed algorithm has advantages in saliency detection, especially for images with a chaotic background or low contrast.

  • Localization of Pointed-At Word in Printed Documents via a Single Neural Network

    Rubin ZHAO  Xiaolong ZHENG  Zhihua YING  Lingyan FAN  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/01/26
      Vol:
    E105-D No:5
      Page(s):
    1075-1084

    Most existing object detection methods and text detection methods are mainly designed to detect either text or objects. In some scenarios where the task is to find the target word pointed-at by an object, results of existing methods are far from satisfying. However, such scenarios happen often in human-computer interaction, when the computer needs to figure out which word the user is pointing at. Comparing with object detection, pointed-at word localization (PAWL) requires higher accuracy, especially in dense text scenarios. Moreover, in printed document, characters are much smaller than those in scene text detection datasets such as ICDAR-2013, ICDAR-2015 and ICPR-2018 etc. To address these problems, the authors propose a novel target word localization network (TWLN) to detect the pointed-at word in printed documents. In this work, a single deep neural network is trained to extract the features of markers and text sequentially. For each image, the location of the marker is predicted firstly, according to the predicted location, a smaller image is cropped from the original image and put into the same network, then the location of pointed-at word is predicted. To train and test the networks, an efficient approach is proposed to generate the dataset from PDF format documents by inserting markers pointing at the words in the documents, which avoids laborious labeling work. Experiments on the proposed dataset demonstrate that TWLN outperforms the compared object detection method and optical character recognition method on every category of targets, especially when the target is a single character that only occupies several pixels in the image. TWLN is also tested with real photographs, and the accuracy shows no significant differences, which proves the validity of the generating method to construct the dataset.

  • Noisy Localization Annotation Refinement for Object Detection

    Jiafeng MAO  Qing YU  Kiyoharu AIZAWA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/05/25
      Vol:
    E104-D No:9
      Page(s):
    1478-1485

    Well annotated dataset is crucial to the training of object detectors. However, the production of finely annotated datasets for object detection tasks is extremely labor-intensive, therefore, cloud sourcing is often used to create datasets, which leads to these datasets tending to contain incorrect annotations such as inaccurate localization bounding boxes. In this study, we highlight a problem of object detection with noisy bounding box annotations and show that these noisy annotations are harmful to the performance of deep neural networks. To solve this problem, we further propose a framework to allow the network to modify the noisy datasets by alternating refinement. The experimental results demonstrate that our proposed framework can significantly alleviate the influences of noise on model performance.

  • Backbone Alignment and Cascade Tiny Object Detecting Techniques for Dolphin Detection and Classification

    Yih-Cherng LEE  Hung-Wei HSU  Jian-Jiun DING  Wen HOU  Lien-Shiang CHOU  Ronald Y. CHANG  

     
    PAPER-Image

      Pubricized:
    2020/09/29
      Vol:
    E104-A No:4
      Page(s):
    734-743

    Automatic tracking and classification are essential for studying the behaviors of wild animals. Owing to dynamic far-shooting photos, the occlusion problem, protective coloration, the background noise is irregular interference for designing a computerized algorithm for reducing human labeling resources. Moreover, wild dolphin images are hard-acquired by on-the-spot investigations, which takes a lot of waiting time and hardly sets the fixed camera to automatic monitoring dolphins on the ocean in several days. It is challenging tasks to detect well and classify a dolphin from polluted photos by a single famous deep learning method in a small dataset. Therefore, in this study, we propose a generic Cascade Small Object Detection (CSOD) algorithm for dolphin detection to handle small object problems and develop visualization to backbone based classification (V2BC) for removing noise, highlighting features of dolphin and classifying the name of dolphin. The architecture of CSOD consists of the P-net and the F-net. The P-net uses the crude Yolov3 detector to be a core network to predict all the regions of interest (ROIs) at lower resolution images. Then, the F-net, which is more robust, is applied to capture the ROIs from high-resolution photos to solve single detector problems. Moreover, a visualization to backbone based classification (V2BC) method focuses on extracting significant regions of occluded dolphin and design significant post-processing by referencing the backbone of dolphins to facilitate for classification. Compared to the state of the art methods, including faster-rcnn, yolov3 detection and Alexnet, the Vgg, and the Resnet classification. All experiments show that the proposed algorithm based on CSOD and V2BC has an excellent performance in dolphin detection and classification. Consequently, compared to the related works of classification, the accuracy of the proposed designation is over 14% higher. Moreover, our proposed CSOD detection system has 42% higher performance than that of the original Yolov3 architecture.

1-20hit(56hit)