The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] fusion(253hit)

21-40hit(253hit)

  • An Improved Real-Time Object Tracking Algorithm Based on Deep Learning Features

    Xianyu WANG  Cong LI  Heyi LI  Rui ZHANG  Zhifeng LIANG  Hai WANG  

     
    PAPER-Object Recognition and Tracking

      Pubricized:
    2022/01/07
      Vol:
    E106-D No:5
      Page(s):
    786-793

    Visual object tracking is always a challenging task in computer vision. During the tracking, the shape and appearance of the target may change greatly, and because of the lack of sufficient training samples, most of the online learning tracking algorithms will have performance bottlenecks. In this paper, an improved real-time algorithm based on deep learning features is proposed, which combines multi-feature fusion, multi-scale estimation, adaptive updating of target model and re-detection after target loss. The effectiveness and advantages of the proposed algorithm are proved by a large number of comparative experiments with other excellent algorithms on large benchmark datasets.

  • Epileptic Seizure Prediction Using Convolutional Neural Networks and Fusion Features on Scalp EEG Signals

    Qixin LAN  Bin YAO  Tao QING  

     
    LETTER-Smart Healthcare

      Pubricized:
    2022/05/27
      Vol:
    E106-D No:5
      Page(s):
    821-823

    Epileptic seizure prediction is an important research topic in the clinical epilepsy treatment, which can provide opportunities to take precautionary measures for epilepsy patients and medical staff. EEG is an commonly used tool for studying brain activity, which records the electrical discharge of brain. Many studies based on machine learning algorithms have been proposed to solve the task using EEG signal. In this study, we propose a novel seizure prediction models based on convolutional neural networks and scalp EEG for a binary classification between preictal and interictal states. The short-time Fourier transform has been used to translate raw EEG signals into STFT sepctrums, which is applied as input of the models. The fusion features have been obtained through the side-output constructions and used to train and test our models. The test results show that our models can achieve comparable results in both sensitivity and FPR upon fusion features. The proposed patient-specific model can be used in seizure prediction system for EEG classification.

  • Modality-Fused Graph Network for Cross-Modal Retrieval

    Fei WU  Shuaishuai LI  Guangchuan PENG  Yongheng MA  Xiao-Yuan JING  

     
    LETTER-Pattern Recognition

      Pubricized:
    2023/02/09
      Vol:
    E106-D No:5
      Page(s):
    1094-1097

    Cross-modal hashing technology has attracted much attention for its favorable retrieval performance and low storage cost. However, for existing cross-modal hashing methods, the heterogeneity of data across modalities is still a challenge and how to fully explore and utilize the intra-modality features has not been well studied. In this paper, we propose a novel cross-modal hashing approach called Modality-fused Graph Network (MFGN). The network architecture consists of a text channel and an image channel that are used to learn modality-specific features, and a modality fusion channel that uses the graph network to learn the modality-shared representations to reduce the heterogeneity across modalities. In addition, an integration module is introduced for the image and text channels to fully explore intra-modality features. Experiments on two widely used datasets show that our approach achieves better results than the state-of-the-art cross-modal hashing methods.

  • Multimodal Named Entity Recognition with Bottleneck Fusion and Contrastive Learning

    Peng WANG  Xiaohang CHEN  Ziyu SHANG  Wenjun KE  

     
    PAPER-Natural Language Processing

      Pubricized:
    2023/01/18
      Vol:
    E106-D No:4
      Page(s):
    545-555

    Multimodal named entity recognition (MNER) is the task of recognizing named entities in multimodal context. Existing methods focus on utilizing co-attention mechanism to discover the relationships between multiple modalities. However, they still have two deficiencies: First, current methods fail to fuse the multimodal representations in a fine-grained way, which may bring noise of visual modalities. Second, current methods ignore bridging the semantic gap between heterogeneous modalities. To solve the above issues, we propose a novel MNER method with bottleneck fusion and contrastive learning (BFCL). Specifically, we first incorporate the transformer-based bottleneck fusion mechanism, subsequently, information between different modalities can only be exchanged through several bottleneck tokens, thus reducing the noise propagation. Then we propose two decoupled image-text contrastive losses to align the unimodal representations, making the representations of semantically similar modalities closer, while the representations of semantically different modalities farther away. Experimental results demonstrate that our method is competitive to the state-of-the-art models, and achieves 74.54% and 85.70% F1-scores on Twitter-2015 and Twitter-2017 datasets, respectively.

  • Design and Development of a Card Game for Learning on the Structure of Arithmetic Story by Concatenated Sentence Integration

    Kohei YAMAGUCHI  Yusuke HAYASHI  Tsukasa HIRASHIMA  

     
    LETTER

      Pubricized:
    2022/09/15
      Vol:
    E106-D No:2
      Page(s):
    131-136

    This study focuses on creating arithmetical stories as a sub-task of problem posing and proposes a game named “Tri-prop scrabble” as a learning environment based on a fusion method of learning and game. The problem-posing ability has a positive relationship with mathematics achievement and understanding the mathematical structure of problems. In the proposed game, learners are expected to experience creating and concatenating various arithmetical stories by integrating simple sentences. The result of a preliminary feasibility study shows that the participants were able to pose and concatenate a variety of types of arithmetic stories and accept this game is helpful for learning arithmetic word problems.

  • A Multi-Modal Fusion Network Guided by Feature Co-Occurrence for Urban Region Function Recognition

    Nenghuan ZHANG  Yongbin WANG  Xiaoguang WANG  Peng YU  

     
    PAPER-Multimedia Pattern Processing

      Pubricized:
    2022/07/25
      Vol:
    E105-D No:10
      Page(s):
    1769-1779

    Recently, multi-modal fusion methods based on remote sensing data and social sensing data have been widely used in the field of urban region function recognition. However, due to the high complexity of noise problem, most of the existing methods are not robust enough when applied in real-world scenes, which seriously affect their application value in urban planning and management. In addition, how to extract valuable periodic feature from social sensing data still needs to be further study. To this end, we propose a multi-modal fusion network guided by feature co-occurrence for urban region function recognition, which leverages the co-occurrence relationship between multi-modal features to identify abnormal noise feature, so as to guide the fusion network to suppress noise feature and focus on clean feature. Furthermore, we employ a graph convolutional network that incorporates node weighting layer and interactive update layer to effectively extract valuable periodic feature from social sensing data. Lastly, experimental results on public available datasets indicate that our proposed method yeilds promising improvements of both accuracy and robustness over several state-of-the-art methods.

  • Altered Fingerprints Detection Based on Deep Feature Fusion

    Chao XU  Yunfeng YAN  Lehangyu YANG  Sheng LI  Guorui FENG  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2022/06/13
      Vol:
    E105-D No:9
      Page(s):
    1647-1651

    The altered fingerprints help criminals escape from police and cause great harm to the society. In this letter, an altered fingerprint detection method is proposed. The method is constructed by two deep convolutional neural networks to train the time-domain and frequency-domain features. A spectral attention module is added to connect two networks. After the extraction network, a feature fusion module is then used to exploit relationship of two network features. We make ablation experiments and add the module proposed in some popular architectures. Results show the proposed method can improve the performance of altered fingerprint detection compared with the recent neural networks.

  • MSFF: A Multi-Scale Feature Fusion Network for Surface Defect Detection of Aluminum Profiles

    Lianshan SUN  Jingxue WEI  Hanchao DU  Yongbin ZHANG  Lifeng HE  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2022/05/30
      Vol:
    E105-D No:9
      Page(s):
    1652-1655

    This paper presents an improved YOLOv3 network, named MSFF-YOLOv3, for precisely detecting variable surface defects of aluminum profiles in practice. First, we introduce a larger prediction scale to provide detailed information for small defect detection; second, we design an efficient attention-guided block to extract more features of defects with less overhead; third, we design a bottom-up pyramid and integrate it with the existing feature pyramid network to construct a twin-tower structure to improve the circulation and fusion of features of different layers. In addition, we employ the K-median algorithm for anchor clustering to speed up the network reasoning. Experimental results showed that the mean average precision of the proposed network MSFF-YOLOv3 is higher than all conventional networks for surface defect detection of aluminum profiles. Moreover, the number of frames processed per second for our proposed MSFF-YOLOv3 could meet real-time requirements.

  • Diabetes Noninvasive Recognition via Improved Capsule Network

    Cunlei WANG  Donghui LI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/05/06
      Vol:
    E105-D No:8
      Page(s):
    1464-1471

    Noninvasive recognition is an important trend in diabetes recognition. Unfortunately, the accuracy obtained from the conventional noninvasive recognition methods is low. This paper proposes a novel Diabetes Noninvasive Recognition method via the plantar pressure image and improved Capsule Network (DNR-CapsNet). The input of the proposed method is a plantar pressure image, and the output is the recognition result: healthy or possibly diabetes. The ResNet18 is used as the backbone of the convolutional layers to convert pixel intensities to local features in the proposed DNR-CapsNet. Then, the PrimaryCaps layer, SecondaryCaps layer, and DiabetesCaps layer are developed to achieve the diabetes recognition. The semantic fusion and locality-constrained dynamic routing are also developed to further improve the recognition accuracy in our method. The experimental results indicate that the proposed method has a better performance on diabetes noninvasive recognition than the state-of-the-art methods.

  • BFF R-CNN: Balanced Feature Fusion for Object Detection

    Hongzhe LIU  Ningwei WANG  Xuewei LI  Cheng XU  Yaze LI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/05/17
      Vol:
    E105-D No:8
      Page(s):
    1472-1480

    In the neck part of a two-stage object detection network, feature fusion is generally carried out in either a top-down or bottom-up manner. However, two types of imbalance may exist: feature imbalance in the neck of the model and gradient imbalance in the region of interest extraction layer due to the scale changes of objects. The deeper the network is, the more abstract the learned features are, that is to say, more semantic information can be extracted. However, the extracted image background, spatial location, and other resolution information are less. In contrast, the shallow part can learn little semantic information, but a lot of spatial location information. We propose the Both Ends to Centre to Multiple Layers (BEtM) feature fusion method to solve the feature imbalance problem in the neck and a Multi-level Region of Interest Feature Extraction (MRoIE) layer to solve the gradient imbalance problem. In combination with the Region-based Convolutional Neural Network (R-CNN) framework, our Balanced Feature Fusion (BFF) method offers significantly improved network performance compared with the Faster R-CNN architecture. On the MS COCO 2017 dataset, it achieves an average precision (AP) that is 1.9 points and 3.2 points higher than those of the Feature Pyramid Network (FPN) Faster R-CNN framework and the Generic Region of Interest Extractor (GRoIE) framework, respectively.

  • Efficient Multi-Scale Feature Fusion for Image Manipulation Detection

    Yuxue ZHANG  Guorui FENG  

     
    LETTER-Information Network

      Pubricized:
    2022/02/03
      Vol:
    E105-D No:5
      Page(s):
    1107-1111

    Convolutional Neural Network (CNN) has made extraordinary progress in image classification tasks. However, it is less effective to use CNN directly to detect image manipulation. To address this problem, we propose an image filtering layer and a multi-scale feature fusion module which can guide the model more accurately and effectively to perform image manipulation detection. Through a series of experiments, it is shown that our model achieves improvements on image manipulation detection compared with the previous researches.

  • RMF-Net: Improving Object Detection with Multi-Scale Strategy

    Yanyan ZHANG  Meiling SHEN  Wensheng YANG  

     
    PAPER-Multimedia Systems for Communications

      Pubricized:
    2021/12/02
      Vol:
    E105-B No:5
      Page(s):
    675-683

    We propose a target detection network (RMF-Net) based on the multi-scale strategy to solve the problems of large differences in the detection scale and mutual occlusion, which result in inaccurate locations. A multi-layer feature fusion module and multi-expansion dilated convolution pyramid module were designed based on the ResNet-101 residual network. The ability of the network to express the multi-scale features of the target could be improved by combining the shallow and deep features of the target and expanding the receptive field of the network. Moreover, RoI Align pooling was introduced to reduce the low accuracy of the anchor frame caused by multiple quantizations for improved positioning accuracy. Finally, an AD-IoU loss function was designed, which can adaptively optimise the distance between the prediction box and real box by comprehensively considering the overlap rate, centre distance, and aspect ratio between the boxes and can improve the detection accuracy of the occlusion target. Ablation experiments on the RMF-Net model verified the effectiveness of each factor in improving the network detection accuracy. Comparative experiments were conducted on the Pascal VOC2007 and Pascal VOC2012 datasets with various target detection algorithms based on convolutional neural networks. The results demonstrated that RMF-Net exhibited strong scale adaptability at different occlusion rates. The detection accuracy reached 80.4% and 78.5% respectively.

  • Assessment System of Presentation Slide Design Using Visual and Structural Features

    Shengzhou YI  Junichiro MATSUGAMI  Toshihiko YAMASAKI  

     
    PAPER

      Pubricized:
    2021/12/01
      Vol:
    E105-D No:3
      Page(s):
    587-596

    Developing well-designed presentation slides is challenging for many people, especially novices. The ability to build high quality slideshows is becoming more important in society. In this study, a neural network was used to identify novice vs. well-designed presentation slides based on visual and structural features. For such a purpose, a dataset containing 1,080 slide pairs was newly constructed. One of each pair was created by a novice, and the other was the improved one by the same person according to the experts' advice. Ten checkpoints frequently pointed out by professional consultants were extracted and set as prediction targets. The intrinsic problem was that the label distribution was imbalanced, because only a part of the samples had corresponding design problems. Therefore, re-sampling methods for addressing class imbalance were applied to improve the accuracy of the proposed model. Furthermore, we combined the target task with an assistant task for transfer and multi-task learning, which helped the proposed model achieve better performance. After the optimal settings were used for each checkpoint, the average accuracy of the proposed model rose up to 81.79%. With the advice provided by our assessment system, the novices significantly improved their slide design.

  • Image Adjustment for Multi-Exposure Images Based on Convolutional Neural Networks

    Isana FUNAHASHI  Taichi YOSHIDA  Xi ZHANG  Masahiro IWAHASHI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/10/21
      Vol:
    E105-D No:1
      Page(s):
    123-133

    In this paper, we propose an image adjustment method for multi-exposure images based on convolutional neural networks (CNNs). We call image regions without information due to saturation and object moving in multi-exposure images lacking areas in this paper. Lacking areas cause the ghosting artifact in fused images from sets of multi-exposure images by conventional fusion methods, which tackle the artifact. To avoid this problem, the proposed method estimates the information of lacking areas via adaptive inpainting. The proposed CNN consists of three networks, warp and refinement, detection, and inpainting networks. The second and third networks detect lacking areas and estimate their pixel values, respectively. In the experiments, it is observed that a simple fusion method with the proposed method outperforms state-of-the-art fusion methods in the peak signal-to-noise ratio. Moreover, the proposed method is applied for various fusion methods as pre-processing, and results show obviously reducing artifacts.

  • Semantic Guided Infrared and Visible Image Fusion

    Wei WU  Dazhi ZHANG  Jilei HOU  Yu WANG  Tao LU  Huabing ZHOU  

     
    LETTER-Image

      Pubricized:
    2021/06/10
      Vol:
    E104-A No:12
      Page(s):
    1733-1738

    In this letter, we propose a semantic guided infrared and visible image fusion method, which can train a network to fuse different semantic objects with different fusion weights according to their own characteristics. First, we design the appropriate fusion weights for each semantic object instead of the whole image. Second, we employ the semantic segmentation technology to obtain the semantic region of each object, and generate special weight maps for the infrared and visible image via pre-designed fusion weights. Third, we feed the weight maps into the loss function to guide the image fusion process. The trained fusion network can generate fused images with better visual effect and more comprehensive scene representation. Moreover, we can enhance the modal features of various semantic objects, benefiting subsequent tasks and applications. Experiment results demonstrate that our method outperforms the state-of-the-art in terms of both visual effect and quantitative metrics.

  • Siamese Visual Tracking with Dual-Pipeline Correlated Fusion Network

    Ying KANG  Cong LIU  Ning WANG  Dianxi SHI  Ning ZHOU  Mengmeng LI  Yunlong WU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/09
      Vol:
    E104-D No:10
      Page(s):
    1702-1711

    Siamese visual tracking, viewed as a problem of max-similarity matching to the target template, has absorbed increasing attention in computer vision. However, it is a challenge for current Siamese trackers that the demands of balance between accuracy in real-time tracking and robustness in long-time tracking are hard to meet. This work proposes a new Siamese based tracker with a dual-pipeline correlated fusion network (named as ADF-SiamRPN), which consists of one initial template for robust correlation, and the other transient template with the ability of adaptive feature optimal selection for accurate correlation. By the promotion from the learnable correlation-response fusion network afterwards, we are in pursuit of the synthetical improvement of tracking performance. To compare the performance of ADF-SiamRPN with state-of-the-art trackers, we conduct lots of experiments on benchmarks like OTB100, UAV123, VOT2016, VOT2018, GOT-10k, LaSOT and TrackingNet. The experimental results of tracking demonstrate that ADF-SiamRPN outperforms all the compared trackers and achieves the best balance between accuracy and robustness.

  • Research on a Prediction Method for Carbon Dioxide Concentration Based on an Optimized LSTM Network of Spatio-Temporal Data Fusion

    Jun MENG  Gangyi DING  Laiyang LIU  

     
    LETTER-Data Engineering, Web Information Systems

      Pubricized:
    2021/07/08
      Vol:
    E104-D No:10
      Page(s):
    1753-1757

    In view of the different spatial and temporal resolutions of observed multi-source heterogeneous carbon dioxide data and the uncertain quality of observations, a data fusion prediction model for observed multi-scale carbon dioxide concentration data is studied. First, a wireless carbon sensor network is created, the gross error data in the original dataset are eliminated, and remaining valid data are combined with kriging method to generate a series of continuous surfaces for expressing specific features and providing unified spatio-temporally normalized data for subsequent prediction models. Then, the long short-term memory network is used to process these continuous time- and space-normalized data to obtain the carbon dioxide concentration prediction model at any scales. Finally, the experimental results illustrate that the proposed method with spatio-temporal features is more accurate than the single sensor monitoring method without spatio-temporal features.

  • Cross-Domain Energy Consumption Prediction via ED-LSTM Networks

    Ye TAO  Fang KONG  Wenjun JU  Hui LI  Ruichun HOU  

     
    PAPER

      Pubricized:
    2021/05/11
      Vol:
    E104-D No:8
      Page(s):
    1204-1213

    As an important type of science and technology service resource, energy consumption data play a vital role in the process of value chain integration between home appliance manufacturers and the state grid. Accurate electricity consumption prediction is essential for demand response programs in smart grid planning. The vast majority of existing prediction algorithms only exploit data belonging to a single domain, i.e., historical electricity load data. However, dependencies and correlations may exist among different domains, such as the regional weather condition and local residential/industrial energy consumption profiles. To take advantage of cross-domain resources, a hybrid energy consumption prediction framework is presented in this paper. This framework combines the long short-term memory model with an encoder-decoder unit (ED-LSTM) to perform sequence-to-sequence forecasting. Extensive experiments are conducted with several of the most commonly used algorithms over integrated cross-domain datasets. The results indicate that the proposed multistep forecasting framework outperforms most of the existing approaches.

  • A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition

    Dongni HU  Chengxin CHEN  Pengyuan ZHANG  Junfeng LI  Yonghong YAN  Qingwei ZHAO  

     
    LETTER-Human-computer Interaction

      Pubricized:
    2021/04/30
      Vol:
    E104-D No:8
      Page(s):
    1391-1394

    Recently, automated recognition and analysis of human emotion has attracted increasing attention from multidisciplinary communities. However, it is challenging to utilize the emotional information simultaneously from multiple modalities. Previous studies have explored different fusion methods, but they mainly focused on either inter-modality interaction or intra-modality interaction. In this letter, we propose a novel two-stage fusion strategy named modality attention flow (MAF) to model the intra- and inter-modality interactions simultaneously in a unified end-to-end framework. Experimental results show that the proposed approach outperforms the widely used late fusion methods, and achieves even better performance when the number of stacked MAF blocks increases.

  • Efficient Data Diffusion and Elimination Control Method for Spatio-Temporal Data Retention System Open Access

    Shumpei YAMASAKI  Daiki NOBAYASHI  Kazuya TSUKAMOTO  Takeshi IKENAGA  Myung J. LEE  

     
    PAPER

      Pubricized:
    2021/01/08
      Vol:
    E104-B No:7
      Page(s):
    805-816

    With the development and spread of Internet of Things technologies, various types of data for IoT applications can be generated anywhere and at any time. Among such data, there are data that depend heavily on generation time and location. We define these data as spatio-temporal data (STD). In previous studies, we proposed a STD retention system using vehicular networks to achieve the “Local production and consumption of STD” paradigm. The system can quickly provide STD for users within a specific location by retaining the STD within the area. However, this system does not take into account that each type of STD has different requirements for STD retention. In particular, the lifetime of STD and the diffusion time to the entire area directly influence the performance of STD retention. Therefore, we propose an efficient diffusion and elimination control method for retention based on the requirements of STD. The results of simulation evaluation demonstrated that the proposed method can satisfy the requirements of STD, while maintaining a high coverage rate in the area.

21-40hit(253hit)