The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] mechanism(154hit)

1-20hit(154hit)

  • Long-Term Adaptive Bitrate Control Mechanism Open Access

    Pierre LEBRETON  Kazuhisa YAMAGISHI  

     
    PAPER-Multimedia Systems for Communications

      Vol:
    E107-B No:11
      Page(s):
    817-830

    Adaptive bitrate (ABR) video streaming is an important application on the Internet. To ensure that users enjoy high-quality services, ABR control mechanisms need to be designed that select chunks wisely on the basis of the available network throughput. To address the chunk selection problem, this paper describes an adaptive bitrate control mechanism that leverages long-term throughput information in the chunk selection process. While previous work has considered how quality should be requested on a per-chunk basis, the proposed method increases the timeframe of the analysis and allows higher quality of experience (QoE) to be reached. This is done by appropriately selecting a sequence of consecutive chunks’ quality values instead of a single chunk’s value. Simulation results are reported on a large variety of real-world network conditions and various throughput prediction algorithms and show the benefit of the proposed method over conventional ABR control mechanisms.

  • Integrating Event Elements for Chinese-Vietnamese Cross-Lingual Event Retrieval Open Access

    Yuxin HUANG  Yuanlin YANG  Enchang ZHU  Yin LIANG  Yantuan XIAN  

     
    PAPER-Natural Language Processing

      Pubricized:
    2024/06/04
      Vol:
    E107-D No:10
      Page(s):
    1353-1361

    Chinese-Vietnamese cross-lingual event retrieval aims to retrieve the Vietnamese sentence describing the same event as a given Chinese query sentence from a set of Vietnamese sentences. Existing mainstream cross-lingual event retrieval methods rely on extracting textual representations from query texts and calculating their similarity with textual representations in other language candidate sets. However, these methods ignore the difference in event elements present during Chinese-Vietnamese cross-language retrieval. Consequently, sentences with similar meanings but different event elements may be incorrectly considered to describe the same event. To address this problem, we propose a cross-lingual retrieval method that integrates event elements. We introduce event elements as an additional supervisory signal, where we calculate the semantic similarity of event elements in two sentences using an attention mechanism to determine the attention score of the event elements. This allows us to establish a one-to-one correspondence between event elements in the text. Additionally, we leverage the multilingual pre-trained language model fine-tuned based on contrastive learning to obtain cross-language sentence representation to calculate the semantic similarity of the sentence texts. By combining these two approaches, we obtain the final text similarity score. Experimental results demonstrate that our proposed method achieves higher retrieval accuracy than the baseline model.

  • Modulation Recognition of Communication Signals Based on Cascade Network Open Access

    Yanli HOU  Chunxiao LIU  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E107-B No:9
      Page(s):
    620-626

    To improve the recognition rate of the end-to-end modulation recognition method based on deep learning, a modulation recognition method of communication signals based on a cascade network is proposed, which is composed of two networks: Stacked Denoising Auto Encoder (SDAE) network and DCELDNN (Dilated Convolution, ECA Mechanism, Long Short-Term Memory, Deep Neural Networks) network. SDAE network is used to denoise the data, reconstruct the input data through encoding and decoding, and extract deep information from the data. DCELDNN network is constructed based on the CLDNN (Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks) network. In the DCELDNN network, dilated convolution is used instead of normal convolution to enlarge the receptive field and extract signal features, the Efficient Channel Attention (ECA) mechanism is introduced to enhance the expression ability of the features, the feature vector information is integrated by a Global Average Pooling (GAP) layer, and signal features are extracted by the DCELDNN network efficiently. Finally, end-to-end classification recognition of communication signals is realized. The test results on the RadioML2018.01a dataset show that the average recognition accuracy of the proposed method reaches 63.1% at SNR of -10 to 15 dB, compared with CNN, LSTM, and CLDNN models, the recognition accuracy is improved by 25.8%, 12.3%, and 4.8% respectively at 10 dB SNR.

  • An Optimized CNN-Attention Network for Clipped OFDM Receiver of Underwater Acoustic Communications Open Access

    Feng LIU  Qian XI  Yanli XU  

     
    LETTER-Communication Theory and Signals

      Pubricized:
    2023/12/01
      Vol:
    E107-A No:8
      Page(s):
    1408-1412

    In underwater acoustic communication systems based on orthogonal frequency division multiplexing (OFDM), taking clipping to reduce the peak-to-average power ratio leads to nonlinear distortion of the signal, making the receiver unable to recover the faded signal accurately. In this letter, an Aquila optimizer-based convolutional attention block stacked network (AO-CABNet) is proposed to replace the receiver to improve the ability to recover the original signal. Simulation results show that the AO method has better optimization capability to quickly obtain the optimal parameters of the network model, and the proposed AO-CABNet structure outperforms existing schemes.

  • Power Peak Load Forecasting Based on Deep Time Series Analysis Method Open Access

    Ying-Chang HUNG  Duen-Ren LIU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2024/03/21
      Vol:
    E107-D No:7
      Page(s):
    845-856

    The prediction of peak power load is a critical factor directly impacting the stability of power supply, characterized significantly by its time series nature and intricate ties to the seasonal patterns in electricity usage. Despite its crucial importance, the current landscape of power peak load forecasting remains a multifaceted challenge in the field. This study aims to contribute to this domain by proposing a method that leverages a combination of three primary models - the GRU model, self-attention mechanism, and Transformer mechanism - to forecast peak power load. To contextualize this research within the ongoing discourse, it’s essential to consider the evolving methodologies and advancements in power peak load forecasting. By delving into additional references addressing the complexities and current state of the power peak load forecasting problem, this study aims to build upon the existing knowledge base and offer insights into contemporary challenges and strategies adopted within the field. Data preprocessing in this study involves comprehensive cleaning, standardization, and the design of relevant functions to ensure robustness in the predictive modeling process. Additionally, recognizing the necessity to capture temporal changes effectively, this research incorporates features such as “Weekly Moving Average” and “Monthly Moving Average” into the dataset. To evaluate the proposed methodologies comprehensively, this study conducts comparative analyses with established models such as LSTM, Self-attention network, Transformer, ARIMA, and SVR. The outcomes reveal that the models proposed in this study exhibit superior predictive performance compared to these established models, showcasing their effectiveness in accurately forecasting electricity consumption. The significance of this research lies in two primary contributions. Firstly, it introduces an innovative prediction method combining the GRU model, self-attention mechanism, and Transformer mechanism, aligning with the contemporary evolution of predictive modeling techniques in the field. Secondly, it introduces and emphasizes the utility of “Weekly Moving Average” and “Monthly Moving Average” methodologies, crucial in effectively capturing and interpreting seasonal variations within the dataset. By incorporating these features, this study enhances the model’s ability to account for seasonal influencing factors, thereby significantly improving the accuracy of peak power load forecasting. This contribution aligns with the ongoing efforts to refine forecasting methodologies and addresses the pertinent challenges within power peak load forecasting.

  • Data-Quality Aware Incentive Mechanism Based on Stackelberg Game in Mobile Edge Computing Open Access

    Shuyun LUO  Wushuang WANG  Yifei LI  Jian HOU  Lu ZHANG  

     
    PAPER-Mobile Information Network and Personal Communications

      Pubricized:
    2023/09/14
      Vol:
    E107-A No:6
      Page(s):
    873-880

    Crowdsourcing becomes a popular data-collection method to relieve the burden of high cost and latency for data-gathering. Since the involved users in crowdsourcing are volunteers, need incentives to encourage them to provide data. However, the current incentive mechanisms mostly pay attention to the data quantity, while ignoring the data quality. In this paper, we design a Data-quality awaRe IncentiVe mEchanism (DRIVE) for collaborative tasks based on the Stackelberg game to motivate users with high quality, the highlight of which is the dynamic reward allocation scheme based on the proposed data quality evaluation method. In order to guarantee the data quality evaluation response in real-time, we introduce the mobile edge computing framework. Finally, one case study is given and its real-data experiments demonstrate the superior performance of DRIVE.

  • FA-YOLO: A High-Precision and Efficient Method for Fabric Defect Detection in Textile Industry Open Access

    Kai YU  Wentao LYU  Xuyi YU  Qing GUO  Weiqiang XU  Lu ZHANG  

     
    PAPER-Neural Networks and Bioengineering

      Pubricized:
    2023/09/04
      Vol:
    E107-A No:6
      Page(s):
    890-898

    The automatic defect detection for fabric images is an essential mission in textile industry. However, there are some inherent difficulties in the detection of fabric images, such as complexity of the background and the highly uneven scales of defects. Moreover, the trade-off between accuracy and speed should be considered in real applications. To address these problems, we propose a novel model based on YOLOv4 to detect defects in fabric images, called Feature Augmentation YOLO (FA-YOLO). In terms of network structure, FA-YOLO adds an additional detection head to improve the detection ability of small defects and builds a powerful Neck structure to enhance feature fusion. First, to reduce information loss during feature fusion, we perform the residual feature augmentation (RFA) on the features after dimensionality reduction by using 1×1 convolution. Afterward, the attention module (SimAM) is embedded into the locations with rich features to improve the adaptation ability to complex backgrounds. Adaptive spatial feature fusion (ASFF) is also applied to output of the Neck to filter inconsistencies across layers. Finally, the cross-stage partial (CSP) structure is introduced for optimization. Experimental results based on three real industrial datasets, including Tianchi fabric dataset (72.5% mAP), ZJU-Leaper fabric dataset (0.714 of average F1-score) and NEU-DET steel dataset (77.2% mAP), demonstrate the proposed FA-YOLO achieves competitive results compared to other state-of-the-art (SoTA) methods.

  • Analysis of Blood Cell Image Recognition Methods Based on Improved CNN and Vision Transformer Open Access

    Pingping WANG  Xinyi ZHANG  Yuyan ZHAO  Yueti LI  Kaisheng XU  Shuaiyin ZHAO  

     
    PAPER-Neural Networks and Bioengineering

      Pubricized:
    2023/09/15
      Vol:
    E107-A No:6
      Page(s):
    899-908

    Leukemia is a common and highly dangerous blood disease that requires early detection and treatment. Currently, the diagnosis of leukemia types mainly relies on the pathologist’s morphological examination of blood cell images, which is a tedious and time-consuming process, and the diagnosis results are highly subjective and prone to misdiagnosis and missed diagnosis. This research suggests a blood cell image recognition technique based on an enhanced Vision Transformer to address these problems. Firstly, this paper incorporate convolutions with token embedding to replace the positional encoding which represent coarse spatial information. Then based on the Transformer’s self-attention mechanism, this paper proposes a sparse attention module that can select identifying regions in the image, further enhancing the model’s fine-grained feature expression capability. Finally, this paper uses a contrastive loss function to further increase the intra-class consistency and inter-class difference of classification features. According to experimental results, The model in this study has an identification accuracy of 92.49% on the Munich single-cell morphological dataset, which is an improvement of 1.41% over the baseline. And comparing with sota Swin transformer, this method still get greater performance. So our method has the potential to provide reference for clinical diagnosis by physicians.

  • Real-Time Video Matting Based on RVM and Mobile ViT Open Access

    Chengyu WU  Jiangshan QIN  Xiangyang LI  Ao ZHAN  Zhengqiang WANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2024/01/29
      Vol:
    E107-D No:6
      Page(s):
    792-796

    Real-time matting is a challenging research in deep learning. Conventional CNN (Convolutional Neural Networks) approaches are easy to misjudge the foreground and background semantic and have blurry matting edges, which result from CNN’s limited concentration on global context due to receptive field. We propose a real-time matting approach called RMViT (Real-time matting with Vision Transformer) with Transformer structure, attention and content-aware guidance to solve issues above. The semantic accuracy improves a lot due to the establishment of global context and long-range pixel information. The experiments show our approach exceeds a 30% reduction in error metrics compared with existing real-time matting approaches.

  • Development of a Coanda-Drone with Built-in Propellers

    Zejing ZHAO  Bin ZHANG  Hun-ok LIM  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/11/10
      Vol:
    E107-D No:2
      Page(s):
    180-190

    In this study, a Coanda-drone with length, width, and height of 121.6, 121.6, and 191[mm] was designed, and its total mass was 1166.7[g]. Using four propulsion devices, it could produce a maximum of 5428[g] thrust. Its structure is very different from conventional drones because in this study it combines the design of the jet engine of a jet fixed-wing drone with the fuselage structure layout of a rotary-wing drone. The advantage of jet drone's high propulsion is kept so that it can output greater thrust under the same variation of PWM waveform output. In this study, the propulsion device performs high-speed jetting, and the airflow around the propulsion device will also be jetted downward along the direction of the airflow.

  • CASEformer — A Transformer-Based Projection Photometric Compensation Network

    Yuqiang ZHANG  Huamin YANG  Cheng HAN  Chao ZHANG  Chaoran ZHU  

     
    PAPER

      Pubricized:
    2023/09/29
      Vol:
    E107-D No:1
      Page(s):
    13-28

    In this paper, we present a novel photometric compensation network named CASEformer, which is built upon the Swin module. For the first time, we combine coordinate attention and channel attention mechanisms to extract rich features from input images. Employing a multi-level encoder-decoder architecture with skip connections, we establish multiscale interactions between projection surfaces and projection images, achieving precise inference and compensation. Furthermore, through an attention fusion module, which simultaneously leverages both coordinate and channel information, we enhance the global context of feature maps while preserving enhanced texture coordinate details. The experimental results demonstrate the superior compensation effectiveness of our approach compared to the current state-of-the-art methods. Additionally, we propose a method for multi-surface projection compensation, further enriching our contributions.

  • A Driver Fatigue Detection Algorithm Based on Dynamic Tracking of Small Facial Targets Using YOLOv7

    Shugang LIU  Yujie WANG  Qiangguo YU  Jie ZHAN  Hongli LIU  Jiangtao LIU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2023/08/21
      Vol:
    E106-D No:11
      Page(s):
    1881-1890

    Driver fatigue detection has become crucial in vehicle safety technology. Achieving high accuracy and real-time performance in detecting driver fatigue is paramount. In this paper, we propose a novel driver fatigue detection algorithm based on dynamic tracking of Facial Eyes and Yawning using YOLOv7, named FEY-YOLOv7. The Coordinate Attention module is inserted into YOLOv7 to enhance its dynamic tracking accuracy by focusing on coordinate information. Additionally, a small target detection head is incorporated into the network architecture to promote the feature extraction ability of small facial targets such as eyes and mouth. In terms of compution, the YOLOv7 network architecture is significantly simplified to achieve high detection speed. Using the proposed PERYAWN algorithm, driver status is labeled and detected by four classes: open_eye, closed_eye, open_mouth, and closed_mouth. Furthermore, the Guided Image Filtering algorithm is employed to enhance image details. The proposed FEY-YOLOv7 is trained and validated on RGB-infrared datasets. The results show that FEY-YOLOv7 has achieved mAP of 0.983 and FPS of 101. This indicates that FEY-YOLOv7 is superior to state-of-the-art methods in accuracy and speed, providing an effective and practical solution for image-based driver fatigue detection.

  • FOM-CDS PUF: A Novel Configurable Dual State Strong PUF Based on Feedback Obfuscation Mechanism against Modeling Attacks

    Hong LI  Wenjun CAO  Chen WANG  Xinrui ZHU  Guisheng LIAO  Zhangqing HE  

     
    PAPER-Cryptography and Information Security

      Pubricized:
    2023/03/29
      Vol:
    E106-A No:10
      Page(s):
    1311-1321

    The configurable Ring oscillator Physical unclonable function (CRO PUF) is the newly proposed strong PUF based on classic RO PUF, which can generate exponential Challenge-Response Pairs (CRPs) and has good uniqueness and reliability. However, existing proposals have low hardware utilization and vulnerability to modeling attacks. In this paper, we propose a Novel Configurable Dual State (CDS) PUF with lower overhead and higher resistance to modeling attacks. This structure can be flexibly transformed into RO PUF and TERO PUF in the same topology according to the parity of the Hamming Weight (HW) of the challenge, which can achieve 100% utilization of the inverters and improve the efficiency of hardware utilization. A feedback obfuscation mechanism (FOM) is also proposed, which uses the stable count value of the ring oscillator in the PUF as the updated mask to confuse and hide the original challenge, significantly improving the effect of resisting modeling attacks. The proposed FOM-CDS PUF is analyzed by building a mathematical model and finally implemented on Xilinx Artix-7 FPGA, the test results show that the FOM-CDS PUF can effectively resist several popular modeling attack methods and the prediction accuracy is below 60%. Meanwhile it shows that the FOM-CDS PUF has good performance with uniformity, Bit Error Rate at different temperatures, Bit Error Rate at different voltages and uniqueness of 53.68%, 7.91%, 5.64% and 50.33% respectively.

  • iLEDGER: A Lightweight Blockchain Framework with New Consensus Method for IoT Applications

    Veeramani KARTHIKA  Suresh JAGANATHAN  

     
    PAPER-Cryptography and Information Security

      Pubricized:
    2023/03/06
      Vol:
    E106-A No:9
      Page(s):
    1251-1262

    Considering the growth of the IoT network, there is a demand for a decentralized solution. Incorporating the blockchain technology will eliminate the challenges faced in centralized solutions, such as i) high infrastructure, ii) maintenance cost, iii) lack of transparency, iv) privacy, and v) data tampering. Blockchain-based IoT network allows businesses to access and share the IoT data within their organization without a central authority. Data in the blockchain are stored as blocks, which should be validated and added to the chain, for this consensus mechanism plays a significant role. However, existing methods are not designed for IoT applications and lack features like i) decentralization, ii) scalability, iii) throughput, iv) faster convergence, and v) network overhead. Moreover, current blockchain frameworks failed to support resource-constrained IoT applications. In this paper, we proposed a new consensus method (WoG) and a lightweight blockchain framework (iLEDGER), mainly for resource-constrained IoT applications in a permissioned environment. The proposed work is tested in an application that tracks the assets using IoT devices (Raspberry Pi 4 and RFID). Furthermore, the proposed consensus method is analyzed against benign failures, and performance parameters such as CPU usage, memory usage, throughput, transaction execution time, and block generation time are compared with state-of-the-art methods.

  • A Lightweight and Efficient Infrared Pedestrian Semantic Segmentation Method

    Shangdong LIU  Chaojun MEI  Shuai YOU  Xiaoliang YAO  Fei WU  Yimu JI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2023/06/13
      Vol:
    E106-D No:9
      Page(s):
    1564-1571

    The thermal imaging pedestrian segmentation system has excellent performance in different illumination conditions, but it has some drawbacks(e.g., weak pedestrian texture information, blurred object boundaries). Meanwhile, high-performance large models have higher latency on edge devices with limited computing performance. To solve the above problems, in this paper, we propose a real-time thermal infrared pedestrian segmentation method. The feature extraction layers of our method consist of two paths. Firstly, we utilize the lossless spatial downsampling to obtain boundary texture details on the spatial path. On the context path, we use atrous convolutions to improve the receptive field and obtain more contextual semantic information. Then, the parameter-free attention mechanism is introduced at the end of the two paths for effective feature selection, respectively. The Feature Fusion Module (FFM) is added to fuse the semantic information of the two paths after selection. Finally, we accelerate method inference through multi-threading techniques on the edge computing device. Besides, we create a high-quality infrared pedestrian segmentation dataset to facilitate research. The comparative experiments on the self-built dataset and two public datasets with other methods show that our method also has certain effectiveness. Our code is available at https://github.com/mcjcs001/LEIPNet.

  • An Integrated Convolutional Neural Network with a Fusion Attention Mechanism for Acoustic Scene Classification

    Pengxu JIANG  Yue XIE  Cairong ZOU  Li ZHAO  Qingyun WANG  

     
    LETTER-Engineering Acoustics

      Pubricized:
    2023/02/06
      Vol:
    E106-A No:8
      Page(s):
    1057-1061

    In human-computer interaction, acoustic scene classification (ASC) is one of the relevant research domains. In real life, the recorded audio may include a lot of noise and quiet clips, making it hard for earlier ASC-based research to isolate the crucial scene information in sound. Furthermore, scene information may be scattered across numerous audio frames; hence, selecting scene-related frames is crucial for ASC. In this context, an integrated convolutional neural network with a fusion attention mechanism (ICNN-FA) is proposed for ASC. Firstly, segmented mel-spectrograms as the input of ICNN can assist the model in learning the short-term time-frequency correlation information. Then, the designed ICNN model is employed to learn these segment-level features. In addition, the proposed global attention layer may gather global information by integrating these segment features. Finally, the developed fusion attention layer is utilized to fuse all segment-level features while the classifier classifies various situations. Experimental findings using ASC datasets from DCASE 2018 and 2019 indicate the efficacy of the suggested method.

  • Time-Series Prediction Based on Double Pyramid Bidirectional Feature Fusion Mechanism

    Na WANG  Xianglian ZHAO  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2022/12/20
      Vol:
    E106-A No:6
      Page(s):
    886-895

    The application of time-series prediction is very extensive, and it is an important problem across many fields, such as stock prediction, sales prediction, and loan prediction and so on, which play a great value in production and life. It requires that the model can effectively capture the long-term feature dependence between the output and input. Recent studies show that Transformer can improve the prediction ability of time-series. However, Transformer has some problems that make it unable to be directly applied to time-series prediction, such as: (1) Local agnosticism: Self-attention in Transformer is not sensitive to short-term feature dependence, which leads to model anomalies in time-series; (2) Memory bottleneck: The spatial complexity of regular transformation increases twice with the sequence length, making direct modeling of long time-series infeasible. In order to solve these problems, this paper designs an efficient model for long time-series prediction. It is a double pyramid bidirectional feature fusion mechanism network with parallel Temporal Convolution Network (TCN) and FastFormer. This network structure can combine the time series fine-grained information captured by the Temporal Convolution Network with the global interactive information captured by FastFormer, it can well handle the time series prediction problem.

  • GazeFollowTR: A Method of Gaze Following with Reborn Mechanism

    Jingzhao DAI  Ming LI  Xuejiao HU  Yang LI  Sidan DU  

     
    PAPER-Vision

      Pubricized:
    2022/11/30
      Vol:
    E106-A No:6
      Page(s):
    938-946

    Gaze following is the task of estimating where an observer is looking inside a scene. Both the observer and scene information must be learned to determine the gaze directions and gaze points. Recently, many existing works have only focused on scenes or observers. In contrast, revealed frameworks for gaze following are limited. In this paper, a gaze following method using a hybrid transformer is proposed. Based on the conventional method (GazeFollow), we conduct three developments. First, a hybrid transformer is applied for learning head images and gaze positions. Second, the pinball loss function is utilized to control the gaze point error. Finally, a novel ReLU layer with the reborn mechanism (reborn ReLU) is conducted to replace traditional ReLU layers in different network stages. To test the performance of our developments, we train our developed framework with the DL Gaze dataset and evaluate the model on our collected set. Through our experimental results, it can be proven that our framework can achieve outperformance over our referred methods.

  • Photochemical Stability of Organic Electro-Optic Polymer at 1310-nm Wavelength Open Access

    Yukihiro TOMINARI  Toshiki YAMADA  Takahiro KAJI  Akira OTOMO  

     
    BRIEF PAPER

      Pubricized:
    2022/11/10
      Vol:
    E106-C No:6
      Page(s):
    228-231

    We investigated the photochemical stability of an electro-optic (EO) polymer under laser irradiation at 1310nm to reveal photodegradation mechanisms. It was found that one-photon absorption excitation assisted with the thermal energy at the temperature is involved in the photodegradation process, in contrast to our previous studies at a wavelength of 1550nm where two-photon absorption excitation is involved in the photodegradation process. Thus, both the excitation wavelength and the thermal energy strongly affect to the degradation mechanism. In any cases, the photodegradation of EO polymers is mainly related to the generation of exited singlet oxygen.

  • A Visual Question Answering Network Merging High- and Low-Level Semantic Information

    Huimin LI  Dezhi HAN  Chongqing CHEN  Chin-Chen CHANG  Kuan-Ching LI  Dun LI  

     
    PAPER-Core Methods

      Pubricized:
    2022/01/06
      Vol:
    E106-D No:5
      Page(s):
    581-589

    Visual Question Answering (VQA) usually uses deep attention mechanisms to learn fine-grained visual content of images and textual content of questions. However, the deep attention mechanism can only learn high-level semantic information while ignoring the impact of the low-level semantic information on answer prediction. For such, we design a High- and Low-Level Semantic Information Network (HLSIN), which employs two strategies to achieve the fusion of high-level semantic information and low-level semantic information. Adaptive weight learning is taken as the first strategy to allow different levels of semantic information to learn weights separately. The gate-sum mechanism is used as the second to suppress invalid information in various levels of information and fuse valid information. On the benchmark VQA-v2 dataset, we quantitatively and qualitatively evaluate HLSIN and conduct extensive ablation studies to explore the reasons behind HLSIN's effectiveness. Experimental results demonstrate that HLSIN significantly outperforms the previous state-of-the-art, with an overall accuracy of 70.93% on test-dev.

1-20hit(154hit)