The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] mechanism(154hit)

21-40hit(154hit)

  • Effectively Utilizing the Category Labels for Image Captioning

    Junlong FENG  Jianping ZHAO  

     
    PAPER-Core Methods

      Pubricized:
    2021/12/13
      Vol:
    E106-D No:5
      Page(s):
    617-624

    As a further investigation of the image captioning task, some works extended the vision-text dataset for specific subtasks, such as the stylized caption generating. The corpus in such dataset is usually composed of obvious sentiment-bearing words. While, in some special cases, the captions are classified depending on image category. This will result in a latent problem: the generated sentences are in close semantic meaning but belong to different or even opposite categories. It is a worthy issue to explore an effective way to utilize the image category label to boost the caption difference. Therefore, we proposed an image captioning network with the label control mechanism (LCNET) in this paper. First, to further improve the caption difference, LCNET employs a semantic enhancement module to provide the decoder with global semantic vectors. Then, through the proposed label control LSTM, LCNET can dynamically modulate the caption generation depending on the image category labels. Finally, the decoder integrates the spatial image features with global semantic vectors to output the caption. Using all the standard evaluation metrics shows that our model outperforms the compared models. Caption analysis demonstrates our approach can improve the performance of semantic representation. Compared with other label control mechanisms, our model is capable of boosting the caption difference according to the labels and keeping a better consistent with image content as well.

  • Learning Pixel Perception for Identity and Illumination Consistency Face Frontalization in the Wild

    Yongtang BAO  Pengfei ZHOU  Yue QI  Zhihui WANG  Qing FAN  

     
    PAPER-Person Image Generation

      Pubricized:
    2022/06/21
      Vol:
    E106-D No:5
      Page(s):
    794-803

    A frontal and realistic face image was synthesized from a single profile face image. It has a wide range of applications in face recognition. Although the frontal face method based on deep learning has made substantial progress in recent years, there is still no guarantee that the generated face has identity consistency and illumination consistency in a significant posture. This paper proposes a novel pixel-based feature regression generative adversarial network (PFR-GAN), which can learn to recover local high-frequency details and preserve identity and illumination frontal face images in an uncontrolled environment. We first propose a Reslu block to obtain richer feature representation and improve the convergence speed of training. We then introduce a feature conversion module to reduce the artifacts caused by face rotation discrepancy, enhance image generation quality, and preserve more high-frequency details of the profile image. We also construct a 30,000 face pose dataset to learn about various uncontrolled field environments. Our dataset includes ages of different races and wild backgrounds, allowing us to handle other datasets and obtain better results. Finally, we introduce a discriminator used for recovering the facial structure of the frontal face images. Quantitative and qualitative experimental results show our PFR-GAN can generate high-quality and high-fidelity frontal face images, and our results are better than the state-of-art results.

  • Chinese Named Entity Recognition Method Based on Dictionary Semantic Knowledge Enhancement

    Tianbin WANG  Ruiyang HUANG  Nan HU  Huansha WANG  Guanghan CHU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/02/15
      Vol:
    E106-D No:5
      Page(s):
    1010-1017

    Chinese Named Entity Recognition is the fundamental technology in the field of the Chinese Natural Language Process. It is extensively adopted into information extraction, intelligent question answering, and knowledge graph. Nevertheless, due to the diversity and complexity of Chinese, most Chinese NER methods fail to sufficiently capture the character granularity semantics, which affects the performance of the Chinese NER. In this work, we propose DSKE-Chinese NER: Chinese Named Entity Recognition based on Dictionary Semantic Knowledge Enhancement. We novelly integrate the semantic information of character granularity into the vector space of characters and acquire the vector representation containing semantic information by the attention mechanism. In addition, we verify the appropriate number of semantic layers through the comparative experiment. Experiments on public Chinese datasets such as Weibo, Resume and MSRA show that the model outperforms character-based LSTM baselines.

  • Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network

    Wenkai LIU  Cuizhu QIN  Menglong WU  Wenle BAI  Hongxia DONG  

     
    LETTER-Human-computer Interaction

      Pubricized:
    2023/02/15
      Vol:
    E106-D No:5
      Page(s):
    1081-1084

    Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.

  • DFAM-DETR: Deformable Feature Based Attention Mechanism DETR on Slender Object Detection

    Feng WEN  Mei WANG  Xiaojie HU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/12/09
      Vol:
    E106-D No:3
      Page(s):
    401-409

    Object detection is one of the most important aspects of computer vision, and the use of CNNs for object detection has yielded substantial results in a variety of fields. However, due to the fixed sampling in standard convolution layers, it restricts receptive fields to fixed locations and limits CNNs in geometric transformations. This leads to poor performance of CNNs for slender object detection. In order to achieve better slender object detection accuracy and efficiency, this proposed detector DFAM-DETR not only can adjust the sampling points adaptively, but also enhance the ability to focus on slender object features and extract essential information from global to local on the image through an attention mechanism. This study uses slender objects images from MS-COCO dataset. The experimental results show that DFAM-DETR achieves excellent detection performance on slender objects compared to CNN and transformer-based detectors.

  • Face Hallucination via Multi-Scale Structure Prior Learning

    Yuexi YAO  Tao LU  Kanghui ZHAO  Yanduo ZHANG  Yu WANG  

     
    LETTER-Image

      Pubricized:
    2022/07/19
      Vol:
    E106-A No:1
      Page(s):
    92-96

    Recently, the face hallucination method based on deep learning understands the mapping between low-resolution (LR) and high-resolution (HR) facial patterns by exploring the priors of facial structure. However, how to maintain the face structure consistency after the reconstruction of face images at different scales is still a challenging problem. In this letter, we propose a novel multi-scale structure prior learning (MSPL) for face hallucination. First, we propose a multi-scale structure prior block (MSPB). Considering the loss of high-frequency information in the LR space, we mainly process the input image in three different scale ascending dimensional spaces, and map the image to the high dimensional space to extract multi-scale structural prior information. Then the size of feature maps is recovered by downsampling, and finally the multi-scale information is fused to restore the feature channels. On this basis, we propose a local detail attention module (LDAM) to focus on the local texture information of faces. We conduct extensive face hallucination reconstruction experiments on a public face dataset (LFW) to verify the effectiveness of our method.

  • A Survey on Explainable Fake News Detection

    Ken MISHIMA  Hayato YAMANA  

     
    SURVEY PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2022/04/22
      Vol:
    E105-D No:7
      Page(s):
    1249-1257

    The increasing amount of fake news is a growing problem that will progressively worsen in our interconnected world. Machine learning, particularly deep learning, is being used to detect misinformation; however, the models employed are essentially black boxes, and thus are uninterpretable. This paper presents an overview of explainable fake news detection models. Specifically, we first review the existing models, datasets, evaluation techniques, and visualization processes. Subsequently, possible improvements in this field are identified and discussed.

  • Cluster Expansion Method for Critical Node Problem Based on Contraction Mechanism in Sparse Graphs

    Zheng WANG  Yi DI  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2022/02/24
      Vol:
    E105-D No:6
      Page(s):
    1135-1149

    The objective of critical nodes problem is to minimize pair-wise connectivity as a result of removing a specific number of nodes in the residual graph. From a mathematical modeling perspective, it comes the truth that the more the number of fragmented components and the evenly distributed of disconnected sub-graphs, the better the quality of the solution. Basing on this conclusion, we proposed a new Cluster Expansion Method for Critical Node Problem (CEMCNP), which on the one hand exploits a contraction mechanism to greedy simplify the complexity of sparse graph model, and on the other hand adopts an incremental cluster expansion approach in order to maintain the size of formed component within reasonable limitation. The proposed algorithm also relies heavily on the idea of multi-start iterative local search algorithm, whereas brings in a diversified late acceptance local search strategy to keep the balance between interleaving diversification and intensification in the process of neighborhood search. Extensive evaluations show that CEMCNP running on 35 of total 42 benchmark instances are superior to the outcome of KBV, while holding 3 previous best results out of the challenging instances. In addition, CEMCNP also demonstrates equivalent performance in comparison with the existing MANCNP and VPMS algorithms over 22 of total 42 graph models with fewer number of node exchange operations.

  • Recursive Multi-Scale Channel-Spatial Attention for Fine-Grained Image Classification

    Dichao LIU  Yu WANG  Kenji MASE  Jien KATO  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/12/22
      Vol:
    E105-D No:3
      Page(s):
    713-726

    Fine-grained image classification is a difficult problem, and previous studies mainly overcome this problem by locating multiple discriminative regions in different scales and then aggregating complementary information explored from the located regions. However, locating discriminative regions introduces heavy overhead and is not suitable for real-world application. In this paper, we propose the recursive multi-scale channel-spatial attention module (RMCSAM) for addressing this problem. Following the experience of previous research on fine-grained image classification, RMCSAM explores multi-scale attentional information. However, the attentional information is explored by recursively refining the deep feature maps of a convolutional neural network (CNN) to better correspond to multi-scale channel-wise and spatial-wise attention, instead of localizing attention regions. In this way, RMCSAM provides a lightweight module that can be inserted into standard CNNs. Experimental results show that RMCSAM can improve the classification accuracy and attention capturing ability over baselines. Also, RMCSAM performs better than other state-of-the-art attention modules in fine-grained image classification, and is complementary to some state-of-the-art approaches for fine-grained image classification. Code is available at https://github.com/Dichao-Liu/Recursive-Multi-Scale-Channel-Spatial-Attention-Module.

  • Toward Blockchain-Based Spoofing Defense for Controlled Optimization of Phases in Traffic Signal System

    Yingxiao XIANG  Chao LI  Tong CHEN  Yike LI  Endong TONG  Wenjia NIU  Qiong LI  Jiqiang LIU  Wei WANG  

     
    PAPER

      Pubricized:
    2021/09/13
      Vol:
    E105-D No:2
      Page(s):
    280-288

    Controlled optimization of phases (COP) is a core implementation in the future intelligent traffic signal system (I-SIG), which has been deployed and tested in countries including the U.S. and China. In such a system design, optimal signal control depends on dynamic traffic situation awareness via connected vehicles. Unfortunately, I-SIG suffers data spoofing from any hacked vehicle; in particular, the spoofing of the last vehicle can break the system and cause severe traffic congestion. Specifically, coordinated attacks on multiple intersections may even bring cascading failure of the road traffic network. To mitigate this security issue, a blockchain-based multi-intersection joint defense mechanism upon COP planning is designed. The major contributions of this paper are the following. 1) A blockchain network constituted by road-side units at multiple intersections, which are originally distributed and decentralized, is proposed to obtain accurate and reliable spoofing detection. 2) COP-oriented smart contract is implemented and utilized to ensure the credibility of spoofing vehicle detection. Thus, an I-SIG can automatically execute a signal planning scheme according to traffic information without spoofing data. Security analysis for the data spoofing attack is carried out to demonstrate the security. Meanwhile, experiments on the simulation platform VISSIM and Hyperledger Fabric show the efficiency and practicality of the blockchain-based defense mechanism.

  • Gender Recognition Using a Gaze-Guided Self-Attention Mechanism Robust Against Background Bias in Training Samples

    Masashi NISHIYAMA  Michiko INOUE  Yoshio IWAI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/11/18
      Vol:
    E105-D No:2
      Page(s):
    415-426

    We propose an attention mechanism in deep learning networks for gender recognition using the gaze distribution of human observers when they judge the gender of people in pedestrian images. Prevalent attention mechanisms spatially compute the correlation among values of all cells in an input feature map to calculate attention weights. If a large bias in the background of pedestrian images (e.g., test samples and training samples containing different backgrounds) is present, the attention weights learned using the prevalent attention mechanisms are affected by the bias, which in turn reduces the accuracy of gender recognition. To avoid this problem, we incorporate an attention mechanism called gaze-guided self-attention (GSA) that is inspired by human visual attention. Our method assigns spatially suitable attention weights to each input feature map using the gaze distribution of human observers. In particular, GSA yields promising results even when using training samples with the background bias. The results of experiments on publicly available datasets confirm that our GSA, using the gaze distribution, is more accurate in gender recognition than currently available attention-based methods in the case of background bias between training and test samples.

  • An Incentivization Mechanism with Validator Voting Profile in Proof-of-Stake-Based Blockchain Open Access

    Takeaki MATSUNAGA  Yuanyu ZHANG  Masahiro SASABE  Shoji KASAHARA  

     
    PAPER

      Pubricized:
    2021/08/05
      Vol:
    E105-B No:2
      Page(s):
    228-239

    The Proof of Stake (PoS) protocol is one of the consensus algorithms for blockchain, in which the integrity of a new block is validated according to voting by nodes called validators. However, due to validator-oriented voting, voting results are likely to be false when the number of validators with wrong votes increases. In the PoS protocol, validators are motivated to vote correctly by reward and penalty mechanisms. With such mechanisms, validators who contribute to correct consensuses are rewarded, while those who vote incorrectly are penalized. In this paper, we consider an incentivization mechanism based on the voting profile of a validator, which is estimated from the voting history of the validator. In this mechanism, the stake collected due to the penalties are redistributed to validators who vote correctly, improving the incentive of validators to contribute to the system. We evaluate the performance of the proposed mechanism by computer simulations, investigating the impacts of system parameters on the estimation accuracy of the validator profile and the amount of validator's stake. Numerical results show that the proposed mechanism can estimate the voting profile of a validator accurately even when the voting profile dynamically changes. It is also shown that the proposed mechanism gives more reward to validators who vote correctly with high voting profile.

  • Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network

    Mariana RODRIGUES MAKIUCHI  Tifani WARNITA  Nakamasa INOUE  Koichi SHINODA  Michitaka YOSHIMURA  Momoko KITAZAWA  Kei FUNAKI  Yoko EGUCHI  Taishiro KISHIMOTO  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/08/03
      Vol:
    E104-D No:11
      Page(s):
    1930-1940

    We propose a non-invasive and cost-effective method to automatically detect dementia by utilizing solely speech audio data. We extract paralinguistic features for a short speech segment and use Gated Convolutional Neural Networks (GCNN) to classify it into dementia or healthy. We evaluate our method on the Pitt Corpus and on our own dataset, the PROMPT Database. Our method yields the accuracy of 73.1% on the Pitt Corpus using an average of 114 seconds of speech data. In the PROMPT Database, our method yields the accuracy of 74.7% using 4 seconds of speech data and it improves to 80.8% when we use all the patient's speech data. Furthermore, we evaluate our method on a three-class classification problem in which we included the Mild Cognitive Impairment (MCI) class and achieved the accuracy of 60.6% with 40 seconds of speech data.

  • Detecting Depression from Speech through an Attentive LSTM Network

    Yan ZHAO  Yue XIE  Ruiyu LIANG  Li ZHANG  Li ZHAO  Chengyu LIU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2021/08/24
      Vol:
    E104-D No:11
      Page(s):
    2019-2023

    Depression endangers people's health conditions and affects the social order as a mental disorder. As an efficient diagnosis of depression, automatic depression detection has attracted lots of researcher's interest. This study presents an attention-based Long Short-Term Memory (LSTM) model for depression detection to make full use of the difference between depression and non-depression between timeframes. The proposed model uses frame-level features, which capture the temporal information of depressive speech, to replace traditional statistical features as an input of the LSTM layers. To achieve more multi-dimensional deep feature representations, the LSTM output is then passed on attention layers on both time and feature dimensions. Then, we concat the output of the attention layers and put the fused feature representation into the fully connected layer. At last, the fully connected layer's output is passed on to softmax layer. Experiments conducted on the DAIC-WOZ database demonstrate that the proposed attentive LSTM model achieves an average accuracy rate of 90.2% and outperforms the traditional LSTM network and LSTM with local attention by 0.7% and 2.3%, respectively, which indicates its feasibility.

  • Multi-Rate Switched Pinning Control for Velocity Control of Vehicle Platoons Open Access

    Takuma WAKASA  Kenji SAWADA  

     
    PAPER

      Pubricized:
    2021/05/12
      Vol:
    E104-A No:11
      Page(s):
    1461-1469

    This paper proposes a switched pinning control method with a multi-rating mechanism for vehicle platoons. The platoons are expressed as multi-agent systems consisting of mass-damper systems in which pinning agents receive target velocities from external devices (ex. intelligent traffic signals). We construct model predictive control (MPC) algorithm that switches pinning agents via mixed-integer quadratic programmings (MIQP) problems. The optimization rate is determined according to the convergence rate to the target velocities and the inter-vehicular distances. This multi-rating mechanism can reduce the computational load caused by iterative calculation. Numerical results demonstrate that our method has a reduction effect on the string instability by selecting the pinning agents to minimize errors of the inter-vehicular distances to the target distances.

  • Triplet Attention Network for Video-Based Person Re-Identification

    Rui SUN  Qili LIANG  Zi YANG  Zhenghui ZHAO  Xudong ZHANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/21
      Vol:
    E104-D No:10
      Page(s):
    1775-1779

    Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.

  • Optic Disc Detection Based on Saliency Detection and Attention Convolutional Neural Networks

    Ying WANG  Xiaosheng YU  Chengdong WU  

     
    LETTER-Image

      Pubricized:
    2021/03/23
      Vol:
    E104-A No:9
      Page(s):
    1370-1374

    The automatic analysis of retinal fundus images is of great significance in large-scale ocular pathologies screening, of which optic disc (OD) location is a prerequisite step. In this paper, we propose a method based on saliency detection and attention convolutional neural network for OD detection. Firstly, the wavelet transform based saliency detection method is used to detect the OD candidate regions to the maximum extent such that the intensity, edge and texture features of the fundus images are all considered into the OD detection process. Then, the attention mechanism that can emphasize the representation of OD region is combined into the dense network. Finally, it is determined whether the detected candidate regions are OD region or non-OD region. The proposed method is implemented on DIARETDB0, DIARETDB1 and MESSIDOR datasets, the experimental results of which demonstrate its superiority and robustness.

  • Consumption Pricing Mechanism of Scientific and Technological Resources Based on Multi-Agent Game Theory: An Interactive Analytical Model and Experimental Validation

    Fanying ZHENG  Fu GU  Yangjian JI  Jianfeng GUO  Xinjian GU  Jin ZHANG  

     
    PAPER

      Pubricized:
    2021/04/16
      Vol:
    E104-D No:8
      Page(s):
    1292-1301

    In the context of Web 2.0, the interaction between users and resources is more and more frequent in the process of resource sharing and consumption. However, the current research on resource pricing mainly focuses on the attributes of the resource itself, and does not weigh the interests of the resource sharing participants. In order to deal with these problems, the pricing mechanism of resource-user interaction evaluation based on multi-agent game theory is established in this paper. Moreover, the user similarity, the evaluation bias based on link analysis and punishment of academic group cheating are also included in the model. Based on the data of 181 scholars and 509 articles from the Wanfang database, this paper conducts 5483 pricing experiments for 13 months, and the results show that this model is more effective than other pricing models - the pricing accuracy of resource resources is 94.2%, and the accuracy of user value evaluation is 96.4%. Besides, this model can intuitively show the relationship within users and within resources. The case study also exhibits that the user's knowledge level is not positively correlated with his or her authority. Discovering and punishing academic group cheating is conducive to objectively evaluating researchers and resources. The pricing mechanism of scientific and technological resources and the users proposed in this paper is the premise of fair trade of scientific and technological resources.

  • Capsule Network with Shortcut Routing Open Access

    Thanh Vu DANG  Hoang Trong VO  Gwang Hyun YU  Jin Young KIM  

     
    PAPER-Image

      Pubricized:
    2021/01/27
      Vol:
    E104-A No:8
      Page(s):
    1043-1050

    Capsules are fundamental informative units that are introduced into capsule networks to manipulate the hierarchical presentation of patterns. The part-hole relationship of an entity is learned through capsule layers, using a routing-by-agreement mechanism that is approximated by a voting procedure. Nevertheless, existing routing methods are computationally inefficient. We address this issue by proposing a novel routing mechanism, namely “shortcut routing”, that directly learns to activate global capsules from local capsules. In our method, the number of operations in the routing procedure is reduced by omitting the capsules in intermediate layers, resulting in lighter routing. To further address the computational problem, we investigate an attention-based approach, and propose fuzzy coefficients, which have been found to be efficient than mixture coefficients from EM routing. Our method achieves on-par classification results on the Mnist (99.52%), smallnorb (93.91%), and affNist (89.02%) datasets. Compared to EM routing, our fuzzy-based and attention-based routing methods attain reductions of 1.42 and 2.5 in terms of the number of calculations.

  • CJAM: Convolutional Neural Network Joint Attention Mechanism in Gait Recognition

    Pengtao JIA  Qi ZHAO  Boze LI  Jing ZHANG  

     
    PAPER

      Pubricized:
    2021/04/28
      Vol:
    E104-D No:8
      Page(s):
    1239-1249

    Gait recognition distinguishes one individual from others according to the natural patterns of human gaits. Gait recognition is a challenging signal processing technology for biometric identification due to the ambiguity of contours and the complex feature extraction procedure. In this work, we proposed a new model - the convolutional neural network (CNN) joint attention mechanism (CJAM) - to classify the gait sequences and conduct person identification using the CASIA-A and CASIA-B gait datasets. The CNN model has the ability to extract gait features, and the attention mechanism continuously focuses on the most discriminative area to achieve person identification. We present a comprehensive transformation from gait image preprocessing to final identification. The results from 12 experiments show that the new attention model leads to a lower error rate than others. The CJAM model improved the 3D-CNN, CNN-LSTM (long short-term memory), and the simple CNN by 8.44%, 2.94% and 1.45%, respectively.

21-40hit(154hit)