The search functionality is under construction.

IEICE TRANSACTIONS on Information

  • Impact Factor

    0.72

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.4

Advance publication (published online immediately after acceptance)

Volume E106-D No.5  (Publication Date:2023/05/01)

    Special Section on Deep Learning Technologies: Architecture, Optimization, Techniques, and Applications
  • FOREWORD Open Access

    Chi-Hua CHEN  

     
    FOREWORD

      Page(s):
    579-580
  • A Visual Question Answering Network Merging High- and Low-Level Semantic Information

    Huimin LI  Dezhi HAN  Chongqing CHEN  Chin-Chen CHANG  Kuan-Ching LI  Dun LI  

     
    PAPER-Core Methods

      Pubricized:
    2022/01/06
      Page(s):
    581-589

    Visual Question Answering (VQA) usually uses deep attention mechanisms to learn fine-grained visual content of images and textual content of questions. However, the deep attention mechanism can only learn high-level semantic information while ignoring the impact of the low-level semantic information on answer prediction. For such, we design a High- and Low-Level Semantic Information Network (HLSIN), which employs two strategies to achieve the fusion of high-level semantic information and low-level semantic information. Adaptive weight learning is taken as the first strategy to allow different levels of semantic information to learn weights separately. The gate-sum mechanism is used as the second to suppress invalid information in various levels of information and fuse valid information. On the benchmark VQA-v2 dataset, we quantitatively and qualitatively evaluate HLSIN and conduct extensive ablation studies to explore the reasons behind HLSIN's effectiveness. Experimental results demonstrate that HLSIN significantly outperforms the previous state-of-the-art, with an overall accuracy of 70.93% on test-dev.

  • The Comparison of Attention Mechanisms with Different Embedding Modes for Performance Improvement of Fine-Grained Classification

    Wujian YE  Run TAN  Yijun LIU  Chin-Chen CHANG  

     
    PAPER-Core Methods

      Pubricized:
    2021/12/22
      Page(s):
    590-600

    Fine-grained image classification is one of the key basic tasks of computer vision. The appearance of traditional deep convolutional neural network (DCNN) combined with attention mechanism can focus on partial and local features of fine-grained images, but it still lacks the consideration of the embedding mode of different attention modules in the network, leading to the unsatisfactory result of classification model. To solve the above problems, three different attention mechanisms are introduced into the DCNN network (like ResNet, VGGNet, etc.), including SE, CBAM and ECA modules, so that DCNN could better focus on the key local features of salient regions in the image. At the same time, we adopt three different embedding modes of attention modules, including serial, residual and parallel modes, to further improve the performance of the classification model. The experimental results show that the three attention modules combined with three different embedding modes can improve the performance of DCNN network effectively. Moreover, compared with SE and ECA, CBAM has stronger feature extraction capability. Among them, the parallelly embedded CBAM can make the local information paid attention to by DCNN richer and more accurate, and bring the optimal effect for DCNN, which is 1.98% and 1.57% higher than that of original VGG16 and Resnet34 in CUB-200-2011 dataset, respectively. The visualization analysis also indicates that the attention modules can be easily embedded into DCNN networks, especially in the parallel mode, with stronger generality and universality.

  • A Novel Differential Evolution Algorithm Based on Local Fitness Landscape Information for Optimization Problems

    Jing LIANG  Ke LI  Kunjie YU  Caitong YUE  Yaxin LI  Hui SONG  

     
    PAPER-Core Methods

      Pubricized:
    2023/02/13
      Page(s):
    601-616

    The selection of mutation strategy greatly affects the performance of differential evolution algorithm (DE). For different types of optimization problems, different mutation strategies should be selected. How to choose a suitable mutation strategy for different problems is a challenging task. To deal with this challenge, this paper proposes a novel DE algorithm based on local fitness landscape, called FLIDE. In the proposed method, fitness landscape information is obtained to guide the selection of mutation operators. In this way, different problems can be solved with proper evolutionary mechanisms. Moreover, a population adjustment method is used to balance the search ability and population diversity. On one hand, the diversity of the population in the early stage is enhanced with a relative large population. One the other hand, the computational cost is reduced in the later stage with a relative small population. The evolutionary information is utilized as much as possible to guide the search direction. The proposed method is compared with five popular algorithms on 30 test functions with different characteristics. Experimental results show that the proposed FLIDE is more effective on problems with high dimensions.

  • Effectively Utilizing the Category Labels for Image Captioning

    Junlong FENG  Jianping ZHAO  

     
    PAPER-Core Methods

      Pubricized:
    2021/12/13
      Page(s):
    617-624

    As a further investigation of the image captioning task, some works extended the vision-text dataset for specific subtasks, such as the stylized caption generating. The corpus in such dataset is usually composed of obvious sentiment-bearing words. While, in some special cases, the captions are classified depending on image category. This will result in a latent problem: the generated sentences are in close semantic meaning but belong to different or even opposite categories. It is a worthy issue to explore an effective way to utilize the image category label to boost the caption difference. Therefore, we proposed an image captioning network with the label control mechanism (LCNET) in this paper. First, to further improve the caption difference, LCNET employs a semantic enhancement module to provide the decoder with global semantic vectors. Then, through the proposed label control LSTM, LCNET can dynamically modulate the caption generation depending on the image category labels. Finally, the decoder integrates the spatial image features with global semantic vectors to output the caption. Using all the standard evaluation metrics shows that our model outperforms the compared models. Caption analysis demonstrates our approach can improve the performance of semantic representation. Compared with other label control mechanisms, our model is capable of boosting the caption difference according to the labels and keeping a better consistent with image content as well.

  • A Novel SSD-Based Detection Algorithm Suitable for Small Object

    Xi ZHANG  Yanan ZHANG  Tao GAO  Yong FANG  Ting CHEN  

     
    PAPER-Core Methods

      Pubricized:
    2022/01/06
      Page(s):
    625-634

    The original single-shot multibox detector (SSD) algorithm has good detection accuracy and speed for regular object recognition. However, the SSD is not suitable for detecting small objects for two reasons: 1) the relationships among different feature layers with various scales are not considered, 2) the predicted results are solely determined by several independent feature layers. To enhance its detection capability for small objects, this study proposes an improved SSD-based algorithm called proportional channels' fusion SSD (PCF-SSD). Three enhancements are provided by this novel PCF-SSD algorithm. First, a fusion feature pyramid model is proposed by concatenating channels of certain key feature layers in a given proportion for object detection. Second, the default box sizes are adjusted properly for small object detection. Third, an improved loss function is suggested to train the above-proposed fusion model, which can further improve object detection performance. A series of experiments are conducted on the public database Pascal VOC to validate the PCF-SSD. On comparing with the original SSD algorithm, our algorithm improves the mean average precision and detection accuracy for small objects by 3.3% and 3.9%, respectively, with a detection speed of 40FPS. Furthermore, the proposed PCF-SSD can achieve a better balance of detection accuracy and efficiency than the original SSD algorithm, as demonstrated by a series of experimental results.

  • Deep Reinforcement Learning Based Ontology Meta-Matching Technique

    Xingsi XUE  Yirui HUANG  Zeqing ZHANG  

     
    PAPER-Core Methods

      Pubricized:
    2022/03/04
      Page(s):
    635-643

    Ontologies are regarded as the solution to data heterogeneity on the Semantic Web (SW), but they also suffer from the heterogeneity problem, which leads to the ambiguity of data information. Ontology Meta-Matching technique (OMM) is able to solve the ontology heterogeneity problem through aggregating various similarity measures to find the heterogeneous entities. Inspired by the success of Reinforcement Learning (RL) in solving complex optimization problems, this work proposes a RL-based OMM technique to address the ontology heterogeneity problem. First, we propose a novel RL-based OMM framework, and then, a neural network that is called evaluated network is proposed to replace the Q table when we choose the next action of the agent, which is able to reduce memory consumption and computing time. After that, to better guide the training of neural network and improve the accuracy of RL agent, we establish a memory bank to mine depth information during the evaluated network's training procedure, and we use another neural network that is called target network to save the historical parameters. The experiment uses the famous benchmark in ontology matching domain to test our approach's performance, and the comparisons among Deep Reinforcement Learning(DRL), RL and state-of-the-art ontology matching systems show that our approach is able to effectively determine high-quality alignments.

  • Intelligent Tool Condition Monitoring Based on Multi-Scale Convolutional Recurrent Neural Network

    Xincheng CAO  Bin YAO  Binqiang CHEN  Wangpeng HE  Suqin GUO  Kun CHEN  

     
    PAPER-Smart Industry

      Pubricized:
    2022/06/16
      Page(s):
    644-652

    Tool condition monitoring is one of the core tasks of intelligent manufacturing in digital workshop. This paper presents an intelligent recognize method of tool condition based on deep learning. First, the industrial microphone is used to collect the acoustic signal during machining; then, a central fractal decomposition algorithm is proposed to extract sensitive information; finally, the multi-scale convolutional recurrent neural network is used for deep feature extraction and pattern recognition. The multi-process milling experiments proved that the proposed method is superior to the existing methods, and the recognition accuracy reached 88%.

  • Computer Vision-Based Tracking of Workers in Construction Sites Based on MDNet

    Wen LIU  Yixiao SHAO  Shihong ZHAI  Zhao YANG  Peishuai CHEN  

     
    PAPER-Smart Industry

      Pubricized:
    2022/10/20
      Page(s):
    653-661

    Automatic continuous tracking of objects involved in a construction project is required for such tasks as productivity assessment, unsafe behavior recognition, and progress monitoring. Many computer-vision-based tracking approaches have been investigated and successfully tested on construction sites; however, their practical applications are hindered by the tracking accuracy limited by the dynamic, complex nature of construction sites (i.e. clutter with background, occlusion, varying scale and pose). To achieve better tracking performance, a novel deep-learning-based tracking approach called the Multi-Domain Convolutional Neural Networks (MD-CNN) is proposed and investigated. The proposed approach consists of two key stages: 1) multi-domain representation of learning; and 2) online visual tracking. To evaluate the effectiveness and feasibility of this approach, it is applied to a metro project in Wuhan China, and the results demonstrate good tracking performance in construction scenarios with complex background. The average distance error and F-measure for the MDNet are 7.64 pixels and 67, respectively. The results demonstrate that the proposed approach can be used by site managers to monitor and track workers for hazard prevention in construction sites.

  • An Improved Insulator and Spacer Detection Algorithm Based on Dual Network and SSD

    Yong LI  Shidi WEI  Xuan LIU  Yinzheng LUO  Yafeng LI  Feng SHUANG  

     
    PAPER-Smart Industry

      Pubricized:
    2022/10/17
      Page(s):
    662-672

    The traditional manual inspection is gradually replaced by the unmanned aerial vehicles (UAV) automatic inspection. However, due to the limited computational resources carried by the UAV, the existing deep learning-based algorithm needs a large amount of computational resources, which makes it impossible to realize the online detection. Moreover, there is no effective online detection system at present. To realize the high-precision online detection of electrical equipment, this paper proposes an SSD (Single Shot Multibox Detector) detection algorithm based on the improved Dual network for the images of insulators and spacers taken by UAVs. The proposed algorithm uses MnasNet and MobileNetv3 to form the Dual network to extract multi-level features, which overcomes the shortcoming of single convolutional network-based backbone for feature extraction. Then the features extracted from the two networks are fused together to obtain the features with high-level semantic information. Finally, the proposed algorithm is tested on the public dataset of the insulator and spacer. The experimental results show that the proposed algorithm can detect insulators and spacers efficiently. Compared with other methods, the proposed algorithm has the advantages of smaller model size and higher accuracy. The object detection accuracy of the proposed method is up to 95.1%.

  • Image-to-Image Translation for Data Augmentation on Multimodal Medical Images

    Yue PENG  Zuqiang MENG  Lina YANG  

     
    PAPER-Smart Healthcare

      Pubricized:
    2022/03/01
      Page(s):
    686-696

    Medical images play an important role in medical diagnosis. However, acquiring a large number of datasets with annotations is still a difficult task in the medical field. For this reason, research in the field of image-to-image translation is combined with computer-aided diagnosis, and data augmentation methods based on generative adversarial networks are applied to medical images. In this paper, we try to perform data augmentation on unimodal data. The designed StarGAN V2 based network has high performance in augmenting the dataset using a small number of original images, and the augmented data is expanded from unimodal data to multimodal medical images, and this multimodal medical image data can be applied to the segmentation task with some improvement in the segmentation results. Our experiments demonstrate that the generated multimodal medical image data can improve the performance of glioma segmentation.

  • MolHF: Molecular Heterogeneous Attributes Fusion for Drug-Target Affinity Prediction on Heterogeneity

    Runze WANG  Zehua ZHANG  Yueqin ZHANG  Zhongyuan JIANG  Shilin SUN  Guixiang MA  

     
    PAPER-Smart Healthcare

      Pubricized:
    2022/05/31
      Page(s):
    697-706

    Recent studies in protein structure prediction such as AlphaFold have enabled deep learning to achieve great attention on the Drug-Target Affinity (DTA) task. Most works are dedicated to embed single molecular property and homogeneous information, ignoring the diverse heterogeneous information gains that are contained in the molecules and interactions. Motivated by this, we propose an end-to-end deep learning framework to perform Molecular Heterogeneous features Fusion (MolHF) for DTA prediction on heterogeneity. To address the challenges that biochemical attributes locates in different heterogeneous spaces, we design a Molecular Heterogeneous Information Learning module with multi-strategy learning. Especially, Molecular Heterogeneous Attention Fusion module is present to obtain the gains of molecular heterogeneous features. With these, the diversity of molecular structure information for drugs can be extracted. Extensive experiments on two benchmark datasets show that our method outperforms the baselines in all four metrics. Ablation studies validate the effect of attentive fusion and multi-group of drug heterogeneous features. Visual presentations demonstrate the impact of protein embedding level and the model ability of fitting data. In summary, the diverse gains brought by heterogeneous information contribute to drug-target affinity prediction.

  • The Effectiveness of Data Augmentation for Mature White Blood Cell Image Classification in Deep Learning — Selection of an Optimal Technique for Hematological Morphology Recognition —

    Hiroyuki NOZAKA  Kosuke KAMATA  Kazufumi YAMAGATA  

     
    PAPER-Smart Healthcare

      Pubricized:
    2022/11/22
      Page(s):
    707-714

    The data augmentation method is known as a helpful technique to generate a dataset with a large number of images from one with a small number of images for supervised training in deep learning. However, a low validity augmentation method for image recognition was reported in a recent study on artificial intelligence (AI). This study aimed to clarify the optimal data augmentation method in deep learning model generation for the recognition of white blood cells (WBCs). Study Design: We conducted three different data augmentation methods (rotation, scaling, and distortion) on original WBC images, with each AI model for WBC recognition generated by supervised training. The subjects of the clinical assessment were 51 healthy persons. Thin-layer blood smears were prepared from peripheral blood and subjected to May-Grünwald-Giemsa staining. Results: The only significantly effective technique among the AI models for WBC recognition was data augmentation with rotation. By contrast, the effectiveness of both image distortion and image scaling was poor, and improved accuracy was limited to a specific WBC subcategory. Conclusion: Although data augmentation methods are often used for achieving high accuracy in AI generation with supervised training, we consider that it is necessary to select the optimal data augmentation method for medical AI generation based on the characteristics of medical images.

  • Fish Detecting Using YOLOv4 and CVAE in Aquaculture Ponds with a Non-Uniform Strong Reflection Background

    Meng ZHAO  Junfeng WU  Hong YU  Haiqing LI  Jingwen XU  Siqi CHENG  Lishuai GU  Juan MENG  

     
    PAPER-Smart Agriculture

      Pubricized:
    2022/11/07
      Page(s):
    715-725

    Accurate fish detection is of great significance in aquaculture. However, the non-uniform strong reflection in aquaculture ponds will affect the precision of fish detection. This paper combines YOLOv4 and CVAE to accurately detect fishes in the image with non-uniform strong reflection, in which the reflection in the image is removed at first and then the reflection-removed image is provided for fish detecting. Firstly, the improved YOLOv4 is applied to detect and mask the strong reflective region, to locate and label the reflective region for the subsequent reflection removal. Then, CVAE is combined with the improved YOLOv4 for inferring the priori distribution of the Reflection region and restoring the Reflection region by the distribution so that the reflection can be removed. For further improving the quality of the reflection-removed images, the adversarial learning is appended to CVAE. Finally, YOLOV4 is used to detect fishes in the high quality image. In addition, a new image dataset of pond cultured takifugu rubripes is constructed,, which includes 1000 images with fishes annotated manually, also a synthetic dataset including 2000 images with strong reflection is created and merged with the generated dataset for training and verifying the robustness of the proposed method. Comprehensive experiments are performed to compare the proposed method with the state-of-the-art fish detecting methods without reflection removal on the generated dataset. The results show that the fish detecting precision and recall of the proposed method are improved by 2.7% and 2.4% respectively.

  • Detection Method of Fat Content in Pig B-Ultrasound Based on Deep Learning

    Wenxin DONG  Jianxun ZHANG  Shuqiu TAN  Xinyue ZHANG  

     
    PAPER-Smart Agriculture

      Pubricized:
    2022/02/07
      Page(s):
    726-734

    In the pork fat content detection task, traditional physical or chemical methods are strongly destructive, have substantial technical requirements and cannot achieve nondestructive detection without slaughtering. To solve these problems, we propose a novel, convenient and economical method for detecting the fat content of pig B-ultrasound images based on hybrid attention and multiscale fusion learning, which extracts and fuses shallow detail information and deep semantic information at multiple scales. First, a deep learning network is constructed to learn the salient features of fat images through a hybrid attention mechanism. Then, the information describing pork fat is extracted at multiple scales, and the detailed information expressed in the shallow layer and the semantic information expressed in the deep layer are fused later. Finally, a deep convolution network is used to predict the fat content compared with the real label. The experimental results show that the determination coefficient is greater than 0.95 on the 130 groups of pork B-ultrasound image data sets, which is 2.90, 6.10 and 5.13 percentage points higher than that of VGGNet, ResNet and DenseNet, respectively. It indicats that the model could effectively identify the B-ultrasound image of pigs and predict the fat content with high accuracy.

  • Compression of Vehicle and Pedestrian Detection Network Based on YOLOv3 Model

    Lie GUO  Yibing ZHAO  Jiandong GAO  

     
    PAPER-Intelligent Transportation Systems

      Pubricized:
    2022/06/22
      Page(s):
    735-745

    The commonly used object detection algorithm based on convolutional neural network is difficult to meet the real-time requirement on embedded platform due to its large size of model, large amount of calculation, and long inference time. It is necessary to use model compression to reduce the amount of network calculation and increase the speed of network inference. This paper conducts compression of vehicle and pedestrian detection network by pruning and removing redundant parameters. The vehicle and pedestrian detection network is trained based on YOLOv3 model by using K-means++ to cluster the anchor boxes. The detection accuracy is improved by changing the proportion of categorical losses and regression losses for each category in the loss function because of the unbalanced number of targets in the dataset. A layer and channel pruning algorithm is proposed by combining global channel pruning thresholds and L1 norm, which can reduce the time cost of the network layer transfer process and the amount of computation. Network layer fusion based on TensorRT is performed and inference is performed using half-precision floating-point to improve the speed of inference. Results show that the vehicle and pedestrian detection compression network pruned 84% channels and 15 Shortcut modules can reduce the size by 32% and the amount of calculation by 17%. While the network inference time can be decreased to 21 ms, which is 1.48 times faster than the network pruned 84% channels.

  • Dynamic Evolution Simulation of Bus Bunching Affected by Traffic Operation State

    Shaorong HU  Yuqi ZHANG  Yuefei JIN  Ziqi DOU  

     
    PAPER-Intelligent Transportation Systems

      Pubricized:
    2022/04/13
      Page(s):
    746-755

    Bus bunching often occurs in public transit system, resulting in a series of problems such as poor punctuality, long waiting time and low service quality. In this paper, we explore the influence of the discrete distribution of traffic operation state on the dynamic evolution of bus bunching. Firstly, we use self-organizing map (SOM) to find the threshold of bus bunching and analyze the factors that affect bus bunching based on GPS data of No. 600 bus line in Xi'an. Then, taking the bus headway as the research index, we construct the bus bunching mechanism model. Finally, a simulation platform is built by MATLAB to examine the trend of headway when various influencing factors show different distribution states along the bus line. In terms of influencing factors, inter vehicle speed, queuing time at intersection and loading time at station are shown to have a significant impact on headway between buses. In terms of the impact of the distribution of crowded road sections on headway, long-distance and concentrated crowded road sections will lead to large interval or bus bunching. When the traffic states along the bus line are randomly distributed among crowded, normal and free, the headway may fluctuate in a large range, which may result in bus bunching, or fluctuate in a small range and remain relatively stable. The headway change curve is determined by the distribution length of each traffic state along the bus line. The research results can help to formulate improvement measures according to traffic operation state for equilibrium bus headway and alleviating bus bunching.

  • Semantic Path Planning for Indoor Navigation Tasks Using Multi-View Context and Prior Knowledge

    Jianbing WU  Weibo HUANG  Guoliang HUA  Wanruo ZHANG  Risheng KANG  Hong LIU  

     
    PAPER-Positioning and Navigation

      Pubricized:
    2022/01/20
      Page(s):
    756-764

    Recently, deep reinforcement learning (DRL) methods have significantly improved the performance of target-driven indoor navigation tasks. However, the rich semantic information of environments is still not fully exploited in previous approaches. In addition, existing methods usually tend to overfit on training scenes or objects in target-driven navigation tasks, making it hard to generalize to unseen environments. Human beings can easily adapt to new scenes as they can recognize the objects they see and reason the possible locations of target objects using their experience. Inspired by this, we propose a DRL-based target-driven navigation model, termed MVC-PK, using Multi-View Context information and Prior semantic Knowledge. It relies only on the semantic label of target objects and allows the robot to find the target without using any geometry map. To perceive the semantic contextual information in the environment, object detectors are leveraged to detect the objects present in the multi-view observations. To enable the semantic reasoning ability of indoor mobile robots, a Graph Convolutional Network is also employed to incorporate prior knowledge. The proposed MVC-PK model is evaluated in the AI2-THOR simulation environment. The results show that MVC-PK (1) significantly improves the cross-scene and cross-target generalization ability, and (2) achieves state-of-the-art performance with 15.2% and 11.0% increase in Success Rate (SR) and Success weighted by Path Length (SPL), respectively.

  • SPSD: Semantics and Deep Reinforcement Learning Based Motion Planning for Supermarket Robot

    Jialun CAI  Weibo HUANG  Yingxuan YOU  Zhan CHEN  Bin REN  Hong LIU  

     
    PAPER-Positioning and Navigation

      Pubricized:
    2022/09/15
      Page(s):
    765-772

    Robot motion planning is an important part of the unmanned supermarket. The challenges of motion planning in supermarkets lie in the diversity of the supermarket environment, the complexity of obstacle movement, the vastness of the search space. This paper proposes an adaptive Search and Path planning method based on the Semantic information and Deep reinforcement learning (SPSD), which effectively improves the autonomous decision-making ability of supermarket robots. Firstly, based on the backbone of deep reinforcement learning (DRL), supermarket robots process real-time information from multi-modality sensors to realize high-speed and collision-free motion planning. Meanwhile, in order to solve the problem caused by the uncertainty of the reward in the deep reinforcement learning, common spatial semantic relationships between landmarks and target objects are exploited to define reward function. Finally, dynamics randomization is introduced to improve the generalization performance of the algorithm in the training. The experimental results show that the SPSD algorithm is excellent in the three indicators of generalization performance, training time and path planning length. Compared with other methods, the training time of SPSD is reduced by 27.42% at most, the path planning length is reduced by 21.08% at most, and the trained network of SPSD can be applied to unfamiliar scenes safely and efficiently. The results are motivating enough to consider the application of the proposed method in practical scenes. We have uploaded the video of the results of the experiment to https://www.youtube.com/watch?v=h1wLpm42NZk.

  • An Improved BPNN Method Based on Probability Density for Indoor Location

    Rong FEI  Yufan GUO  Junhuai LI  Bo HU  Lu YANG  

     
    PAPER-Positioning and Navigation

      Pubricized:
    2022/12/23
      Page(s):
    773-785

    With the widespread use of indoor positioning technology, the need for high-precision positioning services is rising; nevertheless, there are several challenges, such as the difficulty of simulating the distribution of interior location data and the enormous inaccuracy of probability computation. As a result, this paper proposes three different neural network model comparisons for indoor location based on WiFi fingerprint - indoor location algorithm based on improved back propagation neural network model, RSSI indoor location algorithm based on neural network angle change, and RSSI indoor location algorithm based on depth neural network angle change - to raise accurately predict indoor location coordinates. Changing the action range of the activation function in the standard back-propagation neural network model achieves the goal of accurately predicting location coordinates. The revised back-propagation neural network model has strong stability and enhances indoor positioning accuracy based on experimental comparisons of loss rate (loss), accuracy rate (acc), and cumulative distribution function (CDF).

  • An Improved Real-Time Object Tracking Algorithm Based on Deep Learning Features

    Xianyu WANG  Cong LI  Heyi LI  Rui ZHANG  Zhifeng LIANG  Hai WANG  

     
    PAPER-Object Recognition and Tracking

      Pubricized:
    2022/01/07
      Page(s):
    786-793

    Visual object tracking is always a challenging task in computer vision. During the tracking, the shape and appearance of the target may change greatly, and because of the lack of sufficient training samples, most of the online learning tracking algorithms will have performance bottlenecks. In this paper, an improved real-time algorithm based on deep learning features is proposed, which combines multi-feature fusion, multi-scale estimation, adaptive updating of target model and re-detection after target loss. The effectiveness and advantages of the proposed algorithm are proved by a large number of comparative experiments with other excellent algorithms on large benchmark datasets.

  • Learning Pixel Perception for Identity and Illumination Consistency Face Frontalization in the Wild

    Yongtang BAO  Pengfei ZHOU  Yue QI  Zhihui WANG  Qing FAN  

     
    PAPER-Person Image Generation

      Pubricized:
    2022/06/21
      Page(s):
    794-803

    A frontal and realistic face image was synthesized from a single profile face image. It has a wide range of applications in face recognition. Although the frontal face method based on deep learning has made substantial progress in recent years, there is still no guarantee that the generated face has identity consistency and illumination consistency in a significant posture. This paper proposes a novel pixel-based feature regression generative adversarial network (PFR-GAN), which can learn to recover local high-frequency details and preserve identity and illumination frontal face images in an uncontrolled environment. We first propose a Reslu block to obtain richer feature representation and improve the convergence speed of training. We then introduce a feature conversion module to reduce the artifacts caused by face rotation discrepancy, enhance image generation quality, and preserve more high-frequency details of the profile image. We also construct a 30,000 face pose dataset to learn about various uncontrolled field environments. Our dataset includes ages of different races and wild backgrounds, allowing us to handle other datasets and obtain better results. Finally, we introduce a discriminator used for recovering the facial structure of the frontal face images. Quantitative and qualitative experimental results show our PFR-GAN can generate high-quality and high-fidelity frontal face images, and our results are better than the state-of-art results.

  • Multi-Scale Correspondence Learning for Person Image Generation

    Shi-Long SHEN  Ai-Guo WU  Yong XU  

     
    PAPER-Person Image Generation

      Pubricized:
    2022/04/15
      Page(s):
    804-812

    A generative model is presented for two types of person image generation in this paper. First, this model is applied to pose-guided person image generation, i.e., converting the pose of a source person image to the target pose while preserving the texture of that source person image. Second, this model is also used for clothing-guided person image generation, i.e., changing the clothing texture of a source person image to the desired clothing texture. The core idea of the proposed model is to establish the multi-scale correspondence, which can effectively address the misalignment introduced by transferring pose, thereby preserving richer information on appearance. Specifically, the proposed model consists of two stages: 1) It first generates the target semantic map imposed on the target pose to provide more accurate guidance during the generation process. 2) After obtaining the multi-scale feature map by the encoder, the multi-scale correspondence is established, which is useful for a fine-grained generation. Experimental results show the proposed method is superior to state-of-the-art methods in pose-guided person image generation and show its effectiveness in clothing-guided person image generation.

  • Enhanced Full Attention Generative Adversarial Networks

    KaiXu CHEN  Satoshi YAMANE  

     
    LETTER-Core Methods

      Pubricized:
    2023/01/12
      Page(s):
    813-817

    In this paper, we propose improved Generative Adversarial Networks with attention module in Generator, which can enhance the effectiveness of Generator. Furthermore, recent work has shown that Generator conditioning affects GAN performance. Leveraging this insight, we explored the effect of different normalization (spectral normalization, instance normalization) on Generator and Discriminator. Moreover, an enhanced loss function called Wasserstein Divergence distance, can alleviate the problem of difficult to train module in practice.

  • Bearing Remaining Useful Life Prediction Using 2D Attention Residual Network

    Wenrong XIAO  Yong CHEN  Suqin GUO  Kun CHEN  

     
    LETTER-Smart Industry

      Pubricized:
    2022/05/27
      Page(s):
    818-820

    An attention residual network with triple feature as input is proposed to predict the remaining useful life (RUL) of bearings. First, the channel attention and spatial attention are connected in series into the residual connection of the residual neural network to obtain a new attention residual module, so that the newly constructed deep learning network can better pay attention to the weak changes of the bearing state. Secondly, the “triple feature” is used as the input of the attention residual network, so that the deep learning network can better grasp the change trend of bearing running state, and better realize the prediction of the RUL of bearing. Finally, The method is verified by a set of experimental data. The results show the method is simple and effective, has high prediction accuracy, and reduces manual intervention in RUL prediction.

  • Epileptic Seizure Prediction Using Convolutional Neural Networks and Fusion Features on Scalp EEG Signals

    Qixin LAN  Bin YAO  Tao QING  

     
    LETTER-Smart Healthcare

      Pubricized:
    2022/05/27
      Page(s):
    821-823

    Epileptic seizure prediction is an important research topic in the clinical epilepsy treatment, which can provide opportunities to take precautionary measures for epilepsy patients and medical staff. EEG is an commonly used tool for studying brain activity, which records the electrical discharge of brain. Many studies based on machine learning algorithms have been proposed to solve the task using EEG signal. In this study, we propose a novel seizure prediction models based on convolutional neural networks and scalp EEG for a binary classification between preictal and interictal states. The short-time Fourier transform has been used to translate raw EEG signals into STFT sepctrums, which is applied as input of the models. The fusion features have been obtained through the side-output constructions and used to train and test our models. The test results show that our models can achieve comparable results in both sensitivity and FPR upon fusion features. The proposed patient-specific model can be used in seizure prediction system for EEG classification.

  • OPENnet: Object Position Embedding Network for Locating Anti-Bird Thorn of High-Speed Railway

    Zhuo WANG  Junbo LIU  Fan WANG  Jun WU  

     
    LETTER-Intelligent Transportation Systems

      Pubricized:
    2022/11/14
      Page(s):
    824-828

    Machine vision-based automatic anti-bird thorn failure inspection, instead of manual identification, remains a great challenge. In this paper, we proposed a novel Object Position Embedding Network (OPENnet), which can improve the precision of anti-bird thorn localization. OPENnet can simultaneously predict the location boxes of the support device and anti-bird thorn by using the proposed double-head network. And then, OPENnet is optimized using the proposed symbiotic loss function (SymLoss), which embeds the object position into the network. The comprehensive experiments are conducted on the real railway video dataset. OPENnet yields competitive performance on anti-bird thorn localization. Specifically, the localization performance gains +3.65 AP, +2.10 AP50, and +1.22 AP75.

  • Clustering-Based Neural Network for Carbon Dioxide Estimation

    Conghui LI  Quanlin ZHONG  Baoyin LI  

     
    LETTER-Intelligent Transportation Systems

      Pubricized:
    2022/08/01
      Page(s):
    829-832

    In recent years, the applications of deep learning have facilitated the development of green intelligent transportation system (ITS), and carbon dioxide estimation has been one of important issues in green ITS. Furthermore, the carbon dioxide estimation could be modelled as the fuel consumption estimation. Therefore, a clustering-based neural network is proposed to analyze clusters in accordance with fuel consumption behaviors and obtains the estimated fuel consumption and the estimated carbon dioxide. In experiments, the mean absolute percentage error (MAPE) of the proposed method is only 5.61%, and the performance of the proposed method is higher than other methods.

  • Effectiveness of Feature Extraction System for Multimodal Sensor Information Based on VRAE and Its Application to Object Recognition

    Kazuki HAYASHI  Daisuke TANAKA  

     
    LETTER-Object Recognition and Tracking

      Pubricized:
    2023/01/12
      Page(s):
    833-835

    To achieve object recognition, it is necessary to find the unique features of the objects to be recognized. Results in prior research suggest that methods that use multiple modalities information are effective to find the unique features. In this paper, the overview of the system that can extract the features of the objects to be recognized by integrating visual, tactile, and auditory information as multimodal sensor information with VRAE is shown. Furthermore, a discussion about changing the combination of modalities information is also shown.

  • Special Section on Data Engineering and Information Management
  • FOREWORD Open Access

    Akiyoshi MATONO  

     
    FOREWORD

      Page(s):
    836-837
  • Effective Language Representations for Danmaku Comment Classification in Nicovideo

    Hiroyoshi NAGAO  Koshiro TAMURA  Marie KATSURAI  

     
    PAPER

      Pubricized:
    2023/01/16
      Page(s):
    838-846

    Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of the information provided by videos. Such an information pollutant problem can be solved by a comment classifier trained with an abstention option, which detects comments whose video categories are unclear. To improve the performance of this classification task, this paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, a Japanese online encyclopedia of entities that possibly appear in Nicovideo contents, to pre-train a bidirectional encoder representations from Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned such that it could determine whether a given comment falls into any of predefined categories. The experiments conducted on Nicovideo comment data demonstrated the effectiveness of Nicopedia BERT compared with existing BERT models pre-trained using Wikipedia or tweets. We also evaluated the performance of each model in an additional sentiment classification task, and the obtained results implied the applicability of Nicopedia BERT as a feature extractor of other social media text.

  • Maximizing External Action with Information Provision Over Multiple Rounds in Online Social Networks

    Masaaki MIYASHITA  Norihiko SHINOMIYA  Daisuke KASAMATSU  Genya ISHIGAKI  

     
    PAPER

      Pubricized:
    2023/02/03
      Page(s):
    847-855

    Online social networks have increased their impact on the real world, which motivates information senders to control the propagation process of information to promote particular actions of online users. However, the existing works on information provisioning seem to oversimplify the users' decision-making process that involves information reception, internal actions of social networks, and external actions of social networks. In particular, characterizing the best practices of information provisioning that promotes the users' external actions is a complex task due to the complexity of the propagation process in OSNs, even when the variation of information is limited. Therefore, we propose a new information diffusion model that distinguishes user behaviors inside and outside of OSNs, and formulate an optimization problem to maximize the number of users who take the external actions by providing information over multiple rounds. Also, we define a robust provisioning policy for the problem, which selects a message sequence to maximize the expected number of desired users under the probabilistic uncertainty of OSN settings. Our experiment results infer that there could exist an information provisioning policy that achieves nearly-optimal solutions in different types of OSNs. Furthermore, we empirically demonstrate that the proposed robust policy can be such a universally optimal solution.

  • Construction of a Support Tool for Japanese User Reading of Privacy Policies and Assessment of its User Impact

    Sachiko KANAMORI  Hirotsune SATO  Naoya TABATA  Ryo NOJIMA  

     
    PAPER

      Pubricized:
    2023/02/08
      Page(s):
    856-867

    To protect user privacy and establish self-information control rights, service providers must notify users of their privacy policies and obtain their consent in advance. The frameworks that impose these requirements are mandatory. Although originally designed to protect user privacy, obtaining user consent in advance has become a mere formality. These problems are induced by the gap between service providers' privacy policies, which prioritize the observance of laws and guidelines, and user expectations which are to easily understand how their data will be handled. To reduce this gap, we construct a tool supporting users in reading privacy policies in Japanese. We designed the tool to present users with separate unique expressions containing relevant information to improve the display format of the privacy policy and render it more comprehensive for Japanese users. To accurately extract the unique expressions from privacy policies, we created training data for machine learning for the constructed tool. The constructed tool provides a summary of privacy policies for users to help them understand the policies of interest. Subsequently, we assess the effectiveness of the constructed tool in experiments and follow-up questionnaires. Our findings reveal that the constructed tool enhances the users' subjective understanding of the services they read about and their awareness of the related risks. We expect that the developed tool will help users better understand the privacy policy content and and make educated decisions based on their understanding of how service providers intend to use their personal data.

  • Privacy-Preserving Correlation Coefficient

    Tomoaki MIMOTO  Hiroyuki YOKOYAMA  Toru NAKAMURA  Takamasa ISOHARA  Masayuki HASHIMOTO  Ryosuke KOJIMA  Aki HASEGAWA  Yasushi OKUNO  

     
    PAPER

      Pubricized:
    2023/02/08
      Page(s):
    868-876

    Differential privacy is a confidentiality metric and quantitatively guarantees the confidentiality of individuals. A noise criterion, called sensitivity, must be calculated when constructing a probabilistic disturbance mechanism that satisfies differential privacy. Depending on the statistical process, the sensitivity may be very large or even impossible to compute. As a result, the usefulness of the constructed mechanism may be significantly low; it might even be impossible to directly construct it. In this paper, we first discuss situations in which sensitivity is difficult to calculate, and then propose a differential privacy with additional dummy data as a countermeasure. When the sensitivity in the conventional differential privacy is calculable, a mechanism that satisfies the proposed metric satisfies the conventional differential privacy at the same time, and it is possible to evaluate the relationship between the respective privacy parameters. Next, we derive sensitivity by focusing on correlation coefficients as a case study of a statistical process for which sensitivity is difficult to calculate, and propose a probabilistic disturbing mechanism that satisfies the proposed metric. Finally, we experimentally evaluate the effect of noise on the sensitivity of the proposed and direct methods. Experiments show that privacy-preserving correlation coefficients can be derived with less noise compared to using direct methods.

  • Geo-Graph-Indistinguishability: Location Privacy on Road Networks with Differential Privacy

    Shun TAKAGI  Yang CAO  Yasuhito ASANO  Masatoshi YOSHIKAWA  

     
    PAPER

      Pubricized:
    2023/01/16
      Page(s):
    877-894

    In recent years, concerns about location privacy are increasing with the spread of location-based services (LBSs). Many methods to protect location privacy have been proposed in the past decades. Especially, perturbation methods based on Geo-Indistinguishability (GeoI), which randomly perturb a true location to a pseudolocation, are getting attention due to its strong privacy guarantee inherited from differential privacy. However, GeoI is based on the Euclidean plane even though many LBSs are based on road networks (e.g. ride-sharing services). This causes unnecessary noise and thus an insufficient tradeoff between utility and privacy for LBSs on road networks. To address this issue, we propose a new privacy notion, Geo-Graph-Indistinguishability (GeoGI), for locations on a road network to achieve a better tradeoff. We propose Graph-Exponential Mechanism (GEM), which satisfies GeoGI. Moreover, we formalize the optimization problem to find the optimal GEM in terms of the tradeoff. However, the computational complexity of a naive method to find the optimal solution is prohibitive, so we propose a greedy algorithm to find an approximate solution in an acceptable amount of time. Finally, our experiments show that our proposed mechanism outperforms GeoI mechanisms, including optimal GeoI mechanism, with respect to the tradeoff.

  • Prioritization of Lane-Specific Traffic Jam Detection for Automotive Navigation Framework Utilizing Suddenness Index and Automatic Threshold Determination

    Aki HAYASHI  Yuki YOKOHATA  Takahiro HATA  Kouhei MORI  Masato KAMIYA  

     
    PAPER

      Pubricized:
    2023/02/03
      Page(s):
    895-903

    Car navigation systems provide traffic jam information. In this study, we attempt to provide more detailed traffic jam information that considers the lane in which a traffic jam is in. This makes it possible for users to avoid long waits in queued traffic going toward an unintended destination. Lane-specific traffic jam detection utilizes image processing, which incurs long processing time and high cost. To reduce these, we propose a “suddenness index (SI)” to categorize candidate areas as sudden or periodic. Sudden traffic jams are prioritized as they may lead to accidents. This technology aggregates the number of connected cars for each mesh on a map and quantifies the degree of deviation from the ordinary state. In this paper, we evaluate the proposed method using actual global positioning system (GPS) data and found that the proposed index can cover 100% of sudden lane-specific traffic jams while excluding 82.2% of traffic jam candidates. We also demonstrate the effectiveness of time savings by integrating the proposed method into a demonstration framework. In addition, we improved the proposed method's ability to automatically determine the SI threshold to select the appropriate traffic jam candidates to avoid manual parameter settings.

  • MicroState: An Anomaly Localization Method in Heterogeneous Microservice Systems

    Jingjing YANG  Yuchun GUO  Yishuai CHEN  

     
    PAPER

      Pubricized:
    2023/01/13
      Page(s):
    904-912

    Microservice architecture has been widely adopted for large-scale applications because of its benefits of scalability, flexibility, and reliability. However, microservice architecture also proposes new challenges in diagnosing root causes of performance degradation. Existing methods rely on labeled data and suffer a high computation burden. This paper proposes MicroState, an unsupervised and lightweight method to pinpoint the root cause with detailed descriptions. We decompose root cause diagnosis into element location and detailed reason identification. To mitigate the impact of element heterogeneity and dynamic invocations, MicroState generates elements' invoked states, quantifies elements' abnormality by warping-based state comparison, and infers the anomalous group. MicroState locates the root cause element with the consideration of anomaly frequency and persistency. To locate the anomalous metric from diverse metrics, MicroState extracts metrics' trend features and evaluates metrics' abnormality based on their trend feature variation, which reduces the reliance on anomaly detectors. Our experimental evaluation based on public data of the Artificial intelligence for IT Operations Challenge (AIOps Challenge 2020) shows that MicroState locates root cause elements with 87% precision and diagnoses anomaly reasons accurately.

  • Special Section on the Architectures, Protocols, and Applications for the Future Internet
  • FOREWORD Open Access

    ISMAIL ARAI  

     
    FOREWORD

      Page(s):
    913-913
  • Wide-Area and Long-Term Agricultural Sensing System Utilizing UAV and Wireless Technologies

    Hiroshi YAMAMOTO  Shota NISHIURA  Yoshihiro HIGASHIURA  

     
    INVITED PAPER

      Pubricized:
    2023/02/08
      Page(s):
    914-926

    In order to improve crop production and efficiency of farming operations, an IoT (Internet of Things) system for remote monitoring has been attracting a lot of attention. The existing studies have proposed agricultural sensing systems such that environmental information is collected from many sensor nodes installed in farmland through wireless communications (e.g., Wi-Fi, ZigBee). Especially, Low-Power Wide-Area (LPWA) is a focus as a candidate for wireless communication that enables the support of vast farmland for a long time. However, it is difficult to achieve long distance communication even when using the LPWA because a clear line of sight is difficult to keep due to many obstacles such as crops and agricultural machinery in the farmland. In addition, a sensor node cannot run permanently on batteries because the battery capacity is not infinite. On the other hand, an Unmanned Aerial Vehicle (UAV) that can move freely and stably in the sky has been leveraged for agricultural sensor network systems. By utilizing a UAV as the gateway of the sensor network, the gateway can move to the appropriate location to ensure a clear line of sight from the sensor nodes. In addition, the coverage area of the sensor network can be expanded as the UAV travels over a wide area even when short-range and ultra-low-power wireless communication (e.g., Bluetooth Low Energy (BLE)) is adopted. Furthermore, various wireless technologies (e.g., wireless power transfer, wireless positioning) that have the possibility to improve the coverage area and the lifetime of the sensor network have become available. Therefore, in this study, we propose and develop two kinds of new agricultural sensing systems utilizing a UAV and various wireless technologies. The objective of the proposed system is to provide the solution for achieving the wide-area and long-term sensing for the vast farmland. Depending on which problem is in a priority, the proposed system chooses one of two designs. The first design of the system attempts to achieve the wide-area sensing, and so it is based on the LPWA for wireless communication. In the system, to efficiently collect the environmental information, the UAV autonomously travels to search for the locations to maintain the good communication properties of the LPWA to the sensor nodes dispersed over a wide area of farmland. In addition, the second design attempts to achieve the long-term sensing, so it is based on BLE, a typical short-range and ultra-low-power wireless communication technology. In this design, the UAV autonomously flies to the location of sensor nodes and supplies power to them using a wireless power transfer technology for achieving a battery-less sensor node. Through experimental evaluations using a prototype system, it is confirmed that the combination of the UAV and various wireless technologies has the possibility to achieve a wide-area and long-term sensing system for monitoring vast farmland.

  • Performance Aware Egress Path Discovery for Content Provider with SRv6 Egress Peer Engineering

    Yasunobu TOYOTA  Wataru MISHIMA  Koichiro KANAYA  Osamu NAKAMURA  

     
    PAPER

      Pubricized:
    2023/02/22
      Page(s):
    927-939

    QoS of applications is essential for content providers, and it is required to improve the end-to-end communication quality from a content provider to users. Generally, a content provider's data center network is connected to multiple ASes and has multiple egress paths to reach the content user's network. However, on the Internet, the communication quality of network paths outside of the provider's administrative domain is a black box, so multiple egress paths cannot be quantitatively compared. In addition, it is impossible to determine a unique egress path within a network domain because the parameters that affect the QoS of the content are different for each network. We propose a “Performance Aware Egress Path Discovery” method to improve QoS for content providers. The proposed method uses two techniques: Egress Peer Engineering with Segment Routing over IPv6 and Passive End-to-End Measurement. The method is superior in that it allows various metrics depending on the type of content and can be used for measurements without affecting existing systems. To evaluate our method, we deployed the Performance Aware Egress Path Discovery System in an existing content provider network and conducted experiments to provide production services. Our findings from the experiment show that, in this network, 15.9% of users can expect a 30Mbps throughput improvement, and 13.7% of users can expect a 10ms RTT improvement.

  • A Fast Handover Mechanism for Ground-to-Train Free-Space Optical Communication using Station ID Recognition by Dual-Port Camera

    Kosuke MORI  Fumio TERAOKA  Shinichiro HARUYAMA  

     
    PAPER

      Pubricized:
    2023/03/08
      Page(s):
    940-951

    There are demands for high-speed and stable ground-to-train optical communication as a network environment for trains. The existing ground-to-train optical communication system developed by the authors uses a camera and a QPD (Quadrant photo diode) to capture beacon light. The problem with the existing system is that it is impossible to identify the ground station. In the system proposed in this paper, a beacon light modulated with the ID of the ground station is transmitted, and the ground station is identified by demodulating the image from the dual-port camera on the opposite side. In this paper, we developed an actual system and conducted experiments using a car on the road. The results showed that only one packet was lost with the ping command every 1 ms near handover. Although the communication device itself has a bandwidth of 100 Mbps, the throughput before and after the handover was about 94 Mbps, and only dropped to about 89.4 Mbps during the handover.

  • Regular Section
  • Parallelization on a Minimal Substring Search Algorithm for Regular Expressions

    Yosuke OBE  Hiroaki YAMAMOTO  Hiroshi FUJIWARA  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2023/02/08
      Page(s):
    952-958

    Let us consider a regular expression r of length m and a text string T of length n over an alphabet Σ. Then, the RE minimal substring search problem is to find all minimal substrings of T matching r. Yamamoto proposed O(mn) time and O(m) space algorithm using a Thompson automaton. In this paper, we improve Yamamoto's algorithm by introducing parallelism. The proposed algorithm runs in O(mn) time in the worst case and in O(mn/p) time in the best case, where p denotes the number of processors. Besides, we show a parameter related to the parallel time of the proposed algorithm. We evaluate the algorithm experimentally.

  • On Lookaheads in Regular Expressions with Backreferences

    Nariyoshi CHIDA  Tachio TERAUCHI  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2023/02/06
      Page(s):
    959-975

    Many modern regular expression engines employ various extensions to give more expressive support for real-world usages. Among the major extensions employed by many of the modern regular expression engines are backreferences and lookaheads. A question of interest about these extended regular expressions is their expressive power. Previous works have shown that (i) the extension by lookaheads does not enhance the expressive power, i.e., the expressive power of regular expressions with lookaheads is still regular, and that (ii) the extension by backreferences enhances the expressive power, i.e., the expressive power of regular expressions with backreferences (abbreviated as rewb) is no longer regular. This raises the following natural question: Does the extension of regular expressions with backreferences by lookaheads enhance the expressive power of regular expressions with backreferences? This paper answers the question positively by proving that adding either positive lookaheads or negative lookaheads increases the expressive power of rewb (the former abbreviated as rewblp and the latter as rewbln). A consequence of our result is that neither the class of finite state automata nor that of memory automata (MFA) of Schmid[2] (which corresponds to regular expressions with backreferenes but without lookaheads) corresponds to rewblp or rewbln. To fill the void, as a first step toward building such automata, we propose a new class of automata called memory automata with positive lookaheads (PLMFA) that corresponds to rewblp. The key idea of PLMFA is to extend MFA with a new kind of memories, called positive-lookahead memory, that is used to simulate the backtracking behavior of positive lookaheads. Interestingly, our positive-lookahead memories are almost perfectly symmetric to the capturing-group memories of MFA. Therefore, our PLMFA can be seen as a natural extension of MFA that can be obtained independently of its original intended purpose of simulating rewblp.

  • Time Series Forecasting Based on Convolution Transformer

    Na WANG  Xianglian ZHAO  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2023/02/15
      Page(s):
    976-985

    For many fields in real life, time series forecasting is essential. Recent studies have shown that Transformer has certain advantages when dealing with such problems, especially when dealing with long sequence time input and long sequence time forecasting problems. In order to improve the efficiency and local stability of Transformer, these studies combine Transformer and CNN with different structures. However, previous time series forecasting network models based on Transformer cannot make full use of CNN, and they have not been used in a better combination of both. In response to this problem in time series forecasting, we propose the time series forecasting algorithm based on convolution Transformer. (1) ES attention mechanism: Combine external attention with traditional self-attention mechanism through the two-branch network, the computational cost of self-attention mechanism is reduced, and the higher forecasting accuracy is obtained. (2) Frequency enhanced block: A Frequency Enhanced Block is added in front of the ESAttention module, which can capture important structures in time series through frequency domain mapping. (3) Causal dilated convolution: The self-attention mechanism module is connected by replacing the traditional standard convolution layer with a causal dilated convolution layer, so that it obtains the receptive field of exponentially growth without increasing the calculation consumption. (4) Multi-layer feature fusion: The outputs of different self-attention mechanism modules are extracted, and the convolutional layers are used to adjust the size of the feature map for the fusion. The more fine-grained feature information is obtained at negligible computational cost. Experiments on real world datasets show that the time series network forecasting model structure proposed in this paper can greatly improve the real-time forecasting performance of the current state-of-the-art Transformer model, and the calculation and memory costs are significantly lower. Compared with previous algorithms, the proposed algorithm has achieved a greater performance improvement in both effectiveness and forecasting accuracy.

  • A Practical Model Driven Approach for Designing Security Aware RESTful Web APIs Using SOFL

    Busalire Onesmus EMEKA  Soichiro HIDAKA  Shaoying LIU  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2023/02/13
      Page(s):
    986-1000

    RESTful web APIs have become ubiquitous with most modern web applications embracing the micro-service architecture. A RESTful API provides data over the network using HTTP probably interacting with databases and other services and must preserve its security properties. However, REST is not a protocol but rather a set of guidelines on how to design resources accessed over HTTP endpoints. There are guidelines on how related resources should be structured with hierarchical URIs as well as how the different HTTP verbs should be used to represent well-defined actions on those resources. Whereas security has always been critical in the design of RESTful APIs, there are few or no clear model driven engineering techniques utilizing a secure-by-design approach that interweaves both the functional and security requirements. We therefore propose an approach to specifying APIs functional and security requirements with the practical Structured-Object-oriented Formal Language (SOFL). Our proposed approach provides a generic methodology for designing security aware APIs by utilizing concepts of domain models, domain primitives, Ecore metamodel and SOFL. We also describe a case study to evaluate the effectiveness of our approach and discuss important issues in relation to the practical applicability of our method.

  • High-Precision Mobile Robot Localization Using the Integration of RAR and AKF

    Chen WANG  Hong TAN  

     
    PAPER-Information Network

      Pubricized:
    2023/01/24
      Page(s):
    1001-1009

    The high-precision indoor positioning technology has gradually become one of the research hotspots in indoor mobile robots. Relax and Recover (RAR) is an indoor positioning algorithm using distance observations. The algorithm restores the robot's trajectory through curve fitting and does not require time synchronization of observations. The positioning can be successful with few observations. However, the algorithm has the disadvantages of poor resistance to gross errors and cannot be used for real-time positioning. In this paper, while retaining the advantages of the original algorithm, the RAR algorithm is improved with the adaptive Kalman filter (AKF) based on the innovation sequence to improve the anti-gross error performance of the original algorithm. The improved algorithm can be used for real-time navigation and positioning. The experimental validation found that the improved algorithm has a significant improvement in accuracy when compared to the original RAR. When comparing to the extended Kalman filter (EKF), the accuracy is also increased by 12.5%, which can be used for high-precision positioning of indoor mobile robots.

  • Chinese Named Entity Recognition Method Based on Dictionary Semantic Knowledge Enhancement

    Tianbin WANG  Ruiyang HUANG  Nan HU  Huansha WANG  Guanghan CHU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/02/15
      Page(s):
    1010-1017

    Chinese Named Entity Recognition is the fundamental technology in the field of the Chinese Natural Language Process. It is extensively adopted into information extraction, intelligent question answering, and knowledge graph. Nevertheless, due to the diversity and complexity of Chinese, most Chinese NER methods fail to sufficiently capture the character granularity semantics, which affects the performance of the Chinese NER. In this work, we propose DSKE-Chinese NER: Chinese Named Entity Recognition based on Dictionary Semantic Knowledge Enhancement. We novelly integrate the semantic information of character granularity into the vector space of characters and acquire the vector representation containing semantic information by the attention mechanism. In addition, we verify the appropriate number of semantic layers through the comparative experiment. Experiments on public Chinese datasets such as Weibo, Resume and MSRA show that the model outperforms character-based LSTM baselines.

  • Prediction of Driver's Visual Attention in Critical Moment Using Optical Flow

    Rebeka SULTANA  Gosuke OHASHI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/01/26
      Page(s):
    1018-1026

    In recent years, driver's visual attention has been actively studied for driving automation technology. However, the number of models is few to perceive an insight understanding of driver's attention in various moments. All attention models process multi-level image representations by a two-stream/multi-stream network, increasing the computational cost due to an increment of model parameters. However, multi-level image representation such as optical flow plays a vital role in tasks involving videos. Therefore, to reduce the computational cost of a two-stream network and use multi-level image representation, this work proposes a single stream driver's visual attention model for a critical situation. The experiment was conducted using a publicly available critical driving dataset named BDD-A. Qualitative results confirm the effectiveness of the proposed model. Moreover, quantitative results highlight that the proposed model outperforms state-of-the-art visual attention models according to CC and SIM. Extensive ablation studies verify the presence of optical flow in the model, the position of optical flow in the spatial network, the convolution layers to process optical flow, and the computational cost compared to a two-stream model.

  • 3D Multiple-Contextual ROI-Attention Network for Efficient and Accurate Volumetric Medical Image Segmentation

    He LI  Yutaro IWAMOTO  Xianhua HAN  Lanfen LIN  Akira FURUKAWA  Shuzo KANASAKI  Yen-Wei CHEN  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/02/21
      Page(s):
    1027-1037

    Convolutional neural networks (CNNs) have become popular in medical image segmentation. The widely used deep CNNs are customized to extract multiple representative features for two-dimensional (2D) data, generally called 2D networks. However, 2D networks are inefficient in extracting three-dimensional (3D) spatial features from volumetric images. Although most 2D segmentation networks can be extended to 3D networks, the naively extended 3D methods are resource-intensive. In this paper, we propose an efficient and accurate network for fully automatic 3D segmentation. Specifically, we designed a 3D multiple-contextual extractor to capture rich global contextual dependencies from different feature levels. Then we leveraged an ROI-estimation strategy to crop the ROI bounding box. Meanwhile, we used a 3D ROI-attention module to improve the accuracy of in-region segmentation in the decoder path. Moreover, we used a hybrid Dice loss function to address the issues of class imbalance and blurry contour in medical images. By incorporating the above strategies, we realized a practical end-to-end 3D medical image segmentation with high efficiency and accuracy. To validate the 3D segmentation performance of our proposed method, we conducted extensive experiments on two datasets and demonstrated favorable results over the state-of-the-art methods.

  • Subjective Difficulty Estimation of Educational Comics Using Gaze Features

    Kenya SAKAMOTO  Shizuka SHIRAI  Noriko TAKEMURA  Jason ORLOSKY  Hiroyuki NAGATAKI  Mayumi UEDA  Yuki URANISHI  Haruo TAKEMURA  

     
    PAPER-Educational Technology

      Pubricized:
    2023/02/03
      Page(s):
    1038-1048

    This study explores significant eye-gaze features that can be used to estimate subjective difficulty while reading educational comics. Educational comics have grown rapidly as a promising way to teach difficult topics using illustrations and texts. However, comics include a variety of information on one page, so automatically detecting learners' states such as subjective difficulty is difficult with approaches such as system log-based detection, which is common in the Learning Analytics field. In order to solve this problem, this study focused on 28 eye-gaze features, including the proposal of three new features called “Variance in Gaze Convergence,” “Movement between Panels,” and “Movement between Tiles” to estimate two degrees of subjective difficulty. We then ran an experiment in a simulated environment using Virtual Reality (VR) to accurately collect gaze information. We extracted features in two unit levels, page- and panel-units, and evaluated the accuracy with each pattern in user-dependent and user-independent settings, respectively. Our proposed features achieved an average F1 classification-score of 0.721 and 0.742 in user-dependent and user-independent models at panel unit levels, respectively, trained by a Support Vector Machine (SVM).

  • New Training Method for Non-Dominant Hand Pitching Motion Based on Reversal Trajectory of Dominant Hand Pitching Motion Using AR and Vibration

    Masato SOGA  Taiki MORI  

     
    PAPER-Educational Technology

      Pubricized:
    2023/02/08
      Page(s):
    1049-1058

    In this paper, we propose a new method for non-dominant limb training. The method is that a learner aims at a motion which is generated by reversing his/her own motion of dominant limb, when he/she tries to train himself/herself for non-dominant limb training. In addition, we designed and developed interface for the new method which can select feedback types. One is an interface using AR and sound, and the other is an interface using AR and vibration. We found that vibration feedback was effective for non-dominant hand training of pitching motion, while sound feedback was not so effective as vibration.

  • A Computer Simulation Study on Movement Control by Functional Electrical Stimulation Using Optimal Control Technique with Simplified Parameter Estimation

    Fauzan ARROFIQI  Takashi WATANABE  Achmad ARIFIN  

     
    PAPER-Rehabilitation Engineering and Assistive Technology

      Pubricized:
    2023/02/21
      Page(s):
    1059-1068

    The purpose of this study was to develop a practical functional electrical stimulation (FES) controller for joint movements restoration based on an optimal control technique by cascading a linear model predictive control (MPC) and a nonlinear transformation. The cascading configuration was aimed to obtain an FES controller that is able to deal with a nonlinear system. The nonlinear transformation was utilized to transform the linear solution of linear MPC to become a nonlinear solution in form of optimized electrical stimulation intensity. Four different types of nonlinear functions were used to realize the nonlinear transformation. A simple parameter estimation to determine the value of the nonlinear transformation parameter was also developed. The tracking control capability of the proposed controller along with the parameter estimation was examined in controlling the 1-DOF wrist joint movement through computer simulation. The proposed controller was also compared with a fuzzy FES controller. The proposed MPC-FES controller with estimated parameter value worked properly and had a better control accuracy than the fuzzy controller. The parameter estimation was suggested to be useful and effective in practical FES control applications to reduce the time-consuming of determining the parameter value of the proposed controller.

  • Learning Local Similarity with Spatial Interrelations on Content-Based Image Retrieval

    Longjiao ZHAO  Yu WANG  Jien KATO  Yoshiharu ISHIKAWA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2023/02/14
      Page(s):
    1069-1080

    Convolutional Neural Networks (CNNs) have recently demonstrated outstanding performance in image retrieval tasks. Local convolutional features extracted by CNNs, in particular, show exceptional capability in discrimination. Recent research in this field has concentrated on pooling methods that incorporate local features into global features and assess the global similarity of two images. However, the pooling methods sacrifice the image's local region information and spatial relationships, which are precisely known as the keys to the robustness against occlusion and viewpoint changes. In this paper, instead of pooling methods, we propose an alternative method based on local similarity, determined by directly using local convolutional features. Specifically, we first define three forms of local similarity tensors (LSTs), which take into account information about local regions as well as spatial relationships between them. We then construct a similarity CNN model (SCNN) based on LSTs to assess the similarity between the query and gallery images. The ideal configuration of our method is sought through thorough experiments from three perspectives: local region size, local region content, and spatial relationships between local regions. The experimental results on a modified open dataset (where query images are limited to occluded ones) confirm that the proposed method outperforms the pooling methods because of robustness enhancement. Furthermore, testing on three public retrieval datasets shows that combining LSTs with conventional pooling methods achieves the best results.

  • Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network

    Wenkai LIU  Cuizhu QIN  Menglong WU  Wenle BAI  Hongxia DONG  

     
    LETTER-Human-computer Interaction

      Pubricized:
    2023/02/15
      Page(s):
    1081-1084

    Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.

  • Blockchain-Based Pension System Ensuring Security, Provenance and Efficiency

    Minhaz KAMAL  Chowdhury Mohammad ABDULLAH  Fairuz SHAIARA  Abu Raihan Mostofa KAMAL  Md Mehedi HASAN  Jik-Soo KIM  Md Azam HOSSAIN  

     
    LETTER-Office Information Systems, e-Business Modeling

      Pubricized:
    2023/02/21
      Page(s):
    1085-1088

    The literature presents a digitized pension system based on a consortium blockchain, with the aim of overcoming existing pension system challenges such as multiparty collaboration, manual intervention, high turnaround time, cost transparency, auditability, etc. In addition, the adoption of hyperledger fabric and the introduction of smart contracts aim to transform multi-organizational workflow into a synchronized, automated, modular, and error-free procedure.

  • Local Binary Convolution Based Prior Knowledge of Multi-Direction Features for Finger Vein Verification

    Huijie ZHANG  Ling LU  

     
    LETTER-Pattern Recognition

      Pubricized:
    2023/02/22
      Page(s):
    1089-1093

    The finger-vein-based deep neural network authentication system has been applied widely in real scenarios, such as countries' banking and entrance guard systems. However, to ensure performance, the deep neural network should train many parameters, which needs lots of time and computing resources. This paper proposes a method that introduces artificial features with prior knowledge into the convolution layer. First, it designs a multi-direction pattern base on the traditional local binary pattern, which extracts general spatial information and also reduces the spatial dimension. Then, establishes a sample effective deep convolutional neural network via combination with convolution, with the ability to extract deeper finger vein features. Finally, trains the model with a composite loss function to increase the inter-class distance and reduce the intra-class distance. Experiments show that the proposed methods achieve a good performance of higher stability and accuracy of finger vein recognition.

  • Modality-Fused Graph Network for Cross-Modal Retrieval

    Fei WU  Shuaishuai LI  Guangchuan PENG  Yongheng MA  Xiao-Yuan JING  

     
    LETTER-Pattern Recognition

      Pubricized:
    2023/02/09
      Page(s):
    1094-1097

    Cross-modal hashing technology has attracted much attention for its favorable retrieval performance and low storage cost. However, for existing cross-modal hashing methods, the heterogeneity of data across modalities is still a challenge and how to fully explore and utilize the intra-modality features has not been well studied. In this paper, we propose a novel cross-modal hashing approach called Modality-fused Graph Network (MFGN). The network architecture consists of a text channel and an image channel that are used to learn modality-specific features, and a modality fusion channel that uses the graph network to learn the modality-shared representations to reduce the heterogeneity across modalities. In addition, an integration module is introduced for the image and text channels to fully explore intra-modality features. Experiments on two widely used datasets show that our approach achieves better results than the state-of-the-art cross-modal hashing methods.

  • Speech Emotion Recognition Using Multihead Attention in Both Time and Feature Dimensions

    Yue XIE  Ruiyu LIANG  Zhenlin LIANG  Xiaoyan ZHAO  Wenhao ZENG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2023/02/21
      Page(s):
    1098-1101

    To enhance the emotion feature and improve the performance of speech emotion recognition, an attention mechanism is employed to recognize the important information in both time and feature dimensions. In the time dimension, multi-heads attention is modified with the last state of the long short-term memory (LSTM)'s output to match the time accumulation characteristic of LSTM. In the feature dimension, scaled dot-product attention is replaced with additive attention that refers to the method of the state update of LSTM to construct multi-heads attention. This means that a nonlinear change replaces the linear mapping in classical multi-heads attention. Experiments on IEMOCAP datasets demonstrate that the attention mechanism could enhance emotional information and improve the performance of speech emotion recognition.

  • Wider Depth Dynamic Range Using Occupancy Map Correction for Immersive Video Coding

    Sung-Gyun LIM  Dong-Ha KIM  Kwan-Jung OH  Gwangsoon LEE  Jun Young JEONG  Jae-Gon KIM  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2023/02/10
      Page(s):
    1102-1105

    The MPEG Immersive Video (MIV) standard for immersive video coding provides users with an immersive sense of 6 degrees of freedom (6DoF) of view position and orientation by efficiently compressing multiview video acquired from different positions in a limited 3D space. In the MIV reference software called Test Model for Immersive Video (TMIV), the number of pixels to be compressed and transmitted is reduced by removing inter-view redundancy. Therefore, the occupancy information that indicates whether each pixel is valid or invalid must also be transmitted to the decoder for viewport rendering. The occupancy information is embedded in a geometry atlas and transmitted to the decoder side. At this time, to prevent occupancy errors that may occur during the compression of the geometry atlas, a guard band is set in the depth dynamic range. Reducing this guard band can improve the rendering quality by allowing a wider dynamic range for depth representation. Therefore, in this paper, based on the analysis of occupancy error of the current TMIV, two methods of occupancy error correction which allow depth dynamic range extension in the case of computer-generated (CG) sequences are presented. The experimental results show that the proposed method gives an average 2.2% BD-rate bit saving for CG compared to the existing TMIV.

  • Convolution Block Feature Addition Module (CBFAM) for Lightweight and Fast Object Detection on Non-GPU Devices

    Min Ho KWAK  Youngwoo KIM  Kangin LEE  Jae Young CHOI  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2023/01/24
      Page(s):
    1106-1110

    This letter proposes a novel lightweight deep learning object detector named LW-YOLOv4-tiny, which incorporates the convolution block feature addition module (CBFAM). The novelty of LW-YOLOv4-tiny is the use of channel-wise convolution and element-wise addition in the CBFAM instead of utilizing the concatenation of different feature maps. The model size and computation requirement are reduced by up to 16.9 Mbytes, 5.4 billion FLOPs (BFLOPS), and 11.3 FPS, which is 31.9%, 22.8%, and 30% smaller and faster than the most recent version of YOLOv4-tiny. From the MSCOCO2017 and PASCAL VOC2012 benchmarks, LW-YOLOv4-tiny achieved 40.2% and 69.3% mAP, respectively.