IEICE TRANSACTIONS on Information

Impact Factor

0.72
Eigenfactor

0.002
article influence

0.1
Cite Score

1.4

To the Advance publication
To the Archives

Advance publication (published online immediately after acceptance)

Vision Transformer with Key-select Routing Attention for Single Image Dehazing
Lihan TONG Weijia LI Qingxia YANG Liyuan CHEN Peng CHEN

Pubricized:
2024/07/01
- Summary
- Free PDF (2MB)
Towards Superior Pruning Performance in Federated Learning with Discriminative Data
Yinan YANG

Pubricized:
2024/06/27
- Summary
- Free PDF (7.9MB)
CLEAR & RETURN: Stopping Run-time Countermeasures in Cryptographic Primitives
Myung-Hyun KIM Seungkwang LEE

Pubricized:
2024/06/26
- Summary
- Free PDF (1.7MB)
SH-YOLO: Small Target High Performance YOLO for abnormal behavior detection in escalator scene
Shuoyan LIU Chao LI Yuxin LIU Yanqiu WANG

Pubricized:
2024/06/26
- Summary
- Free PDF (708.4KB)
Design and implementation of opto-electrical hybrid floating-point multipliers
Takumi INABA Takatsugu ONO Koji INOUE Satoshi KAWAKAMI

Pubricized:
2024/06/26
- Summary
- Free PDF (2.5MB)
Geometric Refactoring of Quantum and Reversible Circuits using Graph Algorithms
Martin LUKAC Saadat NURSULTAN Georgiy KRYLOV Oliver KESZOCZE Abilmansur RAKHMETTULAYEV Michitaka KAMEYAMA

Pubricized:
2024/06/24
- Summary
- Free PDF (1010KB)
IAD-Net: Single-Image Dehazing Network Based on Image Attention
Zheqing ZHANG Hao ZHOU Chuan LI Weiwei JIANG

Pubricized:
2024/06/20
- Summary
- Free PDF (9.8MB)
Improving the Accuracy of Differential-Neural Distinguisher For DES, Chaskey, and PRESENT
Liu ZHANG Zilong WANG Yindong CHEN

Pubricized:
2024/06/20
- Summary
- Free PDF (355.3KB)
Multi-Scale Contrastive Learning for Human Pose Estimation
Wenxia Bao An Lin Hua Huang Xianjun Yang Hemu Chen

Pubricized:
2024/06/17
- Summary
- Free PDF (1MB)
HDR-VDA: A Full Stage Data Augmentation Method for HDR Video Reconstruction
Fengshan ZHAO Qin LIU Takeshi IKENAGA

Pubricized:
2024/06/17
- Summary
- Free PDF (1.2MB)
Evaluating Introduction of Systems by Goal Dependency Modeling
Haruhiko KAIYA Shinpei OGATA Shinpei HAYASHI

Pubricized:
2024/06/11
- Summary
- Free PDF (1.3MB)
MISpeller: Multimodal Information Enhancement for Chinese Spelling Correction
Jiakai LI Jianyong DUAN Hao WANG Li HE Qing ZHANG

Pubricized:
2024/06/07
- Summary
- Free PDF (3.3MB)
Integrating Event Elements for Chinese-Vietnamese Cross-lingual Event Retrieval
Yuxin HUANG Yuanlin YANG Enchang ZHU Yin LIANG Yantuan XIAN

Pubricized:
2024/06/04
- Summary
- Free PDF (3.9MB)
Space-efficient FPT Algorithms for Degeneracy
Naohito MATSUMOTO Kazuhiro KURITA Masashi KIYOMI

Pubricized:
2024/05/31
- Summary
- Free PDF (101.3KB)
Learning Fast Deployment for UAV-Assisted Disaster System
Na XING Lu LI Ye ZHANG Shiyi YANG

Pubricized:
2024/05/30
- Summary
- Free PDF (10.3MB)
TDEM: Table data extraction model based on cell segmentation
Zhe Wang Zhe-Ming Lu Hao Luo Yang-Ming Zheng

Pubricized:
2024/05/30
- Summary
- Free PDF (838KB)
Reliable image matching using optimal combination of color and intensity information based on relationship with surrounding objects
Rina TAGAMI Hiroki KOBAYASHI Shuichi AKIZUKI Manabu HASHIMOTO

Pubricized:
2024/05/30
- Summary
- Free PDF (4.2MB)
The Least Core of Routing Game Without Triangle Inequality
Tomohiro KOBAYASHI Tomomi MATSUI

Pubricized:
2024/05/30
- Summary
- Free PDF (232.9KB)
Enumerating floorplans with Aligned Columns
Shin-ichi NAKANO

Pubricized:
2024/05/30
- Summary
- Free PDF (365.4KB)
A Two-Phase Algorithm for Reliable and Energy-Efficient Heterogeneous Embedded Systems
Hongzhi XU Binlian ZHANG

Pubricized:
2024/05/27
- Summary
- Free PDF (902.4KB)
Smart Contract Timestamp Vulnerability Detection Based on Code Homogeneity
Weizhi WANG Lei XIA Zhuo ZHANG Xiankai MENG

Pubricized:
2024/05/27
- Summary
- Free PDF (706KB)
Neural End-to-end Speech Translation Leveraged by ASR Posterior Distribution
Yuka KO Katsuhito SUDOH Sakriani SAKTI Satoshi NAKAMURA

Pubricized:
2024/05/24
- Summary
- Free PDF (1.5MB)
Watermarking Method with Scaling Rate Estimation Using Pilot Signal
Rinka KAWANO Masaki KAWAMURA

Pubricized:
2024/05/22
- Summary
- Free PDF (1MB)
Type-enhanced Ensemble Triple Representation via Triple-aware Attention for Cross-lingual Entity Alignment
Zhishuo ZHANG Chengxiang TAN Xueyan ZHAO Min YANG

Pubricized:
2024/05/22
- Summary
- Free PDF (5.1MB)
Joint Optimization of Task Offloading and Resource Allocation for UAV-Assisted Edge Computing: A Stackelberg Bilayer Game Approach
Peng WANG Guifen CHEN Zhiyao SUN

Pubricized:
2024/05/21
- Summary
- Free PDF (678.2KB)
EfficientNet Empowered by Dendritic Learning for Diabetic Retinopathy
Zeyuan JU Zhipeng LIU Yu GAO Haotian LI Qianhang DU Kota YOSHIKAWA Shangce GAO

Pubricized:
2024/05/20
- Summary
- Free PDF (517.4KB)
6T-8T hybrid SRAM for lower-power neural-network processing by lowering operating voltage
Ji WU Ruoxi YU Kazuteru NAMBA

Pubricized:
2024/05/20
- Summary
- Free PDF (495.1KB)
Chinese Spelling Correction Based on Knowledge Enhancement and Contrastive Learning
Hao WANG Yao Ma Jianyong Duan Li HE Xin Li

Pubricized:
2024/05/17
- Summary
- Free PDF (1.2MB)
TIG: A Multitask Temporal Interval Guided Framework for Key Frame Detection
Shijie WANG Xuejiao HU Sheng LIU Ming LI Yang LI Sidan DU

Pubricized:
2024/05/17
- Summary
- Free PDF (10.6MB)
Node-to-node and Node-to-set Disjoint Paths Problems in Bicubes
Arata KANEKO Htoo Htoo Sandi KYAW Kunihiro FUJIYOSHI Keiichi KANEKO

Pubricized:
2024/05/17
- Summary
- Free PDF (1MB)
Remote Sensing Image Dehazing Using Multi-Scale Gated Attention For Flight Simulator
Qi LIU Bo WANG Shihan TAN Shurong ZOU Wenyi GE

Pubricized:
2024/05/14
- Summary
- Free PDF (4.2MB)
Large Class Detection using GNNs: A graph based deep learning approach utilizing three typical GNN model architectures
HanYu Zhang Tomoji Kishi

Pubricized:
2024/05/14
- Summary
- Free PDF (1.4MB)
Functional Decomposition of Symmetric Multiple-Valued Functions and Their Compact Representation in Decision Diagrams
Shinobu NAGAYAMA Tsutomu SASAO Jon T. BUTLER

Pubricized:
2024/05/14
- Summary
- Free PDF (303.5KB)
Greedy selection of sensors for linear Bayesian estimation under correlated noise
Yoon Hak KIM

Pubricized:
2024/05/14
- Summary
- Free PDF (115KB)
New Bounds for Quick Computation of the Lower Bound on the Gate Count of Toffoli-Based Reversible Logic Circuits
Takashi HIRAYAMA Rin SUZUKI Katsuhisa YAMANAKA Yasuaki NISHITANI

Pubricized:
2024/05/10
- Summary
- Free PDF (222.1KB)
Evaluation of Multi-valued Data Transmission in Two-Dimensional Symbol Mapping using Linear Mixture Model
Yosuke IIJIMA Atsunori OKADA Yasushi YUMINAKA

Pubricized:
2024/05/09
- Summary
- Free PDF (8.2MB)
Using Genetic Algorithm and Mathematical Programming Model for Ambulance Location Problem in Emergency Medical Service
Batnasan Luvaanjalba Elaine Yi-Ling Wu

Pubricized:
2024/05/08
- Summary
- Free PDF (909.3KB)
Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation
KuanChao CHU Satoshi YAMAZAKI Hideki NAKAYAMA

Pubricized:
2024/04/30
- Summary
- Free PDF (9.2MB)
A mmWave sensor and camera fusion system for indoor occupancy detection and tracking
Shenglei LI Haoran LUO Tengfei SHAO Reiko HISHIYAMA

Pubricized:
2024/04/26
- Summary
- Free PDF (4.1MB)
Evaluating PAM-4 Data Transmission Quality using Multi-Dimensional Mapping of Received Symbols
Yasushi YUMINAKA Kazuharu NAKAJIMA Yosuke IIJIMA

Pubricized:
2024/04/25
- Summary
- Free PDF (7.5MB)
Unsupervised Intrusion Detection Based on Asymmetric Auto-Encoder Feature Extraction
Chunbo Liu Liyin Wang Zhikai Zhang Chunmiao Xiang Zhaojun Gu Zhi Wang Shuang Wang

Pubricized:
2024/04/25
- Summary
- Free PDF (1.2MB)
Reinforced Voxel-RCNN:An efficient 3D Object Detection Method Based on Feature Aggregation
Jia-ji JIANG Hai-bin WAN Hong-min SUN Tuan-fa QIN Zheng-qiang WANG

Pubricized:
2024/04/24
- Summary
- Free PDF (4.6MB)
A Channel Contrastive Attention-based Local-Nonlocal Mutual block on Super-Resolution
Yuhao LIU Zhenzhong CHU Lifei WEI

Pubricized:
2024/04/23
- Summary
- Free PDF (1.5MB)
Error-Tolerance-Aware Write-Energy Reduction of MTJ-Based Quantized Neural Network Hardware
Ken ASANO Masanori NATSUI Takahiro HANYU

Pubricized:
2024/04/22
- Summary
- Free PDF (2.1MB)
Skin diagnostic method using Fontana-Masson stained images of stratum corneum cells
Shuto HASEGAWA Koichiro ENOMOTO Taeko MIZUTANI Yuri OKANO Takenori TANAKA Osamu SAKAI

Pubricized:
2024/04/19
- Summary
- Free PDF (6.6MB)
Confidence-Driven Contrastive Learning for Document Classification without Annotated Data
Zhewei XU Mizuho IWAIHARA

Pubricized:
2024/04/19
- Summary
- Free PDF (2.2MB)
Delta-Sigma Domain Signal Processing Revisited with Related Topics in Stochastic Computing
Takao WAHO Akihisa KOYAMA Hitoshi HAYASHI

Pubricized:
2024/04/17
- Summary
- Free PDF (1.7MB)
Extending Binary Neural Networks to Bayesian Neural Networks with Probabilistic Interpretation of Binary Weights
Taisei SAITO Kota ANDO Tetsuya ASAI

Pubricized:
2024/04/17
- Summary
- Free PDF (1.1MB)
Unveiling Python Version Compatibility Challenges in Code Snippets on Stack Overflow
Shiyu YANG Tetsuya KANDA Daniel M. GERMAN Yoshiki HIGO

Pubricized:
2024/04/16
- Summary
- Free PDF (488.3KB)
On Easily Reconstructable Logic Functions
Tsutomu SASAO

Pubricized:
2024/04/16
- Summary
- Free PDF (240.7KB)
Tracking WebVR User Activities through Hand Motions: An Attack Perspective
Jiyeon LEE

Pubricized:
2024/04/16
- Summary
- Free PDF (3MB)
Permissionless Blockchain-Based Sybil-Resistant Self-Sovereign Identity Utilizing Attested Execution Secure Processors
Koichi MORIYAMA Akira OTSUKA

Pubricized:
2024/04/15
- Summary
- Free PDF (4.1MB)
Cross-Corpus Speech Emotion Recognition Based on Causal Emotion Information Representation
Hongliang FU Qianqian LI Huawei TAO Chunhua ZHU Yue XIE Ruxue GUO

Pubricized:
2024/04/12
- Summary
- Free PDF (8.1MB)
Investigating and Enhancing the Neural Distinguisher for Differential Cryptanalysis
Gao WANG Gaoli WANG Siwei SUN

Pubricized:
2024/04/12
- Summary
- Free PDF (1.7MB)
Nuclear Norm Minus Frobenius Norm Minimization with Rank Residual Constraint for Image Denoising
Hua HUANG Yiwen SHAN Chuan LI Zhi WANG

Pubricized:
2024/04/09
- Summary
- Free PDF (6.9MB)
Improved Just Noticeable Difference Model Based Algorithm for Fast CU Partition in V-PCC
Zhi LIU Heng WANG Yuan LI Hongyun LU Hongyuan JING Mengmeng ZHANG

Pubricized:
2024/04/05
- Summary
- Free PDF (1.1MB)
MDX-Mixer: Music Demixing by Leveraging Source Signals Separated by Existing Demixing Models
Tomoyasu NAKANO Masataka GOTO

Pubricized:
2024/04/05
- Summary
- Free PDF (2.4MB)
Machine Learning-based System for Heat-Resistant Analysis of Car Lamp Design
Hyebong CHOI Joel SHIN Jeongho KIM Samuel YOON Hyeonmin PARK Hyejin CHO Jiyoung JUNG

Pubricized:
2024/04/03
- Summary
- Free PDF (1.4MB)
Agent Allocation-Action Learning with Dynamic Heterogeneous Graph in Multi-task Games
Xianglong LI Yuan LI Jieyuan ZHANG Xinhai XU Donghong LIU

Pubricized:
2024/04/03
- Summary
- Free PDF (3.9MB)
FSAMT : Face Shape Adaptive Makeup Transfer
Haoran LUO Tengfei SHAO Shenglei LI Reiko HISHIYAMA

Pubricized:
2024/04/02
- Summary
- Free PDF (984.6KB)
Artifact Removal Using Attention Guided Local-Global Dual-Stream Network for Sparse-View CT Reconstruction
Chang SUN Yitong LIU Hongwen YANG

Pubricized:
2024/03/29
- Summary
A CNN-based feature pyramid segmentation Strategy for acoustic；scene classification
Ji XI Yue XIE Pengxu JIANG Wei JIANG

Pubricized:
2024/03/26
- Summary
An IP Core Protection Scheme Based on Hybrid Lightweight Encryption for Neuromorphic Computing System
Ming PAN

The aritcle processing charge of this paper has not been paid.

Pubricized:
2022/09/14
- Summary

Whole issue(109.2MB)

Volume E106-D No.5 (Publication Date:2023/05/01)

Special Section on Deep Learning Technologies: Architecture, Optimization, Techniques, and Applications

FOREWORD Open Access
Chi-Hua CHEN

FOREWORD

Page(s):
579-580
- HTML
- Free PDF (70.9KB)
A Visual Question Answering Network Merging High- and Low-Level Semantic Information
Huimin LI Dezhi HAN Chongqing CHEN Chin-Chen CHANG Kuan-Ching LI Dun LI

PAPER-Core Methods

Pubricized:
2022/01/06
Page(s):
581-589
Visual Question Answering (VQA) usually uses deep attention mechanisms to learn fine-grained visual content of images and textual content of questions. However, the deep attention mechanism can only learn high-level semantic information while ignoring the impact of the low-level semantic information on answer prediction. For such, we design a High- and Low-Level Semantic Information Network (HLSIN), which employs two strategies to achieve the fusion of high-level semantic information and low-level semantic information. Adaptive weight learning is taken as the first strategy to allow different levels of semantic information to learn weights separately. The gate-sum mechanism is used as the second to suppress invalid information in various levels of information and fuse valid information. On the benchmark VQA-v2 dataset, we quantitatively and qualitatively evaluate HLSIN and conduct extensive ablation studies to explore the reasons behind HLSIN's effectiveness. Experimental results demonstrate that HLSIN significantly outperforms the previous state-of-the-art, with an overall accuracy of 70.93% on test-dev.
The Comparison of Attention Mechanisms with Different Embedding Modes for Performance Improvement of Fine-Grained Classification
Wujian YE Run TAN Yijun LIU Chin-Chen CHANG

PAPER-Core Methods

Pubricized:
2021/12/22
Page(s):
590-600
Fine-grained image classification is one of the key basic tasks of computer vision. The appearance of traditional deep convolutional neural network (DCNN) combined with attention mechanism can focus on partial and local features of fine-grained images, but it still lacks the consideration of the embedding mode of different attention modules in the network, leading to the unsatisfactory result of classification model. To solve the above problems, three different attention mechanisms are introduced into the DCNN network (like ResNet, VGGNet, etc.), including SE, CBAM and ECA modules, so that DCNN could better focus on the key local features of salient regions in the image. At the same time, we adopt three different embedding modes of attention modules, including serial, residual and parallel modes, to further improve the performance of the classification model. The experimental results show that the three attention modules combined with three different embedding modes can improve the performance of DCNN network effectively. Moreover, compared with SE and ECA, CBAM has stronger feature extraction capability. Among them, the parallelly embedded CBAM can make the local information paid attention to by DCNN richer and more accurate, and bring the optimal effect for DCNN, which is 1.98% and 1.57% higher than that of original VGG16 and Resnet34 in CUB-200-2011 dataset, respectively. The visualization analysis also indicates that the attention modules can be easily embedded into DCNN networks, especially in the parallel mode, with stronger generality and universality.
A Novel Differential Evolution Algorithm Based on Local Fitness Landscape Information for Optimization Problems
Jing LIANG Ke LI Kunjie YU Caitong YUE Yaxin LI Hui SONG

PAPER-Core Methods

Pubricized:
2023/02/13
Page(s):
601-616
The selection of mutation strategy greatly affects the performance of differential evolution algorithm (DE). For different types of optimization problems, different mutation strategies should be selected. How to choose a suitable mutation strategy for different problems is a challenging task. To deal with this challenge, this paper proposes a novel DE algorithm based on local fitness landscape, called FLIDE. In the proposed method, fitness landscape information is obtained to guide the selection of mutation operators. In this way, different problems can be solved with proper evolutionary mechanisms. Moreover, a population adjustment method is used to balance the search ability and population diversity. On one hand, the diversity of the population in the early stage is enhanced with a relative large population. One the other hand, the computational cost is reduced in the later stage with a relative small population. The evolutionary information is utilized as much as possible to guide the search direction. The proposed method is compared with five popular algorithms on 30 test functions with different characteristics. Experimental results show that the proposed FLIDE is more effective on problems with high dimensions.
Effectively Utilizing the Category Labels for Image Captioning
Junlong FENG Jianping ZHAO

PAPER-Core Methods

Pubricized:
2021/12/13
Page(s):
617-624
As a further investigation of the image captioning task, some works extended the vision-text dataset for specific subtasks, such as the stylized caption generating. The corpus in such dataset is usually composed of obvious sentiment-bearing words. While, in some special cases, the captions are classified depending on image category. This will result in a latent problem: the generated sentences are in close semantic meaning but belong to different or even opposite categories. It is a worthy issue to explore an effective way to utilize the image category label to boost the caption difference. Therefore, we proposed an image captioning network with the label control mechanism (LCNET) in this paper. First, to further improve the caption difference, LCNET employs a semantic enhancement module to provide the decoder with global semantic vectors. Then, through the proposed label control LSTM, LCNET can dynamically modulate the caption generation depending on the image category labels. Finally, the decoder integrates the spatial image features with global semantic vectors to output the caption. Using all the standard evaluation metrics shows that our model outperforms the compared models. Caption analysis demonstrates our approach can improve the performance of semantic representation. Compared with other label control mechanisms, our model is capable of boosting the caption difference according to the labels and keeping a better consistent with image content as well.
A Novel SSD-Based Detection Algorithm Suitable for Small Object
Xi ZHANG Yanan ZHANG Tao GAO Yong FANG Ting CHEN

PAPER-Core Methods

Pubricized:
2022/01/06
Page(s):
625-634
The original single-shot multibox detector (SSD) algorithm has good detection accuracy and speed for regular object recognition. However, the SSD is not suitable for detecting small objects for two reasons: 1) the relationships among different feature layers with various scales are not considered, 2) the predicted results are solely determined by several independent feature layers. To enhance its detection capability for small objects, this study proposes an improved SSD-based algorithm called proportional channels' fusion SSD (PCF-SSD). Three enhancements are provided by this novel PCF-SSD algorithm. First, a fusion feature pyramid model is proposed by concatenating channels of certain key feature layers in a given proportion for object detection. Second, the default box sizes are adjusted properly for small object detection. Third, an improved loss function is suggested to train the above-proposed fusion model, which can further improve object detection performance. A series of experiments are conducted on the public database Pascal VOC to validate the PCF-SSD. On comparing with the original SSD algorithm, our algorithm improves the mean average precision and detection accuracy for small objects by 3.3% and 3.9%, respectively, with a detection speed of 40FPS. Furthermore, the proposed PCF-SSD can achieve a better balance of detection accuracy and efficiency than the original SSD algorithm, as demonstrated by a series of experimental results.
Deep Reinforcement Learning Based Ontology Meta-Matching Technique
Xingsi XUE Yirui HUANG Zeqing ZHANG

PAPER-Core Methods

Pubricized:
2022/03/04
Page(s):
635-643
Ontologies are regarded as the solution to data heterogeneity on the Semantic Web (SW), but they also suffer from the heterogeneity problem, which leads to the ambiguity of data information. Ontology Meta-Matching technique (OMM) is able to solve the ontology heterogeneity problem through aggregating various similarity measures to find the heterogeneous entities. Inspired by the success of Reinforcement Learning (RL) in solving complex optimization problems, this work proposes a RL-based OMM technique to address the ontology heterogeneity problem. First, we propose a novel RL-based OMM framework, and then, a neural network that is called evaluated network is proposed to replace the Q table when we choose the next action of the agent, which is able to reduce memory consumption and computing time. After that, to better guide the training of neural network and improve the accuracy of RL agent, we establish a memory bank to mine depth information during the evaluated network's training procedure, and we use another neural network that is called target network to save the historical parameters. The experiment uses the famous benchmark in ontology matching domain to test our approach's performance, and the comparisons among Deep Reinforcement Learning(DRL), RL and state-of-the-art ontology matching systems show that our approach is able to effectively determine high-quality alignments.
Intelligent Tool Condition Monitoring Based on Multi-Scale Convolutional Recurrent Neural Network
Xincheng CAO Bin YAO Binqiang CHEN Wangpeng HE Suqin GUO Kun CHEN

PAPER-Smart Industry

Pubricized:
2022/06/16
Page(s):
644-652
Tool condition monitoring is one of the core tasks of intelligent manufacturing in digital workshop. This paper presents an intelligent recognize method of tool condition based on deep learning. First, the industrial microphone is used to collect the acoustic signal during machining; then, a central fractal decomposition algorithm is proposed to extract sensitive information; finally, the multi-scale convolutional recurrent neural network is used for deep feature extraction and pattern recognition. The multi-process milling experiments proved that the proposed method is superior to the existing methods, and the recognition accuracy reached 88%.
Computer Vision-Based Tracking of Workers in Construction Sites Based on MDNet
Wen LIU Yixiao SHAO Shihong ZHAI Zhao YANG Peishuai CHEN

PAPER-Smart Industry

Pubricized:
2022/10/20
Page(s):
653-661
Automatic continuous tracking of objects involved in a construction project is required for such tasks as productivity assessment, unsafe behavior recognition, and progress monitoring. Many computer-vision-based tracking approaches have been investigated and successfully tested on construction sites; however, their practical applications are hindered by the tracking accuracy limited by the dynamic, complex nature of construction sites (i.e. clutter with background, occlusion, varying scale and pose). To achieve better tracking performance, a novel deep-learning-based tracking approach called the Multi-Domain Convolutional Neural Networks (MD-CNN) is proposed and investigated. The proposed approach consists of two key stages: 1) multi-domain representation of learning; and 2) online visual tracking. To evaluate the effectiveness and feasibility of this approach, it is applied to a metro project in Wuhan China, and the results demonstrate good tracking performance in construction scenarios with complex background. The average distance error and F-measure for the MDNet are 7.64 pixels and 67, respectively. The results demonstrate that the proposed approach can be used by site managers to monitor and track workers for hazard prevention in construction sites.
An Improved Insulator and Spacer Detection Algorithm Based on Dual Network and SSD
Yong LI Shidi WEI Xuan LIU Yinzheng LUO Yafeng LI Feng SHUANG

PAPER-Smart Industry

Pubricized:
2022/10/17
Page(s):
662-672
The traditional manual inspection is gradually replaced by the unmanned aerial vehicles (UAV) automatic inspection. However, due to the limited computational resources carried by the UAV, the existing deep learning-based algorithm needs a large amount of computational resources, which makes it impossible to realize the online detection. Moreover, there is no effective online detection system at present. To realize the high-precision online detection of electrical equipment, this paper proposes an SSD (Single Shot Multibox Detector) detection algorithm based on the improved Dual network for the images of insulators and spacers taken by UAVs. The proposed algorithm uses MnasNet and MobileNetv3 to form the Dual network to extract multi-level features, which overcomes the shortcoming of single convolutional network-based backbone for feature extraction. Then the features extracted from the two networks are fused together to obtain the features with high-level semantic information. Finally, the proposed algorithm is tested on the public dataset of the insulator and spacer. The experimental results show that the proposed algorithm can detect insulators and spacers efficiently. Compared with other methods, the proposed algorithm has the advantages of smaller model size and higher accuracy. The object detection accuracy of the proposed method is up to 95.1%.
Image-to-Image Translation for Data Augmentation on Multimodal Medical Images
Yue PENG Zuqiang MENG Lina YANG

PAPER-Smart Healthcare

Pubricized:
2022/03/01
Page(s):
686-696
Medical images play an important role in medical diagnosis. However, acquiring a large number of datasets with annotations is still a difficult task in the medical field. For this reason, research in the field of image-to-image translation is combined with computer-aided diagnosis, and data augmentation methods based on generative adversarial networks are applied to medical images. In this paper, we try to perform data augmentation on unimodal data. The designed StarGAN V2 based network has high performance in augmenting the dataset using a small number of original images, and the augmented data is expanded from unimodal data to multimodal medical images, and this multimodal medical image data can be applied to the segmentation task with some improvement in the segmentation results. Our experiments demonstrate that the generated multimodal medical image data can improve the performance of glioma segmentation.
MolHF: Molecular Heterogeneous Attributes Fusion for Drug-Target Affinity Prediction on Heterogeneity
Runze WANG Zehua ZHANG Yueqin ZHANG Zhongyuan JIANG Shilin SUN Guixiang MA

PAPER-Smart Healthcare

Pubricized:
2022/05/31
Page(s):
697-706
Recent studies in protein structure prediction such as AlphaFold have enabled deep learning to achieve great attention on the Drug-Target Affinity (DTA) task. Most works are dedicated to embed single molecular property and homogeneous information, ignoring the diverse heterogeneous information gains that are contained in the molecules and interactions. Motivated by this, we propose an end-to-end deep learning framework to perform Molecular Heterogeneous features Fusion (MolHF) for DTA prediction on heterogeneity. To address the challenges that biochemical attributes locates in different heterogeneous spaces, we design a Molecular Heterogeneous Information Learning module with multi-strategy learning. Especially, Molecular Heterogeneous Attention Fusion module is present to obtain the gains of molecular heterogeneous features. With these, the diversity of molecular structure information for drugs can be extracted. Extensive experiments on two benchmark datasets show that our method outperforms the baselines in all four metrics. Ablation studies validate the effect of attentive fusion and multi-group of drug heterogeneous features. Visual presentations demonstrate the impact of protein embedding level and the model ability of fitting data. In summary, the diverse gains brought by heterogeneous information contribute to drug-target affinity prediction.
The Effectiveness of Data Augmentation for Mature White Blood Cell Image Classification in Deep Learning — Selection of an Optimal Technique for Hematological Morphology Recognition —
Hiroyuki NOZAKA Kosuke KAMATA Kazufumi YAMAGATA

PAPER-Smart Healthcare

Pubricized:
2022/11/22
Page(s):
707-714
The data augmentation method is known as a helpful technique to generate a dataset with a large number of images from one with a small number of images for supervised training in deep learning. However, a low validity augmentation method for image recognition was reported in a recent study on artificial intelligence (AI). This study aimed to clarify the optimal data augmentation method in deep learning model generation for the recognition of white blood cells (WBCs). Study Design: We conducted three different data augmentation methods (rotation, scaling, and distortion) on original WBC images, with each AI model for WBC recognition generated by supervised training. The subjects of the clinical assessment were 51 healthy persons. Thin-layer blood smears were prepared from peripheral blood and subjected to May-Grünwald-Giemsa staining. Results: The only significantly effective technique among the AI models for WBC recognition was data augmentation with rotation. By contrast, the effectiveness of both image distortion and image scaling was poor, and improved accuracy was limited to a specific WBC subcategory. Conclusion: Although data augmentation methods are often used for achieving high accuracy in AI generation with supervised training, we consider that it is necessary to select the optimal data augmentation method for medical AI generation based on the characteristics of medical images.
Fish Detecting Using YOLOv4 and CVAE in Aquaculture Ponds with a Non-Uniform Strong Reflection Background
Meng ZHAO Junfeng WU Hong YU Haiqing LI Jingwen XU Siqi CHENG Lishuai GU Juan MENG

PAPER-Smart Agriculture

Pubricized:
2022/11/07
Page(s):
715-725
Accurate fish detection is of great significance in aquaculture. However, the non-uniform strong reflection in aquaculture ponds will affect the precision of fish detection. This paper combines YOLOv4 and CVAE to accurately detect fishes in the image with non-uniform strong reflection, in which the reflection in the image is removed at first and then the reflection-removed image is provided for fish detecting. Firstly, the improved YOLOv4 is applied to detect and mask the strong reflective region, to locate and label the reflective region for the subsequent reflection removal. Then, CVAE is combined with the improved YOLOv4 for inferring the priori distribution of the Reflection region and restoring the Reflection region by the distribution so that the reflection can be removed. For further improving the quality of the reflection-removed images, the adversarial learning is appended to CVAE. Finally, YOLOV4 is used to detect fishes in the high quality image. In addition, a new image dataset of pond cultured takifugu rubripes is constructed,, which includes 1000 images with fishes annotated manually, also a synthetic dataset including 2000 images with strong reflection is created and merged with the generated dataset for training and verifying the robustness of the proposed method. Comprehensive experiments are performed to compare the proposed method with the state-of-the-art fish detecting methods without reflection removal on the generated dataset. The results show that the fish detecting precision and recall of the proposed method are improved by 2.7% and 2.4% respectively.
Detection Method of Fat Content in Pig B-Ultrasound Based on Deep Learning
Wenxin DONG Jianxun ZHANG Shuqiu TAN Xinyue ZHANG

PAPER-Smart Agriculture

Pubricized:
2022/02/07
Page(s):
726-734
In the pork fat content detection task, traditional physical or chemical methods are strongly destructive, have substantial technical requirements and cannot achieve nondestructive detection without slaughtering. To solve these problems, we propose a novel, convenient and economical method for detecting the fat content of pig B-ultrasound images based on hybrid attention and multiscale fusion learning, which extracts and fuses shallow detail information and deep semantic information at multiple scales. First, a deep learning network is constructed to learn the salient features of fat images through a hybrid attention mechanism. Then, the information describing pork fat is extracted at multiple scales, and the detailed information expressed in the shallow layer and the semantic information expressed in the deep layer are fused later. Finally, a deep convolution network is used to predict the fat content compared with the real label. The experimental results show that the determination coefficient is greater than 0.95 on the 130 groups of pork B-ultrasound image data sets, which is 2.90, 6.10 and 5.13 percentage points higher than that of VGGNet, ResNet and DenseNet, respectively. It indicats that the model could effectively identify the B-ultrasound image of pigs and predict the fat content with high accuracy.
Compression of Vehicle and Pedestrian Detection Network Based on YOLOv3 Model
Lie GUO Yibing ZHAO Jiandong GAO

PAPER-Intelligent Transportation Systems

Pubricized:
2022/06/22
Page(s):
735-745
The commonly used object detection algorithm based on convolutional neural network is difficult to meet the real-time requirement on embedded platform due to its large size of model, large amount of calculation, and long inference time. It is necessary to use model compression to reduce the amount of network calculation and increase the speed of network inference. This paper conducts compression of vehicle and pedestrian detection network by pruning and removing redundant parameters. The vehicle and pedestrian detection network is trained based on YOLOv3 model by using K-means++ to cluster the anchor boxes. The detection accuracy is improved by changing the proportion of categorical losses and regression losses for each category in the loss function because of the unbalanced number of targets in the dataset. A layer and channel pruning algorithm is proposed by combining global channel pruning thresholds and L1 norm, which can reduce the time cost of the network layer transfer process and the amount of computation. Network layer fusion based on TensorRT is performed and inference is performed using half-precision floating-point to improve the speed of inference. Results show that the vehicle and pedestrian detection compression network pruned 84% channels and 15 Shortcut modules can reduce the size by 32% and the amount of calculation by 17%. While the network inference time can be decreased to 21 ms, which is 1.48 times faster than the network pruned 84% channels.
Dynamic Evolution Simulation of Bus Bunching Affected by Traffic Operation State
Shaorong HU Yuqi ZHANG Yuefei JIN Ziqi DOU

PAPER-Intelligent Transportation Systems

Pubricized:
2022/04/13
Page(s):
746-755
Bus bunching often occurs in public transit system, resulting in a series of problems such as poor punctuality, long waiting time and low service quality. In this paper, we explore the influence of the discrete distribution of traffic operation state on the dynamic evolution of bus bunching. Firstly, we use self-organizing map (SOM) to find the threshold of bus bunching and analyze the factors that affect bus bunching based on GPS data of No. 600 bus line in Xi'an. Then, taking the bus headway as the research index, we construct the bus bunching mechanism model. Finally, a simulation platform is built by MATLAB to examine the trend of headway when various influencing factors show different distribution states along the bus line. In terms of influencing factors, inter vehicle speed, queuing time at intersection and loading time at station are shown to have a significant impact on headway between buses. In terms of the impact of the distribution of crowded road sections on headway, long-distance and concentrated crowded road sections will lead to large interval or bus bunching. When the traffic states along the bus line are randomly distributed among crowded, normal and free, the headway may fluctuate in a large range, which may result in bus bunching, or fluctuate in a small range and remain relatively stable. The headway change curve is determined by the distribution length of each traffic state along the bus line. The research results can help to formulate improvement measures according to traffic operation state for equilibrium bus headway and alleviating bus bunching.
Semantic Path Planning for Indoor Navigation Tasks Using Multi-View Context and Prior Knowledge
Jianbing WU Weibo HUANG Guoliang HUA Wanruo ZHANG Risheng KANG Hong LIU

PAPER-Positioning and Navigation

Pubricized:
2022/01/20
Page(s):
756-764
Recently, deep reinforcement learning (DRL) methods have significantly improved the performance of target-driven indoor navigation tasks. However, the rich semantic information of environments is still not fully exploited in previous approaches. In addition, existing methods usually tend to overfit on training scenes or objects in target-driven navigation tasks, making it hard to generalize to unseen environments. Human beings can easily adapt to new scenes as they can recognize the objects they see and reason the possible locations of target objects using their experience. Inspired by this, we propose a DRL-based target-driven navigation model, termed MVC-PK, using Multi-View Context information and Prior semantic Knowledge. It relies only on the semantic label of target objects and allows the robot to find the target without using any geometry map. To perceive the semantic contextual information in the environment, object detectors are leveraged to detect the objects present in the multi-view observations. To enable the semantic reasoning ability of indoor mobile robots, a Graph Convolutional Network is also employed to incorporate prior knowledge. The proposed MVC-PK model is evaluated in the AI2-THOR simulation environment. The results show that MVC-PK (1) significantly improves the cross-scene and cross-target generalization ability, and (2) achieves state-of-the-art performance with 15.2% and 11.0% increase in Success Rate (SR) and Success weighted by Path Length (SPL), respectively.
SPSD: Semantics and Deep Reinforcement Learning Based Motion Planning for Supermarket Robot
Jialun CAI Weibo HUANG Yingxuan YOU Zhan CHEN Bin REN Hong LIU

PAPER-Positioning and Navigation

Pubricized:
2022/09/15
Page(s):
765-772
Robot motion planning is an important part of the unmanned supermarket. The challenges of motion planning in supermarkets lie in the diversity of the supermarket environment, the complexity of obstacle movement, the vastness of the search space. This paper proposes an adaptive Search and Path planning method based on the Semantic information and Deep reinforcement learning (SPSD), which effectively improves the autonomous decision-making ability of supermarket robots. Firstly, based on the backbone of deep reinforcement learning (DRL), supermarket robots process real-time information from multi-modality sensors to realize high-speed and collision-free motion planning. Meanwhile, in order to solve the problem caused by the uncertainty of the reward in the deep reinforcement learning, common spatial semantic relationships between landmarks and target objects are exploited to define reward function. Finally, dynamics randomization is introduced to improve the generalization performance of the algorithm in the training. The experimental results show that the SPSD algorithm is excellent in the three indicators of generalization performance, training time and path planning length. Compared with other methods, the training time of SPSD is reduced by 27.42% at most, the path planning length is reduced by 21.08% at most, and the trained network of SPSD can be applied to unfamiliar scenes safely and efficiently. The results are motivating enough to consider the application of the proposed method in practical scenes. We have uploaded the video of the results of the experiment to https://www.youtube.com/watch?v=h1wLpm42NZk.
An Improved BPNN Method Based on Probability Density for Indoor Location
Rong FEI Yufan GUO Junhuai LI Bo HU Lu YANG

PAPER-Positioning and Navigation

Pubricized:
2022/12/23
Page(s):
773-785
With the widespread use of indoor positioning technology, the need for high-precision positioning services is rising; nevertheless, there are several challenges, such as the difficulty of simulating the distribution of interior location data and the enormous inaccuracy of probability computation. As a result, this paper proposes three different neural network model comparisons for indoor location based on WiFi fingerprint - indoor location algorithm based on improved back propagation neural network model, RSSI indoor location algorithm based on neural network angle change, and RSSI indoor location algorithm based on depth neural network angle change - to raise accurately predict indoor location coordinates. Changing the action range of the activation function in the standard back-propagation neural network model achieves the goal of accurately predicting location coordinates. The revised back-propagation neural network model has strong stability and enhances indoor positioning accuracy based on experimental comparisons of loss rate (loss), accuracy rate (acc), and cumulative distribution function (CDF).
An Improved Real-Time Object Tracking Algorithm Based on Deep Learning Features
Xianyu WANG Cong LI Heyi LI Rui ZHANG Zhifeng LIANG Hai WANG

PAPER-Object Recognition and Tracking

Pubricized:
2022/01/07
Page(s):
786-793
Visual object tracking is always a challenging task in computer vision. During the tracking, the shape and appearance of the target may change greatly, and because of the lack of sufficient training samples, most of the online learning tracking algorithms will have performance bottlenecks. In this paper, an improved real-time algorithm based on deep learning features is proposed, which combines multi-feature fusion, multi-scale estimation, adaptive updating of target model and re-detection after target loss. The effectiveness and advantages of the proposed algorithm are proved by a large number of comparative experiments with other excellent algorithms on large benchmark datasets.
Learning Pixel Perception for Identity and Illumination Consistency Face Frontalization in the Wild
Yongtang BAO Pengfei ZHOU Yue QI Zhihui WANG Qing FAN

PAPER-Person Image Generation

Pubricized:
2022/06/21
Page(s):
794-803
A frontal and realistic face image was synthesized from a single profile face image. It has a wide range of applications in face recognition. Although the frontal face method based on deep learning has made substantial progress in recent years, there is still no guarantee that the generated face has identity consistency and illumination consistency in a significant posture. This paper proposes a novel pixel-based feature regression generative adversarial network (PFR-GAN), which can learn to recover local high-frequency details and preserve identity and illumination frontal face images in an uncontrolled environment. We first propose a Reslu block to obtain richer feature representation and improve the convergence speed of training. We then introduce a feature conversion module to reduce the artifacts caused by face rotation discrepancy, enhance image generation quality, and preserve more high-frequency details of the profile image. We also construct a 30,000 face pose dataset to learn about various uncontrolled field environments. Our dataset includes ages of different races and wild backgrounds, allowing us to handle other datasets and obtain better results. Finally, we introduce a discriminator used for recovering the facial structure of the frontal face images. Quantitative and qualitative experimental results show our PFR-GAN can generate high-quality and high-fidelity frontal face images, and our results are better than the state-of-art results.
Multi-Scale Correspondence Learning for Person Image Generation
Shi-Long SHEN Ai-Guo WU Yong XU

PAPER-Person Image Generation

Pubricized:
2022/04/15
Page(s):
804-812
A generative model is presented for two types of person image generation in this paper. First, this model is applied to pose-guided person image generation, i.e., converting the pose of a source person image to the target pose while preserving the texture of that source person image. Second, this model is also used for clothing-guided person image generation, i.e., changing the clothing texture of a source person image to the desired clothing texture. The core idea of the proposed model is to establish the multi-scale correspondence, which can effectively address the misalignment introduced by transferring pose, thereby preserving richer information on appearance. Specifically, the proposed model consists of two stages: 1) It first generates the target semantic map imposed on the target pose to provide more accurate guidance during the generation process. 2) After obtaining the multi-scale feature map by the encoder, the multi-scale correspondence is established, which is useful for a fine-grained generation. Experimental results show the proposed method is superior to state-of-the-art methods in pose-guided person image generation and show its effectiveness in clothing-guided person image generation.
Enhanced Full Attention Generative Adversarial Networks
KaiXu CHEN Satoshi YAMANE

LETTER-Core Methods

Pubricized:
2023/01/12
Page(s):
813-817
In this paper, we propose improved Generative Adversarial Networks with attention module in Generator, which can enhance the effectiveness of Generator. Furthermore, recent work has shown that Generator conditioning affects GAN performance. Leveraging this insight, we explored the effect of different normalization (spectral normalization, instance normalization) on Generator and Discriminator. Moreover, an enhanced loss function called Wasserstein Divergence distance, can alleviate the problem of difficult to train module in practice.
Bearing Remaining Useful Life Prediction Using 2D Attention Residual Network
Wenrong XIAO Yong CHEN Suqin GUO Kun CHEN

LETTER-Smart Industry

Pubricized:
2022/05/27
Page(s):
818-820
An attention residual network with triple feature as input is proposed to predict the remaining useful life (RUL) of bearings. First, the channel attention and spatial attention are connected in series into the residual connection of the residual neural network to obtain a new attention residual module, so that the newly constructed deep learning network can better pay attention to the weak changes of the bearing state. Secondly, the “triple feature” is used as the input of the attention residual network, so that the deep learning network can better grasp the change trend of bearing running state, and better realize the prediction of the RUL of bearing. Finally, The method is verified by a set of experimental data. The results show the method is simple and effective, has high prediction accuracy, and reduces manual intervention in RUL prediction.
Epileptic Seizure Prediction Using Convolutional Neural Networks and Fusion Features on Scalp EEG Signals
Qixin LAN Bin YAO Tao QING

LETTER-Smart Healthcare

Pubricized:
2022/05/27
Page(s):
821-823
Epileptic seizure prediction is an important research topic in the clinical epilepsy treatment, which can provide opportunities to take precautionary measures for epilepsy patients and medical staff. EEG is an commonly used tool for studying brain activity, which records the electrical discharge of brain. Many studies based on machine learning algorithms have been proposed to solve the task using EEG signal. In this study, we propose a novel seizure prediction models based on convolutional neural networks and scalp EEG for a binary classification between preictal and interictal states. The short-time Fourier transform has been used to translate raw EEG signals into STFT sepctrums, which is applied as input of the models. The fusion features have been obtained through the side-output constructions and used to train and test our models. The test results show that our models can achieve comparable results in both sensitivity and FPR upon fusion features. The proposed patient-specific model can be used in seizure prediction system for EEG classification.
OPENnet: Object Position Embedding Network for Locating Anti-Bird Thorn of High-Speed Railway
Zhuo WANG Junbo LIU Fan WANG Jun WU

LETTER-Intelligent Transportation Systems

Pubricized:
2022/11/14
Page(s):
824-828
Machine vision-based automatic anti-bird thorn failure inspection, instead of manual identification, remains a great challenge. In this paper, we proposed a novel Object Position Embedding Network (OPENnet), which can improve the precision of anti-bird thorn localization. OPENnet can simultaneously predict the location boxes of the support device and anti-bird thorn by using the proposed double-head network. And then, OPENnet is optimized using the proposed symbiotic loss function (SymLoss), which embeds the object position into the network. The comprehensive experiments are conducted on the real railway video dataset. OPENnet yields competitive performance on anti-bird thorn localization. Specifically, the localization performance gains +3.65 AP, +2.10 AP50, and +1.22 AP75.
Clustering-Based Neural Network for Carbon Dioxide Estimation
Conghui LI Quanlin ZHONG Baoyin LI

LETTER-Intelligent Transportation Systems

Pubricized:
2022/08/01
Page(s):
829-832
In recent years, the applications of deep learning have facilitated the development of green intelligent transportation system (ITS), and carbon dioxide estimation has been one of important issues in green ITS. Furthermore, the carbon dioxide estimation could be modelled as the fuel consumption estimation. Therefore, a clustering-based neural network is proposed to analyze clusters in accordance with fuel consumption behaviors and obtains the estimated fuel consumption and the estimated carbon dioxide. In experiments, the mean absolute percentage error (MAPE) of the proposed method is only 5.61%, and the performance of the proposed method is higher than other methods.
Effectiveness of Feature Extraction System for Multimodal Sensor Information Based on VRAE and Its Application to Object Recognition
Kazuki HAYASHI Daisuke TANAKA

LETTER-Object Recognition and Tracking

Pubricized:
2023/01/12
Page(s):
833-835
To achieve object recognition, it is necessary to find the unique features of the objects to be recognized. Results in prior research suggest that methods that use multiple modalities information are effective to find the unique features. In this paper, the overview of the system that can extract the features of the objects to be recognized by integrating visual, tactile, and auditory information as multimodal sensor information with VRAE is shown. Furthermore, a discussion about changing the combination of modalities information is also shown.

Special Section on Data Engineering and Information Management

FOREWORD Open Access
Akiyoshi MATONO

FOREWORD

Page(s):
836-837
- HTML
- Free PDF (114.1KB)
Effective Language Representations for Danmaku Comment Classification in Nicovideo
Hiroyoshi NAGAO Koshiro TAMURA Marie KATSURAI

PAPER

Pubricized:
2023/01/16
Page(s):
838-846
Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of the information provided by videos. Such an information pollutant problem can be solved by a comment classifier trained with an abstention option, which detects comments whose video categories are unclear. To improve the performance of this classification task, this paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, a Japanese online encyclopedia of entities that possibly appear in Nicovideo contents, to pre-train a bidirectional encoder representations from Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned such that it could determine whether a given comment falls into any of predefined categories. The experiments conducted on Nicovideo comment data demonstrated the effectiveness of Nicopedia BERT compared with existing BERT models pre-trained using Wikipedia or tweets. We also evaluated the performance of each model in an additional sentiment classification task, and the obtained results implied the applicability of Nicopedia BERT as a feature extractor of other social media text.
Maximizing External Action with Information Provision Over Multiple Rounds in Online Social Networks
Masaaki MIYASHITA Norihiko SHINOMIYA Daisuke KASAMATSU Genya ISHIGAKI

PAPER

Pubricized:
2023/02/03
Page(s):
847-855
Online social networks have increased their impact on the real world, which motivates information senders to control the propagation process of information to promote particular actions of online users. However, the existing works on information provisioning seem to oversimplify the users' decision-making process that involves information reception, internal actions of social networks, and external actions of social networks. In particular, characterizing the best practices of information provisioning that promotes the users' external actions is a complex task due to the complexity of the propagation process in OSNs, even when the variation of information is limited. Therefore, we propose a new information diffusion model that distinguishes user behaviors inside and outside of OSNs, and formulate an optimization problem to maximize the number of users who take the external actions by providing information over multiple rounds. Also, we define a robust provisioning policy for the problem, which selects a message sequence to maximize the expected number of desired users under the probabilistic uncertainty of OSN settings. Our experiment results infer that there could exist an information provisioning policy that achieves nearly-optimal solutions in different types of OSNs. Furthermore, we empirically demonstrate that the proposed robust policy can be such a universally optimal solution.
Construction of a Support Tool for Japanese User Reading of Privacy Policies and Assessment of its User Impact
Sachiko KANAMORI Hirotsune SATO Naoya TABATA Ryo NOJIMA

PAPER

Pubricized:
2023/02/08
Page(s):
856-867
To protect user privacy and establish self-information control rights, service providers must notify users of their privacy policies and obtain their consent in advance. The frameworks that impose these requirements are mandatory. Although originally designed to protect user privacy, obtaining user consent in advance has become a mere formality. These problems are induced by the gap between service providers' privacy policies, which prioritize the observance of laws and guidelines, and user expectations which are to easily understand how their data will be handled. To reduce this gap, we construct a tool supporting users in reading privacy policies in Japanese. We designed the tool to present users with separate unique expressions containing relevant information to improve the display format of the privacy policy and render it more comprehensive for Japanese users. To accurately extract the unique expressions from privacy policies, we created training data for machine learning for the constructed tool. The constructed tool provides a summary of privacy policies for users to help them understand the policies of interest. Subsequently, we assess the effectiveness of the constructed tool in experiments and follow-up questionnaires. Our findings reveal that the constructed tool enhances the users' subjective understanding of the services they read about and their awareness of the related risks. We expect that the developed tool will help users better understand the privacy policy content and and make educated decisions based on their understanding of how service providers intend to use their personal data.
Privacy-Preserving Correlation Coefficient
Tomoaki MIMOTO Hiroyuki YOKOYAMA Toru NAKAMURA Takamasa ISOHARA Masayuki HASHIMOTO Ryosuke KOJIMA Aki HASEGAWA Yasushi OKUNO

PAPER

Pubricized:
2023/02/08
Page(s):
868-876
Differential privacy is a confidentiality metric and quantitatively guarantees the confidentiality of individuals. A noise criterion, called sensitivity, must be calculated when constructing a probabilistic disturbance mechanism that satisfies differential privacy. Depending on the statistical process, the sensitivity may be very large or even impossible to compute. As a result, the usefulness of the constructed mechanism may be significantly low; it might even be impossible to directly construct it. In this paper, we first discuss situations in which sensitivity is difficult to calculate, and then propose a differential privacy with additional dummy data as a countermeasure. When the sensitivity in the conventional differential privacy is calculable, a mechanism that satisfies the proposed metric satisfies the conventional differential privacy at the same time, and it is possible to evaluate the relationship between the respective privacy parameters. Next, we derive sensitivity by focusing on correlation coefficients as a case study of a statistical process for which sensitivity is difficult to calculate, and propose a probabilistic disturbing mechanism that satisfies the proposed metric. Finally, we experimentally evaluate the effect of noise on the sensitivity of the proposed and direct methods. Experiments show that privacy-preserving correlation coefficients can be derived with less noise compared to using direct methods.
Geo-Graph-Indistinguishability: Location Privacy on Road Networks with Differential Privacy
Shun TAKAGI Yang CAO Yasuhito ASANO Masatoshi YOSHIKAWA

PAPER

Pubricized:
2023/01/16
Page(s):
877-894
In recent years, concerns about location privacy are increasing with the spread of location-based services (LBSs). Many methods to protect location privacy have been proposed in the past decades. Especially, perturbation methods based on Geo-Indistinguishability (GeoI), which randomly perturb a true location to a pseudolocation, are getting attention due to its strong privacy guarantee inherited from differential privacy. However, GeoI is based on the Euclidean plane even though many LBSs are based on road networks (e.g. ride-sharing services). This causes unnecessary noise and thus an insufficient tradeoff between utility and privacy for LBSs on road networks. To address this issue, we propose a new privacy notion, Geo-Graph-Indistinguishability (GeoGI), for locations on a road network to achieve a better tradeoff. We propose Graph-Exponential Mechanism (GEM), which satisfies GeoGI. Moreover, we formalize the optimization problem to find the optimal GEM in terms of the tradeoff. However, the computational complexity of a naive method to find the optimal solution is prohibitive, so we propose a greedy algorithm to find an approximate solution in an acceptable amount of time. Finally, our experiments show that our proposed mechanism outperforms GeoI mechanisms, including optimal GeoI mechanism, with respect to the tradeoff.
Prioritization of Lane-Specific Traffic Jam Detection for Automotive Navigation Framework Utilizing Suddenness Index and Automatic Threshold Determination
Aki HAYASHI Yuki YOKOHATA Takahiro HATA Kouhei MORI Masato KAMIYA

PAPER

Pubricized:
2023/02/03
Page(s):
895-903
Car navigation systems provide traffic jam information. In this study, we attempt to provide more detailed traffic jam information that considers the lane in which a traffic jam is in. This makes it possible for users to avoid long waits in queued traffic going toward an unintended destination. Lane-specific traffic jam detection utilizes image processing, which incurs long processing time and high cost. To reduce these, we propose a “suddenness index (SI)” to categorize candidate areas as sudden or periodic. Sudden traffic jams are prioritized as they may lead to accidents. This technology aggregates the number of connected cars for each mesh on a map and quantifies the degree of deviation from the ordinary state. In this paper, we evaluate the proposed method using actual global positioning system (GPS) data and found that the proposed index can cover 100% of sudden lane-specific traffic jams while excluding 82.2% of traffic jam candidates. We also demonstrate the effectiveness of time savings by integrating the proposed method into a demonstration framework. In addition, we improved the proposed method's ability to automatically determine the SI threshold to select the appropriate traffic jam candidates to avoid manual parameter settings.
MicroState: An Anomaly Localization Method in Heterogeneous Microservice Systems
Jingjing YANG Yuchun GUO Yishuai CHEN

PAPER

Pubricized:
2023/01/13
Page(s):
904-912
Microservice architecture has been widely adopted for large-scale applications because of its benefits of scalability, flexibility, and reliability. However, microservice architecture also proposes new challenges in diagnosing root causes of performance degradation. Existing methods rely on labeled data and suffer a high computation burden. This paper proposes MicroState, an unsupervised and lightweight method to pinpoint the root cause with detailed descriptions. We decompose root cause diagnosis into element location and detailed reason identification. To mitigate the impact of element heterogeneity and dynamic invocations, MicroState generates elements' invoked states, quantifies elements' abnormality by warping-based state comparison, and infers the anomalous group. MicroState locates the root cause element with the consideration of anomaly frequency and persistency. To locate the anomalous metric from diverse metrics, MicroState extracts metrics' trend features and evaluates metrics' abnormality based on their trend feature variation, which reduces the reliance on anomaly detectors. Our experimental evaluation based on public data of the Artificial intelligence for IT Operations Challenge (AIOps Challenge 2020) shows that MicroState locates root cause elements with 87% precision and diagnoses anomaly reasons accurately.

Special Section on the Architectures, Protocols, and Applications for the Future Internet

FOREWORD Open Access
ISMAIL ARAI

FOREWORD

Page(s):
913-913
- HTML
- Free PDF (107.9KB)
Wide-Area and Long-Term Agricultural Sensing System Utilizing UAV and Wireless Technologies
Hiroshi YAMAMOTO Shota NISHIURA Yoshihiro HIGASHIURA

INVITED PAPER

Pubricized:
2023/02/08
Page(s):
914-926
In order to improve crop production and efficiency of farming operations, an IoT (Internet of Things) system for remote monitoring has been attracting a lot of attention. The existing studies have proposed agricultural sensing systems such that environmental information is collected from many sensor nodes installed in farmland through wireless communications (e.g., Wi-Fi, ZigBee). Especially, Low-Power Wide-Area (LPWA) is a focus as a candidate for wireless communication that enables the support of vast farmland for a long time. However, it is difficult to achieve long distance communication even when using the LPWA because a clear line of sight is difficult to keep due to many obstacles such as crops and agricultural machinery in the farmland. In addition, a sensor node cannot run permanently on batteries because the battery capacity is not infinite. On the other hand, an Unmanned Aerial Vehicle (UAV) that can move freely and stably in the sky has been leveraged for agricultural sensor network systems. By utilizing a UAV as the gateway of the sensor network, the gateway can move to the appropriate location to ensure a clear line of sight from the sensor nodes. In addition, the coverage area of the sensor network can be expanded as the UAV travels over a wide area even when short-range and ultra-low-power wireless communication (e.g., Bluetooth Low Energy (BLE)) is adopted. Furthermore, various wireless technologies (e.g., wireless power transfer, wireless positioning) that have the possibility to improve the coverage area and the lifetime of the sensor network have become available. Therefore, in this study, we propose and develop two kinds of new agricultural sensing systems utilizing a UAV and various wireless technologies. The objective of the proposed system is to provide the solution for achieving the wide-area and long-term sensing for the vast farmland. Depending on which problem is in a priority, the proposed system chooses one of two designs. The first design of the system attempts to achieve the wide-area sensing, and so it is based on the LPWA for wireless communication. In the system, to efficiently collect the environmental information, the UAV autonomously travels to search for the locations to maintain the good communication properties of the LPWA to the sensor nodes dispersed over a wide area of farmland. In addition, the second design attempts to achieve the long-term sensing, so it is based on BLE, a typical short-range and ultra-low-power wireless communication technology. In this design, the UAV autonomously flies to the location of sensor nodes and supplies power to them using a wireless power transfer technology for achieving a battery-less sensor node. Through experimental evaluations using a prototype system, it is confirmed that the combination of the UAV and various wireless technologies has the possibility to achieve a wide-area and long-term sensing system for monitoring vast farmland.
Performance Aware Egress Path Discovery for Content Provider with SRv6 Egress Peer Engineering
Yasunobu TOYOTA Wataru MISHIMA Koichiro KANAYA Osamu NAKAMURA

PAPER

Pubricized:
2023/02/22
Page(s):
927-939
QoS of applications is essential for content providers, and it is required to improve the end-to-end communication quality from a content provider to users. Generally, a content provider's data center network is connected to multiple ASes and has multiple egress paths to reach the content user's network. However, on the Internet, the communication quality of network paths outside of the provider's administrative domain is a black box, so multiple egress paths cannot be quantitatively compared. In addition, it is impossible to determine a unique egress path within a network domain because the parameters that affect the QoS of the content are different for each network. We propose a “Performance Aware Egress Path Discovery” method to improve QoS for content providers. The proposed method uses two techniques: Egress Peer Engineering with Segment Routing over IPv6 and Passive End-to-End Measurement. The method is superior in that it allows various metrics depending on the type of content and can be used for measurements without affecting existing systems. To evaluate our method, we deployed the Performance Aware Egress Path Discovery System in an existing content provider network and conducted experiments to provide production services. Our findings from the experiment show that, in this network, 15.9% of users can expect a 30Mbps throughput improvement, and 13.7% of users can expect a 10ms RTT improvement.
A Fast Handover Mechanism for Ground-to-Train Free-Space Optical Communication using Station ID Recognition by Dual-Port Camera
Kosuke MORI Fumio TERAOKA Shinichiro HARUYAMA

PAPER

Pubricized:
2023/03/08
Page(s):
940-951
There are demands for high-speed and stable ground-to-train optical communication as a network environment for trains. The existing ground-to-train optical communication system developed by the authors uses a camera and a QPD (Quadrant photo diode) to capture beacon light. The problem with the existing system is that it is impossible to identify the ground station. In the system proposed in this paper, a beacon light modulated with the ID of the ground station is transmitted, and the ground station is identified by demodulating the image from the dual-port camera on the opposite side. In this paper, we developed an actual system and conducted experiments using a car on the road. The results showed that only one packet was lost with the ping command every 1 ms near handover. Although the communication device itself has a bandwidth of 100 Mbps, the throughput before and after the handover was about 94 Mbps, and only dropped to about 89.4 Mbps during the handover.

Regular Section

Parallelization on a Minimal Substring Search Algorithm for Regular Expressions
Yosuke OBE Hiroaki YAMAMOTO Hiroshi FUJIWARA

PAPER-Fundamentals of Information Systems

Pubricized:
2023/02/08
Page(s):
952-958
Let us consider a regular expression r of length m and a text string T of length n over an alphabet Σ. Then, the RE minimal substring search problem is to find all minimal substrings of T matching r. Yamamoto proposed O(mn) time and O(m) space algorithm using a Thompson automaton. In this paper, we improve Yamamoto's algorithm by introducing parallelism. The proposed algorithm runs in O(mn) time in the worst case and in O(mn/p) time in the best case, where p denotes the number of processors. Besides, we show a parameter related to the parallel time of the proposed algorithm. We evaluate the algorithm experimentally.
On Lookaheads in Regular Expressions with Backreferences
Nariyoshi CHIDA Tachio TERAUCHI

PAPER-Fundamentals of Information Systems

Pubricized:
2023/02/06
Page(s):
959-975
Many modern regular expression engines employ various extensions to give more expressive support for real-world usages. Among the major extensions employed by many of the modern regular expression engines are backreferences and lookaheads. A question of interest about these extended regular expressions is their expressive power. Previous works have shown that (i) the extension by lookaheads does not enhance the expressive power, i.e., the expressive power of regular expressions with lookaheads is still regular, and that (ii) the extension by backreferences enhances the expressive power, i.e., the expressive power of regular expressions with backreferences (abbreviated as rewb) is no longer regular. This raises the following natural question: Does the extension of regular expressions with backreferences by lookaheads enhance the expressive power of regular expressions with backreferences? This paper answers the question positively by proving that adding either positive lookaheads or negative lookaheads increases the expressive power of rewb (the former abbreviated as rewbl_p and the latter as rewbl_n). A consequence of our result is that neither the class of finite state automata nor that of memory automata (MFA) of Schmid[2] (which corresponds to regular expressions with backreferenes but without lookaheads) corresponds to rewbl_p or rewbl_n. To fill the void, as a first step toward building such automata, we propose a new class of automata called memory automata with positive lookaheads (PLMFA) that corresponds to rewbl_p. The key idea of PLMFA is to extend MFA with a new kind of memories, called positive-lookahead memory, that is used to simulate the backtracking behavior of positive lookaheads. Interestingly, our positive-lookahead memories are almost perfectly symmetric to the capturing-group memories of MFA. Therefore, our PLMFA can be seen as a natural extension of MFA that can be obtained independently of its original intended purpose of simulating rewbl_p.
Time Series Forecasting Based on Convolution Transformer
Na WANG Xianglian ZHAO

PAPER-Fundamentals of Information Systems

Pubricized:
2023/02/15
Page(s):
976-985
For many fields in real life, time series forecasting is essential. Recent studies have shown that Transformer has certain advantages when dealing with such problems, especially when dealing with long sequence time input and long sequence time forecasting problems. In order to improve the efficiency and local stability of Transformer, these studies combine Transformer and CNN with different structures. However, previous time series forecasting network models based on Transformer cannot make full use of CNN, and they have not been used in a better combination of both. In response to this problem in time series forecasting, we propose the time series forecasting algorithm based on convolution Transformer. (1) ES attention mechanism: Combine external attention with traditional self-attention mechanism through the two-branch network, the computational cost of self-attention mechanism is reduced, and the higher forecasting accuracy is obtained. (2) Frequency enhanced block: A Frequency Enhanced Block is added in front of the ESAttention module, which can capture important structures in time series through frequency domain mapping. (3) Causal dilated convolution: The self-attention mechanism module is connected by replacing the traditional standard convolution layer with a causal dilated convolution layer, so that it obtains the receptive field of exponentially growth without increasing the calculation consumption. (4) Multi-layer feature fusion: The outputs of different self-attention mechanism modules are extracted, and the convolutional layers are used to adjust the size of the feature map for the fusion. The more fine-grained feature information is obtained at negligible computational cost. Experiments on real world datasets show that the time series network forecasting model structure proposed in this paper can greatly improve the real-time forecasting performance of the current state-of-the-art Transformer model, and the calculation and memory costs are significantly lower. Compared with previous algorithms, the proposed algorithm has achieved a greater performance improvement in both effectiveness and forecasting accuracy.
A Practical Model Driven Approach for Designing Security Aware RESTful Web APIs Using SOFL
Busalire Onesmus EMEKA Soichiro HIDAKA Shaoying LIU

PAPER-Data Engineering, Web Information Systems

Pubricized:
2023/02/13
Page(s):
986-1000
RESTful web APIs have become ubiquitous with most modern web applications embracing the micro-service architecture. A RESTful API provides data over the network using HTTP probably interacting with databases and other services and must preserve its security properties. However, REST is not a protocol but rather a set of guidelines on how to design resources accessed over HTTP endpoints. There are guidelines on how related resources should be structured with hierarchical URIs as well as how the different HTTP verbs should be used to represent well-defined actions on those resources. Whereas security has always been critical in the design of RESTful APIs, there are few or no clear model driven engineering techniques utilizing a secure-by-design approach that interweaves both the functional and security requirements. We therefore propose an approach to specifying APIs functional and security requirements with the practical Structured-Object-oriented Formal Language (SOFL). Our proposed approach provides a generic methodology for designing security aware APIs by utilizing concepts of domain models, domain primitives, Ecore metamodel and SOFL. We also describe a case study to evaluate the effectiveness of our approach and discuss important issues in relation to the practical applicability of our method.
High-Precision Mobile Robot Localization Using the Integration of RAR and AKF
Chen WANG Hong TAN

PAPER-Information Network

Pubricized:
2023/01/24
Page(s):
1001-1009
The high-precision indoor positioning technology has gradually become one of the research hotspots in indoor mobile robots. Relax and Recover (RAR) is an indoor positioning algorithm using distance observations. The algorithm restores the robot's trajectory through curve fitting and does not require time synchronization of observations. The positioning can be successful with few observations. However, the algorithm has the disadvantages of poor resistance to gross errors and cannot be used for real-time positioning. In this paper, while retaining the advantages of the original algorithm, the RAR algorithm is improved with the adaptive Kalman filter (AKF) based on the innovation sequence to improve the anti-gross error performance of the original algorithm. The improved algorithm can be used for real-time navigation and positioning. The experimental validation found that the improved algorithm has a significant improvement in accuracy when compared to the original RAR. When comparing to the extended Kalman filter (EKF), the accuracy is also increased by 12.5%, which can be used for high-precision positioning of indoor mobile robots.
Chinese Named Entity Recognition Method Based on Dictionary Semantic Knowledge Enhancement
Tianbin WANG Ruiyang HUANG Nan HU Huansha WANG Guanghan CHU

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2023/02/15
Page(s):
1010-1017
Chinese Named Entity Recognition is the fundamental technology in the field of the Chinese Natural Language Process. It is extensively adopted into information extraction, intelligent question answering, and knowledge graph. Nevertheless, due to the diversity and complexity of Chinese, most Chinese NER methods fail to sufficiently capture the character granularity semantics, which affects the performance of the Chinese NER. In this work, we propose DSKE-Chinese NER: Chinese Named Entity Recognition based on Dictionary Semantic Knowledge Enhancement. We novelly integrate the semantic information of character granularity into the vector space of characters and acquire the vector representation containing semantic information by the attention mechanism. In addition, we verify the appropriate number of semantic layers through the comparative experiment. Experiments on public Chinese datasets such as Weibo, Resume and MSRA show that the model outperforms character-based LSTM baselines.
Prediction of Driver's Visual Attention in Critical Moment Using Optical Flow
Rebeka SULTANA Gosuke OHASHI

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2023/01/26
Page(s):
1018-1026
In recent years, driver's visual attention has been actively studied for driving automation technology. However, the number of models is few to perceive an insight understanding of driver's attention in various moments. All attention models process multi-level image representations by a two-stream/multi-stream network, increasing the computational cost due to an increment of model parameters. However, multi-level image representation such as optical flow plays a vital role in tasks involving videos. Therefore, to reduce the computational cost of a two-stream network and use multi-level image representation, this work proposes a single stream driver's visual attention model for a critical situation. The experiment was conducted using a publicly available critical driving dataset named BDD-A. Qualitative results confirm the effectiveness of the proposed model. Moreover, quantitative results highlight that the proposed model outperforms state-of-the-art visual attention models according to CC and SIM. Extensive ablation studies verify the presence of optical flow in the model, the position of optical flow in the spatial network, the convolution layers to process optical flow, and the computational cost compared to a two-stream model.
3D Multiple-Contextual ROI-Attention Network for Efficient and Accurate Volumetric Medical Image Segmentation
He LI Yutaro IWAMOTO Xianhua HAN Lanfen LIN Akira FURUKAWA Shuzo KANASAKI Yen-Wei CHEN

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2023/02/21
Page(s):
1027-1037
Convolutional neural networks (CNNs) have become popular in medical image segmentation. The widely used deep CNNs are customized to extract multiple representative features for two-dimensional (2D) data, generally called 2D networks. However, 2D networks are inefficient in extracting three-dimensional (3D) spatial features from volumetric images. Although most 2D segmentation networks can be extended to 3D networks, the naively extended 3D methods are resource-intensive. In this paper, we propose an efficient and accurate network for fully automatic 3D segmentation. Specifically, we designed a 3D multiple-contextual extractor to capture rich global contextual dependencies from different feature levels. Then we leveraged an ROI-estimation strategy to crop the ROI bounding box. Meanwhile, we used a 3D ROI-attention module to improve the accuracy of in-region segmentation in the decoder path. Moreover, we used a hybrid Dice loss function to address the issues of class imbalance and blurry contour in medical images. By incorporating the above strategies, we realized a practical end-to-end 3D medical image segmentation with high efficiency and accuracy. To validate the 3D segmentation performance of our proposed method, we conducted extensive experiments on two datasets and demonstrated favorable results over the state-of-the-art methods.
Subjective Difficulty Estimation of Educational Comics Using Gaze Features
Kenya SAKAMOTO Shizuka SHIRAI Noriko TAKEMURA Jason ORLOSKY Hiroyuki NAGATAKI Mayumi UEDA Yuki URANISHI Haruo TAKEMURA

PAPER-Educational Technology

Pubricized:
2023/02/03
Page(s):
1038-1048
This study explores significant eye-gaze features that can be used to estimate subjective difficulty while reading educational comics. Educational comics have grown rapidly as a promising way to teach difficult topics using illustrations and texts. However, comics include a variety of information on one page, so automatically detecting learners' states such as subjective difficulty is difficult with approaches such as system log-based detection, which is common in the Learning Analytics field. In order to solve this problem, this study focused on 28 eye-gaze features, including the proposal of three new features called “Variance in Gaze Convergence,” “Movement between Panels,” and “Movement between Tiles” to estimate two degrees of subjective difficulty. We then ran an experiment in a simulated environment using Virtual Reality (VR) to accurately collect gaze information. We extracted features in two unit levels, page- and panel-units, and evaluated the accuracy with each pattern in user-dependent and user-independent settings, respectively. Our proposed features achieved an average F1 classification-score of 0.721 and 0.742 in user-dependent and user-independent models at panel unit levels, respectively, trained by a Support Vector Machine (SVM).
New Training Method for Non-Dominant Hand Pitching Motion Based on Reversal Trajectory of Dominant Hand Pitching Motion Using AR and Vibration
Masato SOGA Taiki MORI

PAPER-Educational Technology

Pubricized:
2023/02/08
Page(s):
1049-1058
In this paper, we propose a new method for non-dominant limb training. The method is that a learner aims at a motion which is generated by reversing his/her own motion of dominant limb, when he/she tries to train himself/herself for non-dominant limb training. In addition, we designed and developed interface for the new method which can select feedback types. One is an interface using AR and sound, and the other is an interface using AR and vibration. We found that vibration feedback was effective for non-dominant hand training of pitching motion, while sound feedback was not so effective as vibration.
A Computer Simulation Study on Movement Control by Functional Electrical Stimulation Using Optimal Control Technique with Simplified Parameter Estimation
Fauzan ARROFIQI Takashi WATANABE Achmad ARIFIN

PAPER-Rehabilitation Engineering and Assistive Technology

Pubricized:
2023/02/21
Page(s):
1059-1068
The purpose of this study was to develop a practical functional electrical stimulation (FES) controller for joint movements restoration based on an optimal control technique by cascading a linear model predictive control (MPC) and a nonlinear transformation. The cascading configuration was aimed to obtain an FES controller that is able to deal with a nonlinear system. The nonlinear transformation was utilized to transform the linear solution of linear MPC to become a nonlinear solution in form of optimized electrical stimulation intensity. Four different types of nonlinear functions were used to realize the nonlinear transformation. A simple parameter estimation to determine the value of the nonlinear transformation parameter was also developed. The tracking control capability of the proposed controller along with the parameter estimation was examined in controlling the 1-DOF wrist joint movement through computer simulation. The proposed controller was also compared with a fuzzy FES controller. The proposed MPC-FES controller with estimated parameter value worked properly and had a better control accuracy than the fuzzy controller. The parameter estimation was suggested to be useful and effective in practical FES control applications to reduce the time-consuming of determining the parameter value of the proposed controller.
Learning Local Similarity with Spatial Interrelations on Content-Based Image Retrieval
Longjiao ZHAO Yu WANG Jien KATO Yoshiharu ISHIKAWA

PAPER-Image Processing and Video Processing

Pubricized:
2023/02/14
Page(s):
1069-1080
Convolutional Neural Networks (CNNs) have recently demonstrated outstanding performance in image retrieval tasks. Local convolutional features extracted by CNNs, in particular, show exceptional capability in discrimination. Recent research in this field has concentrated on pooling methods that incorporate local features into global features and assess the global similarity of two images. However, the pooling methods sacrifice the image's local region information and spatial relationships, which are precisely known as the keys to the robustness against occlusion and viewpoint changes. In this paper, instead of pooling methods, we propose an alternative method based on local similarity, determined by directly using local convolutional features. Specifically, we first define three forms of local similarity tensors (LSTs), which take into account information about local regions as well as spatial relationships between them. We then construct a similarity CNN model (SCNN) based on LSTs to assess the similarity between the query and gallery images. The ideal configuration of our method is sought through thorough experiments from three perspectives: local region size, local region content, and spatial relationships between local regions. The experimental results on a modified open dataset (where query images are limited to occluded ones) confirm that the proposed method outperforms the pooling methods because of robustness enhancement. Furthermore, testing on three public retrieval datasets shows that combining LSTs with conventional pooling methods achieves the best results.
Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network
Wenkai LIU Cuizhu QIN Menglong WU Wenle BAI Hongxia DONG

LETTER-Human-computer Interaction

Pubricized:
2023/02/15
Page(s):
1081-1084
Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.
Blockchain-Based Pension System Ensuring Security, Provenance and Efficiency
Minhaz KAMAL Chowdhury Mohammad ABDULLAH Fairuz SHAIARA Abu Raihan Mostofa KAMAL Md Mehedi HASAN Jik-Soo KIM Md Azam HOSSAIN

LETTER-Office Information Systems, e-Business Modeling

Pubricized:
2023/02/21
Page(s):
1085-1088
The literature presents a digitized pension system based on a consortium blockchain, with the aim of overcoming existing pension system challenges such as multiparty collaboration, manual intervention, high turnaround time, cost transparency, auditability, etc. In addition, the adoption of hyperledger fabric and the introduction of smart contracts aim to transform multi-organizational workflow into a synchronized, automated, modular, and error-free procedure.
Local Binary Convolution Based Prior Knowledge of Multi-Direction Features for Finger Vein Verification
Huijie ZHANG Ling LU

LETTER-Pattern Recognition

Pubricized:
2023/02/22
Page(s):
1089-1093
The finger-vein-based deep neural network authentication system has been applied widely in real scenarios, such as countries' banking and entrance guard systems. However, to ensure performance, the deep neural network should train many parameters, which needs lots of time and computing resources. This paper proposes a method that introduces artificial features with prior knowledge into the convolution layer. First, it designs a multi-direction pattern base on the traditional local binary pattern, which extracts general spatial information and also reduces the spatial dimension. Then, establishes a sample effective deep convolutional neural network via combination with convolution, with the ability to extract deeper finger vein features. Finally, trains the model with a composite loss function to increase the inter-class distance and reduce the intra-class distance. Experiments show that the proposed methods achieve a good performance of higher stability and accuracy of finger vein recognition.
Modality-Fused Graph Network for Cross-Modal Retrieval
Fei WU Shuaishuai LI Guangchuan PENG Yongheng MA Xiao-Yuan JING

LETTER-Pattern Recognition

Pubricized:
2023/02/09
Page(s):
1094-1097
Cross-modal hashing technology has attracted much attention for its favorable retrieval performance and low storage cost. However, for existing cross-modal hashing methods, the heterogeneity of data across modalities is still a challenge and how to fully explore and utilize the intra-modality features has not been well studied. In this paper, we propose a novel cross-modal hashing approach called Modality-fused Graph Network (MFGN). The network architecture consists of a text channel and an image channel that are used to learn modality-specific features, and a modality fusion channel that uses the graph network to learn the modality-shared representations to reduce the heterogeneity across modalities. In addition, an integration module is introduced for the image and text channels to fully explore intra-modality features. Experiments on two widely used datasets show that our approach achieves better results than the state-of-the-art cross-modal hashing methods.
Speech Emotion Recognition Using Multihead Attention in Both Time and Feature Dimensions
Yue XIE Ruiyu LIANG Zhenlin LIANG Xiaoyan ZHAO Wenhao ZENG

LETTER-Speech and Hearing

Pubricized:
2023/02/21
Page(s):
1098-1101
To enhance the emotion feature and improve the performance of speech emotion recognition, an attention mechanism is employed to recognize the important information in both time and feature dimensions. In the time dimension, multi-heads attention is modified with the last state of the long short-term memory (LSTM)'s output to match the time accumulation characteristic of LSTM. In the feature dimension, scaled dot-product attention is replaced with additive attention that refers to the method of the state update of LSTM to construct multi-heads attention. This means that a nonlinear change replaces the linear mapping in classical multi-heads attention. Experiments on IEMOCAP datasets demonstrate that the attention mechanism could enhance emotional information and improve the performance of speech emotion recognition.
Wider Depth Dynamic Range Using Occupancy Map Correction for Immersive Video Coding
Sung-Gyun LIM Dong-Ha KIM Kwan-Jung OH Gwangsoon LEE Jun Young JEONG Jae-Gon KIM

LETTER-Image Processing and Video Processing

Pubricized:
2023/02/10
Page(s):
1102-1105
The MPEG Immersive Video (MIV) standard for immersive video coding provides users with an immersive sense of 6 degrees of freedom (6DoF) of view position and orientation by efficiently compressing multiview video acquired from different positions in a limited 3D space. In the MIV reference software called Test Model for Immersive Video (TMIV), the number of pixels to be compressed and transmitted is reduced by removing inter-view redundancy. Therefore, the occupancy information that indicates whether each pixel is valid or invalid must also be transmitted to the decoder for viewport rendering. The occupancy information is embedded in a geometry atlas and transmitted to the decoder side. At this time, to prevent occupancy errors that may occur during the compression of the geometry atlas, a guard band is set in the depth dynamic range. Reducing this guard band can improve the rendering quality by allowing a wider dynamic range for depth representation. Therefore, in this paper, based on the analysis of occupancy error of the current TMIV, two methods of occupancy error correction which allow depth dynamic range extension in the case of computer-generated (CG) sequences are presented. The experimental results show that the proposed method gives an average 2.2% BD-rate bit saving for CG compared to the existing TMIV.
Convolution Block Feature Addition Module (CBFAM) for Lightweight and Fast Object Detection on Non-GPU Devices
Min Ho KWAK Youngwoo KIM Kangin LEE Jae Young CHOI

LETTER-Image Recognition, Computer Vision

Pubricized:
2023/01/24
Page(s):
1106-1110
This letter proposes a novel lightweight deep learning object detector named LW-YOLOv4-tiny, which incorporates the convolution block feature addition module (CBFAM). The novelty of LW-YOLOv4-tiny is the use of channel-wise convolution and element-wise addition in the CBFAM instead of utilizing the concatenation of different feature maps. The model size and computation requirement are reduced by up to 16.9 Mbytes, 5.4 billion FLOPs (BFLOPS), and 11.3 FPS, which is 31.9%, 22.8%, and 30% smaller and faster than the most recent version of YOLOv4-tiny. From the MSCOCO2017 and PASCAL VOC2012 benchmarks, LW-YOLOv4-tiny achieved 40.2% and 69.3% mAP, respectively.