IEICE TRANSACTIONS on Information

Impact Factor

0.72
Eigenfactor

0.002
article influence

0.1
Cite Score

1.4

To the Advance publication
To the Archives

Advance publication (published online immediately after acceptance)

Vision Transformer with Key-select Routing Attention for Single Image Dehazing
Lihan TONG Weijia LI Qingxia YANG Liyuan CHEN Peng CHEN

Pubricized:
2024/07/01
- Summary
- Free PDF (2MB)
Towards Superior Pruning Performance in Federated Learning with Discriminative Data
Yinan YANG

Pubricized:
2024/06/27
- Summary
- Free PDF (7.9MB)
CLEAR & RETURN: Stopping Run-time Countermeasures in Cryptographic Primitives
Myung-Hyun KIM Seungkwang LEE

Pubricized:
2024/06/26
- Summary
- Free PDF (1.7MB)
SH-YOLO: Small Target High Performance YOLO for abnormal behavior detection in escalator scene
Shuoyan LIU Chao LI Yuxin LIU Yanqiu WANG

Pubricized:
2024/06/26
- Summary
- Free PDF (708.4KB)
Design and implementation of opto-electrical hybrid floating-point multipliers
Takumi INABA Takatsugu ONO Koji INOUE Satoshi KAWAKAMI

Pubricized:
2024/06/26
- Summary
- Free PDF (2.5MB)
Geometric Refactoring of Quantum and Reversible Circuits using Graph Algorithms
Martin LUKAC Saadat NURSULTAN Georgiy KRYLOV Oliver KESZOCZE Abilmansur RAKHMETTULAYEV Michitaka KAMEYAMA

Pubricized:
2024/06/24
- Summary
- Free PDF (1010KB)
IAD-Net: Single-Image Dehazing Network Based on Image Attention
Zheqing ZHANG Hao ZHOU Chuan LI Weiwei JIANG

Pubricized:
2024/06/20
- Summary
- Free PDF (9.8MB)
Improving the Accuracy of Differential-Neural Distinguisher For DES, Chaskey, and PRESENT
Liu ZHANG Zilong WANG Yindong CHEN

Pubricized:
2024/06/20
- Summary
- Free PDF (355.3KB)
Multi-Scale Contrastive Learning for Human Pose Estimation
Wenxia Bao An Lin Hua Huang Xianjun Yang Hemu Chen

Pubricized:
2024/06/17
- Summary
- Free PDF (1MB)
HDR-VDA: A Full Stage Data Augmentation Method for HDR Video Reconstruction
Fengshan ZHAO Qin LIU Takeshi IKENAGA

Pubricized:
2024/06/17
- Summary
- Free PDF (1.2MB)
Evaluating Introduction of Systems by Goal Dependency Modeling
Haruhiko KAIYA Shinpei OGATA Shinpei HAYASHI

Pubricized:
2024/06/11
- Summary
- Free PDF (1.3MB)
MISpeller: Multimodal Information Enhancement for Chinese Spelling Correction
Jiakai LI Jianyong DUAN Hao WANG Li HE Qing ZHANG

Pubricized:
2024/06/07
- Summary
- Free PDF (3.3MB)
Integrating Event Elements for Chinese-Vietnamese Cross-lingual Event Retrieval
Yuxin HUANG Yuanlin YANG Enchang ZHU Yin LIANG Yantuan XIAN

Pubricized:
2024/06/04
- Summary
- Free PDF (3.9MB)
Space-efficient FPT Algorithms for Degeneracy
Naohito MATSUMOTO Kazuhiro KURITA Masashi KIYOMI

Pubricized:
2024/05/31
- Summary
- Free PDF (101.3KB)
Learning Fast Deployment for UAV-Assisted Disaster System
Na XING Lu LI Ye ZHANG Shiyi YANG

Pubricized:
2024/05/30
- Summary
- Free PDF (10.3MB)
TDEM: Table data extraction model based on cell segmentation
Zhe Wang Zhe-Ming Lu Hao Luo Yang-Ming Zheng

Pubricized:
2024/05/30
- Summary
- Free PDF (838KB)
Reliable image matching using optimal combination of color and intensity information based on relationship with surrounding objects
Rina TAGAMI Hiroki KOBAYASHI Shuichi AKIZUKI Manabu HASHIMOTO

Pubricized:
2024/05/30
- Summary
- Free PDF (4.2MB)
The Least Core of Routing Game Without Triangle Inequality
Tomohiro KOBAYASHI Tomomi MATSUI

Pubricized:
2024/05/30
- Summary
- Free PDF (232.9KB)
Enumerating floorplans with Aligned Columns
Shin-ichi NAKANO

Pubricized:
2024/05/30
- Summary
- Free PDF (365.4KB)
A Two-Phase Algorithm for Reliable and Energy-Efficient Heterogeneous Embedded Systems
Hongzhi XU Binlian ZHANG

Pubricized:
2024/05/27
- Summary
- Free PDF (902.4KB)
Smart Contract Timestamp Vulnerability Detection Based on Code Homogeneity
Weizhi WANG Lei XIA Zhuo ZHANG Xiankai MENG

Pubricized:
2024/05/27
- Summary
- Free PDF (706KB)
Neural End-to-end Speech Translation Leveraged by ASR Posterior Distribution
Yuka KO Katsuhito SUDOH Sakriani SAKTI Satoshi NAKAMURA

Pubricized:
2024/05/24
- Summary
- Free PDF (1.5MB)
Watermarking Method with Scaling Rate Estimation Using Pilot Signal
Rinka KAWANO Masaki KAWAMURA

Pubricized:
2024/05/22
- Summary
- Free PDF (1MB)
Type-enhanced Ensemble Triple Representation via Triple-aware Attention for Cross-lingual Entity Alignment
Zhishuo ZHANG Chengxiang TAN Xueyan ZHAO Min YANG

Pubricized:
2024/05/22
- Summary
- Free PDF (5.1MB)
Joint Optimization of Task Offloading and Resource Allocation for UAV-Assisted Edge Computing: A Stackelberg Bilayer Game Approach
Peng WANG Guifen CHEN Zhiyao SUN

Pubricized:
2024/05/21
- Summary
- Free PDF (678.2KB)
EfficientNet Empowered by Dendritic Learning for Diabetic Retinopathy
Zeyuan JU Zhipeng LIU Yu GAO Haotian LI Qianhang DU Kota YOSHIKAWA Shangce GAO

Pubricized:
2024/05/20
- Summary
- Free PDF (517.4KB)
6T-8T hybrid SRAM for lower-power neural-network processing by lowering operating voltage
Ji WU Ruoxi YU Kazuteru NAMBA

Pubricized:
2024/05/20
- Summary
- Free PDF (495.1KB)
Chinese Spelling Correction Based on Knowledge Enhancement and Contrastive Learning
Hao WANG Yao Ma Jianyong Duan Li HE Xin Li

Pubricized:
2024/05/17
- Summary
- Free PDF (1.2MB)
TIG: A Multitask Temporal Interval Guided Framework for Key Frame Detection
Shijie WANG Xuejiao HU Sheng LIU Ming LI Yang LI Sidan DU

Pubricized:
2024/05/17
- Summary
- Free PDF (10.6MB)
Node-to-node and Node-to-set Disjoint Paths Problems in Bicubes
Arata KANEKO Htoo Htoo Sandi KYAW Kunihiro FUJIYOSHI Keiichi KANEKO

Pubricized:
2024/05/17
- Summary
- Free PDF (1MB)
Remote Sensing Image Dehazing Using Multi-Scale Gated Attention For Flight Simulator
Qi LIU Bo WANG Shihan TAN Shurong ZOU Wenyi GE

Pubricized:
2024/05/14
- Summary
- Free PDF (4.2MB)
Large Class Detection using GNNs: A graph based deep learning approach utilizing three typical GNN model architectures
HanYu Zhang Tomoji Kishi

Pubricized:
2024/05/14
- Summary
- Free PDF (1.4MB)
Functional Decomposition of Symmetric Multiple-Valued Functions and Their Compact Representation in Decision Diagrams
Shinobu NAGAYAMA Tsutomu SASAO Jon T. BUTLER

Pubricized:
2024/05/14
- Summary
- Free PDF (303.5KB)
Greedy selection of sensors for linear Bayesian estimation under correlated noise
Yoon Hak KIM

Pubricized:
2024/05/14
- Summary
- Free PDF (115KB)
New Bounds for Quick Computation of the Lower Bound on the Gate Count of Toffoli-Based Reversible Logic Circuits
Takashi HIRAYAMA Rin SUZUKI Katsuhisa YAMANAKA Yasuaki NISHITANI

Pubricized:
2024/05/10
- Summary
- Free PDF (222.1KB)
Evaluation of Multi-valued Data Transmission in Two-Dimensional Symbol Mapping using Linear Mixture Model
Yosuke IIJIMA Atsunori OKADA Yasushi YUMINAKA

Pubricized:
2024/05/09
- Summary
- Free PDF (8.2MB)
Using Genetic Algorithm and Mathematical Programming Model for Ambulance Location Problem in Emergency Medical Service
Batnasan Luvaanjalba Elaine Yi-Ling Wu

Pubricized:
2024/05/08
- Summary
- Free PDF (909.3KB)
Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation
KuanChao CHU Satoshi YAMAZAKI Hideki NAKAYAMA

Pubricized:
2024/04/30
- Summary
- Free PDF (9.2MB)
A mmWave sensor and camera fusion system for indoor occupancy detection and tracking
Shenglei LI Haoran LUO Tengfei SHAO Reiko HISHIYAMA

Pubricized:
2024/04/26
- Summary
- Free PDF (4.1MB)
Evaluating PAM-4 Data Transmission Quality using Multi-Dimensional Mapping of Received Symbols
Yasushi YUMINAKA Kazuharu NAKAJIMA Yosuke IIJIMA

Pubricized:
2024/04/25
- Summary
- Free PDF (7.5MB)
Unsupervised Intrusion Detection Based on Asymmetric Auto-Encoder Feature Extraction
Chunbo Liu Liyin Wang Zhikai Zhang Chunmiao Xiang Zhaojun Gu Zhi Wang Shuang Wang

Pubricized:
2024/04/25
- Summary
- Free PDF (1.2MB)
Reinforced Voxel-RCNN:An efficient 3D Object Detection Method Based on Feature Aggregation
Jia-ji JIANG Hai-bin WAN Hong-min SUN Tuan-fa QIN Zheng-qiang WANG

Pubricized:
2024/04/24
- Summary
- Free PDF (4.6MB)
A Channel Contrastive Attention-based Local-Nonlocal Mutual block on Super-Resolution
Yuhao LIU Zhenzhong CHU Lifei WEI

Pubricized:
2024/04/23
- Summary
- Free PDF (1.5MB)
Error-Tolerance-Aware Write-Energy Reduction of MTJ-Based Quantized Neural Network Hardware
Ken ASANO Masanori NATSUI Takahiro HANYU

Pubricized:
2024/04/22
- Summary
- Free PDF (2.1MB)
Skin diagnostic method using Fontana-Masson stained images of stratum corneum cells
Shuto HASEGAWA Koichiro ENOMOTO Taeko MIZUTANI Yuri OKANO Takenori TANAKA Osamu SAKAI

Pubricized:
2024/04/19
- Summary
- Free PDF (6.6MB)
Confidence-Driven Contrastive Learning for Document Classification without Annotated Data
Zhewei XU Mizuho IWAIHARA

Pubricized:
2024/04/19
- Summary
- Free PDF (2.2MB)
Delta-Sigma Domain Signal Processing Revisited with Related Topics in Stochastic Computing
Takao WAHO Akihisa KOYAMA Hitoshi HAYASHI

Pubricized:
2024/04/17
- Summary
- Free PDF (1.7MB)
Extending Binary Neural Networks to Bayesian Neural Networks with Probabilistic Interpretation of Binary Weights
Taisei SAITO Kota ANDO Tetsuya ASAI

Pubricized:
2024/04/17
- Summary
- Free PDF (1.1MB)
Unveiling Python Version Compatibility Challenges in Code Snippets on Stack Overflow
Shiyu YANG Tetsuya KANDA Daniel M. GERMAN Yoshiki HIGO

Pubricized:
2024/04/16
- Summary
- Free PDF (488.3KB)
On Easily Reconstructable Logic Functions
Tsutomu SASAO

Pubricized:
2024/04/16
- Summary
- Free PDF (240.7KB)
Tracking WebVR User Activities through Hand Motions: An Attack Perspective
Jiyeon LEE

Pubricized:
2024/04/16
- Summary
- Free PDF (3MB)
Permissionless Blockchain-Based Sybil-Resistant Self-Sovereign Identity Utilizing Attested Execution Secure Processors
Koichi MORIYAMA Akira OTSUKA

Pubricized:
2024/04/15
- Summary
- Free PDF (4.1MB)
Cross-Corpus Speech Emotion Recognition Based on Causal Emotion Information Representation
Hongliang FU Qianqian LI Huawei TAO Chunhua ZHU Yue XIE Ruxue GUO

Pubricized:
2024/04/12
- Summary
- Free PDF (8.1MB)
Investigating and Enhancing the Neural Distinguisher for Differential Cryptanalysis
Gao WANG Gaoli WANG Siwei SUN

Pubricized:
2024/04/12
- Summary
- Free PDF (1.7MB)
Nuclear Norm Minus Frobenius Norm Minimization with Rank Residual Constraint for Image Denoising
Hua HUANG Yiwen SHAN Chuan LI Zhi WANG

Pubricized:
2024/04/09
- Summary
- Free PDF (6.9MB)
Improved Just Noticeable Difference Model Based Algorithm for Fast CU Partition in V-PCC
Zhi LIU Heng WANG Yuan LI Hongyun LU Hongyuan JING Mengmeng ZHANG

Pubricized:
2024/04/05
- Summary
- Free PDF (1.1MB)
MDX-Mixer: Music Demixing by Leveraging Source Signals Separated by Existing Demixing Models
Tomoyasu NAKANO Masataka GOTO

Pubricized:
2024/04/05
- Summary
- Free PDF (2.4MB)
Machine Learning-based System for Heat-Resistant Analysis of Car Lamp Design
Hyebong CHOI Joel SHIN Jeongho KIM Samuel YOON Hyeonmin PARK Hyejin CHO Jiyoung JUNG

Pubricized:
2024/04/03
- Summary
- Free PDF (1.4MB)
Agent Allocation-Action Learning with Dynamic Heterogeneous Graph in Multi-task Games
Xianglong LI Yuan LI Jieyuan ZHANG Xinhai XU Donghong LIU

Pubricized:
2024/04/03
- Summary
- Free PDF (3.9MB)
FSAMT : Face Shape Adaptive Makeup Transfer
Haoran LUO Tengfei SHAO Shenglei LI Reiko HISHIYAMA

Pubricized:
2024/04/02
- Summary
- Free PDF (984.6KB)
Artifact Removal Using Attention Guided Local-Global Dual-Stream Network for Sparse-View CT Reconstruction
Chang SUN Yitong LIU Hongwen YANG

Pubricized:
2024/03/29
- Summary
A CNN-based feature pyramid segmentation Strategy for acoustic；scene classification
Ji XI Yue XIE Pengxu JIANG Wei JIANG

Pubricized:
2024/03/26
- Summary
An IP Core Protection Scheme Based on Hybrid Lightweight Encryption for Neuromorphic Computing System
Ming PAN

The aritcle processing charge of this paper has not been paid.

Pubricized:
2022/09/14
- Summary

Whole issue(103MB)

Volume E107-D No.1 (Publication Date:2024/01/01)

Special Section on Enriched Multimedia — Media technologies opening up the future —

FOREWORD Open Access
Ryouichi NISHIMURA

FOREWORD

Page(s):
1-1
- HTML
- Free PDF (590.9KB)
Frameworks for Privacy-Preserving Federated Learning
Le Trieu PHONG Tran Thi PHUONG Lihua WANG Seiichi OZAWA

INVITED PAPER

Pubricized:
2023/09/25
Page(s):
2-12
In this paper, we explore privacy-preserving techniques in federated learning, including those can be used with both neural networks and decision trees. We begin by identifying how information can be leaked in federated learning, after which we present methods to address this issue by introducing two privacy-preserving frameworks that encompass many existing privacy-preserving federated learning (PPFL) systems. Through experiments with publicly available financial, medical, and Internet of Things datasets, we demonstrate the effectiveness of privacy-preserving federated learning and its potential to develop highly accurate, secure, and privacy-preserving machine learning systems in real-world scenarios. The findings highlight the importance of considering privacy in the design and implementation of federated learning systems and suggest that privacy-preserving techniques are essential in enabling the development of effective and practical machine learning systems.
CASEformer — A Transformer-Based Projection Photometric Compensation Network
Yuqiang ZHANG Huamin YANG Cheng HAN Chao ZHANG Chaoran ZHU

PAPER

Pubricized:
2023/09/29
Page(s):
13-28
In this paper, we present a novel photometric compensation network named CASEformer, which is built upon the Swin module. For the first time, we combine coordinate attention and channel attention mechanisms to extract rich features from input images. Employing a multi-level encoder-decoder architecture with skip connections, we establish multiscale interactions between projection surfaces and projection images, achieving precise inference and compensation. Furthermore, through an attention fusion module, which simultaneously leverages both coordinate and channel information, we enhance the global context of feature maps while preserving enhanced texture coordinate details. The experimental results demonstrate the superior compensation effectiveness of our approach compared to the current state-of-the-art methods. Additionally, we propose a method for multi-surface projection compensation, further enriching our contributions.
A Coded Aperture as a Key for Information Hiding Designed by Physics-in-the-Loop Optimization
Tomoki MINAMATA Hiroki HAMASAKI Hiroshi KAWASAKI Hajime NAGAHARA Satoshi ONO

PAPER

Pubricized:
2023/09/28
Page(s):
29-38
This paper proposes a novel application of coded apertures (CAs) for visual information hiding. CA is one of the representative computational photography techniques, in which a patterned mask is attached to a camera as an alternative to a conventional circular aperture. With image processing in the post-processing phase, various functions such as omnifocal image capturing and depth estimation can be performed. In general, a watermark embedded as high-frequency components is difficult to extract if captured outside the focal length, and defocus blur occurs. Installation of a CA into the camera is a simple solution to mitigate the difficulty, and several attempts are conducted to make a better design for stable extraction. On the contrary, our motivation is to design a specific CA as well as an information hiding scheme; the secret information can only be decoded if an image with hidden information is captured with the key aperture at a certain distance outside the focus range. The proposed technique designs the key aperture patterns and information hiding scheme through evolutionary multi-objective optimization so as to minimize the decryption error of a hidden image when using the key aperture while minimizing the accuracy when using other apertures. During the optimization process, solution candidates, i.e., key aperture patterns and information hiding schemes, are evaluated on actual devices to account for disturbances that cannot be considered in optical simulations. Experimental results have shown that decoding can be performed with the designed key aperture and similar ones, that decrypted image quality deteriorates as the similarity between the key and the aperture used for decryption decreases, and that the proposed information hiding technique works on actual devices.
An Evaluation of the Impact of Distance on Perceptual Quality of Textured 3D Meshes
Duc NGUYEN Tran THUY HIEN Huyen T. T. TRAN Truong THU HUONG Pham NGOC NAM

LETTER

Pubricized:
2023/09/25
Page(s):
39-43
Distance-aware quality adaptation is a potential approach to reduce the resource requirement for the transmission and rendering of textured 3D meshes. In this paper, we carry out a subjective experiment to investigate the effects of the distance from the camera on the perceptual quality of textured 3D meshes. Besides, we evaluate the effectiveness of eight image-based objective quality metrics in representing the user's perceptual quality. Our study found that the perceptual quality in terms of mean opinion score increases as the distance from the camera increases. In addition, it is shown that normalized mutual information (NMI), a full-reference objective quality metric, is highly correlated with subjective scores.
Unbiased Pseudo-Labeling for Learning with Noisy Labels
Ryota HIGASHIMOTO Soh YOSHIDA Takashi HORIHATA Mitsuji MUNEYASU

LETTER

Pubricized:
2023/09/19
Page(s):
44-48
Noisy labels in training data can significantly harm the performance of deep neural networks (DNNs). Recent research on learning with noisy labels uses a property of DNNs called the memorization effect to divide the training data into a set of data with reliable labels and a set of data with unreliable labels. Methods introducing semi-supervised learning strategies discard the unreliable labels and assign pseudo-labels generated from the confident predictions of the model. So far, this semi-supervised strategy has yielded the best results in this field. However, we observe that even when models are trained on balanced data, the distribution of the pseudo-labels can still exhibit an imbalance that is driven by data similarity. Additionally, a data bias is seen that originates from the division of the training data using the semi-supervised method. If we address both types of bias that arise from pseudo-labels, we can avoid the decrease in generalization performance caused by biased noisy pseudo-labels. We propose a learning method with noisy labels that introduces unbiased pseudo-labeling based on causal inference. The proposed method achieves significant accuracy gains in experiments at high noise rates on the standard benchmarks CIFAR-10 and CIFAR-100.
CQTXNet: A Modified Xception Network with Attention Modules for Cover Song Identification
Jinsoo SEO Junghyun KIM Hyemi KIM

LETTER

Pubricized:
2023/10/02
Page(s):
49-52
Song-level feature summarization is fundamental for the browsing, retrieval, and indexing of digital music archives. This study proposes a deep neural network model, CQTXNet, for extracting song-level feature summary for cover song identification. CQTXNet incorporates depth-wise separable convolution, residual network connections, and attention models to extend previous approaches. An experimental evaluation of the proposed CQTXNet was performed on two publicly available cover song datasets by varying the number of network layers and the type of attention modules.

Regular Section

Node-to-Set Disjoint Paths Problem in Cross-Cubes
Rikuya SASAKI Hiroyuki ICHIDA Htoo Htoo Sandi KYAW Keiichi KANEKO

PAPER-Fundamentals of Information Systems

Pubricized:
2023/10/06
Page(s):
53-59
The increasing demand for high-performance computing in recent years has led to active research on massively parallel systems. The interconnection network in a massively parallel system interconnects hundreds of thousands of processing elements so that they can process large tasks while communicating among others. By regarding the processing elements as nodes and the links between processing elements as edges, respectively, we can discuss various problems of interconnection networks in the framework of the graph theory. Many topologies have been proposed for interconnection networks of massively parallel systems. The hypercube is a very popular topology and it has many variants. The cross-cube is such a topology, which can be obtained by adding one extra edge to each node of the hypercube. The cross-cube reduces the diameter of the hypercube, and allows cycles of odd lengths. Therefore, we focus on the cross-cube and propose an algorithm that constructs disjoint paths from a node to a set of nodes. We give a proof of correctness of the algorithm. Also, we show that the time complexity and the maximum path length of the algorithm are O(n³ log n) and 2n - 3, respectively. Moreover, we estimate that the average execution time of the algorithm is O(n²) based on a computer experiment.
Testing and Delay-Monitoring for the High Reliability of Memory-Based Programmable Logic Device
Xihong ZHOU Senling WANG Yoshinobu HIGAMI Hiroshi TAKAHASHI

PAPER-Dependable Computing

Pubricized:
2023/10/03
Page(s):
60-71
Memory-based Programmable Logic Device (MPLD) is a new type of reconfigurable device constructed using a general SRAM array in a unique interconnect configuration. This research aims to propose approaches to guarantee the long-term reliability of MPLDs, including a test method to identify interconnect defects in the SRAM array during the production phase and a delay monitoring technique to detect aging-caused failures. The proposed test method configures pre-generated test configuration data into SRAMs to create fault propagation paths, applies an external walking-zero/one vector to excite faults, and identifies faults at the external output ports. The proposed delay monitoring method configures a novel ring oscillator logic design into MPLD to measure delay variations when the device is in practical use. The logic simulation results with fault injection confirm the effectiveness of the proposed methods.
A Novel Double-Tail Generative Adversarial Network for Fast Photo Animation
Gang LIU Xin CHEN Zhixiang GAO

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2023/09/28
Page(s):
72-82
Photo animation is to transform photos of real-world scenes into anime style images, which is a challenging task in AIGC (AI Generated Content). Although previous methods have achieved promising results, they often introduce noticeable artifacts or distortions. In this paper, we propose a novel double-tail generative adversarial network (DTGAN) for fast photo animation. DTGAN is the third version of the AnimeGAN series. Therefore, DTGAN is also called AnimeGANv3. The generator of DTGAN has two output tails, a support tail for outputting coarse-grained anime style images and a main tail for refining coarse-grained anime style images. In DTGAN, we propose a novel learnable normalization technique, termed as linearly adaptive denormalization (LADE), to prevent artifacts in the generated images. In order to improve the visual quality of the generated anime style images, two novel loss functions suitable for photo animation are proposed: 1) the region smoothing loss function, which is used to weaken the texture details of the generated images to achieve anime effects with abstract details; 2) the fine-grained revision loss function, which is used to eliminate artifacts and noise in the generated anime style image while preserving clear edges. Furthermore, the generator of DTGAN is a lightweight generator framework with only 1.02 million parameters in the inference phase. The proposed DTGAN can be easily end-to-end trained with unpaired training data. Extensive experiments have been conducted to qualitatively and quantitatively demonstrate that our method can produce high-quality anime style images from real-world photos and perform better than the state-of-the-art models.
Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology
Wenkai LIU Lin ZHANG Menglong WU Xichang CAI Hongxia DONG

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2023/10/23
Page(s):
83-92
The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
Kenichi FUJITA Atsushi ANDO Yusuke IJIMA

PAPER-Speech and Hearing

Pubricized:
2023/10/06
Page(s):
93-104
This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker. Speech rhythm is one of the essential factors among speaker characteristics, along with acoustic features such as F0, for reproducing individual utterances in speech synthesis. A novel feature of the proposed method is the rhythm-based embeddings extracted from phonemes and their durations, which are known to be related to speaking rhythm. They are extracted with a speaker identification model similar to the conventional spectral feature-based one. We conducted three experiments, speaker embeddings generation, speech synthesis with generated embeddings, and embedding space analysis, to evaluate the performance. The proposed method demonstrated a moderate speaker identification performance (15.2% EER), even with only phonemes and their duration information. The objective and subjective evaluation results demonstrated that the proposed method can synthesize speech with speech rhythm closer to the target speaker than the conventional method. We also visualized the embeddings to evaluate the relationship between the distance of the embeddings and the perceptual similarity. The visualization of the embedding space and the relation analysis between the closeness indicated that the distribution of embeddings reflects the subjective and objective similarity.
Efficient Action Spotting Using Saliency Feature Weighting
Yuzhi SHI Takayoshi YAMASHITA Tsubasa HIRAKAWA Hironobu FUJIYOSHI Mitsuru NAKAZAWA Yeongnam CHAE Björn STENGER

PAPER-Image Processing and Video Processing

Pubricized:
2023/10/17
Page(s):
105-114
Action spotting is a key component in high-level video understanding. The large number of similar frames poses a challenge for recognizing actions in videos. In this paper we use frame saliency to represent the importance of frames for guiding the model to focus on keyframes. We propose the frame saliency weighting module to improve frame saliency and video representation at the same time. Our proposed model contains two encoders, for pre-action and post-action time windows, to encode video context. We validate our design choices and the generality of proposed method in extensive experiments. On the public SoccerNet-v2 dataset, the method achieves an average mAP of 57.3%, improving over the state of the art. Using embedding features obtained from multiple feature extractors, the average mAP further increases to 75%. We show that reducing the model size by over 90% does not significantly impact performance. Additionally, we use ablation studies to prove the effective of saliency weighting module. Further, we show that our frame saliency weighting strategy is applicable to existing methods on more general action datasets, such as SoccerNet-v1, ActivityNet v1.3, and UCF101.
Improved Head and Data Augmentation to Reduce Artifacts at Grid Boundaries in Object Detection
Shinji UCHINOURA Takio KURITA

PAPER-Image Recognition, Computer Vision

Pubricized:
2023/10/23
Page(s):
115-124
We investigated the influence of horizontal shifts of the input images for one stage object detection method. We found that the object detector class scores drop when the target object center is at the grid boundary. Many approaches have focused on reducing the aliasing effect of down-sampling to achieve shift-invariance. However, down-sampling does not completely solve this problem at the grid boundary; it is necessary to suppress the dispersion of features in pixels close to the grid boundary into adjacent grid cells. Therefore, this paper proposes two approaches focused on the grid boundary to improve this weak point of current object detection methods. One is the Sub-Grid Feature Extraction Module, in which the sub-grid features are added to the input of the classification head. The other is Grid-Aware Data Augmentation, where augmented data are generated by the grid-level shifts and are used in training. The effectiveness of the proposed approaches is demonstrated using the COCO validation set after applying the proposed method to the FCOS architecture.
Multi-Task Learning of Japanese How-to Tip Machine Reading Comprehension by a Generative Model
Xiaotian WANG Tingxuan LI Takuya TAMURA Shunsuke NISHIDA Takehito UTSURO

PAPER-Natural Language Processing

Pubricized:
2023/10/23
Page(s):
125-134
In the research of machine reading comprehension of Japanese how-to tip QA tasks, conventional extractive machine reading comprehension methods have difficulty in dealing with cases in which the answer string spans multiple locations in the context. The method of fine-tuning of the BERT model for machine reading comprehension tasks is not suitable for such cases. In this paper, we trained a generative machine reading comprehension model of Japanese how-to tip by constructing a generative dataset based on the website “wikihow” as a source of information. We then proposed two methods for multi-task learning to fine-tune the generative model. The first method is the multi-task learning with a generative and extractive hybrid training dataset, where both generative and extractive datasets are simultaneously trained on a single model. The second method is the multi-task learning with the inter-sentence semantic similarity and answer generation, where, drawing upon the answer generation task, the model additionally learns the distance between the sentences of the question/context and the answer in the training examples. The evaluation results showed that both of the multi-task learning methods significantly outperformed the single-task learning method in generative question-and-answer examples. Between the two methods for multi-task learning, that with the inter-sentence semantic similarity and answer generation performed the best in terms of the manual evaluation result. The data and the code are available at https://github.com/EternalEdenn/multitask_ext-gen_sts-gen.
Inference Discrepancy Based Curriculum Learning for Neural Machine Translation
Lei ZHOU Ryohei SASANO Koichi TAKEDA

PAPER-Natural Language Processing

Pubricized:
2023/10/18
Page(s):
135-143
In practice, even a well-trained neural machine translation (NMT) model can still make biased inferences on the training set due to distribution shifts. For the human learning process, if we can not reproduce something correctly after learning it multiple times, we consider it to be more difficult. Likewise, a training example causing a large discrepancy between inference and reference implies higher learning difficulty for the MT model. Therefore, we propose to adopt the inference discrepancy of each training example as the difficulty criterion, and according to which rank training examples from easy to hard. In this way, a trained model can guide the curriculum learning process of an initial model identical to itself. We put forward an analogy to this training scheme as guiding the learning process of a curriculum NMT model by a pretrained vanilla model. In this paper, we assess the effectiveness of the proposed training scheme and take an insight into the influence of translation direction, evaluation metrics and different curriculum schedules. Experimental results on translation benchmarks WMT14 English ⇒ German, WMT17 Chinese ⇒ English and Multitarget TED Talks Task (MTTT) English ⇔ German, English ⇔ Chinese, English ⇔ Russian demonstrate that our proposed method consistently improves the translation performance against the advanced Transformer baseline.
Negative Learning to Prevent Undesirable Misclassification
Kazuki EGASHIRA Atsuyuki MIYAI Qing YU Go IRIE Kiyoharu AIZAWA

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2023/10/05
Page(s):
144-147
We propose a novel classification problem setting where Undesirable Classes (UCs) are defined for each class. UC is the class you specifically want to avoid misclassifying. To address this setting, we propose a framework to reduce the probabilities for UCs while increasing the probability for a correct class.
Shared Latent Embedding Learning for Multi-View Subspace Clustering
Zhaohu LIU Peng SONG Jinshuai MU Wenming ZHENG

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2023/10/17
Page(s):
148-152
Most existing multi-view subspace clustering approaches only capture the inter-view similarities between different views and ignore the optimal local geometric structure of the original data. To this end, in this letter, we put forward a novel method named shared latent embedding learning for multi-view subspace clustering (SLE-MSC), which can efficiently capture a better latent space. To be specific, we introduce a pseudo-label constraint to capture the intra-view similarities within each view. Meanwhile, we utilize a novel optimal graph Laplacian to learn the consistent latent representation, in which the common manifold is considered as the optimal manifold to obtain a more reasonable local geometric structure. Comprehensive experimental results indicate the superiority and effectiveness of the proposed method.
A CNN-Based Multi-Scale Pooling Strategy for Acoustic Scene Classification
Rong HUANG Yue XIE

LETTER-Speech and Hearing

Pubricized:
2023/10/17
Page(s):
153-156
Acoustic scene classification (ASC) is a fundamental domain within the realm of artificial intelligence classification tasks. ASC-based tasks commonly employ models based on convolutional neural networks (CNNs) that utilize log-Mel spectrograms as input for gathering acoustic features. In this paper, we designed a CNN-based multi-scale pooling (MSP) strategy for ASC. The log-Mel spectrograms are utilized as the input to CNN, which is partitioned into four frequency axis segments. Furthermore, we devised four CNN channels to acquire inputs from distinct frequency ranges. The high-level features extracted from outputs in various frequency bands are integrated through frequency pyramid average pooling layers at multiple levels. Subsequently, a softmax classifier is employed to classify different scenes. Our study demonstrates that the implementation of our designed model leads to a significant enhancement in the model's performance, as evidenced by the testing of two acoustic datasets.
Lightweight and Fast Low-Light Image Enhancement Method Based on PoolFormer
Xin HU Jinhua WANG Sunhan XU

LETTER-Image Processing and Video Processing

Pubricized:
2023/10/05
Page(s):
157-160
Images captured in low-light environments have low visibility and high noise, which will seriously affect subsequent visual tasks such as target detection and face recognition. Therefore, low-light image enhancement is of great significance in obtaining high-quality images and is a challenging problem in computer vision tasks. A low-light enhancement model, LLFormer, based on the Vision Transformer, uses axis-based multi-head self-attention and a cross-layer attention fusion mechanism to reduce the complexity and achieve feature extraction. This algorithm can enhance images well. However, the calculation of the attention mechanism is complex and the number of parameters is large, which limits the application of the model in practice. In response to this problem, a lightweight module, PoolFormer, is used to replace the attention module with spatial pooling, which can increase the parallelism of the network and greatly reduce the number of model parameters. To suppress image noise and improve visual effects, a new loss function is constructed for model optimization. The experiment results show that the proposed method not only reduces the number of parameters by 49%, but also performs better in terms of image detail restoration and noise suppression compared with the baseline model. On the LOL dataset, the PSNR and SSIM were 24.098dB and 0.8575 respectively. On the MIT-Adobe FiveK dataset, the PSNR and SSIM were 27.060dB and 0.9490. The evaluation results on the two datasets are better than the current mainstream low-light enhancement algorithms.
Robust Visual Tracking Using Hierarchical Vision Transformer with Shifted Windows Multi-Head Self-Attention
Peng GAO Xin-Yue ZHANG Xiao-Li YANG Jian-Cheng NI Fei WANG

LETTER-Image Recognition, Computer Vision

Pubricized:
2023/10/20
Page(s):
161-164
Despite Siamese trackers attracting much attention due to their scalability and efficiency in recent years, researchers have ignored the background appearance, which leads to their inapplicability in recognizing arbitrary target objects with various variations, especially in complex scenarios with background clutter and distractors. In this paper, we present a simple yet effective Siamese tracker, where the shifted windows multi-head self-attention is produced to learn the characteristics of a specific given target object for visual tracking. To validate the effectiveness of our proposed tracker, we use the Swin Transformer as the backbone network and introduced an auxiliary feature enhancement network. Extensive experimental results on two evaluation datasets demonstrate that the proposed tracker outperforms other baselines.
Re-Evaluating Syntax-Based Negation Scope Resolution
Asahi YOSHIDA Yoshihide KATO Shigeki MATSUBARA

LETTER-Natural Language Processing

Pubricized:
2023/10/16
Page(s):
165-168
Negation scope resolution is the process of detecting the negated part of a sentence. Unlike the syntax-based approach employed in previous researches, state-of-the-art methods performed better without the explicit use of syntactic structure. This work revisits the syntax-based approach and re-evaluates the effectiveness of syntactic structure in negation scope resolution. We replace the parser utilized in the prior works with state-of-the-art parsers and modify the syntax-based heuristic rules. The experimental results demonstrate that the simple modifications enhance the performance of the prior syntax-based method to the same level as state-of-the-art end-to-end neural-based methods.

IEICE TRANSACTIONS on Information

Advance publication (published online immediately after acceptance)

Volume E107-D No.1 (Publication Date:2024/01/01)

FOREWORD Open Access

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles