Zheqing ZHANG Hao ZHOU Chuan LI Weiwei JIANG
Single-image dehazing is a challenging task in computer vision research. Aiming at the limitations of traditional convolutional neural network representation capabilities and the high computational overhead of the self-attention mechanism in recent years, we proposed image attention and designed a single image dehazing network based on the image attention: IAD-Net. The proposed image attention is a plug-and-play module with the ability of global modeling. IAD-Net is a parallel network structure that combines the global modeling ability of image attention and the local modeling ability of convolution, so that the network can learn global and local features. The proposed network model has excellent feature learning ability and feature expression ability, has low computational overhead, and also improves the detail information of hazy images. Experiments verify the effectiveness of the image attention module and the competitiveness of IAD-Net with state-of-the-art methods.
This article describes the idea of utilizing Attested Execution Secure Processors (AESPs) that fit into building a secure Self-Sovereign Identity (SSI) system satisfying Sybil-resistance under permissionless blockchains. Today’s circumstances requiring people to be more online have encouraged us to address digital identity preserving privacy. There is a momentum of research addressing SSI, and many researchers approach blockchain technology as a foundation. SSI brings natural persons various benefits such as owning controls; on the other side, digital identity systems in the real world require Sybil-resistance to comply with Anti-Money-Laundering (AML) and other needs. The main idea in our proposal is to utilize AESPs for three reasons: first is the use of attested execution capability along with tamper-resistance, which is a strong assumption; second is powerfulness and flexibility, allowing various open-source programs to be executed within a secure enclave, and the third is that equipping hardware-assisted security in mobile devices has become a norm. Rafael Pass et al.’s formal abstraction of AESPs and the ideal functionality $\color{brown}{\mathcal{G}_\mathtt{att}}$ enable us to formulate how hardware-assisted security works for secure digital identity systems preserving privacy under permissionless blockchains mathematically. Our proposal of the AESP-based SSI architecture and system protocols, $\color{blue}{\Pi^{\mathcal{G}_\mathtt{att}}}$, demonstrates the advantages of building a proper SSI system that satisfies the Sybil-resistant requirement. The protocols may eliminate the online distributed committee assumed in other research, such as CanDID, because of assuming AESPs; thus, $\color{blue}{\Pi^{\mathcal{G}_\mathtt{att}}}$ allows not to rely on multi-party computation (MPC), bringing drastic flexibility and efficiency compared with the existing SSI systems.
Yuhao LIU Zhenzhong CHU Lifei WEI
In the realm of Single Image Super-Resolution (SISR), the meticulously crafted Nonlocal Sparse Attention-based block demonstrates its efficacy in noise reduction and computational cost reduction for nonlocal (global) features. However, it neglect the traditional Convolutional-based block, which proficient in handling local features. Thus, merging both the Nonlocal Sparse Attention-based block and the Convolutional-based block to concurrently manage local and nonlocal features poses a significant challenge. To tackle the aforementioned issues, this paper introduces the Channel Contrastive Attention-based Local-Nonlocal Mutual block (CCLN) for Super-Resolution (SR). (1) We introduce the CCLN block, encompassing the Local Sparse Convolutional-based block for local features and the Nonlocal Sparse Attention-based network block for nonlocal features. (2) We introduce Channel Contrastive Attention (CCA) blocks, incorporating Sparse Aggregation into Convolutional-based blocks. Additionally, we introduce a robust framework to fuse these two blocks, ensuring that each branch operates according to its respective strengths. (3) The CCLN block can seamlessly integrate into established network backbones like the Enhanced Deep Super-Resolution network (EDSR), achieving in the Channel Attention based Local-Nonlocal Mutual Network (CCLNN). Experimental results show that our CCLNN effectively leverages both local and nonlocal features, outperforming other state-of-the-art algorithms.
Power line communication (PLC) provides a flexible-access, wide-distribution, and low-cost communication solution for distribution network services. However, the PLC self-organizing networking in distribution network faces several challenges such as diversified data transmission requirements guarantee, the contradiction between long-term constraints and short-term optimization, and the uncertainty of global information. To address these challenges, we propose a backpressure learning-based data transmission reliability-aware self-organizing networking algorithm to minimize the weighted sum of node data backlogs under the long-term transmission reliability constraint. Specifically, the minimization problem is transformed by the Lyapunov optimization and backpressure algorithm. Finally, we propose a backpressure and data transmission reliability-aware state-action-reward-state-action (SARSA)-based self-organizing networking strategy to realize the PLC networking optimization. Simulation results demonstrate that the proposed algorithm has superior performances of data backlogs and transmission reliability.
Hiroya HACHIYAMA Takamichi NAKAMOTO
Devices presenting audiovisual information are widespread, but few ones presenting olfactory information. We have developed a device called an olfactory display that presents odors to users by mixing multiple fragrances. Previously developed olfactory displays had the problem that the ejection volume of liquid perfume droplets was large and the dynamic range of the blending ratio was small. In this study, we used an inkjet device that ejects small droplets in order to expand the dynamic range of blending ratios to present a variety of scents. By finely controlling the back pressure using an electro-osmotic pump (EO pump) and adjusting the timing of EO pump and inkjet device, we succeeded in stabilizing the ejection of the inkjet device and we can have large dynamic range.
Shinsuke IBI Takumi TAKAHASHI Hisato IWAI
This paper proposes a novel differential active self-interference canceller (DASIC) algorithm for asynchronous in-band full-duplex (IBFD) Gaussian filtered frequency shift keying (GFSK), which is designed for wireless Internet of Things (IoT). In IBFD communications, where two terminals simultaneously transmit and receive signals in the same frequency band, there is an extremely strong self-interference (SI). The SI can be mitigated by an active SI canceller (ASIC), which subtracts an interference replica based on channel state information (CSI) from the received signal. The challenging problem is the realization of asynchronous IBFD for wireless IoT in indoor environments. In the asynchronous mode, pilot contamination is induced by the non-orthogonality between asynchronous pilot sequences. In addition, the transceiver suffers from analog front-end (AFE) impairments, such as phase noise. Due to these impairments, the SI cannot be canceled entirely at the receiver, resulting in residual interference. To address the above issue, the DASIC incorporates the principle of the differential codec, which enables to suppress SI without the CSI estimation of SI owing to the differential structure. Also, on the premise of using an error correction technique, iterative detection and decoding (IDD) is applied to improve the detection capability while exchanging the extrinsic log-likelihood ratio (LLR) between the maximum a-posteriori probability (MAP) detector and the channel decoder. Finally, the validity of using the DASIC algorithm is evaluated by computer simulations in terms of the packet error rate (PER). The results clearly demonstrate the possibility of realizing asynchronous IBFD.
Hua HUANG Yiwen SHAN Chuan LI Zhi WANG
Image denoising is an indispensable process of manifold high level tasks in image processing and computer vision. However, the traditional low-rank minimization-based methods suffer from a biased problem since only the noisy observation is used to estimate the underlying clean matrix. To overcome this issue, a new low-rank minimization-based method, called nuclear norm minus Frobenius norm rank residual minimization (NFRRM), is proposed for image denoising. The propose method transforms the ill-posed image denoising problem to rank residual minimization problems through excavating the nonlocal self-similarity prior. The proposed NFRRM model can perform an accurate estimation to the underlying clean matrix through treating each rank residual component flexibly. More importantly, the global optimum of the proposed NFRRM model can be obtained in closed-form. Extensive experiments demonstrate that the proposed NFRRM method outperforms many state-of-the-art image denoising methods.
Data sparsity has always been a problem in document classification, for which semi-supervised learning and few-shot learning are studied. An even more extreme scenario is to classify documents without any annotated data, but using only category names. In this paper, we introduce a nearest neighbor search-based method Con2Class to tackle this tough task. We intend to produce embeddings for predefined categories and predict category embeddings for all the unlabeled documents in a unified embedding space, such that categories can be easily assigned by searching the nearest predefined category in the embedding space. To achieve this, we propose confidence-driven contrastive learning, in which prompt-based templates are designed and MLM-maintained contrastive loss is newly proposed to finetune a pretrained language model for embedding production. To deal with the issue that no annotated data is available to validate the classification model, we introduce confidence factor to estimate the classification ability by evaluating the prediction confidence. The language model having the highest confidence factor is used to produce embeddings for similarity evaluation. Pseudo labels are then assigned by searching the semantically closest category name, which are further used to train a separate classifier following a progressive self-training strategy for final prediction. Our experiments on five representative datasets demonstrate the superiority of our proposed method over the existing approaches.
The prediction of peak power load is a critical factor directly impacting the stability of power supply, characterized significantly by its time series nature and intricate ties to the seasonal patterns in electricity usage. Despite its crucial importance, the current landscape of power peak load forecasting remains a multifaceted challenge in the field. This study aims to contribute to this domain by proposing a method that leverages a combination of three primary models - the GRU model, self-attention mechanism, and Transformer mechanism - to forecast peak power load. To contextualize this research within the ongoing discourse, it’s essential to consider the evolving methodologies and advancements in power peak load forecasting. By delving into additional references addressing the complexities and current state of the power peak load forecasting problem, this study aims to build upon the existing knowledge base and offer insights into contemporary challenges and strategies adopted within the field. Data preprocessing in this study involves comprehensive cleaning, standardization, and the design of relevant functions to ensure robustness in the predictive modeling process. Additionally, recognizing the necessity to capture temporal changes effectively, this research incorporates features such as “Weekly Moving Average” and “Monthly Moving Average” into the dataset. To evaluate the proposed methodologies comprehensively, this study conducts comparative analyses with established models such as LSTM, Self-attention network, Transformer, ARIMA, and SVR. The outcomes reveal that the models proposed in this study exhibit superior predictive performance compared to these established models, showcasing their effectiveness in accurately forecasting electricity consumption. The significance of this research lies in two primary contributions. Firstly, it introduces an innovative prediction method combining the GRU model, self-attention mechanism, and Transformer mechanism, aligning with the contemporary evolution of predictive modeling techniques in the field. Secondly, it introduces and emphasizes the utility of “Weekly Moving Average” and “Monthly Moving Average” methodologies, crucial in effectively capturing and interpreting seasonal variations within the dataset. By incorporating these features, this study enhances the model’s ability to account for seasonal influencing factors, thereby significantly improving the accuracy of peak power load forecasting. This contribution aligns with the ongoing efforts to refine forecasting methodologies and addresses the pertinent challenges within power peak load forecasting.
Xueying WANG Yuan HUANG Xin LONG Ziji MA
In recent years, the increasing complexity of deep network structures has hindered their application in small resource constrained hardware. Therefore, we urgently need to compress and accelerate deep network models. Channel pruning is an effective method to compress deep neural networks. However, most existing channel pruning methods are prone to falling into local optima. In this paper, we propose a channel pruning method via Improved Grey Wolf Optimizer Pruner which called IGWO-Pruner to prune redundant channels of convolutional neural networks. It identifies pruning ratio of each layer by using Improved Grey Wolf algorithm, and then fine-tuning the new pruned network model. In experimental section, we evaluate the proposed method in CIFAR datasets and ILSVRC-2012 with several classical networks, including VGGNet, GoogLeNet and ResNet-18/34/56/152, and experimental results demonstrate the proposed method is able to prune a large number of redundant channels and parameters with rare performance loss.
Wanying MAN Guiqin YANG Shurui FENG
Software Defined Networking (SDN), a new network architecture, allows for centralized network management by separating the control plane from the forwarding plane. Because forwarding and control is separated, distributed denial of service (DDoS) assaults provide a greater threat to SDN networks. To address the problem, this paper uses a joint high-precision attack detection combining self-attentive mechanism and support vector machine: a trigger mechanism deployed at both control and data layers is proposed to trigger the initial detection of DDoS attacks; the data in the network under attack is screened in detail using a combination of self-attentive mechanism and support vector machine; the control plane is proposed to initiate attack defense using the OpenFlow protocol features to issue flow tables for accurate classification results. The experimental results show that the trigger mechanism can react to the attack in time with less than 20% load, and the accurate detection mechanism is better than the existing inspection and testing methods, with a precision rate of 98.95% and a false alarm rate of only 1.04%. At the same time, the defense strategy can achieve timely recovery of network characteristics.
Pingping WANG Xinyi ZHANG Yuyan ZHAO Yueti LI Kaisheng XU Shuaiyin ZHAO
Leukemia is a common and highly dangerous blood disease that requires early detection and treatment. Currently, the diagnosis of leukemia types mainly relies on the pathologist’s morphological examination of blood cell images, which is a tedious and time-consuming process, and the diagnosis results are highly subjective and prone to misdiagnosis and missed diagnosis. This research suggests a blood cell image recognition technique based on an enhanced Vision Transformer to address these problems. Firstly, this paper incorporate convolutions with token embedding to replace the positional encoding which represent coarse spatial information. Then based on the Transformer’s self-attention mechanism, this paper proposes a sparse attention module that can select identifying regions in the image, further enhancing the model’s fine-grained feature expression capability. Finally, this paper uses a contrastive loss function to further increase the intra-class consistency and inter-class difference of classification features. According to experimental results, The model in this study has an identification accuracy of 92.49% on the Munich single-cell morphological dataset, which is an improvement of 1.41% over the baseline. And comparing with sota Swin transformer, this method still get greater performance. So our method has the potential to provide reference for clinical diagnosis by physicians.
Li HE Xiaowu ZHANG Jianyong DUAN Hao WANG Xin LI Liang ZHAO
Chinese spelling correction (CSC) models detect and correct a text typo based on the misspelled character and its context. Recently, Bert-based models have dominated the research of Chinese spelling correction. However, these methods only focus on the semantic information of the text during the pretraining stage, neglecting the learning of correcting spelling errors. Moreover, when multiple incorrect characters are in the text, the context introduces noisy information, making it difficult for the model to accurately detect the positions of the incorrect characters, leading to false corrections. To address these limitations, we apply the multimodal pre-trained language model ChineseBert to the task of spelling correction. We propose a self-distillation learning-based pretraining strategy, where a confusion set is used to construct text containing erroneous characters, allowing the model to jointly learns how to understand language and correct spelling errors. Additionally, we introduce a single-channel masking mechanism to mitigate the noise caused by the incorrect characters. This mechanism masks the semantic encoding channel while preserving the phonetic and glyph encoding channels, reducing the noise introduced by incorrect characters during the prediction process. Finally, experiments are conducted on widely used benchmarks. Our model achieves superior performance against state-of-the-art methods by a remarkable gain.
Peng GAO Xin-Yue ZHANG Xiao-Li YANG Jian-Cheng NI Fei WANG
Despite Siamese trackers attracting much attention due to their scalability and efficiency in recent years, researchers have ignored the background appearance, which leads to their inapplicability in recognizing arbitrary target objects with various variations, especially in complex scenarios with background clutter and distractors. In this paper, we present a simple yet effective Siamese tracker, where the shifted windows multi-head self-attention is produced to learn the characteristics of a specific given target object for visual tracking. To validate the effectiveness of our proposed tracker, we use the Swin Transformer as the backbone network and introduced an auxiliary feature enhancement network. Extensive experimental results on two evaluation datasets demonstrate that the proposed tracker outperforms other baselines.
Images captured in low-light environments have low visibility and high noise, which will seriously affect subsequent visual tasks such as target detection and face recognition. Therefore, low-light image enhancement is of great significance in obtaining high-quality images and is a challenging problem in computer vision tasks. A low-light enhancement model, LLFormer, based on the Vision Transformer, uses axis-based multi-head self-attention and a cross-layer attention fusion mechanism to reduce the complexity and achieve feature extraction. This algorithm can enhance images well. However, the calculation of the attention mechanism is complex and the number of parameters is large, which limits the application of the model in practice. In response to this problem, a lightweight module, PoolFormer, is used to replace the attention module with spatial pooling, which can increase the parallelism of the network and greatly reduce the number of model parameters. To suppress image noise and improve visual effects, a new loss function is constructed for model optimization. The experiment results show that the proposed method not only reduces the number of parameters by 49%, but also performs better in terms of image detail restoration and noise suppression compared with the baseline model. On the LOL dataset, the PSNR and SSIM were 24.098dB and 0.8575 respectively. On the MIT-Adobe FiveK dataset, the PSNR and SSIM were 27.060dB and 0.9490. The evaluation results on the two datasets are better than the current mainstream low-light enhancement algorithms.
Lei ZHOU Ryohei SASANO Koichi TAKEDA
In practice, even a well-trained neural machine translation (NMT) model can still make biased inferences on the training set due to distribution shifts. For the human learning process, if we can not reproduce something correctly after learning it multiple times, we consider it to be more difficult. Likewise, a training example causing a large discrepancy between inference and reference implies higher learning difficulty for the MT model. Therefore, we propose to adopt the inference discrepancy of each training example as the difficulty criterion, and according to which rank training examples from easy to hard. In this way, a trained model can guide the curriculum learning process of an initial model identical to itself. We put forward an analogy to this training scheme as guiding the learning process of a curriculum NMT model by a pretrained vanilla model. In this paper, we assess the effectiveness of the proposed training scheme and take an insight into the influence of translation direction, evaluation metrics and different curriculum schedules. Experimental results on translation benchmarks WMT14 English ⇒ German, WMT17 Chinese ⇒ English and Multitarget TED Talks Task (MTTT) English ⇔ German, English ⇔ Chinese, English ⇔ Russian demonstrate that our proposed method consistently improves the translation performance against the advanced Transformer baseline.
Guojin LIAO Yongpeng ZUO Qiao LIAO Xiaofeng TIAN
Frame synchronization detection before data transmission is an important module which directly affects the lifetime and coexistence of underwater acoustic communication (UAC) networks, where linear frequency modulation (LFM) is a frame preamble signal commonly used for synchronization. Unlike terrestrial wireless communications, strong bursty noise frequently appears in UAC. Due to the long transmission distance and the low signal-to-noise ratio, strong short-distance bursty noise will greatly reduce the accuracy of conventional fractional fourier transform (FrFT) detection. We propose a multi-segment verification fractional fourier transform (MFrFT) preamble detection algorithm to address this challenge. In the proposed algorithm, 4 times of adjacent FrFT operations are carried out. And the LFM signal identifies by observing the linear correlation between two lines connected in pair among three adjacent peak points, called ‘dual-line-correlation mechanism’. The accurate starting time of the LFM signal can be found according to the peak frequency of the adjacent FrFT. More importantly, MFrFT do not result in an increase in computational complexity. Compared with the conventional FrFT detection method, experimental results show that the proposed algorithm can effectively distinguish between signal starting points and bursty noise with much lower error detection rate, which in turn minimizes the cost of retransmission.
Rongcheng DONG Taisuke IZUMI Naoki KITAMURA Yuichi SUDO Toshimitsu MASUZAWA
The maximal independent set (MIS) problem is one of the most fundamental problems in the field of distributed computing. This paper focuses on the MIS problem with unreliable communication between processes in the system. We propose a relaxed notion of MIS, named almost MIS (ALMIS), and show that the loosely-stabilizing algorithm proposed in our previous work can achieve exponentially long holding time with logarithmic convergence time and space complexity regarding ALMIS, which cannot be achieved at the same time regarding MIS in our previous work.
Zahra AZIZAH Tomoya OHYAMA Xiumin ZHAO Yuichi OHKAWA Takashi MITSUISHI
Learning analytics (LA) has emerged as a technique for educational quality improvement in many learning contexts, including blended learning (BL) courses. Numerous studies show that students' academic performance is significantly impacted by their ability to engage in self-regulated learning (SRL). In this study, learning behaviors indicating SRL and motivation are elucidated during a BL course on second language learning. Online trace data of a mobile language learning application (m-learning app) is used as a part of BL implementation. The observed motivation were of two categories: high-level motivation (study in time, study again, and early learning) and low-level motivation (cramming and catch up). As a result, students who perform well tend to engage in high-level motivation. While low performance students tend to engage in clow-level motivation. Those findings are supported by regression models showing that study in time followed by early learning significantly influences the academic performance of BL courses, both in the spring and fall semesters. Using limited resource of m-learning app log data, this BL study could explain the overall BL performance.
High-performance deep learning-based object detection models can reduce traffic accidents using dashcam images during nighttime driving. Deep learning requires a large-scale dataset to obtain a high-performance model. However, existing object detection datasets are mostly daytime scenes and a few nighttime scenes. Increasing the nighttime dataset is laborious and time-consuming. In such a case, it is possible to convert daytime images to nighttime images by image-to-image translation model to augment the nighttime dataset with less effort so that the translated dataset can utilize the annotations of the daytime dataset. Therefore, in this study, a GAN-based image-to-image translation model is proposed by incorporating self-attention with cycle consistency and content/style separation for nighttime data augmentation that shows high fidelity to annotations of the daytime dataset. Experimental results highlight the effectiveness of the proposed model compared with other models in terms of translated images and FID scores. Moreover, the high fidelity of translated images to the annotations is verified by a small object detection model according to detection results and mAP. Ablation studies confirm the effectiveness of self-attention in the proposed model. As a contribution to GAN-based data augmentation, the source code of the proposed image translation model is publicly available at https://github.com/subecky/Image-Translation-With-Self-Attention