Lihan TONG Weijia LI Qingxia YANG Liyuan CHEN Peng CHEN
We present Ksformer, utilizing Multi-scale Key-select Routing Attention (MKRA) for intelligent selection of key areas through multi-channel, multi-scale windows with a top-k operator, and Lightweight Frequency Processing Module (LFPM) to enhance high-frequency features, outperforming other dehazing methods in tests.
Multi-focus image fusion involves combining partially focused images of the same scene to create an all-in-focus image. Aiming at the problems of existing multi-focus image fusion algorithms that the benchmark image is difficult to obtain and the convolutional neural network focuses too much on the local region, a fusion algorithm that combines local and global feature encoding is proposed. Initially, we devise two self-supervised image reconstruction tasks and train an encoder-decoder network through multi-task learning. Subsequently, within the encoder, we merge the dense connection module with the PS-ViT module, enabling the network to utilize local and global information during feature extraction. Finally, to enhance the overall efficiency of the model, distinct loss functions are applied to each task. To preserve the more robust features from the original images, spatial frequency is employed during the fusion stage to obtain the feature map of the fused image. Experimental results demonstrate that, in comparison to twelve other prominent algorithms, our method exhibits good fusion performance in objective evaluation. Ten of the selected twelve evaluation metrics show an improvement of more than 0.28%. Additionally, it presents superior visual effects subjectively.
This article focuses on improving the BiSeNet v2 bilateral branch image segmentation network structure, enhancing its learning ability for spatial details and overall image segmentation accuracy. A modified network called “BiconvNet” is proposed. Firstly, to extract shallow spatial details more effectively, a parallel concatenated strip and dilated (PCSD) convolution module is proposed and used to extract local features and surrounding contextual features in the detail branch. Continuing on, the semantic branch is reconstructed using the lightweight capability of depth separable convolution and high performance of ConvNet, in order to enable more efficient learning of deep advanced semantic features. Finally, fine-tuning is performed on the bilateral guidance aggregation layer of BiSeNet v2, enabling better fusion of the feature maps output by the detail branch and semantic branch. The experimental part discusses the contribution of stripe convolution and different sizes of empty convolution to image segmentation accuracy, and compares them with common convolutions such as Conv2d convolution, CG convolution and CCA convolution. The experiment proves that the PCSD convolution module proposed in this paper has the highest segmentation accuracy in all categories of the Cityscapes dataset compared with common convolutions. BiConvNet achieved a 9.39% accuracy improvement over the BiSeNet v2 network, with only a slight increase of 1.18M in model parameters. A mIoU accuracy of 68.75% was achieved on the validation set. Furthermore, through comparative experiments with commonly used autonomous driving image segmentation algorithms in recent years, BiConvNet demonstrates strong competitive advantages in segmentation accuracy on the Cityscapes and BDD100K datasets.
Seiya KISHIMOTO Ryoya OGINO Kenta ARASE Shinichiro OHNUKI
This paper introduces a computational approach for transient analysis of extensive scattering problems. This novel method is based on the combination of physical optics (PO) and the fast inverse Laplace transform (FILT). PO is a technique for analyzing electromagnetic scattering from large-scale objects. We modify PO for application in the complex frequency domain, where the scattered fields are evaluated. The complex frequency function is efficiently transformed into the time domain using FILT. The effectiveness of this combination is demonstrated through large-scale analysis and transient response for a short pulse incidence. The accuracy is investigated and validated by comparison with reference solutions.
Haonan CHEN Akito IGUCHI Yasuhide TSUJI
In order to calculate photonic devices with slowly varying waveguide structure along propagation direction, we develop finite element beam propagation method (FE-BPM) with coordinate transformation. In this approach, converting a longitudinally varying waveguide into the equivalent straight waveguide, cumbersome processes in FE-BPM, such as mesh updating and field interpolation processes at each propagation step, can be avoided. We employ this simulation technique in shape optimization of photonic devices and show design examples of mode converter. To show the validity of this approach, the calculated results of designed devices are compared with the finite element method (FEM) or the standard FE-BPM.
Hyunuk AHN Akito IGUCHI Keita MORIMOTO Yasuhide TSUJI
We develop new 3D full vectorial finite element bidirectional beam propagation method (3DFV-BiBPM) in order to handle the nonradiative dielectric waveguide (NRD guide) components where waveguide profile varies in the direction perpendicular to the parallel metal plates. The BiBPM is one of the transfer-matrix-based methods where only transverse cross sections have to be discretized using the finite difference or the finite element scheme, and it can treat backward and multiple reflections as opposed to the standard BPM. An NRD guide with air-gap and a filter with a sapphire resonator are numerically analyze considering dielectric losses to investigate the validity of our approach.
In satellite positioning, both the reception of ranging signals and the acquisition of navigation messages are necessary. In general, the acquisition of navigation messages does not always require the reception of radiowaves; however, when radiowaves are used for acquisition, a period of continuous reception significantly longer than one second is required. The European satellite positioning system, Galileo, started broadcasting new navigation messages from August 2022. The improvement is based on a secondary synchronization pattern, secondary forward error correction, and reduced ephemeris to aid in the rapid recovery from interruptions in message acquisition caused by temporary deterioration in radio reception. This paper evaluates the recovery characteristics from interruptions in navigation message acquisition by moving reception of this improved I/NAV navigation message.
Xiangrun LI Qiyu SHENG Guangda ZHOU Jialong WEI Yanmin SHI Zhen ZHAO Yongwei LI Xingfeng LI Yang LIU
Automated tongue segmentation plays a crucial role in the realm of computer-aided tongue diagnosis. The challenge lies in developing algorithms that achieve higher segmentation accuracy and maintain less memory space and swift inference capabilities. To relieve this issue, we propose a novel Pool-unet integrating Pool-former and Multi-task mask learning for tongue image segmentation. First of all, we collected 756 tongue images taken in various shooting environments and from different angles and accurately labeled the tongue under the guidance of a medical professional. Second, we propose the Pool-unet model, combining a hierarchical Pool-former module and a U-shaped symmetric encoder-decoder with skip-connections, which utilizes a patch expanding layer for up-sampling and a patch embedding layer for down-sampling to maintain spatial resolution, to effectively capture global and local information using fewer parameters and faster inference. Finally, a Multi-task mask learning strategy is designed, which improves the generalization and anti-interference ability of the model through the Multi-task pre-training and self-supervised fine-tuning stages. Experimental results on the tongue dataset show that compared to the state-of-the-art method (OET-NET), our method has 25% fewer model parameters, achieves 22% faster inference times, and exhibits 0.91% and 0.55% improvements in Mean Intersection Over Union (MIOU), and Mean Pixel Accuracy (MPA), respectively.
Pingping JI Lingge JIANG Chen HE Di HE Zhuxian LIAN
High altitude platform (HAP), known as line-of-sight dominated communications, effectively enhance the spectral efficiency of wireless networks. However, the line-of-sight links, particularly in urban areas, may be severely deteriorated due to the complex communication environment. The reconfigurable intelligent surface (RIS) is employed to establish the cascaded-link and improve the quality of communication service by smartly reflecting the signals received from HAP to users without direct-link. Motivated by this, the joint precoding scheme for a novel RIS-aided beamspace HAP with non-orthogonal multiple access (HAP-NOMA) system is investigated to maximize the minimum user signal-to-leakage-plus-noise ratio (SLNR) by considering user fairness. Specifically, the SLNR is utilized as metric to design the joint precoding algorithm for a lower complexity, because the isolation between the precoding obtainment and power allocation can make the two parts be attained iteratively. To deal with the formulated non-convex problem, we first derive the statistical upper bound on SLNR based on the random matrix theory in large scale antenna array. Then, the closed-form expressions of power matrix and passive precoding matrix are given by introducing auxiliary variables based on the derived upper bound on SLNR. The proposed joint precoding only depends on the statistical channel state information (SCSI) instead of instantaneous channel state information (ICSI). NOMA serves multi-users simultaneously in the same group to compensate for the loss of spectral efficiency resulted from the beamspace HAP. Numerical results show the effectiveness of the derived statistical upper bound on SLNR and the performance enhancement of the proposed joint precoding algorithm.
Baku TAKAHARA Tomohiko MITANI Naoki SHINOHARA
We propose microwave heating via electromagnetic coupling using zeroth-order resonators (ZORs) to extend the uniform heating area. ZORs can generate resonant modes with a wavenumber of 0, which corresponds to an infinite guide wavelength. Under this condition, uniform heating is expected because the resulting standing waves would not have nodes or antinodes. In the design proposed in this paper, two ZORs fabricated on dielectric substrates are arranged to face each other for electromagnetic coupling, and a sample placed between the resonators is heated. A single ZOR was investigated using a 3D electromagnetic simulator, and the resonant frequency and electric field distribution of the simulated ZOR were confirmed to be in good agreement with those of the fabricated ZOR. Simulations of two ZORs facing each other were then conducted to evaluate the performance of the proposed system as a heating apparatus. It was found that a resonator spacing of 25 mm was suitable for uniform heating. Heating simulations of SiC and Al2O3 sheets were performed with the obtained structure. The heating uniformity was evaluated by the width L50% over which the power loss distribution exceeds half the maximum value. This evaluation index was equal to 0.397λ0 for SiC and 0.409λ0 for Al2O3, both of which exceed λ0/4, the distance between a neighboring node and antinode of a standing wave, where λ0 is the free-space wavelength. Therefore, the proposed heating apparatus is effective for uniform microwave heating. Because of the different electrical parameters of the heated materials, SiC can be easily heated, whereas Al2O3 heats little. Finally, heating experiments were performed on each of these materials. Good uniformity in temperature was obtained for both SiC and Al2O3 sheets.
Yoshinori ITOTAGAWA Koma ATSUMI Hikaru SEBE Daisuke KANEMOTO Tetsuya HIROSE
This paper describes a programmable differential bandgap reference (PD-BGR) for ultra-low-power IoT (Internet-of-Things) edge node devices. The PD-BGR consists of a current generator (CG) and differential voltage generator (DVG). The CG is based on a bandgap reference (BGR) and generates an operating current and a voltage, while the DVG generates another voltage from the current. A differential voltage reference can be obtained by taking the voltage difference from the voltages. The PD-BGR can produce a programmable differential output voltage by changing the multipliers of MOSFETs in a differential pair and resistance with digital codes. Simulation results showed that the proposed PD-BGR can generate 25- to 200-mV reference voltages with a 25-mV step within a ±0.7% temperature inaccuracy in a temperature range from -20 to 100°C. A Monte Carlo simulation showed that the coefficient of the variation in the reference was within 1.1%. Measurement results demonstrated that our prototype chips can generate stable programmable differential output voltages, almost the same results as those of the simulation. The average power consumption was only 88.4 nW, with a voltage error of -4/+3 mV with 5 samples.
Hikaru SEBE Daisuke KANEMOTO Tetsuya HIROSE
Extremely low-voltage charge pump (ELV-CP) and its dedicated multi-stage driver (MS-DRV) for sub-60-mV thermoelectric energy harvesting are proposed. The proposed MS-DRV utilizes the output voltages of each ELV-CP to efficiently boost the control clock signals. The boosted clock signals are used as switching signals for each ELV-CP and MS-DRV to turn switch transistors on and off. Moreover, reset transistors are added to the MS-DRV to ensure an adequate non-overlapping period between switching signals. Measurement results demonstrated that the proposed MS-DRV can generate boosted clock signals of 350 mV from input voltage of 60 mV. The ELV-CP can boost the input voltage of 100 mV with 10.7% peak efficiency. The proposed ELV-CP and MS-DRV can boost the low input voltage of 56 mV.
Reliability is an important figure of merit of the system and it must be satisfied in safety-critical applications. This paper considers parallel applications on heterogeneous embedded systems and proposes a two-phase algorithm framework to minimize energy consumption for satisfying applications’ reliability requirement. The first phase is for initial assignment and the second phase is for either satisfying the reliability requirement or improving energy efficiency. Specifically, when the application’s reliability requirement cannot be achieved via the initial assignment, an algorithm for enhancing the reliability of tasks is designed to satisfy the application’s reliability requirement. Considering that the reliability of initial assignment may exceed the application’s reliability requirement, an algorithm for reducing the execution frequency of tasks is designed to improve energy efficiency. The proposed algorithms are compared with existing algorithms by using real parallel applications. Experimental results demonstrate that the proposed algorithms consume less energy while satisfying the application’s reliability requirements.
Rina TAGAMI Hiroki KOBAYASHI Shuichi AKIZUKI Manabu HASHIMOTO
Due to the revitalization of the semiconductor industry and efforts to reduce labor and unmanned operations in the retail and food manufacturing industries, objects to be recognized at production sites are increasingly diversified in color and design. Depending on the target objects, it may be more reliable to process only color information, while intensity information may be better, or a combination of color and intensity information may be better. However, there are not many conventional method for optimizing the color and intensity information to be used, and deep learning is too costly for production sites. In this paper, we optimize the combination of the color and intensity information of a small number of pixels used for matching in the framework of template matching, on the basis of the mutual relationship between the target object and surrounding objects. We propose a fast and reliable matching method using these few pixels. Pixels with a low pixel pattern frequency are selected from color and grayscale images of the target object, and pixels that are highly discriminative from surrounding objects are carefully selected from these pixels. The use of color and intensity information makes the method highly versatile for object design. The use of a small number of pixels that are not shared by the target and surrounding objects provides high robustness to the surrounding objects and enables fast matching. Experiments using real images have confirmed that when 14 pixels are used for matching, the processing time is 6.3 msec and the recognition success rate is 99.7%. The proposed method also showed better positional accuracy than the comparison method, and the optimized pixels had a higher recognition success rate than the non-optimized pixels.
Yuka KO Katsuhito SUDOH Sakriani SAKTI Satoshi NAKAMURA
End-to-end speech translation (ST) directly renders source language speech to the target language without intermediate automatic speech recognition (ASR) output as in a cascade approach. End-to-end ST avoids error propagation from intermediate ASR results. Although recent attempts have applied multi-task learning using an auxiliary task of ASR to improve ST performance, they use cross-entropy loss to one-hot references in the ASR task, and the trained ST models do not consider possible ASR confusion. In this study, we propose a novel multi-task learning framework for end-to-end STs leveraged by ASR-based loss against posterior distributions obtained using a pre-trained ASR model called ASR posterior-based loss (ASR-PBL). The ASR-PBL method, which enables a ST model to reflect possible ASR confusion among competing hypotheses with similar pronunciations, can be applied to one of the strong multi-task ST baseline models with Hybrid CTC/Attention ASR task loss. In our experiments on the Fisher Spanish-to-English corpus, the proposed method demonstrated better BLEU results than the baseline that used standard CE loss.
Yuxin HUANG Yuanlin YANG Enchang ZHU Yin LIANG Yantuan XIAN
Chinese-Vietnamese cross-lingual event retrieval aims to retrieve the Vietnamese sentence describing the same event as a given Chinese query sentence from a set of Vietnamese sentences. Existing mainstream cross-lingual event retrieval methods rely on extracting textual representations from query texts and calculating their similarity with textual representations in other language candidate sets. However, these methods ignore the difference in event elements present during Chinese-Vietnamese cross-language retrieval. Consequently, sentences with similar meanings but different event elements may be incorrectly considered to describe the same event. To address this problem, we propose a cross-lingual retrieval method that integrates event elements. We introduce event elements as an additional supervisory signal, where we calculate the semantic similarity of event elements in two sentences using an attention mechanism to determine the attention score of the event elements. This allows us to establish a one-to-one correspondence between event elements in the text. Additionally, we leverage the multilingual pre-trained language model fine-tuned based on contrastive learning to obtain cross-language sentence representation to calculate the semantic similarity of the sentence texts. By combining these two approaches, we obtain the final text similarity score. Experimental results demonstrate that our proposed method achieves higher retrieval accuracy than the baseline model.
Zheqing ZHANG Hao ZHOU Chuan LI Weiwei JIANG
Single-image dehazing is a challenging task in computer vision research. Aiming at the limitations of traditional convolutional neural network representation capabilities and the high computational overhead of the self-attention mechanism in recent years, we proposed image attention and designed a single image dehazing network based on the image attention: IAD-Net. The proposed image attention is a plug-and-play module with the ability of global modeling. IAD-Net is a parallel network structure that combines the global modeling ability of image attention and the local modeling ability of convolution, so that the network can learn global and local features. The proposed network model has excellent feature learning ability and feature expression ability, has low computational overhead, and also improves the detail information of hazy images. Experiments verify the effectiveness of the image attention module and the competitiveness of IAD-Net with state-of-the-art methods.
Shi BAO Xiaoyan SONG Xufei ZHUANG Min LU Gao LE
Images with rich color information are an important source of information that people obtain from the objective world. Occasionally, it is difficult for people with red-green color vision deficiencies to obtain color information from color images. We propose a method of color correction for dichromats based on the physiological characteristics of dichromats, considering hue information. First, the hue loss of color pairs under normal color vision was defined, an objective function was constructed on its basis, and the resultant image was obtained by minimizing it. Finally, the effectiveness of the proposed method is verified through comparison tests. Red-green color vision deficient people fail to distinguish between partial red and green colors. When the red and green connecting lines are parallel to the a* axis of CIE L*a*b*, red and green perception defectives cannot distinguish the color pair, but can distinguish the color pair parallel to the b* axis. Therefore, when two colors are parallel to the a* axis, their color correction yields good results. When color correction is performed on a color, the hue loss between the two colors under normal color vision is supplemented with b* so that red-green color vision-deficient individuals can distinguish the color difference between the color pairs. The magnitude of the correction is greatest when the connecting lines of the color pairs are parallel to the a* axis, and no color correction is applied when the connecting lines are parallel to the b* axis. The objective evaluation results show that the method achieves a higher score, indicating that the proposed method can maintain the naturalness of the image while reducing confusing colors.
Feng WANG Xiangyu WEN Lisheng LI Yan WEN Shidong ZHANG Yang LIU
The rapid advancement of cloud-edge-end collaboration offers a feasible solution to realize low-delay and low-energy-consumption data processing for internet of things (IoT)-based smart distribution grid. The major concern of cloud-edge-end collaboration lies on resource management. However, the joint optimization of heterogeneous resources involves multiple timescales, and the optimization decisions of different timescales are intertwined. In addition, burst electromagnetic interference will affect the channel environment of the distribution grid, leading to inaccuracies in optimization decisions, which can result in negative influences such as slow convergence and strong fluctuations. Hence, we propose a cloud-edge-end collaborative multi-timescale multi-service resource management algorithm. Large-timescale device scheduling is optimized by sliding window pricing matching, which enables accurate matching estimation and effective conflict elimination. Small-timescale compression level selection and power control are jointly optimized by disturbance-robust upper confidence bound (UCB), which perceives the presence of electromagnetic interference and adjusts exploration tendency for convergence improvement. Simulation outcomes illustrate the excellent performance of the proposed algorithm.
Feng LIU Helin WANG Conggai LI Yanli XU
This letter proposes a scheme for the backward transmission of the propagation-delay based three-user X channel, which is reciprocal to the forward transmission. The given scheme successfully delivers 10 expected messages in 6 time-slots by cyclic interference alignment without loss of degrees of freedom, which supports efficient bidirectional transmission between the two ends of the three-user X channel.