Lihan TONG Weijia LI Qingxia YANG Liyuan CHEN Peng CHEN
We present Ksformer, utilizing Multi-scale Key-select Routing Attention (MKRA) for intelligent selection of key areas through multi-channel, multi-scale windows with a top-k operator, and Lightweight Frequency Processing Module (LFPM) to enhance high-frequency features, outperforming other dehazing methods in tests.
Shuoyan LIU Chao LI Yuxin LIU Yanqiu WANG
Escalators are an indispensable facility in public places. While they can provide convenience to people, abnormal accidents can lead to serious consequences. Yolo is a function that detects human behavior in real time. However, the model exhibits low accuracy and a high miss rate for small targets. To this end, this paper proposes the Small Target High Performance YOLO (SH-YOLO) model to detect abnormal behavior in escalators. The SH-YOLO model first enhances the backbone network through attention mechanisms. Subsequently, a small target detection layer is incorporated in order to enhance detection of key points for small objects. Finally, the conv and the SPPF are replaced with a Region Dynamic Perception Depth Separable Conv (DR-DP-Conv) and Atrous Spatial Pyramid Pooling (ASPP), respectively. The experimental results demonstrate that the proposed model is capable of accurately and robustly detecting anomalies in the real-world escalator scene.
Congcong FANG Yun JIN Guanlin CHEN Yunfan ZHANG Shidang LI Yong MA Yue XIE
Currently, an increasing number of tasks in speech emotion recognition rely on the analysis of both speech and text features. However, there remains a paucity of research exploring the potential of leveraging large language models like GPT-3 to enhance emotion recognition. In this investigation, we harness the power of the GPT-3 model to extract semantic information from transcribed texts, generating text modal features with a dimensionality of 1536. Subsequently, we perform feature fusion, combining the 1536-dimensional text features with 1188-dimensional acoustic features to yield comprehensive multi-modal recognition outcomes. Our findings reveal that the proposed method achieves a weighted accuracy of 79.62% across the four emotion categories in IEMOCAP, underscoring the considerable enhancement in emotion recognition accuracy facilitated by integrating large language models.
Takahito YOSHIDA Takaharu YAGUCHI Takashi MATSUBARA
Accurately simulating physical systems is essential in various fields. In recent years, deep learning has been used to automatically build models of such systems by learning from data. One such method is the neural ordinary differential equation (neural ODE), which treats the output of a neural network as the time derivative of the system states. However, while this and related methods have shown promise, their training strategies still require further development. Inspired by error analysis techniques in numerical analysis while replacing numerical errors with modeling errors, we propose the error-analytic strategy to address this issue. Therefore, our strategy can capture long-term errors and thus improve the accuracy of long-term predictions.
White-box cryptographic implementations often use masking and shuffling as countermeasures against key extraction attacks. To counter these defenses, higher-order Differential Computation Analysis (HO-DCA) and its variants have been developed. These methods aim to breach these countermeasures without needing reverse engineering. However, these non-invasive attacks are expensive and can be thwarted by updating the masking and shuffling techniques. This paper introduces a simple binary injection attack, aptly named clear & return, designed to bypass advanced masking and shuffling defenses employed in white-box cryptography. The attack involves injecting a small amount of assembly code, which effectively disables run-time random sources. This loss of randomness exposes the unprotected lookup value within white-box implementations, making them vulnerable to simple statistical analysis. In experiments targeting open-source white-box cryptographic implementations, the attack strategy of hijacking entries in the Global Offset Table (GOT) or function calls shows effectiveness in circumventing run-time countermeasures.
Multi-focus image fusion involves combining partially focused images of the same scene to create an all-in-focus image. Aiming at the problems of existing multi-focus image fusion algorithms that the benchmark image is difficult to obtain and the convolutional neural network focuses too much on the local region, a fusion algorithm that combines local and global feature encoding is proposed. Initially, we devise two self-supervised image reconstruction tasks and train an encoder-decoder network through multi-task learning. Subsequently, within the encoder, we merge the dense connection module with the PS-ViT module, enabling the network to utilize local and global information during feature extraction. Finally, to enhance the overall efficiency of the model, distinct loss functions are applied to each task. To preserve the more robust features from the original images, spatial frequency is employed during the fusion stage to obtain the feature map of the fused image. Experimental results demonstrate that, in comparison to twelve other prominent algorithms, our method exhibits good fusion performance in objective evaluation. Ten of the selected twelve evaluation metrics show an improvement of more than 0.28%. Additionally, it presents superior visual effects subjectively.
Modern memory devices such as DRAM are prone to errors that occur because of unintended bit flips during their operation. Since memory errors severely impact in-memory key-value stores (KVSes), software mechanisms for hardening them against memory errors are being explored. However, it is hard to efficiently test the memory error handling code due to its characteristics: the code is event-driven, the handlers depend on the memory object, and in-memory KVSes manage various objects in huge memory space. This paper presents MemFI that supports runtime tests for the memory error handlers of in-memory KVSes. Our approach performs the software fault injection of memory errors at the memory object level to trigger the target handler while smoothly carrying out tests on the same running state. To show the effectiveness of MemFI, we integrate error handling mechanisms into a real-world in-memory KVS, memcached 1.6.9 and Redis 6.2.7, and check their behavior using the MemFI prototypes. The results show that the MemFI-based runtime test allows us to check the behavior of the error handling mechanisms. We also show its efficiency by comparing it to other fault injection approaches based on a trial model.
Yuan LI Tingting HU Ryuji FUCHIKAMI Takeshi IKENAGA
1 millisecond (1-ms) vision systems are gaining increasing attention in diverse fields like factory automation and robotics, as the ultra-low delay ensures seamless and timely responses. Superpixel segmentation is a pivotal preprocessing to reduce the number of image primitives for subsequent processing. Recently, there has been a growing emphasis on leveraging deep network-based algorithms to pursue superior performance and better integration into other deep network tasks. Superpixel Sampling Network (SSN) employs a deep network for feature generation and employs differentiable SLIC for superpixel generation. SSN achieves high performance with a small number of parameters. However, implementing SSN on FPGAs for ultra-low delay faces challenges due to the final layer’s aggregation of intermediate results. To address this limitation, this paper proposes an aggregated to pipelined structure for FPGA implementation. The final layer is decomposed into individual final layers for each intermediate result. This architectural adjustment eliminates the need for memory to store intermediate results. Concurrently, the proposed structure leverages decomposed layers to facilitate a pipelined structure with pixel streaming input to achieve ultra-low latency. To cooperate with the pipelined structure, layer-partitioned memory architecture is proposed. Each final layer has dedicated memory for storing superpixel center information, allowing values to be read and calculated from memory without conflicts. Calculation results of each final layer are accumulated, and the result of each pixel is obtained as the stream reaches the last layer. Evaluation results demonstrate that boundary recall and under-segmentation error remain comparable to SSN, with an average label consistency improvement of 0.035 over SSN. From a hardware performance perspective, the proposed system processes 1000 FPS images with a delay of 0.947 ms/frame.
This article focuses on improving the BiSeNet v2 bilateral branch image segmentation network structure, enhancing its learning ability for spatial details and overall image segmentation accuracy. A modified network called “BiconvNet” is proposed. Firstly, to extract shallow spatial details more effectively, a parallel concatenated strip and dilated (PCSD) convolution module is proposed and used to extract local features and surrounding contextual features in the detail branch. Continuing on, the semantic branch is reconstructed using the lightweight capability of depth separable convolution and high performance of ConvNet, in order to enable more efficient learning of deep advanced semantic features. Finally, fine-tuning is performed on the bilateral guidance aggregation layer of BiSeNet v2, enabling better fusion of the feature maps output by the detail branch and semantic branch. The experimental part discusses the contribution of stripe convolution and different sizes of empty convolution to image segmentation accuracy, and compares them with common convolutions such as Conv2d convolution, CG convolution and CCA convolution. The experiment proves that the PCSD convolution module proposed in this paper has the highest segmentation accuracy in all categories of the Cityscapes dataset compared with common convolutions. BiConvNet achieved a 9.39% accuracy improvement over the BiSeNet v2 network, with only a slight increase of 1.18M in model parameters. A mIoU accuracy of 68.75% was achieved on the validation set. Furthermore, through comparative experiments with commonly used autonomous driving image segmentation algorithms in recent years, BiConvNet demonstrates strong competitive advantages in segmentation accuracy on the Cityscapes and BDD100K datasets.
Shohei MATSUHARA Kazuyuki SAITO Tomoyuki TAJIMA Aditya RAKHMADI Yoshiki WATANABE Nobuyoshi TAKESHITA
Renal Denervation (RDN) has been developed as a potential treatment for hypertension that is resistant to traditional antihypertensive medication. This technique involves the ablation of nerve fibers around the renal artery from inside the blood vessel, which is intended to suppress sympathetic nerve activity and result in an antihypertensive effect. Currently, clinical investigation is underway to evaluate the effectiveness of RDN in treating treatment-resistant hypertension. Although radio frequency (RF) ablation catheters are commonly used, their heating capacity is limited. Microwave catheters are being considered as another option for RDN. We aim to solve the technical challenges of applying microwave catheters to RDN. In this paper, we designed a catheter with a helix structure and a microwave (2.45 GHz) antenna. The antenna is a coaxial slot antenna, the dimensions of which were determined by optimizing the reflection coefficient through simulation. The measured catheter reflection coefficient is -23.6 dB using egg white and -32 dB in the renal artery. The prototype catheter was evaluated by in vitro experiments to validate the simulation. The procedure performed successfully with in vivo experiments involving the ablation of porcine renal arteries. The pathological evaluation confirmed that a large area of the perivascular tissue was ablated (> 5 mm) in a single quadrant without significant damage to the renal artery. Our proposed device allows for control of the ablation position and produces deep nerve ablation without overheating the intima or surrounding blood, suggesting a highly capable new denervation catheter.
Ground penetrating radar (GPR) has the advantage of non-destructively and quickly inspecting internal structures such as voids and buried pipes under roads. However, it is necessary to estimate the internal structures from the GPR images. Recently, recognition and detection methods for GPR images using deep learning have been studied. This paper examines a data augmentation method using a cutout method necessary to estimate GPR images with deep learning accurately. We find that the cutout augmentation exhibits higher detection rates for all objects used in this study than a commonly used horizontal shift augmentation.
Akira KAWAHARA Jun SHIBAYAMA Kazuhiro FUJITA Junji YAMAUCHI Hisamatsu NAKANO
Numerical dispersion property is investigated for the finite-difference time-domain (FDTD) method based on the iterated Crank-Nicolson (ICN) scheme. The numerical dispersion relation is newly derived from the amplification matrix and its property is discussed with attention to the eigenvalue of the matrix. It is shown that the ICN-FDTD method is conditionally stable but slightly dissipative.
Ryo KUMAGAI Ryosuke SUGA Tomoki UWANO
In this paper, a single-layer circular polarizer for linear polarized horn antenna is proposed. The multiple reflected waves between the aperture and array provide desired phase differences between vertical and horizontal polarizations. The measured gain of the fabricated antenna is 14.4 dBic and the half power beamwidths of the vertical polarization are 28 and 24 deg. and those of the horizontal polarization are 31 and 23 degrees in the vertical and horizontal planes. The polarizer has a low impact on the gain and beamwidth of the primary horn antenna and their changes are within 1.7 dB and 10 degrees. The 3 dB fractional bandwidth of the axial ratio is measured to be 1.4%.
Fan LIU Zhewang MA Masataka OHIRA Dongchun QIAO Guosheng PU Masaru ICHIKAWA
In this paper, a precise design method of high-order bandpass filters (BPFs) with complicated coupling topologies is proposed, and is demonstrated through the design of an 11-pole BPF using TM010 mode dielectric resonators (DRs). A novel Z-shaped coupling structure is proposed which avoids the mixed use of TM010 and TM01δ modes and enables the tuning and assembling of the filter much easier. The coupling topology of the BPF includes three cascade triplets (CTs) of DRs, and both the capacitive and inductive couplings in the CTs are designed independently tunable, which produce consequently three controllable transmission zeros on both sides of the passband of filter. A procedure of mapping the coupling matrix of BPF to its physical dimensions is developed, and an iterative optimization of these physical dimensions is implemented to achieve best performance. The design of the 11-pole BPF is shown highly precise by the excellent agreement between the electromagnetic simulated response of the filter and the desired target specifications.
A domain decomposition method is widely utilized for analyzing large-scale electromagnetic problems. The method decomposes the target model into small independent subdomains. An electromagnetic analysis has inherently suffers from late convergence analyzed with iterative algorithms such as Krylov subspace algorithms. The DDM remedies this issue by decomposing the total system into subdomain problems and gathering the local results as an interface problem to adjust to achieve the total solution. In this paper we report the convergence properties of the domain decomposition method while modifying the size of local domain and the region shape on several mesh sizes. As experimental results show, the convergence speed depends on the number of interface problem variables and the selection of the local region shapes. In addition to that the convergence property differs according to the target frequencies. In general it is demonstrated that the convergence speed can be accelerated with large cubic subdomain shape. We propose the subdomain selection strategies based on the analysis of the condition numbers of the governing equation.
Haonan CHEN Akito IGUCHI Yasuhide TSUJI
In order to calculate photonic devices with slowly varying waveguide structure along propagation direction, we develop finite element beam propagation method (FE-BPM) with coordinate transformation. In this approach, converting a longitudinally varying waveguide into the equivalent straight waveguide, cumbersome processes in FE-BPM, such as mesh updating and field interpolation processes at each propagation step, can be avoided. We employ this simulation technique in shape optimization of photonic devices and show design examples of mode converter. To show the validity of this approach, the calculated results of designed devices are compared with the finite element method (FEM) or the standard FE-BPM.
Hyunuk AHN Akito IGUCHI Keita MORIMOTO Yasuhide TSUJI
We develop new 3D full vectorial finite element bidirectional beam propagation method (3DFV-BiBPM) in order to handle the nonradiative dielectric waveguide (NRD guide) components where waveguide profile varies in the direction perpendicular to the parallel metal plates. The BiBPM is one of the transfer-matrix-based methods where only transverse cross sections have to be discretized using the finite difference or the finite element scheme, and it can treat backward and multiple reflections as opposed to the standard BPM. An NRD guide with air-gap and a filter with a sapphire resonator are numerically analyze considering dielectric losses to investigate the validity of our approach.
We report on a method for reconstructing the spectrum of incident light from a single image captured by a snapshot multispectral camera. The camera has a dielectric multilayer multispectral filter array (MSFA) integrated onto a CMOS image sensor. Sparse estimation algorithm was applied to reconstruct the spectrum. Using Gaussian functions with various bandwidths and central wavelengths as the bases matrix, the algorithm has been shown to be highly accurate for estimating the spectra of both narrowband monochromatic and broadband fluorescent light emitting diodes (LEDs), regardless of the wavelength band.
In satellite positioning, both the reception of ranging signals and the acquisition of navigation messages are necessary. In general, the acquisition of navigation messages does not always require the reception of radiowaves; however, when radiowaves are used for acquisition, a period of continuous reception significantly longer than one second is required. The European satellite positioning system, Galileo, started broadcasting new navigation messages from August 2022. The improvement is based on a secondary synchronization pattern, secondary forward error correction, and reduced ephemeris to aid in the rapid recovery from interruptions in message acquisition caused by temporary deterioration in radio reception. This paper evaluates the recovery characteristics from interruptions in navigation message acquisition by moving reception of this improved I/NAV navigation message.
Yi CHENG Kexin LI Chunbo XIU Jiaxin LIU
In modern radar systems, the Generalized compound distribution model is more suitable for describing the amplitude distribution characteristics of radar sea clutter. Accurately and efficiently simulating sea clutter has important practical significance for radar signal processing and sea surface target detection. However, in traditional zero memory nonlinearity (ZMNL) method, the correlated Generalized compound distribution model cannot deal with non-integral or non-semi-integral parameter. In order to overcome this shortcoming, a new method of generating correlated Generalized compound distributed clutter is proposed, which changes the generation method of Generalized Gamma distributed random sequences in traditional Generalized compound distribution models. Firstly, by combining with the Gamma distribution and using the additivity of the Gamma distribution, the Probability Density Function (PDF) of Gamma function is transformed into a second-order nonlinear ordinary differential equation, and the Gamma distributed sequence under arbitrary parameter is solved. Then the Generalized Gamma distributed sequence with arbitrary parameter can be obtained through the nonlinear transformation relationship between the Generalized Gamma distribution and the Gamma distribution, so that the shape parameters of the Generalized compound distributed sea clutter are extended to general real numbers. Simulation results show that the proposed method is not only suitable for clutter simulation with non-integral or non-semi-integral shape parameter values, but also further improves the fitting degree.