Lihan TONG Weijia LI Qingxia YANG Liyuan CHEN Peng CHEN
We present Ksformer, utilizing Multi-scale Key-select Routing Attention (MKRA) for intelligent selection of key areas through multi-channel, multi-scale windows with a top-k operator, and Lightweight Frequency Processing Module (LFPM) to enhance high-frequency features, outperforming other dehazing methods in tests.
Shuoyan LIU Chao LI Yuxin LIU Yanqiu WANG
Escalators are an indispensable facility in public places. While they can provide convenience to people, abnormal accidents can lead to serious consequences. Yolo is a function that detects human behavior in real time. However, the model exhibits low accuracy and a high miss rate for small targets. To this end, this paper proposes the Small Target High Performance YOLO (SH-YOLO) model to detect abnormal behavior in escalators. The SH-YOLO model first enhances the backbone network through attention mechanisms. Subsequently, a small target detection layer is incorporated in order to enhance detection of key points for small objects. Finally, the conv and the SPPF are replaced with a Region Dynamic Perception Depth Separable Conv (DR-DP-Conv) and Atrous Spatial Pyramid Pooling (ASPP), respectively. The experimental results demonstrate that the proposed model is capable of accurately and robustly detecting anomalies in the real-world escalator scene.
Congcong FANG Yun JIN Guanlin CHEN Yunfan ZHANG Shidang LI Yong MA Yue XIE
Currently, an increasing number of tasks in speech emotion recognition rely on the analysis of both speech and text features. However, there remains a paucity of research exploring the potential of leveraging large language models like GPT-3 to enhance emotion recognition. In this investigation, we harness the power of the GPT-3 model to extract semantic information from transcribed texts, generating text modal features with a dimensionality of 1536. Subsequently, we perform feature fusion, combining the 1536-dimensional text features with 1188-dimensional acoustic features to yield comprehensive multi-modal recognition outcomes. Our findings reveal that the proposed method achieves a weighted accuracy of 79.62% across the four emotion categories in IEMOCAP, underscoring the considerable enhancement in emotion recognition accuracy facilitated by integrating large language models.
Takahito YOSHIDA Takaharu YAGUCHI Takashi MATSUBARA
Accurately simulating physical systems is essential in various fields. In recent years, deep learning has been used to automatically build models of such systems by learning from data. One such method is the neural ordinary differential equation (neural ODE), which treats the output of a neural network as the time derivative of the system states. However, while this and related methods have shown promise, their training strategies still require further development. Inspired by error analysis techniques in numerical analysis while replacing numerical errors with modeling errors, we propose the error-analytic strategy to address this issue. Therefore, our strategy can capture long-term errors and thus improve the accuracy of long-term predictions.
White-box cryptographic implementations often use masking and shuffling as countermeasures against key extraction attacks. To counter these defenses, higher-order Differential Computation Analysis (HO-DCA) and its variants have been developed. These methods aim to breach these countermeasures without needing reverse engineering. However, these non-invasive attacks are expensive and can be thwarted by updating the masking and shuffling techniques. This paper introduces a simple binary injection attack, aptly named clear & return, designed to bypass advanced masking and shuffling defenses employed in white-box cryptography. The attack involves injecting a small amount of assembly code, which effectively disables run-time random sources. This loss of randomness exposes the unprotected lookup value within white-box implementations, making them vulnerable to simple statistical analysis. In experiments targeting open-source white-box cryptographic implementations, the attack strategy of hijacking entries in the Global Offset Table (GOT) or function calls shows effectiveness in circumventing run-time countermeasures.
Keitaro NAKASAI Shin KOMEDA Masateru TSUNODA Masayuki KASHIMA
To automatically measure the mental workload of developers, existing studies have used biometric measures such as brain waves and the heart rate. However, developers are often required to equip certain devices when measuring them, and can therefore be physically burdened. In this study, we evaluated the feasibility of non-contact biometric measures based on the nasal skin temperature (NST). In the experiment, the proposed biometric measures were more accurate than non-biometric measures.
Nan WU Xiaocong LAI Mei CHEN Ying PAN
With the development of the Semantic Web, an increasing number of researchers are utilizing ontology technology to construct domain ontology. Since there is no unified construction standard, ontology heterogeneity occurs. The ontology matching method can fuse heterogeneous ontologies, which realizes the interoperability between knowledge and associates to more relevant semantic information. In the case of differences between ontologies, how to reduce false matching and unsuccessful matching is a critical problem to be solved. Moreover, as the number of ontologies increases, the semantic relationship between ontologies becomes increasingly complex. Nevertheless, the current methods that solely find the similarity of names between concepts are no longer sufficient. Consequently, this paper proposes an ontology matching method based on semantic association. Accurate matching pairs are discovered by existing semantic knowledge, and then the potential semantic associations between concepts are mined according to the characteristics of the contextual structure. The matching method can better carry out matching work based on reliable knowledge. In addition, this paper introduces a probabilistic logic repair method, which can detect and repair the conflict of matching results, to enhance the availability and reliability of matching results. The experimental results show that the proposed method effectively improves the quality of matching between ontologies and saves time on repairing incorrect matching pairs. Besides, compared with the existing ontology matching systems, the proposed method has better stability.
Multi-focus image fusion involves combining partially focused images of the same scene to create an all-in-focus image. Aiming at the problems of existing multi-focus image fusion algorithms that the benchmark image is difficult to obtain and the convolutional neural network focuses too much on the local region, a fusion algorithm that combines local and global feature encoding is proposed. Initially, we devise two self-supervised image reconstruction tasks and train an encoder-decoder network through multi-task learning. Subsequently, within the encoder, we merge the dense connection module with the PS-ViT module, enabling the network to utilize local and global information during feature extraction. Finally, to enhance the overall efficiency of the model, distinct loss functions are applied to each task. To preserve the more robust features from the original images, spatial frequency is employed during the fusion stage to obtain the feature map of the fused image. Experimental results demonstrate that, in comparison to twelve other prominent algorithms, our method exhibits good fusion performance in objective evaluation. Ten of the selected twelve evaluation metrics show an improvement of more than 0.28%. Additionally, it presents superior visual effects subjectively.
Modern memory devices such as DRAM are prone to errors that occur because of unintended bit flips during their operation. Since memory errors severely impact in-memory key-value stores (KVSes), software mechanisms for hardening them against memory errors are being explored. However, it is hard to efficiently test the memory error handling code due to its characteristics: the code is event-driven, the handlers depend on the memory object, and in-memory KVSes manage various objects in huge memory space. This paper presents MemFI that supports runtime tests for the memory error handlers of in-memory KVSes. Our approach performs the software fault injection of memory errors at the memory object level to trigger the target handler while smoothly carrying out tests on the same running state. To show the effectiveness of MemFI, we integrate error handling mechanisms into a real-world in-memory KVS, memcached 1.6.9 and Redis 6.2.7, and check their behavior using the MemFI prototypes. The results show that the MemFI-based runtime test allows us to check the behavior of the error handling mechanisms. We also show its efficiency by comparing it to other fault injection approaches based on a trial model.
Yuan LI Tingting HU Ryuji FUCHIKAMI Takeshi IKENAGA
1 millisecond (1-ms) vision systems are gaining increasing attention in diverse fields like factory automation and robotics, as the ultra-low delay ensures seamless and timely responses. Superpixel segmentation is a pivotal preprocessing to reduce the number of image primitives for subsequent processing. Recently, there has been a growing emphasis on leveraging deep network-based algorithms to pursue superior performance and better integration into other deep network tasks. Superpixel Sampling Network (SSN) employs a deep network for feature generation and employs differentiable SLIC for superpixel generation. SSN achieves high performance with a small number of parameters. However, implementing SSN on FPGAs for ultra-low delay faces challenges due to the final layer’s aggregation of intermediate results. To address this limitation, this paper proposes an aggregated to pipelined structure for FPGA implementation. The final layer is decomposed into individual final layers for each intermediate result. This architectural adjustment eliminates the need for memory to store intermediate results. Concurrently, the proposed structure leverages decomposed layers to facilitate a pipelined structure with pixel streaming input to achieve ultra-low latency. To cooperate with the pipelined structure, layer-partitioned memory architecture is proposed. Each final layer has dedicated memory for storing superpixel center information, allowing values to be read and calculated from memory without conflicts. Calculation results of each final layer are accumulated, and the result of each pixel is obtained as the stream reaches the last layer. Evaluation results demonstrate that boundary recall and under-segmentation error remain comparable to SSN, with an average label consistency improvement of 0.035 over SSN. From a hardware performance perspective, the proposed system processes 1000 FPS images with a delay of 0.947 ms/frame.
This article focuses on improving the BiSeNet v2 bilateral branch image segmentation network structure, enhancing its learning ability for spatial details and overall image segmentation accuracy. A modified network called “BiconvNet” is proposed. Firstly, to extract shallow spatial details more effectively, a parallel concatenated strip and dilated (PCSD) convolution module is proposed and used to extract local features and surrounding contextual features in the detail branch. Continuing on, the semantic branch is reconstructed using the lightweight capability of depth separable convolution and high performance of ConvNet, in order to enable more efficient learning of deep advanced semantic features. Finally, fine-tuning is performed on the bilateral guidance aggregation layer of BiSeNet v2, enabling better fusion of the feature maps output by the detail branch and semantic branch. The experimental part discusses the contribution of stripe convolution and different sizes of empty convolution to image segmentation accuracy, and compares them with common convolutions such as Conv2d convolution, CG convolution and CCA convolution. The experiment proves that the PCSD convolution module proposed in this paper has the highest segmentation accuracy in all categories of the Cityscapes dataset compared with common convolutions. BiConvNet achieved a 9.39% accuracy improvement over the BiSeNet v2 network, with only a slight increase of 1.18M in model parameters. A mIoU accuracy of 68.75% was achieved on the validation set. Furthermore, through comparative experiments with commonly used autonomous driving image segmentation algorithms in recent years, BiConvNet demonstrates strong competitive advantages in segmentation accuracy on the Cityscapes and BDD100K datasets.
Shohei MATSUHARA Kazuyuki SAITO Tomoyuki TAJIMA Aditya RAKHMADI Yoshiki WATANABE Nobuyoshi TAKESHITA
Renal Denervation (RDN) has been developed as a potential treatment for hypertension that is resistant to traditional antihypertensive medication. This technique involves the ablation of nerve fibers around the renal artery from inside the blood vessel, which is intended to suppress sympathetic nerve activity and result in an antihypertensive effect. Currently, clinical investigation is underway to evaluate the effectiveness of RDN in treating treatment-resistant hypertension. Although radio frequency (RF) ablation catheters are commonly used, their heating capacity is limited. Microwave catheters are being considered as another option for RDN. We aim to solve the technical challenges of applying microwave catheters to RDN. In this paper, we designed a catheter with a helix structure and a microwave (2.45 GHz) antenna. The antenna is a coaxial slot antenna, the dimensions of which were determined by optimizing the reflection coefficient through simulation. The measured catheter reflection coefficient is -23.6 dB using egg white and -32 dB in the renal artery. The prototype catheter was evaluated by in vitro experiments to validate the simulation. The procedure performed successfully with in vivo experiments involving the ablation of porcine renal arteries. The pathological evaluation confirmed that a large area of the perivascular tissue was ablated (> 5 mm) in a single quadrant without significant damage to the renal artery. Our proposed device allows for control of the ablation position and produces deep nerve ablation without overheating the intima or surrounding blood, suggesting a highly capable new denervation catheter.
Kaiji OWAKI Yusuke KANDA Hideaki KIMURA
In recent years, the declining birthrate and aging population have become serious problems in Japan. To solve these problems, we have developed a system based on edge AI. This system predicts the future heart rate during walking in real time and provides feedback to improve the quality of exercise and extend healthy life expectancy. In this paper, we predicted the heart rate in real time based on the proposed system and provided feedback. Experiments were conducted without and with the predicted heart rate, and a comparison was made to demonstrate the effectiveness of the predicted heart rate.
Ground penetrating radar (GPR) has the advantage of non-destructively and quickly inspecting internal structures such as voids and buried pipes under roads. However, it is necessary to estimate the internal structures from the GPR images. Recently, recognition and detection methods for GPR images using deep learning have been studied. This paper examines a data augmentation method using a cutout method necessary to estimate GPR images with deep learning accurately. We find that the cutout augmentation exhibits higher detection rates for all objects used in this study than a commonly used horizontal shift augmentation.
Akira KAWAHARA Jun SHIBAYAMA Kazuhiro FUJITA Junji YAMAUCHI Hisamatsu NAKANO
Numerical dispersion property is investigated for the finite-difference time-domain (FDTD) method based on the iterated Crank-Nicolson (ICN) scheme. The numerical dispersion relation is newly derived from the amplification matrix and its property is discussed with attention to the eigenvalue of the matrix. It is shown that the ICN-FDTD method is conditionally stable but slightly dissipative.
Kensei ITAYA Ryosuke OZAKI Tsuneki YAMASAKI
In this paper, we propose the transient analysis technique to analyze the multilayered dispersive media by using a combination of fast inversion Laplace transform (FILT) and the continued fraction expanded methods. Numerical results are given by the reflection response, inside-time response waveforms, and electric field distributions of the reflection component. Further, we verify the calculation accuracy of FILT method for the two types using a convergence test.
Seiya KISHIMOTO Ryoya OGINO Kenta ARASE Shinichiro OHNUKI
This paper introduces a computational approach for transient analysis of extensive scattering problems. This novel method is based on the combination of physical optics (PO) and the fast inverse Laplace transform (FILT). PO is a technique for analyzing electromagnetic scattering from large-scale objects. We modify PO for application in the complex frequency domain, where the scattered fields are evaluated. The complex frequency function is efficiently transformed into the time domain using FILT. The effectiveness of this combination is demonstrated through large-scale analysis and transient response for a short pulse incidence. The accuracy is investigated and validated by comparison with reference solutions.
Ryo KUMAGAI Ryosuke SUGA Tomoki UWANO
In this paper, a single-layer circular polarizer for linear polarized horn antenna is proposed. The multiple reflected waves between the aperture and array provide desired phase differences between vertical and horizontal polarizations. The measured gain of the fabricated antenna is 14.4 dBic and the half power beamwidths of the vertical polarization are 28 and 24 deg. and those of the horizontal polarization are 31 and 23 degrees in the vertical and horizontal planes. The polarizer has a low impact on the gain and beamwidth of the primary horn antenna and their changes are within 1.7 dB and 10 degrees. The 3 dB fractional bandwidth of the axial ratio is measured to be 1.4%.
A domain decomposition method is widely utilized for analyzing large-scale electromagnetic problems. The method decomposes the target model into small independent subdomains. An electromagnetic analysis has inherently suffers from late convergence analyzed with iterative algorithms such as Krylov subspace algorithms. The DDM remedies this issue by decomposing the total system into subdomain problems and gathering the local results as an interface problem to adjust to achieve the total solution. In this paper we report the convergence properties of the domain decomposition method while modifying the size of local domain and the region shape on several mesh sizes. As experimental results show, the convergence speed depends on the number of interface problem variables and the selection of the local region shapes. In addition to that the convergence property differs according to the target frequencies. In general it is demonstrated that the convergence speed can be accelerated with large cubic subdomain shape. We propose the subdomain selection strategies based on the analysis of the condition numbers of the governing equation.
Haonan CHEN Akito IGUCHI Yasuhide TSUJI
In order to calculate photonic devices with slowly varying waveguide structure along propagation direction, we develop finite element beam propagation method (FE-BPM) with coordinate transformation. In this approach, converting a longitudinally varying waveguide into the equivalent straight waveguide, cumbersome processes in FE-BPM, such as mesh updating and field interpolation processes at each propagation step, can be avoided. We employ this simulation technique in shape optimization of photonic devices and show design examples of mode converter. To show the validity of this approach, the calculated results of designed devices are compared with the finite element method (FEM) or the standard FE-BPM.