Lihan TONG Weijia LI Qingxia YANG Liyuan CHEN Peng CHEN
We present Ksformer, utilizing Multi-scale Key-select Routing Attention (MKRA) for intelligent selection of key areas through multi-channel, multi-scale windows with a top-k operator, and Lightweight Frequency Processing Module (LFPM) to enhance high-frequency features, outperforming other dehazing methods in tests.
Congcong FANG Yun JIN Guanlin CHEN Yunfan ZHANG Shidang LI Yong MA Yue XIE
Currently, an increasing number of tasks in speech emotion recognition rely on the analysis of both speech and text features. However, there remains a paucity of research exploring the potential of leveraging large language models like GPT-3 to enhance emotion recognition. In this investigation, we harness the power of the GPT-3 model to extract semantic information from transcribed texts, generating text modal features with a dimensionality of 1536. Subsequently, we perform feature fusion, combining the 1536-dimensional text features with 1188-dimensional acoustic features to yield comprehensive multi-modal recognition outcomes. Our findings reveal that the proposed method achieves a weighted accuracy of 79.62% across the four emotion categories in IEMOCAP, underscoring the considerable enhancement in emotion recognition accuracy facilitated by integrating large language models.
Modern memory devices such as DRAM are prone to errors that occur because of unintended bit flips during their operation. Since memory errors severely impact in-memory key-value stores (KVSes), software mechanisms for hardening them against memory errors are being explored. However, it is hard to efficiently test the memory error handling code due to its characteristics: the code is event-driven, the handlers depend on the memory object, and in-memory KVSes manage various objects in huge memory space. This paper presents MemFI that supports runtime tests for the memory error handlers of in-memory KVSes. Our approach performs the software fault injection of memory errors at the memory object level to trigger the target handler while smoothly carrying out tests on the same running state. To show the effectiveness of MemFI, we integrate error handling mechanisms into a real-world in-memory KVS, memcached 1.6.9 and Redis 6.2.7, and check their behavior using the MemFI prototypes. The results show that the MemFI-based runtime test allows us to check the behavior of the error handling mechanisms. We also show its efficiency by comparing it to other fault injection approaches based on a trial model.
This article focuses on improving the BiSeNet v2 bilateral branch image segmentation network structure, enhancing its learning ability for spatial details and overall image segmentation accuracy. A modified network called “BiconvNet” is proposed. Firstly, to extract shallow spatial details more effectively, a parallel concatenated strip and dilated (PCSD) convolution module is proposed and used to extract local features and surrounding contextual features in the detail branch. Continuing on, the semantic branch is reconstructed using the lightweight capability of depth separable convolution and high performance of ConvNet, in order to enable more efficient learning of deep advanced semantic features. Finally, fine-tuning is performed on the bilateral guidance aggregation layer of BiSeNet v2, enabling better fusion of the feature maps output by the detail branch and semantic branch. The experimental part discusses the contribution of stripe convolution and different sizes of empty convolution to image segmentation accuracy, and compares them with common convolutions such as Conv2d convolution, CG convolution and CCA convolution. The experiment proves that the PCSD convolution module proposed in this paper has the highest segmentation accuracy in all categories of the Cityscapes dataset compared with common convolutions. BiConvNet achieved a 9.39% accuracy improvement over the BiSeNet v2 network, with only a slight increase of 1.18M in model parameters. A mIoU accuracy of 68.75% was achieved on the validation set. Furthermore, through comparative experiments with commonly used autonomous driving image segmentation algorithms in recent years, BiConvNet demonstrates strong competitive advantages in segmentation accuracy on the Cityscapes and BDD100K datasets.
Fan LIU Zhewang MA Masataka OHIRA Dongchun QIAO Guosheng PU Masaru ICHIKAWA
In this paper, a precise design method of high-order bandpass filters (BPFs) with complicated coupling topologies is proposed, and is demonstrated through the design of an 11-pole BPF using TM010 mode dielectric resonators (DRs). A novel Z-shaped coupling structure is proposed which avoids the mixed use of TM010 and TM01δ modes and enables the tuning and assembling of the filter much easier. The coupling topology of the BPF includes three cascade triplets (CTs) of DRs, and both the capacitive and inductive couplings in the CTs are designed independently tunable, which produce consequently three controllable transmission zeros on both sides of the passband of filter. A procedure of mapping the coupling matrix of BPF to its physical dimensions is developed, and an iterative optimization of these physical dimensions is implemented to achieve best performance. The design of the 11-pole BPF is shown highly precise by the excellent agreement between the electromagnetic simulated response of the filter and the desired target specifications.
Yi CHENG Kexin LI Chunbo XIU Jiaxin LIU
In modern radar systems, the Generalized compound distribution model is more suitable for describing the amplitude distribution characteristics of radar sea clutter. Accurately and efficiently simulating sea clutter has important practical significance for radar signal processing and sea surface target detection. However, in traditional zero memory nonlinearity (ZMNL) method, the correlated Generalized compound distribution model cannot deal with non-integral or non-semi-integral parameter. In order to overcome this shortcoming, a new method of generating correlated Generalized compound distributed clutter is proposed, which changes the generation method of Generalized Gamma distributed random sequences in traditional Generalized compound distribution models. Firstly, by combining with the Gamma distribution and using the additivity of the Gamma distribution, the Probability Density Function (PDF) of Gamma function is transformed into a second-order nonlinear ordinary differential equation, and the Gamma distributed sequence under arbitrary parameter is solved. Then the Generalized Gamma distributed sequence with arbitrary parameter can be obtained through the nonlinear transformation relationship between the Generalized Gamma distribution and the Gamma distribution, so that the shape parameters of the Generalized compound distributed sea clutter are extended to general real numbers. Simulation results show that the proposed method is not only suitable for clutter simulation with non-integral or non-semi-integral shape parameter values, but also further improves the fitting degree.
Anoop A Christo K. THOMAS Kala S
In this paper, a novel Enhanced Spatial Modulation-based Orthogonal Time Frequency Space (ESM-OTFS) is proposed to maximize the benefits of enhanced spatial modulation (ESM) and orthogonal time frequency space (OTFS) transmission. The primary objective of this novel modulation is to enhance transmission reliability, meeting the demanding requirements of high transmission rates and rapid data transfer in future wireless communication systems. The paper initially outlines the system model and specific signal processing techniques employed in ESM-OTFS. Furthermore, a novel detector based on sparse signal estimation is presented specifically for ESM-OTFS. The sparse signal estimation is performed using a fully factorized posterior approximation using Variational Bayesian Inference that leads to a low complexity solution without any matrix inversions. Simulation results indicate that ESM-OTFS surpasses traditional spatial modulation-based OTFS, and the newly introduced detection algorithm outperforms other linear detection methods.
Akira SAITOU Kaito UCHIDA Kanki KITAYAMA Ryo ISHIKAWA Kazuhiko HONJO
Analytical expression of transmission for the orbital angular momentum (OAM) communication using loop antenna arrays and paraboloids is derived to achieve a communication distance of 100 m. With the field distribution of the single “transformed OAM mode” radiated by a loop antenna, the collimated field by the transmitting paraboloid and its diffracted field are analytically derived. Effects of frequencies, sizes of paraboloids, and shifts of transmitting and receiving arrays from the focal planes are included. With the diffracted field distribution on the focal plane of the receiving paraboloid, transmission between the transmitting and receiving loop antennas is analytically estimated. It is shown that the transmission between the antennas with different OAM modes is null, but the transmission between the antennas with the same mode can be reduced. To clarify the mechanism of the reduction, factors of the reduction are quantitatively defined, and the explicit formulae are derived. Based on the analytical results, numerical estimation for a communication distance of 100 m is demonstrated, where the frequency, the focal length, and the size of the paraboloid are 150 GHz, 50 cm and 100 cm, respectively. Where both arrays are located on each focal plane, the transmission for the signal is more than -7.78 dB for eight kinds of OAM modes. The transmission is the least for the highest-order mode. The transmission loss is shown to be mitigated by optimizing the shifts of transmitting and receiving arrays from their focal planes. The loss is made almost even by exploiting the tradeoff of the improvement for the mode orders. The transmission is improved by 5.98 dB, to be more than -1.80 dB, by optimizing the shifts of the arrays.
Tomoya MATSUDA Koji NISHIMURA Hiroyuki HASHIGUCHI
Phased-array technology is primarily employed in atmospheric and wind profiling radars for meteorological remote sensing. As a novel avenue of advancement in phased-array technology, the Multiple-Input Multiple-Output (MIMO) technique, originally developed for communication systems, has been applied to radar systems. A MIMO radar system can be used to create a virtual receive antenna aperture plane with transmission freedom. The MIMO technique requires orthogonal waveforms on each transmitter to identify the transmit signals using multiple receivers; various methods have been developed to realize the orthogonality. In this study, we focus on the Doppler Division Multiple Access (DDMA) MIMO technique by using slightly different frequencies for the transmit waveforms, which can be separated by different receivers in the Doppler frequency domain. The Middle and Upper atmosphere (MU) radar is a VHF-band phased array atmospheric radar with multi-channel receivers. Additional configurations are necessary, requiring the inclusion of multi-channel transmitters to enable its operation as a MIMO radar. In this study, a comparison between the brightness distribution of the beamformer, utilizing echoes reflected from the moon, and the antenna pattern obtained through calculations revealed a high degree of consistency, which means that the MU radar functions effectively as a MIMO radar. Furthermore, it is demonstrated that the simultaneous application of MIMO and Capon techniques has a mutually enhancing effect.
Jiaxin WU Bing LI Li ZHAO Xinzhou XU
The task of Speech Emotion Detection (SED) aims at judging positive class and negetive class when the speaker expresses emotions. The SED performances are heavily dependent on the diversity and prominence of emotional features extracted from the speech. However, most of the existing related research focuses on investigating the effects of single feature source and hand-crafted features. Thus, we propose a SED approach using multi-source low-level information based recurrent branches. The fusion multi-source low-level information obtain variety and discriminative representations from speech emotion signals. In addition, focal-loss function benifit for imbalance classes, resulting in reducing the proportion of well-classified samples and increasing the weights for difficult samples on SED tasks. Experiments on IEMOCAP corpus demonstrate the effectiveness of the proposed method. Compared with the baselines, MSIR achieve the significant performance improvements in terms of Unweighted Average Recall and F1-score.
Daisuke ISHII Takanori HARA Kenichi HIGUCHI
In this paper, we investigate a method for clustering user equipment (UE)-specific transmission access points (APs) in downlink cell-free multiple-input multiple-output (MIMO) assuming that the APs distributed over the system coverage know only part of the instantaneous channel state information (CSI). As a beamforming (BF) method based on partial CSI, we use a layered partially non-orthogonal zero-forcing (ZF) method based on channel matrix muting, which is applicable to the case where different transmitting AP groups are selected for each UE under partial CSI conditions. We propose two AP clustering methods. Both proposed methods first tentatively determine the transmitting APs independently for each UE and then iteratively update the transmitting APs for each UE based on the estimated throughput considering the interference among the UEs. One of the two proposed methods introduces a UE cluster for each UE into the iterative updates of the transmitting APs to balance throughput performance and scalability. Computer simulations show that the proposed methods achieve higher geometric-mean and worst user throughput than those for the conventional methods.
Jun SAITO Nobuhide NONAKA Kenichi HIGUCHI
We propose a novel peak-to-average power ratio (PAPR) reduction method based on a peak cancellation (PC) signal vector that considers the variance in the average signal power among transmitter antennas for massive multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) signals using the null space in a MIMO channel. First, we discuss the conditions under which the PC signal vector achieves a sufficient PAPR reduction effect after its projection onto the null space of the MIMO channel. The discussion reveals that the magnitude of the correlation between the PC signal vector before projection and the transmission signal vector should be as low as possible. Based on this observation and the fact that to reduce the PAPR it is helpful to suppress the variation in the transmission signal power among antennas, which may be enhanced by beamforming (BF), we propose a novel method for generating a PC signal vector. The proposed PC signal vector is designed so that the signal power levels of all the transmitter antennas are limited to be between the maximum and minimum power threshold levels at the target timing. The newly introduced feature in the proposed method, i.e., increasing the signal power to be above the minimum power threshold, contributes to suppressing the transmission signal power variance among antennas and to improving the PAPR reduction capability after projecting the PC signal onto the null space in the MIMO channel. This is because the proposed method decreases the magnitude of the correlation between the PC signal vectors before its projection and the transmission signal vectors. Based on computer simulation results, we show that the PAPR reduction performance of the proposed method is improved compared to that for the conventional method and the proposed method reduces the computational complexity compared to that for the conventional method for achieving the same target PAPR.
Takuya SAKAMOTO Itsuki IWATA Toshiki MINAMI Takuya MATSUMOTO
There has been a growing interest in the application of radar technology to the monitoring of humans and animals and their positions, motions, activities, and vital signs. Radar can be used, for example, to remotely measure vital signs such as respiration and heartbeat without contact. Radar-based human sensing is expected to be adopted in a variety of fields, such as medicine, healthcare, and entertainment, but what can be realized by radar-based animal sensing? This paper reviews the latest research trends in the noncontact sensing of animals using radar systems. We also present examples of our past radar experiments for the respiratory measurement of monkeys and the heartbeat measurement of chimpanzees. The trends in this field are reviewed in terms of the target animal species, type of vital sign, and radar type and selection of frequencies.
Koji YAMANAKA Kazuhiro IYOMASA Takumi SUGITANI Eigo KUWATA Shintaro SHINJO
GaN solid state power amplifiers (SSPA) for wireless power transfer and microwave heating have been reviewed. For wireless power transfer, 9 W output power with 79% power added efficiency at 5.8 GHz has been achieved. For microwave heating, 450 W output power with 70% drain efficiency at 2.45 GHz has been achieved. Microwave power concentration and uniform microwave heating by phase control of multiple SSPAs are demonstrated.
Katsumi KAWAI Naoki SHINOHARA Tomohiko MITANI
This study introduces a novel single-diode rectenna, enhancing the rf-dc conversion efficiency using harmonic control of the antenna impedance. We employ source-pull simulations encompassing the fundamental frequency and the harmonics to achieve a highly efficient rectenna. The results of the source-pull simulations delineate the source-impedance ranges required for enhanced efficiency at each harmonic. Based on the source-pull simulation results, we designed two inverted-F antenna with input impedances within and without these identified source impedance ranges. Experimental results show that the proposed rectenna has a maximum rf-dc conversion efficiency of 75.9% at the fundamental frequency of 920 MHz, an input power of 10.8 dBm, and a load resistance of 1 kΩ, which is higher than that of the comparative rectenna without harmonic control of the antenna impedance. This study demonstrates that the proposed rectenna achieves high efficiency through the direct connection of the antenna and the single diode, along with harmonic control of the antenna impedance.
Ting DING Jiandong ZHU Jing YANG Xingmeng JIANG Chengcheng LIU
Considering the non-convexity of hybrid precoding and the hardware constraints of practical systems, a hybrid precoding architecture, which combines limited-resolution overlapped phase shifter networks with lens array, is investigated. The analogy part is a beam selection network composed of overlapped low-resolution phase shifter networks. In particular, in the proposed hybrid precoding algorithm, the analog precoding improves array gain by utilizing the quantization beam alignment method, whereas the digital precoding schemes multiplexing gain by adopting a Wiener Filter precoding scheme with a minimum mean square error criterion. Finally, in the sparse scattering millimeter-wave channel for the uniform linear array, the proposed method is compared with the existing scheme by computer simulation by using the ideal channel state information and the non-ideal channel state information. It is concluded that the proposed scheme performs better in low signal-to-noise regions and can achieve a good compromise between system performance and hardware complexity.
Ayumu YAMADA Zhiyuan HUANG Naoko MISAWA Chihiro MATSUI Ken TAKEUCHI
In this work, fluctuation patterns of ReRAM current are classified automatically by proposed fluctuation pattern classifier (FPC). FPC is trained with artificially created dataset to overcome the difficulties of measured current signals, including the annotation cost and imbalanced data amount. Using FPC, fluctuation occurrence under different write conditions is analyzed for both HRS and LRS current. Based on the measurement and classification results, physical models of fluctuations are established.
Yuya ICHIKAWA Ayumu YAMADA Naoko MISAWA Chihiro MATSUI Ken TAKEUCHI
Integrating RGB and event sensors improves object detection accuracy, especially during the night, due to the high-dynamic range of event camera. However, introducing an event sensor leads to an increase in computational resources, which makes the implementation of RGB-event fusion multi-modal AI to CiM difficult. To tackle this issue, this paper proposes RGB-Event fusion Multi-modal analog Computation-in-Memory (CiM), called REM-CiM, for multi-modal edge object detection AI. In REM-CiM, two proposals about multi-modal AI algorithms and circuit implementation are co-designed. First, Memory capacity-Efficient Attentional Feature Pyramid Network (MEA-FPN), the model architecture for RGB-event fusion analog CiM, is proposed for parameter-efficient RGB-event fusion. Convolution-less bi-directional calibration (C-BDC) in MEA-FPN extracts important features of each modality with attention modules, while reducing the number of weight parameters by removing large convolutional operations from conventional BDC. Proposed MEA-FPN w/ C-BDC achieves a 76% reduction of parameters while maintaining mean Average Precision (mAP) degradation to < 2.3% during both day and night, compared with Attentional FPN fusion (A-FPN), a conventional BDC-adopted FPN fusion. Second, the low-bit quantization with clipping (LQC) is proposed to reduce area/energy. Proposed REM-CiM with MEA-FPN and LQC achieves almost the same memory cells, 21% less ADC area, 24% less ADC energy and 0.17% higher mAP than conventional FPN fusion CiM without LQC.
Fuyuki KIHARA Chihiro MATSUI Ken TAKEUCHI
In this work, we propose a 1T1R ReRAM CiM architecture for Hyperdimensional Computing (HDC). The number of Source Lines and Bit Lines is reduced by introducing memory cells that are connected in series, which is especially advantageous when using a 3D implementation. The results of CiM operations contain errors, but HDC is robust against them, so that even if the XNOR operation has an error of 25%, the inference accuracy remains above 90%.
Jiakai LI Jianyong DUAN Hao WANG Li HE Qing ZHANG
Chinese spelling correction is a foundational task in natural language processing that aims to detect and correct spelling errors in text. Most spelling corrections in Chinese used multimodal information to model the relationship between incorrect and correct characters. However, feature information mismatch occured during fusion result from the different sources of features, causing the importance relationships between different modalities to be ignored, which in turn restricted the model from learning in an efficient manner. To this end, this paper proposes a multimodal language model-based Chinese spelling corrector, named as MISpeller. The method, based on ChineseBERT as the basic model, allows the comprehensive capture and fusion of character semantic information, phonetic information and graphic information in a single model without the need to construct additional neural networks, and realises the phenomenon of unequal fusion of multi-feature information. In addition, in order to solve the overcorrection issues, the replication mechanism is further introduced, and the replication factor is used as the dynamic weight to efficiently fuse the multimodal information. The model is able to control the proportion of original characters and predicted characters according to different input texts, and it can learn more specifically where errors occur. Experiments conducted on the SIGHAN benchmark show that the proposed model achieves the state-of-the-art performance of the F1 score at the correction level by an average of 4.36%, which validates the effectiveness of the model.