Min GAO Gaohua CHEN Jiaxin GU Chunmei ZHANG
Wearing a mask correctly is an effective method to prevent respiratory infectious diseases. Correct mask use is a reliable approach for preventing contagious respiratory infections. However, when dealing with mask-wearing in some complex settings, the detection accuracy still needs to be enhanced. The technique for mask-wearing detection based on YOLOv7-Tiny is enhanced in this research. Distribution Shifting Convolutions (DSConv) based on YOLOv7-tiny are used instead of the 3×3 convolution in the original model to simplify computation and increase detection precision. To decrease the loss of coordinate regression and enhance the detection performance, we adopt the loss function Intersection over Union with Minimum Points Distance (MPDIoU) instead of Complete Intersection over Union (CIoU) in the original model. The model is introduced with the GSConv and VoVGSCSP modules, recognizing the model’s mobility. The P6 detection layer has been designed to increase detection precision for tiny targets in challenging environments and decrease missed and false positive detection rates. The robustness of the model is increased further by creating and marking a mask-wearing data set in a multi environment that uses Mixup and Mosaic technologies for data augmentation. The efficiency of the model is validated in this research using comparison and ablation experiments on the mask dataset. The results demonstrate that when compared to YOLOv7-tiny, the precision of the enhanced detection algorithm is improved by 5.4%, Recall by 1.8%, mAP@.5 by 3%, mAP@.5:.95 by 1.7%, while the FLOPs is decreased by 8.5G. Therefore, the improved detection algorithm realizes more real-time and accurate mask-wearing detection tasks.
Ryotaro MITSUBOSHI Kohei HATANO Eiji TAKIMOTO
Following the formulation of Support Vector Regression (SVR), we consider a regression analogue of soft margin optimization over the feature space indexed by a hypothesis class H. More specifically, the problem is to find a linear model w ∈ ℝH that minimizes the sum of ρ-insensitive losses over all training data for as small ρ as posssible, where the ρ-insensitive loss for a single data (xi, yi) is defined as max{|yi - ∑h whh(xi)| - ρ, 0}. Intuitively, the parameter ρ and the ρ-insensitive loss are defined analogously to the target margin and the hinge loss in soft margin optimization, respectively. The difference of our formulation from SVR is two-fold: (1) we consider L1-norm regularization instead of L2-norm regularization, and (2) the feature space is implicitly defined by a hypothesis class instead of a kernel. We propose a boosting-type algorithm for solving the problem with a theoretically guaranteed convergence rate under a natural assumption on the weak learnability.
Yongpeng HU Hang LI J. Andrew ZHANG Xiaojing HUANG Zhiqun CHENG
Analog beamforming with broadband large-scale antenna arrays in millimeter wave (mmWave) multiple input multiple output (MIMO) systems faces the beam squint problem. In this paper, we first investigate the sensitivity of analog beamforming to subarray spatial separations in wideband mmWave systems using hybrid arrays, and propose optimized subarray separations. We then design improved analog beamforming after phase compensation based on Zadoff-Chu (ZC) sequence to flatten the frequency response of radio frequency (RF) equivalent channel. Considering a single-carrier frequency-domain equalization (SC-FDE) scheme at the receiver, we derive low-complexity linear zero-forcing (ZF) and minimum mean squared error (MMSE) equalizers in terms of output signal-to-noise ratio (SNR) after equalization. Simulation results show that the improved analog beamforming can effectively remove frequency-selective deep fading caused by beam squint, and achieve better bit-error-rate performance compared with the conventional analog beamforming.
Xiaohu WANG Yubin DUAN Yi WEI Xinyuan CHEN Huang ZHUN Chaohui ZHAO
With the gradually increase of the application of new energy in microgrids, Electric Spring (ES), as a new type of distributed compensation power electronic device has been widely studied. The Generalized Electric Spring (G-ES) is an improved topology, and the space limitation problem in the traditional topology is solved. Because of the mode of G-ES use in the power grid, a reasonable solution to the voltage loss of the critical section feeder is needed. In this paper, the voltage balance equation based on the feedforward compensation coefficient is established, and a two cascade control strategy based on the equation is studied. The first stage of the two cascade control strategy is to use communication means to realize the allocation of feedforward compensation coefficients, and the second stage is to use the coefficients to realize feedforward fixed angle control. Simulation analysis shows that the proposed control strategy does not affect the control accuracy of the critical load (CL), and effectively improves the operational range of the G-ES.
Jinguang HAO Gang WANG Honggang WANG Lili WANG Xuefeng LIU
The existing literature focuses on the applications of fast filter bank due to its excellent frequency responses with low complexity. However, the topic is not addressed related to the general transfer function expressions of the corresponding subfilters for a specific channel. To do this, in this paper, general closed-form transfer function expressions for fast filter bank are derived. Firstly, the cascaded structure of fast filter bank is modelled by a binary tree, with which the index of the subfilter at each stage within the channel can be determined. Then the transfer functions for the two outputs of a subfilter are expressed in a unified form. Finally, the general closed-form transfer functions for the channel and its corresponding subfilters are obtained by variables replacement if the prototype lowpass filters for the stages are given. Analytical results and simulations verify the general expressions. With such closed-form expressions lend themselves easily to analysis and direct computation of the transfer functions and the frequency responses without the structure graph.
The performance of a fully wireless-power-transfer (WPT) node network, in which each node transfers (and receives) energy through a wireless channel when it has sufficient (and insufficient) energy in its battery, was theoretically analyzed. The lost job ratio (LJR), namely, is the ratio of (i) the amount of jobs that cannot be done due to battery of a node running out to (ii) the amount of jobs that should be done, is used as a performance metric. It describes the effect of the battery of each node running out and how much additional energy is needed. Although it is known that WPT can reduce the probability of the battery running out among a few nodes within a small area, the performance of a fully WPT network has not been clarified. By using stochastic geometry and first-passage-time analysis for a diffusion process, the expected LJR was theoretically derived. Numerical examples demonstrate that the key parameters determining the performance of the network are node density, threshold switching of statuses between “transferring energy” and “receiving energy,” and the parameters of power conversion. They also demonstrate the followings: (1) The mean energy stored in the node battery decreases in the networks because of the loss caused by WPT, and a fully WPT network cannot decrease the probability of the battery running out under the current WPT efficiency. (2) When the saturation value of power conversion increases, a fully WPT network can decrease the probability of the battery running out although the mean energy stored in the node battery still decreases in the networks. This result is explained by the fact that the variance of stored energy in each node battery becomes smaller due to transfer of energy from nodes of sufficient energy to nodes of insufficient energy.
Keita IMAIZUMI Koichi ICHIGE Tatsuya NAGAO Takahiro HAYASHI
In this paper, we propose a method for predicting radio wave propagation using a correlation graph convolutional neural network (C-Graph CNN). We examine what kind of parameters are suitable to be used as system parameters in C-Graph CNN. Performance of the proposed method is evaluated by the path loss estimation accuracy and the computational cost through simulation.
Xinqun LIU Tao LI Yingxiao ZHAO Jinlin PENG
Conventional Nyquist folding receiver (NYFR) uses zero crossing rising (ZCR) voltage times to control the RF sample clock, which is easily affected by noise. Moreover, the analog and digital parts are not synchronized so that the initial phase of the input signal is lost. Furthermore, it is assumed in most literature that the input signal is in a single Nyquist zone (NZ), which is inconsistent with the actual situation. In this paper, we propose an improved architecture denominated as a dual-channel NYFR with adjustable local oscillator (LOS) and an information recovery algorithm. The simulation results demonstrate the validity and viability of the proposed architecture and the corresponding algorithm.
Yang LIU Yuqi XIA Haoqin SUN Xiaolei MENG Jianxiong BAI Wenbo GUAN Zhen ZHAO Yongwei LI
Speech emotion recognition (SER) has been a complex and difficult task for a long time due to emotional complexity. In this paper, we propose a multitask deep learning approach based on cascaded attention network and self-adaption loss for SER. First, non-personalized features are extracted to represent the process of emotion change while reducing external variables' influence. Second, to highlight salient speech emotion features, a cascade attention network is proposed, where spatial temporal attention can effectively locate the regions of speech that express emotion, while self-attention reduces the dependence on external information. Finally, the influence brought by the differences in gender and human perception of external information is alleviated by using a multitask learning strategy, where a self-adaption loss is introduced to determine the weights of different tasks dynamically. Experimental results on IEMOCAP dataset demonstrate that our method gains an absolute improvement of 1.97% and 0.91% over state-of-the-art strategies in terms of weighted accuracy (WA) and unweighted accuracy (UA), respectively.
Jingzhao DAI Ming LI Xuejiao HU Yang LI Sidan DU
Gaze following is the task of estimating where an observer is looking inside a scene. Both the observer and scene information must be learned to determine the gaze directions and gaze points. Recently, many existing works have only focused on scenes or observers. In contrast, revealed frameworks for gaze following are limited. In this paper, a gaze following method using a hybrid transformer is proposed. Based on the conventional method (GazeFollow), we conduct three developments. First, a hybrid transformer is applied for learning head images and gaze positions. Second, the pinball loss function is utilized to control the gaze point error. Finally, a novel ReLU layer with the reborn mechanism (reborn ReLU) is conducted to replace traditional ReLU layers in different network stages. To test the performance of our developments, we train our developed framework with the DL Gaze dataset and evaluate the model on our collected set. Through our experimental results, it can be proven that our framework can achieve outperformance over our referred methods.
Satomitsu IMAI Atsuya YAMAKAWA
An enzymatic biofuel cell (BFC) that uses lactic acid in human sweat as fuel to generate electricity is an attractive power source for wearable devices. A BFC capable of generating electricity with human sweat has been developed. It comprised a flexible tattoo seal type battery with silver oxide vapor deposited on a flexible material and conductive carbon nanotubes printed on it. The anode and cathode in this battery were arranged in a plane (planar type). This work proposes a thin laminated enzymatic BFC by inserting a cellulose nanofiber (CNF) sheet between two electrodes to absorb human sweat (stack-type). Optimization of the anode and changing the arrangement of electrodes from planar to stack type improved the output and battery life. The stack type is 43.20μW / cm2 at 180mV, which is 1.25 times the maximum power density of the planar type.
Conventional enzymatic biofuel cells (EBFCs) use glucose solution or glucose from human body. It is desirable to get glucose from a substance containing glucose because the glucose concentration can be kept at the optimum level. This work developed a biofuel cell that generates electricity from cellulose, which is the main components of plants, by using decomposing enzyme of cellulase. Cellulose nanofiber (CNF) was chosen for the ease of decomposability. It was confirmed by the cyclic voltammetry method that cellulase was effective against CNF. The maximum output of the optimized proposed method was 38.7 μW/cm2, which was 85% of the output by using the glucose solution at the optimized concentration.
Zhuo WANG Junbo LIU Fan WANG Jun WU
Machine vision-based automatic anti-bird thorn failure inspection, instead of manual identification, remains a great challenge. In this paper, we proposed a novel Object Position Embedding Network (OPENnet), which can improve the precision of anti-bird thorn localization. OPENnet can simultaneously predict the location boxes of the support device and anti-bird thorn by using the proposed double-head network. And then, OPENnet is optimized using the proposed symbiotic loss function (SymLoss), which embeds the object position into the network. The comprehensive experiments are conducted on the real railway video dataset. OPENnet yields competitive performance on anti-bird thorn localization. Specifically, the localization performance gains +3.65 AP, +2.10 AP50, and +1.22 AP75.
Daiki NISHIYAMA Kazuto FUKUCHI Youhei AKIMOTO Jun SAKUMA
In real world applications of multiclass classification models, misclassification in an important class (e.g., stop sign) can be significantly more harmful than in other classes (e.g., no parking). Thus, it is crucial to improve the recall of an important class while maintaining overall accuracy. For this problem, we found that improving the separation of important classes relative to other classes in the feature space is effective. Existing methods that give a class-sensitive penalty for cross-entropy loss do not improve the separation. Moreover, the methods designed to improve separations between all classes are unsuitable for our purpose because they do not consider the important classes. To achieve the separation, we propose a loss function that explicitly gives loss for the feature space, called class-sensitive additive angular margin (CAMRI) loss. CAMRI loss is expected to reduce the variance of an important class due to the addition of a penalty to the angle between the important class features and the corresponding weight vectors in the feature space. In addition, concentrating the penalty on only the important class hardly sacrifices separating the other classes. Experiments on CIFAR-10, GTSRB, and AwA2 showed that CAMRI loss could improve the recall of a specific class without sacrificing accuracy. In particular, compared with GTSRB's second-worst class recall when trained with cross-entropy loss, CAMRI loss improved recall by 9%.
Yusuke INOUE Kenji HASHIMOTO Hiroyuki SEKI
Multiple context-free grammar (MCFG) is an extension of context-free grammar (CFG), which generates tuples of words. The expressive power of MCFG is between CFG and context-sensitive grammar while MCFG inherits good properties of CFG. In this paper, we introduce weighted multiple context-free grammar (WMCFG) as a quantitative extension of MCFG. Then we investigate properties of WMCFG such as polynomial-time computability of basic problems, its closure property and expressive power.
Tomoya NITTA Tsubasa HIRAKAWA Hironobu FUJIYOSHI Toru TAMAKI
In this paper we propose an extension of the Attention Branch Network (ABN) by using instance segmentation for generating sharper attention maps for action recognition. Methods for visual explanation such as Grad-CAM usually generate blurry maps which are not intuitive for humans to understand, particularly in recognizing actions of people in videos. Our proposed method, Object-ABN, tackles this issue by introducing a new mask loss that makes the generated attention maps close to the instance segmentation result. Further the Prototype Conformity (PC) loss and multiple attention maps are introduced to enhance the sharpness of the maps and improve the performance of classification. Experimental results with UCF101 and SSv2 shows that the generated maps by the proposed method are much clearer qualitatively and quantitatively than those of the original ABN.
Jing ZHANG Dan LI Hong-an LI Xuewen LI Lizhi ZHANG
In order to solve the low-quality problems such as low brightness, poor contrast, noise interference and color imbalance in night images, a night image enhancement algorithm based on MDIFE-Net curve estimation is presented. This algorithm mainly consists of three parts: Firstly, we design an illumination estimation curve (IEC), which adjusts the pixel level of the low illumination image domain through a non-linear fitting function, maps to the enhanced image domain, and effectively eliminates the effect of illumination loss; Secondly, the DCE-Net is improved, replacing the original Relu activation function with a smoother Mish activation function, so that the parameters can be better updated; Finally, illumination estimation loss function, which combines image attributes with fidelity, is designed to drive the no-reference image enhancement, which preserves more image details while enhancing the night image. The experimental results show that our method can not only effectively improve the image contrast, but also make the details of the target more prominent, improve the visual quality of the image, and make the image achieve a better visual effect. Compared with four existing low illumination image enhancement algorithms, the NIQE and STD evaluation index values are better than other representative algorithms, verify the feasibility and validity of the algorithm, and verify the rationality and necessity of each component design through ablation experiments.
Ze Fu GAO Hai Cheng TAO Qin Yu ZHU Yi Wen JIAO Dong LI Fei Long MAO Chao LI Yi Tong SI Yu Xin WANG
Aiming at the problem of non-line of sight (NLOS) signal recognition for Ultra Wide Band (UWB) positioning, we utilize the concepts of Neural Network Clustering and Neural Network Pattern Recognition. We propose a classification algorithm based on self-organizing feature mapping (SOM) neural network batch processing, and a recognition algorithm based on convolutional neural network (CNN). By assigning different weights to learning, training and testing parts in the data set of UWB location signals with given known patterns, a strong NLOS signal recognizer is trained to minimize the recognition error rate. Finally, the proposed NLOS signal recognition algorithm is verified using data sets from real scenarios. The test results show that the proposed algorithm can solve the problem of UWB NLOS signal recognition under strong signal interference. The simulation results illustrate that the proposed algorithm is significantly more effective compared with other algorithms.
Naoya NIWA Yoshiya SHIKAMA Hideharu AMANO Michihiro KOIBUCHI
Network-on-Chips (NoCs) are important components for scalable many-core processors. Because the performance of parallel applications is usually sensitive to the latency of NoCs, reducing it is a primary requirement. In this study, a compression router that hides the (de)compression-operation delay is proposed. The compression router (de)compresses the contents of the incoming packet before the switch arbitration is completed, thus shortening the packet length without latency penalty and reducing the network injection-and-ejection latency. Evaluation results show that the compression router improves up to 33% of the parallel application performance (conjugate gradients (CG), fast Fourier transform (FT), integer sort (IS), and traveling salesman problem (TSP)) and 63% of the effective network throughput by 1.8 compression ratio on NoC. The cost is an increase in router area and its energy consumption by 0.22mm2 and 1.6 times compared to the conventional virtual-channel router. Another finding is that off-loading the decompressor onto a network interface decreases the compression-router area by 57% at the expense of the moderate increase in communication latency.
Binggang ZHUO Masaki MURATA Qing MA
Paragraph segmentation is a text segmentation task. Iikura et al. achieved excellent results on paragraph segmentation by introducing focal loss to Bidirectional Encoder Representations from Transformers. In this study, we investigated paragraph segmentation on Daily News and Novel datasets. Based on the approach proposed by Iikura et al., we used auxiliary loss to train the model to improve paragraph segmentation performance. Consequently, the average F1-score obtained by the approach of Iikura et al. was 0.6704 on the Daily News dataset, whereas that of our approach was 0.6801. Our approach thus improved the performance by approximately 1%. The performance improvement was also confirmed on the Novel dataset. Furthermore, the results of two-tailed paired t-tests indicated that there was a statistical significance between the performance of the two approaches.