Min GAO Gaohua CHEN Jiaxin GU Chunmei ZHANG
Wearing a mask correctly is an effective method to prevent respiratory infectious diseases. Correct mask use is a reliable approach for preventing contagious respiratory infections. However, when dealing with mask-wearing in some complex settings, the detection accuracy still needs to be enhanced. The technique for mask-wearing detection based on YOLOv7-Tiny is enhanced in this research. Distribution Shifting Convolutions (DSConv) based on YOLOv7-tiny are used instead of the 3×3 convolution in the original model to simplify computation and increase detection precision. To decrease the loss of coordinate regression and enhance the detection performance, we adopt the loss function Intersection over Union with Minimum Points Distance (MPDIoU) instead of Complete Intersection over Union (CIoU) in the original model. The model is introduced with the GSConv and VoVGSCSP modules, recognizing the model’s mobility. The P6 detection layer has been designed to increase detection precision for tiny targets in challenging environments and decrease missed and false positive detection rates. The robustness of the model is increased further by creating and marking a mask-wearing data set in a multi environment that uses Mixup and Mosaic technologies for data augmentation. The efficiency of the model is validated in this research using comparison and ablation experiments on the mask dataset. The results demonstrate that when compared to YOLOv7-tiny, the precision of the enhanced detection algorithm is improved by 5.4%, Recall by 1.8%, mAP@.5 by 3%, mAP@.5:.95 by 1.7%, while the FLOPs is decreased by 8.5G. Therefore, the improved detection algorithm realizes more real-time and accurate mask-wearing detection tasks.
Beibei LI Xun RAN Yiran LIU Wensheng LI Qingling DUAN
Fish skin color detection plays a critical role in aquaculture. However, challenges arise from image color cast and the limited dataset, impacting the accuracy of the skin color detection process. To address these issues, we proposed a novel fish skin color detection method, termed VH-YOLOv5s. Specifically, we constructed a dataset for fish skin color detection to tackle the limitation posed by the scarcity of available datasets. Additionally, we proposed a Variance Gray World Algorithm (VGWA) to correct the image color cast. Moreover, the designed Hybrid Spatial Pyramid Pooling (HSPP) module effectively performs multi-scale feature fusion, thereby enhancing the feature representation capability. Extensive experiments have demonstrated that VH-YOLOv5s achieves excellent detection results on the Plectropomus leopardus skin color dataset, with a precision of 91.7%, recall of 90.1%, mAP@0.5 of 95.2%, and mAP@0.5:0.95 of 57.5%. When compared to other models such as Centernet, AutoAssign, and YOLOX-s, VH-YOLOv5s exhibits superior detection performance, surpassing them by 2.5%, 1.8%, and 1.7%, respectively. Furthermore, our model can be deployed directly on mobile phones, making it highly suitable for practical applications.
Xiangyu LI Ping RUAN Wei HAO Meilin XIE Tao LV
To achieve precise measurement without landing, the high-mobility vehicle-mounted theodolite needs to be leveled quickly with high precision and ensure sufficient support stability before work. After the measurement, it is also necessary to ensure that the high-mobility vehicle-mounted theodolite can be quickly withdrawn. Therefore, this paper proposes a hierarchical automatic leveling strategy and establishes a two-stage electromechanical automatic leveling mechanism model. Using coarse leveling of the first-stage automatic leveling mechanism and fine leveling of the second-stage automatic leveling mechanism, the model realizes high-precision and fast leveling of the vehicle-mounted theodolites. Then, the leveling control method based on repeated positioning is proposed for the first-stage automatic leveling mechanism. To realize the rapid withdrawal for high-mobility vehicle-mounted theodolites, the method ensures the coincidence of spatial movement paths when the structural parts are unfolded and withdrawn. Next, the leg static balance equation is constructed in the leveling state, and the support force detection method is discussed in realizing the stable support for vehicle-mounted theodolites. Furthermore, a mathematical model for “false leg” detection is established furtherly, and a “false leg” detection scheme based on the support force detection method is analyzed to significantly improve the support stability of vehicle-mounted theodolites. Finally, an experimental platform is constructed to perform the performance test for automatic leveling mechanisms. The experimental results show that the leveling accuracy of established two-stage electromechanical automatic leveling mechanism can reach 3.6″, and the leveling time is no more than 2 mins. The maximum support force error of the support force detection method is less than 15%, and the average support force error is less than 10%. In contrast, the maximum support force error of the drive motor torque detection method reaches 80.12%, and its leg support stability is much less than the support force detection method. The model and analysis method proposed in this paper can also be used for vehicle-mounted radar, vehicle-mounted laser measurement devices, vehicle-mounted artillery launchers and other types of vehicle-mounted equipment with high-precision and high-mobility working requirements.
Yaokun HU Xuanyu PENG Takeshi TODA
The subject must be motionless for conventional radar-based non-contact vital signs measurements. Additionally, the measurement range is limited by the design of the radar module itself. Although the accuracy of measurements has been improving, the prospects for their application could have been faster to develop. This paper proposed a novel radar-based adaptive tracking method for measuring the heart rate of the moving monitored person. The radar module is fixed on a circular plate and driven by stepping motors to rotate it. In order to protect the user’s privacy, the method uses radar signal processing to detect the subject’s position to control a stepping motor that adjusts the radar’s measurement range. The results of the fixed-route experiments revealed that when the subject was moving at a speed of 0.5 m/s, the mean values of RMSE for heart rate measurements were all below 2.85 beat per minute (bpm), and when moving at a speed of 1 m/s, they were all below 4.05 bpm. When subjects walked at random routes and speeds, the RMSE of the measurements were all below 6.85 bpm, with a mean value of 4.35 bpm. The average RR interval time of the reconstructed heartbeat signal was highly correlated with the electrocardiography (ECG) data, with a correlation coefficient of 0.9905. In addition, this study not only evaluated the potential effect of arm swing (more normal walking motion) on heart rate measurement but also demonstrated the ability of the proposed method to measure heart rate in a multiple-people scenario.
Gyulim KIM Hoojin LEE Xinrong LI Seong Ho CHAE
This letter studies the secrecy outage probability (SOP) and the secrecy diversity order of Alamouti STBC with decision feedback (DF) detection over the time-selective fading channels. For given temporal correlations, we have derived the exact SOPs and their asymptotic approximations for all possible combinations of detection schemes including joint maximum likehood (JML), zero-forcing (ZF), and DF at Bob and Eve. We reveal that the SOP is mainly influenced by the detection scheme of the legitimate receiver rather than eavesdropper and the achievable secrecy diversity order converges to two and one for JML only at Bob (i.e., JML-JML/ZF/DF) and for the other cases (i.e., ZF-JML/ZF/DF, DF-JML/ZF/DF), respectively. Here, p-q combination pair indicates that Bob and Eve adopt the detection method p ∈ {JML, ZF, DF} and q ∈ {JML, ZF, DF}, respectively.
Kai YU Wentao LYU Xuyi YU Qing GUO Weiqiang XU Lu ZHANG
The automatic defect detection for fabric images is an essential mission in textile industry. However, there are some inherent difficulties in the detection of fabric images, such as complexity of the background and the highly uneven scales of defects. Moreover, the trade-off between accuracy and speed should be considered in real applications. To address these problems, we propose a novel model based on YOLOv4 to detect defects in fabric images, called Feature Augmentation YOLO (FA-YOLO). In terms of network structure, FA-YOLO adds an additional detection head to improve the detection ability of small defects and builds a powerful Neck structure to enhance feature fusion. First, to reduce information loss during feature fusion, we perform the residual feature augmentation (RFA) on the features after dimensionality reduction by using 1×1 convolution. Afterward, the attention module (SimAM) is embedded into the locations with rich features to improve the adaptation ability to complex backgrounds. Adaptive spatial feature fusion (ASFF) is also applied to output of the Neck to filter inconsistencies across layers. Finally, the cross-stage partial (CSP) structure is introduced for optimization. Experimental results based on three real industrial datasets, including Tianchi fabric dataset (72.5% mAP), ZJU-Leaper fabric dataset (0.714 of average F1-score) and NEU-DET steel dataset (77.2% mAP), demonstrate the proposed FA-YOLO achieves competitive results compared to other state-of-the-art (SoTA) methods.
Kenshi OGAWA Masashi KUROSAKI Ryohei NAKAMURA
With the development of drone technology, concerns have arisen about the possibility of drones being equipped with threat payloads for terrorism and other crimes. A drone detection system that can detect drones carrying payloads is needed. A drone’s propeller rotation frequency increases with payload weight. Therefore, a method for estimating propeller rotation frequency will effectively detect the presence or absence of a payload and its weight. In this paper, we propose a method for classifying the payload weight of a drone by estimating its propeller rotation frequency from radar images obtained using a millimeter-wave fast-chirp-modulation multiple-input and multiple-output (MIMO) radar. For each drone model, the proposed method requires a pre-prepared reference dataset that establishes the relationships between the payload weight and propeller rotation frequency. Two experimental measurement cases were conducted to investigate the effectiveness of our proposal. In case 1, we assessed four drones (DJI Matrice 600, DJI Phantom 3, DJI Mavic Pro, and DJI Mavic Mini) to determine whether the propeller rotation frequency of any drone could be correctly estimated. In case 2, experiments were conducted on a hovering Phantom 3 drone with several payloads in a stable position for calculating the accuracy of the payload weight classification. The experimental results indicated that the proposed method could estimate the propeller rotation frequency of any drone and classify payloads in a 250 g step with high accuracy.
Qingqi ZHANG Xiaoan BAO Ren WU Mitsuru NAKATA Qi-Wei GE
Automatic detection of prohibited items is vital in helping security staff be more efficient while improving the public safety index. However, prohibited item detection within X-ray security inspection images is limited by various factors, including the imbalance distribution of categories, diversity of prohibited item scales, and overlap between items. In this paper, we propose to leverage the Poisson blending algorithm with the Canny edge operator to alleviate the imbalance distribution of categories maximally in the X-ray images dataset. Based on this, we improve the cascade network to deal with the other two difficulties. To address the prohibited scale diversity problem, we propose the Re-BiFPN feature fusion method, which includes a coordinate attention atrous spatial pyramid pooling (CA-ASPP) module and a recursive connection. The CA-ASPP module can implicitly extract direction-aware and position-aware information from the feature map. The recursive connection feeds the CA-ASPP module processed multi-scale feature map to the bottom-up backbone layer for further multi-scale feature extraction. In addition, a Rep-CIoU loss function is designed to address the overlapping problem in X-ray images. Extensive experimental results demonstrate that our method can successfully identify ten types of prohibited items, such as Knives, Scissors, Pressure, etc. and achieves 83.4% of mAP, which is 3.8% superior to the original cascade network. Moreover, our method outperforms other mainstream methods by a significant margin.
This Letter focuses on deep learning-based monkeys' head swing counting problem. Nowadays, there are very few papers on monkey detection, and even fewer papers on monkeys' head swing counting. This research tries to fill in the gap and try to calculate the head swing frequency of monkeys through deep learning, where we further extend the traditional target detection algorithm. After analyzing object detection results, we localize the monkey's actions over a period. This Letter analyzes the task of counting monkeys' head swings, and proposes the standard that accurately describes a monkey's head swing. Under the guidance of this standard, the monkeys' head swing counting accuracy in 50 test videos reaches 94.23%.
Wocheng XIAO Lingyu LIANG Jianyong CHEN Tao WANG
Video text detection (VTD) aims to localize text instances in videos, which has wide applications for downstream tasks. To deal with the variances of different scenes and text instances, multiple models and feature fusion strategies were typically integrated in existing VTD methods. A VTD method consisting of sophisticated components can efficiently improve detection accuracy, but may suffer from a limitation for real-time applications. This paper aims to achieve real-time VTD with an adaptive lightweight end-to-end framework. Different from previous methods that represent text in a spatial domain, we model text instances in the Fourier domain. Specifically, we propose a scale-aware Fourier Contour Embedding method, which not only models arbitrary shaped text contours of videos as compact signatures, but also adaptively select proper scales for features in a backbone in the training stage. Then, we construct VTD-FCENet to achieve real-time VTD, which encodes temporal correlations of adjacent frames with scale-aware FCE in a lightweight and adaptive manner. Quantitative evaluations were conducted on ICDAR2013 Video, Minetto and YVT benchmark datasets, and the results show that our VTD-FCENet not only obtains the state-of-the-arts or competitive detection accuracy, but also allows real-time text detection on HD videos simultaneously.
Owing to the several cases wherein abnormal sounds, called adventitious sounds, are included in the lung sounds of a patient suffering from pulmonary disease, the objective of this study was to automatically detect abnormal sounds from auscultatory sounds. To this end, we expressed the acoustic features of the normal lung sounds of healthy people and abnormal lung sounds of patients using Gaussian mixture model (GMM)-hidden Markov models (HMMs), and distinguished between normal and abnormal lung sounds. In our previous study, we constructed left-to-right GMM-HMMs with a limited number of states. Because we expressed abnormal sounds that occur intermittently and repeatedly using limited states, the GMM-HMMs could not express the acoustic features of abnormal sounds. Furthermore, because the analysis frame length and intervals were long, the GMM-HMMs could not express the acoustic features of short time segments, such as heart sounds. Therefore, the classification rate of normal and abnormal respiration was low (86.60%). In this study, we propose the construction of ergodic GMM-HMMs with a repetitive structure for intermittent sounds. Furthermore, we considered a suitable frame length and frame interval to analyze acoustic features. Using the ergodic GMM-HMM, which can express the acoustic features of abnormal sounds and heart sounds that occur repeatedly in detail, the classification rate increased (89.34%). The results obtained in this study demonstrated the effectiveness of the proposed method.
Li HE Jingxuan ZHAO Jianyong DUAN Hao WANG Xin LI
In Natural Language Understanding, intent detection and slot filling have been widely used to understand user queries. However, current methods tend to rely on single words and sentences to understand complex semantic concepts, and can only consider local information within the sentence. Therefore, they usually cannot capture long-distance dependencies well and are prone to problems where complex intentions in sentences are difficult to recognize. In order to solve the problem of long-distance dependency of the model, this paper uses ConceptNet as an external knowledge source and introduces its extensive semantic information into the multi-intent detection and slot filling model. Specifically, for a certain sentence, based on confidence scores and semantic relationships, the most relevant conceptual knowledge is selected to equip the sentence, and a concept context map with rich information is constructed. Then, the multi-head graph attention mechanism is used to strengthen context correlation and improve the semantic understanding ability of the model. The experimental results indicate that the model has significantly improved performance compared to other models on the MixATIS and MixSNIPS multi-intent datasets.
Zikang CHEN Wenping GE Henghai FEI Haipeng ZHAO Bowen LI
The combination of multiple-input multiple-output (MIMO) technology and sparse code multiple access (SCMA) can significantly enhance the spectral efficiency of future wireless communication networks. However, the receiver design for downlink MIMO-SCMA systems faces challenges in developing multi-user detection (MUD) schemes that achieve both low latency and low bit error rate (BER). The separated detection scheme in the MIMO-SCMA system involves performing MIMO detection first to obtain estimated signals, followed by SCMA decoding. We propose an enhanced separated detection scheme based on lightweight graph neural networks (GNNs). In this scheme, we raise the concept of coordinate point relay and full-category training, which allow for the substitution of the conventional message passing algorithm (MPA) in SCMA decoding with image classification techniques based on deep learning (DL). The features of the images used for training encompass crucial information such as the amplitude and phase of estimated signals, as well as channel characteristics they have encountered. Furthermore, various types of images demonstrate distinct directional trends, contributing additional features that enhance the precision of classification by GNNs. Simulation results demonstrate that the enhanced separated detection scheme outperforms existing separated and joint detection schemes in terms of computational complexity, while having a better BER performance than the joint detection schemes at high Eb/N0 (energy per bit to noise power spectral density ratio) values.
Noboru HAYASAKA Riku KASAI Takuya FUTAGAMI
In this paper, we propose a noise-robust scream detection method with the aim of expanding the scream detection system, a sound-based security system. The proposed method uses enhanced screams using Wave-U-Net, which was effective as a noise reduction method for noisy screams. However, the enhanced screams showed different frequency components from clean screams and erroneously emphasized frequency components similar to scream in noise. Therefore, Wave-U-Net was applied even in the process of training Gaussian mixture models, which are discriminators. We conducted detection experiments using the proposed method in various noise environments and determined that the false acceptance rate was reduced by an average of 2.1% or more compared with the conventional method.
Asahi MIZUKOSHI Ayano NAKAI-KASAI Tadashi WADAYAMA
This paper proposes the periodical successive over-relaxation (PSOR)-Jacobi algorithm for minimum mean squared error (MMSE) detection of multiple-input multiple-output (MIMO) signals. The proposed algorithm has the advantages of two conventional methods. One is the Jacobi method, which is an iterative method for solving linear equations and is suitable for parallel implementation. The Jacobi method is thus a promising candidate for high-speed simultaneous linear equation solvers for the MMSE detector. The other is the Chebyshev PSOR method, which has recently been shown to accelerate the convergence speed of linear fixed-point iterations. We compare the convergence performance of the PSOR-Jacobi algorithm with that of conventional algorithms via computer simulation. The results show that the PSOR-Jacobi algorithm achieves faster convergence without increasing computational complexity, and higher detection performance for a fixed number of iterations. This paper also proposes an efficient computation method of inverse matrices using the PSOR-Jacobi algorithm. The results of computer simulation show that the PSOR-Jacobi algorithm also accelerates the computation of inverse matrix.
Recently, multivariate time-series data has been generated in various environments, such as sensor networks and IoT, making anomaly detection in time-series data an essential research topic. Unsupervised learning anomaly detectors identify anomalies by training a model on normal data and producing high residuals for abnormal observations. However, a fundamental issue arises as anomalies do not consistently result in high residuals, necessitating a focus on the time-series patterns of residuals rather than individual residual sizes. In this paper, we present a novel framework comprising two serialized anomaly detectors: the first model calculates residuals as usual, while the second one evaluates the time-series pattern of the computed residuals to determine whether they are normal or abnormal. Experiments conducted on real-world time-series data demonstrate the effectiveness of our proposed framework.
Shiyu TENG Jiaqing LIU Yue HUANG Shurong CHAI Tomoko TATEYAMA Xinyin HUANG Lanfen LIN Yen-Wei CHEN
Depression is a prevalent mental disorder affecting a significant portion of the global population, leading to considerable disability and contributing to the overall burden of disease. Consequently, designing efficient and robust automated methods for depression detection has become imperative. Recently, deep learning methods, especially multimodal fusion methods, have been increasingly used in computer-aided depression detection. Importantly, individuals with depression and those without respond differently to various emotional stimuli, providing valuable information for detecting depression. Building on these observations, we propose an intra- and inter-emotional stimulus transformer-based fusion model to effectively extract depression-related features. The intra-emotional stimulus fusion framework aims to prioritize different modalities, capitalizing on their diversity and complementarity for depression detection. The inter-emotional stimulus model maps each emotional stimulus onto both invariant and specific subspaces using individual invariant and specific encoders. The emotional stimulus-invariant subspace facilitates efficient information sharing and integration across different emotional stimulus categories, while the emotional stimulus specific subspace seeks to enhance diversity and capture the distinct characteristics of individual emotional stimulus categories. Our proposed intra- and inter-emotional stimulus fusion model effectively integrates multimodal data under various emotional stimulus categories, providing a comprehensive representation that allows accurate task predictions in the context of depression detection. We evaluate the proposed model on the Chinese Soochow University students dataset, and the results outperform state-of-the-art models in terms of concordance correlation coefficient (CCC), root mean squared error (RMSE) and accuracy.
Asahi YOSHIDA Yoshihide KATO Shigeki MATSUBARA
Negation scope resolution is the process of detecting the negated part of a sentence. Unlike the syntax-based approach employed in previous researches, state-of-the-art methods performed better without the explicit use of syntactic structure. This work revisits the syntax-based approach and re-evaluates the effectiveness of syntactic structure in negation scope resolution. We replace the parser utilized in the prior works with state-of-the-art parsers and modify the syntax-based heuristic rules. The experimental results demonstrate that the simple modifications enhance the performance of the prior syntax-based method to the same level as state-of-the-art end-to-end neural-based methods.
We investigated the influence of horizontal shifts of the input images for one stage object detection method. We found that the object detector class scores drop when the target object center is at the grid boundary. Many approaches have focused on reducing the aliasing effect of down-sampling to achieve shift-invariance. However, down-sampling does not completely solve this problem at the grid boundary; it is necessary to suppress the dispersion of features in pixels close to the grid boundary into adjacent grid cells. Therefore, this paper proposes two approaches focused on the grid boundary to improve this weak point of current object detection methods. One is the Sub-Grid Feature Extraction Module, in which the sub-grid features are added to the input of the classification head. The other is Grid-Aware Data Augmentation, where augmented data are generated by the grid-level shifts and are used in training. The effectiveness of the proposed approaches is demonstrated using the COCO validation set after applying the proposed method to the FCOS architecture.
Tomoki CHIBA Yusuke ASANO Masaharu TAKAHASHI
The proportion of persons over 65 years old is projected to increase worldwide between 2022 and 2050. The increasing burden on medical staff and the shortage of human resources are growing problems. Bedsores are injuries caused by prolonged pressure on the skin and stagnation of blood flow. The more the damage caused by bedsores progresses, the longer the treatment period becomes. Moreover, patients require surgery in some serious cases. Therefore, early detection is essential. In our research, we are developing a non-contact bedsore detection system using electromagnetic waves at 10.5GHz. In this paper, we extracted appropriate information from a scalogram and utilized it to detect the sizes of bedsores. In addition, experiments using a phantom were conducted to confirm the basic operation of the bedsore detection system. As a result, using the approximate curves and lines obtained from prior analysis data, it was possible to estimate the volume of each defected area, as well as combinations of the depth of the defected area and the length of the defected area. Moreover, the experiments showed that it was possible to detect bedsore presence and estimate their sizes, although the detection results had slight variations.