Shuoyan LIU Chao LI Yuxin LIU Yanqiu WANG
Escalators are an indispensable facility in public places. While they can provide convenience to people, abnormal accidents can lead to serious consequences. Yolo is a function that detects human behavior in real time. However, the model exhibits low accuracy and a high miss rate for small targets. To this end, this paper proposes the Small Target High Performance YOLO (SH-YOLO) model to detect abnormal behavior in escalators. The SH-YOLO model first enhances the backbone network through attention mechanisms. Subsequently, a small target detection layer is incorporated in order to enhance detection of key points for small objects. Finally, the conv and the SPPF are replaced with a Region Dynamic Perception Depth Separable Conv (DR-DP-Conv) and Atrous Spatial Pyramid Pooling (ASPP), respectively. The experimental results demonstrate that the proposed model is capable of accurately and robustly detecting anomalies in the real-world escalator scene.
Jiaxin WU Bing LI Li ZHAO Xinzhou XU
The task of Speech Emotion Detection (SED) aims at judging positive class and negetive class when the speaker expresses emotions. The SED performances are heavily dependent on the diversity and prominence of emotional features extracted from the speech. However, most of the existing related research focuses on investigating the effects of single feature source and hand-crafted features. Thus, we propose a SED approach using multi-source low-level information based recurrent branches. The fusion multi-source low-level information obtain variety and discriminative representations from speech emotion signals. In addition, focal-loss function benifit for imbalance classes, resulting in reducing the proportion of well-classified samples and increasing the weights for difficult samples on SED tasks. Experiments on IEMOCAP corpus demonstrate the effectiveness of the proposed method. Compared with the baselines, MSIR achieve the significant performance improvements in terms of Unweighted Average Recall and F1-score.
Rong WANG Changjun YU Zhe LYU Aijun LIU
To address the challenge of target signals being completely submerged by ionospheric clutter during typhoon passages, this letter proposes a chaotic detection method for target signals in the background of ionospheric noise under typhoon excitation. Experimental results demonstrate the effectiveness of the proposed method in detecting target signals with harmonic characteristics from strong ionospheric clutter during typhoon passages.
To fully exploit the attribute information in graphs and dynamically fuse the features from different modalities, this letter proposes the Attributed Graph Clustering Network with Adaptive Feature Fusion (AGC-AFF) for graph clustering, where an Attribute Reconstruction Graph Autoencoder (ARGAE) with masking operation learns to reconstruct the node attributes and adjacency matrix simultaneously, and an Adaptive Feature Fusion (AFF) mechanism dynamically fuses the features from different modules based on node attention. Extensive experiments on various benchmark datasets demonstrate the effectiveness of the proposed method.
Yuya ICHIKAWA Ayumu YAMADA Naoko MISAWA Chihiro MATSUI Ken TAKEUCHI
Integrating RGB and event sensors improves object detection accuracy, especially during the night, due to the high-dynamic range of event camera. However, introducing an event sensor leads to an increase in computational resources, which makes the implementation of RGB-event fusion multi-modal AI to CiM difficult. To tackle this issue, this paper proposes RGB-Event fusion Multi-modal analog Computation-in-Memory (CiM), called REM-CiM, for multi-modal edge object detection AI. In REM-CiM, two proposals about multi-modal AI algorithms and circuit implementation are co-designed. First, Memory capacity-Efficient Attentional Feature Pyramid Network (MEA-FPN), the model architecture for RGB-event fusion analog CiM, is proposed for parameter-efficient RGB-event fusion. Convolution-less bi-directional calibration (C-BDC) in MEA-FPN extracts important features of each modality with attention modules, while reducing the number of weight parameters by removing large convolutional operations from conventional BDC. Proposed MEA-FPN w/ C-BDC achieves a 76% reduction of parameters while maintaining mean Average Precision (mAP) degradation to < 2.3% during both day and night, compared with Attentional FPN fusion (A-FPN), a conventional BDC-adopted FPN fusion. Second, the low-bit quantization with clipping (LQC) is proposed to reduce area/energy. Proposed REM-CiM with MEA-FPN and LQC achieves almost the same memory cells, 21% less ADC area, 24% less ADC energy and 0.17% higher mAP than conventional FPN fusion CiM without LQC.
Rina TAGAMI Hiroki KOBAYASHI Shuichi AKIZUKI Manabu HASHIMOTO
Due to the revitalization of the semiconductor industry and efforts to reduce labor and unmanned operations in the retail and food manufacturing industries, objects to be recognized at production sites are increasingly diversified in color and design. Depending on the target objects, it may be more reliable to process only color information, while intensity information may be better, or a combination of color and intensity information may be better. However, there are not many conventional method for optimizing the color and intensity information to be used, and deep learning is too costly for production sites. In this paper, we optimize the combination of the color and intensity information of a small number of pixels used for matching in the framework of template matching, on the basis of the mutual relationship between the target object and surrounding objects. We propose a fast and reliable matching method using these few pixels. Pixels with a low pixel pattern frequency are selected from color and grayscale images of the target object, and pixels that are highly discriminative from surrounding objects are carefully selected from these pixels. The use of color and intensity information makes the method highly versatile for object design. The use of a small number of pixels that are not shared by the target and surrounding objects provides high robustness to the surrounding objects and enables fast matching. Experiments using real images have confirmed that when 14 pixels are used for matching, the processing time is 6.3 msec and the recognition success rate is 99.7%. The proposed method also showed better positional accuracy than the comparison method, and the optimized pixels had a higher recognition success rate than the non-optimized pixels.
Weizhi WANG Lei XIA Zhuo ZHANG Xiankai MENG
Smart contracts, as a form of digital protocol, are computer programs designed for the automatic execution, control, and recording of contractual terms. They permit transactions to be conducted without the need for an intermediary. However, the economic property of smart contracts makes their vulnerabilities susceptible to hacking attacks, leading to significant losses. In this paper, we introduce a smart contract timestamp vulnerability detection technique HomoDec based on code homogeneity. The core idea of this technique involves comparing the homogeneity between the code of the test smart contract and the existing smart contract vulnerability codes in the database to determine whether the tested code has a timestamp vulnerability. Specifically, HomoDec first explores how to vectorize smart contracts reasonably and efficiently, representing smart contract code as a high-dimensional vector containing features of code vulnerabilities. Subsequently, it investigates methods to determine the homogeneity between the test codes and the ones in vulnerability code base, enabling the detection of potential timestamp vulnerabilities in smart contract code.
Chunbo LIU Liyin WANG Zhikai ZHANG Chunmiao XIANG Zhaojun GU Zhi WANG Shuang WANG
Aiming at the problem that large-scale traffic data lack labels and take too long for feature extraction in network intrusion detection, an unsupervised intrusion detection method ACOPOD based on Adam asymmetric autoencoder and COPOD (Copula-Based Outlier Detection) algorithm is proposed. This method uses the Adam asymmetric autoencoder with a reduced structure to extract features from the network data and reduce the data dimension. Then, based on the Copula function, the joint probability distribution of all features is represented by the edge probability of each feature, and then the outliers are detected. Experiments on the published NSL-KDD dataset with six other traditional unsupervised anomaly detection methods show that ACOPOD achieves higher precision and has obvious advantages in running speed. Experiments on the real civil aviation air traffic management network dataset further prove that the method can effectively detect intrusion behavior in the real network environment, and the results are interpretable and helpful for attack source tracing.
Shenglei LI Haoran LUO Tengfei SHAO Reiko HISHIYAMA
Automatic detection and recognition systems have numerous applications in smart city implementation. Despite the accuracy and widespread use of device-based and optical methods, several issues remain. These include device limitations, environmental limitations, and privacy concerns. The FMWC sensor can overcome these issues to detect and track moving people accurately in commercial environments. However, single-chip mmWave sensor solutions might struggle to recognize standing and sitting people due to the necessary static removal module. To address these issues, we propose a real-time indoor people detection and tracking fusion system using mmWave radar and cameras. The proposed fusion system approaches an overall detection accuracy of 93.8% with a median position error of 1.7 m in a commercial environment. Compared to our single-chip mmWave radar solution addressing an overall accuracy of 83.5% for walking people, it performs better in detecting individual stillness, which may feed the security needs in retail. This system visualizes customer information, including trajectories and the number of people. It helps commercial environments prevent crowds during the COVID-19 pandemic and analyze customer visiting patterns for efficient management and marketing. Powered by an IoT platform, the system can be deployed in the cloud for easy large-scale implementation.
Jia-ji JIANG Hai-bin WAN Hong-min SUN Tuan-fa QIN Zheng-qiang WANG
In this paper, the Towards High Performance Voxel-based 3D Object Detection (Voxel-RCNN) three-dimensional (3D) point cloud object detection model is used as the benchmark network. Aiming at the problems existing in the current mainstream 3D point cloud voxelization methods, such as the backbone and the lack of feature expression ability under the bird’s-eye view (BEV), a high-performance voxel-based 3D object detection network (Reinforced Voxel-RCNN) is proposed. Firstly, a 3D feature extraction module based on the integration of inverted residual convolutional network and weight normalization is designed on the 3D backbone. This module can not only well retain more point cloud feature information, enhance the information interaction between convolutional layers, but also improve the feature extraction ability of the backbone network. Secondly, a spatial feature-semantic fusion module based on spatial and channel attention is proposed from a BEV perspective. The mixed use of channel features and semantic features further improves the network’s ability to express point cloud features. In the comparison of experimental results on the public dataset KITTI, the experimental results of this paper are better than many voxel-based methods. Compared with the baseline network, the 3D average accuracy and BEV average accuracy on the three categories of Car, Cyclist, and Pedestrians are improved. Among them, in the 3D average accuracy, the improvement rate of Car category is 0.23%, Cyclist is 0.78%, and Pedestrians is 2.08%. In the context of BEV average accuracy, enhancements are observed: 0.32% for the Car category, 0.99% for Cyclist, and 2.38% for Pedestrians. The findings demonstrate that the algorithm enhancement introduced in this study effectively enhances the accuracy of target category detection.
Shijie WANG Xuejiao HU Sheng LIU Ming LI Yang LI Sidan DU
Detecting key frames in videos has garnered substantial attention in recent years, it is a point-level task and has deep research value and application prospect in daily life. For instances, video surveillance system, video cover generation and highlight moment flashback all demands the technique of key frame detection. However, the task is beset by challenges such as the sparsity of key frame instances, imbalances between target frames and background frames, and the absence of post-processing method. In response to these problems, we introduce a novel and effective Temporal Interval Guided (TIG) framework to precisely localize specific frames. The framework is incorporated with a proposed Point-Level-Soft non-maximum suppression (PLS-NMS) post-processing algorithm which is suitable for point-level task, facilitated by the well-designed confidence score decay function. Furthermore, we propose a TIG-loss, exhibiting sensitivity to temporal interval from target frame, to optimize the two-stage framework. The proposed method can be broadly applied to key frame detection in video understanding, including action start detection and static video summarization. Extensive experimentation validates the efficacy of our approach on action start detection benchmark datasets: THUMOS’14 and Activitynet v1.3, and we have reached state-of-the-art performance. Competitive results are also demonstrated on SumMe and TVSum datasets for deep learning based static video summarization.
Akira KITAYAMA Goichi ONO Hiroaki ITO
Edge devices with strict safety and reliability requirements, such as autonomous driving cars, industrial robots, and drones, necessitate software verification on such devices before operation. The human cost and time required for this analysis constitute a barrier in the cycle of software development and updating. In particular, the final verification at the edge device should at least strictly confirm that the updated software is not degraded from the current it. Since the edge device does not have the correct data, it is necessary for a human to judge whether the difference between the updated software and the operating it is due to degradation or improvement. Therefore, this verification is very costly. This paper proposes a novel automated method for efficient verification on edge devices of an object detection AI, which has found practical use in various applications. In the proposed method, a target object existence detector (TOED) (a simple binary classifier) judges whether an object in the recognition target class exists in the region of a prediction difference between the AI’s operating and updated versions. Using the results of this TOED judgement and the predicted difference, an automated verification system for the updated AI was constructed. TOED was designed as a simple binary classifier with four convolutional layers, and the accuracy of object existence judgment was evaluated for the difference between the predictions of the YOLOv5 L and X models using the Cityscapes dataset. The results showed judgement with more than 99.5% accuracy and 8.6% over detection, thus indicating that a verification system adopting this method would be more efficient than simple analysis of the prediction differences.
Shinsuke IBI Takumi TAKAHASHI Hisato IWAI
This paper proposes a novel differential active self-interference canceller (DASIC) algorithm for asynchronous in-band full-duplex (IBFD) Gaussian filtered frequency shift keying (GFSK), which is designed for wireless Internet of Things (IoT). In IBFD communications, where two terminals simultaneously transmit and receive signals in the same frequency band, there is an extremely strong self-interference (SI). The SI can be mitigated by an active SI canceller (ASIC), which subtracts an interference replica based on channel state information (CSI) from the received signal. The challenging problem is the realization of asynchronous IBFD for wireless IoT in indoor environments. In the asynchronous mode, pilot contamination is induced by the non-orthogonality between asynchronous pilot sequences. In addition, the transceiver suffers from analog front-end (AFE) impairments, such as phase noise. Due to these impairments, the SI cannot be canceled entirely at the receiver, resulting in residual interference. To address the above issue, the DASIC incorporates the principle of the differential codec, which enables to suppress SI without the CSI estimation of SI owing to the differential structure. Also, on the premise of using an error correction technique, iterative detection and decoding (IDD) is applied to improve the detection capability while exchanging the extrinsic log-likelihood ratio (LLR) between the maximum a-posteriori probability (MAP) detector and the channel decoder. Finally, the validity of using the DASIC algorithm is evaluated by computer simulations in terms of the packet error rate (PER). The results clearly demonstrate the possibility of realizing asynchronous IBFD.
Xiangyu LI Ping RUAN Wei HAO Meilin XIE Tao LV
To achieve precise measurement without landing, the high-mobility vehicle-mounted theodolite needs to be leveled quickly with high precision and ensure sufficient support stability before work. After the measurement, it is also necessary to ensure that the high-mobility vehicle-mounted theodolite can be quickly withdrawn. Therefore, this paper proposes a hierarchical automatic leveling strategy and establishes a two-stage electromechanical automatic leveling mechanism model. Using coarse leveling of the first-stage automatic leveling mechanism and fine leveling of the second-stage automatic leveling mechanism, the model realizes high-precision and fast leveling of the vehicle-mounted theodolites. Then, the leveling control method based on repeated positioning is proposed for the first-stage automatic leveling mechanism. To realize the rapid withdrawal for high-mobility vehicle-mounted theodolites, the method ensures the coincidence of spatial movement paths when the structural parts are unfolded and withdrawn. Next, the leg static balance equation is constructed in the leveling state, and the support force detection method is discussed in realizing the stable support for vehicle-mounted theodolites. Furthermore, a mathematical model for “false leg” detection is established furtherly, and a “false leg” detection scheme based on the support force detection method is analyzed to significantly improve the support stability of vehicle-mounted theodolites. Finally, an experimental platform is constructed to perform the performance test for automatic leveling mechanisms. The experimental results show that the leveling accuracy of established two-stage electromechanical automatic leveling mechanism can reach 3.6″, and the leveling time is no more than 2 mins. The maximum support force error of the support force detection method is less than 15%, and the average support force error is less than 10%. In contrast, the maximum support force error of the drive motor torque detection method reaches 80.12%, and its leg support stability is much less than the support force detection method. The model and analysis method proposed in this paper can also be used for vehicle-mounted radar, vehicle-mounted laser measurement devices, vehicle-mounted artillery launchers and other types of vehicle-mounted equipment with high-precision and high-mobility working requirements.
Beibei LI Xun RAN Yiran LIU Wensheng LI Qingling DUAN
Fish skin color detection plays a critical role in aquaculture. However, challenges arise from image color cast and the limited dataset, impacting the accuracy of the skin color detection process. To address these issues, we proposed a novel fish skin color detection method, termed VH-YOLOv5s. Specifically, we constructed a dataset for fish skin color detection to tackle the limitation posed by the scarcity of available datasets. Additionally, we proposed a Variance Gray World Algorithm (VGWA) to correct the image color cast. Moreover, the designed Hybrid Spatial Pyramid Pooling (HSPP) module effectively performs multi-scale feature fusion, thereby enhancing the feature representation capability. Extensive experiments have demonstrated that VH-YOLOv5s achieves excellent detection results on the Plectropomus leopardus skin color dataset, with a precision of 91.7%, recall of 90.1%, mAP@0.5 of 95.2%, and mAP@0.5:0.95 of 57.5%. When compared to other models such as Centernet, AutoAssign, and YOLOX-s, VH-YOLOv5s exhibits superior detection performance, surpassing them by 2.5%, 1.8%, and 1.7%, respectively. Furthermore, our model can be deployed directly on mobile phones, making it highly suitable for practical applications.
Min GAO Gaohua CHEN Jiaxin GU Chunmei ZHANG
Wearing a mask correctly is an effective method to prevent respiratory infectious diseases. Correct mask use is a reliable approach for preventing contagious respiratory infections. However, when dealing with mask-wearing in some complex settings, the detection accuracy still needs to be enhanced. The technique for mask-wearing detection based on YOLOv7-Tiny is enhanced in this research. Distribution Shifting Convolutions (DSConv) based on YOLOv7-tiny are used instead of the 3×3 convolution in the original model to simplify computation and increase detection precision. To decrease the loss of coordinate regression and enhance the detection performance, we adopt the loss function Intersection over Union with Minimum Points Distance (MPDIoU) instead of Complete Intersection over Union (CIoU) in the original model. The model is introduced with the GSConv and VoVGSCSP modules, recognizing the model’s mobility. The P6 detection layer has been designed to increase detection precision for tiny targets in challenging environments and decrease missed and false positive detection rates. The robustness of the model is increased further by creating and marking a mask-wearing data set in a multi environment that uses Mixup and Mosaic technologies for data augmentation. The efficiency of the model is validated in this research using comparison and ablation experiments on the mask dataset. The results demonstrate that when compared to YOLOv7-tiny, the precision of the enhanced detection algorithm is improved by 5.4%, Recall by 1.8%, mAP@.5 by 3%, mAP@.5:.95 by 1.7%, while the FLOPs is decreased by 8.5G. Therefore, the improved detection algorithm realizes more real-time and accurate mask-wearing detection tasks.
Kai YU Wentao LYU Xuyi YU Qing GUO Weiqiang XU Lu ZHANG
The automatic defect detection for fabric images is an essential mission in textile industry. However, there are some inherent difficulties in the detection of fabric images, such as complexity of the background and the highly uneven scales of defects. Moreover, the trade-off between accuracy and speed should be considered in real applications. To address these problems, we propose a novel model based on YOLOv4 to detect defects in fabric images, called Feature Augmentation YOLO (FA-YOLO). In terms of network structure, FA-YOLO adds an additional detection head to improve the detection ability of small defects and builds a powerful Neck structure to enhance feature fusion. First, to reduce information loss during feature fusion, we perform the residual feature augmentation (RFA) on the features after dimensionality reduction by using 1×1 convolution. Afterward, the attention module (SimAM) is embedded into the locations with rich features to improve the adaptation ability to complex backgrounds. Adaptive spatial feature fusion (ASFF) is also applied to output of the Neck to filter inconsistencies across layers. Finally, the cross-stage partial (CSP) structure is introduced for optimization. Experimental results based on three real industrial datasets, including Tianchi fabric dataset (72.5% mAP), ZJU-Leaper fabric dataset (0.714 of average F1-score) and NEU-DET steel dataset (77.2% mAP), demonstrate the proposed FA-YOLO achieves competitive results compared to other state-of-the-art (SoTA) methods.
Gyulim KIM Hoojin LEE Xinrong LI Seong Ho CHAE
This letter studies the secrecy outage probability (SOP) and the secrecy diversity order of Alamouti STBC with decision feedback (DF) detection over the time-selective fading channels. For given temporal correlations, we have derived the exact SOPs and their asymptotic approximations for all possible combinations of detection schemes including joint maximum likehood (JML), zero-forcing (ZF), and DF at Bob and Eve. We reveal that the SOP is mainly influenced by the detection scheme of the legitimate receiver rather than eavesdropper and the achievable secrecy diversity order converges to two and one for JML only at Bob (i.e., JML-JML/ZF/DF) and for the other cases (i.e., ZF-JML/ZF/DF, DF-JML/ZF/DF), respectively. Here, p-q combination pair indicates that Bob and Eve adopt the detection method p ∈ {JML, ZF, DF} and q ∈ {JML, ZF, DF}, respectively.
Yaokun HU Xuanyu PENG Takeshi TODA
The subject must be motionless for conventional radar-based non-contact vital signs measurements. Additionally, the measurement range is limited by the design of the radar module itself. Although the accuracy of measurements has been improving, the prospects for their application could have been faster to develop. This paper proposed a novel radar-based adaptive tracking method for measuring the heart rate of the moving monitored person. The radar module is fixed on a circular plate and driven by stepping motors to rotate it. In order to protect the user’s privacy, the method uses radar signal processing to detect the subject’s position to control a stepping motor that adjusts the radar’s measurement range. The results of the fixed-route experiments revealed that when the subject was moving at a speed of 0.5 m/s, the mean values of RMSE for heart rate measurements were all below 2.85 beat per minute (bpm), and when moving at a speed of 1 m/s, they were all below 4.05 bpm. When subjects walked at random routes and speeds, the RMSE of the measurements were all below 6.85 bpm, with a mean value of 4.35 bpm. The average RR interval time of the reconstructed heartbeat signal was highly correlated with the electrocardiography (ECG) data, with a correlation coefficient of 0.9905. In addition, this study not only evaluated the potential effect of arm swing (more normal walking motion) on heart rate measurement but also demonstrated the ability of the proposed method to measure heart rate in a multiple-people scenario.
Qingqi ZHANG Xiaoan BAO Ren WU Mitsuru NAKATA Qi-Wei GE
Automatic detection of prohibited items is vital in helping security staff be more efficient while improving the public safety index. However, prohibited item detection within X-ray security inspection images is limited by various factors, including the imbalance distribution of categories, diversity of prohibited item scales, and overlap between items. In this paper, we propose to leverage the Poisson blending algorithm with the Canny edge operator to alleviate the imbalance distribution of categories maximally in the X-ray images dataset. Based on this, we improve the cascade network to deal with the other two difficulties. To address the prohibited scale diversity problem, we propose the Re-BiFPN feature fusion method, which includes a coordinate attention atrous spatial pyramid pooling (CA-ASPP) module and a recursive connection. The CA-ASPP module can implicitly extract direction-aware and position-aware information from the feature map. The recursive connection feeds the CA-ASPP module processed multi-scale feature map to the bottom-up backbone layer for further multi-scale feature extraction. In addition, a Rep-CIoU loss function is designed to address the overlapping problem in X-ray images. Extensive experimental results demonstrate that our method can successfully identify ten types of prohibited items, such as Knives, Scissors, Pressure, etc. and achieves 83.4% of mAP, which is 3.8% superior to the original cascade network. Moreover, our method outperforms other mainstream methods by a significant margin.