1-3hit |
Wenkai LIU Cuizhu QIN Menglong WU Wenle BAI Hongxia DONG
Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.
Wenkai LIU Lin ZHANG Menglong WU Xichang CAI Hongxia DONG
The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
Menglong WU Cuizhu QIN Hongxia DONG Wenkai LIU Xiaodong NIE Xichang CAI Yundong LI
In many screen to camera communication (S2C) systems, the barcode preprocessing method is a significant prerequisite because barcodes may be deformed due to various environmental factors. However, previous studies have focused on barcode detection under static conditions; to date, few studies have been carried out on dynamic conditions (for example, the barcode video stream or the transmitter and receiver are moving). Therefore, we present a detection and tracking method for dynamic barcodes based on a Siamese network. The backbone of the CNN in the Siamese network is improved by SE-ResNet. The detection accuracy achieved 89.5%, which stands out from other classical detection networks. The EAO reaches 0.384, which is better than previous tracking methods. It is also superior to other methods in terms of accuracy and robustness. The SE-ResNet in this paper improved the EAO by 1.3% compared with ResNet in SiamMask. Also, our method is not only applicable to static barcodes but also allows real-time tracking and segmentation of barcodes captured in dynamic situations.