Haoran LUO Tengfei SHAO Shenglei LI Reiko HISHIYAMA
Makeup transfer is the process of applying the makeup style from one picture (reference) to another (source), allowing for the modification of characters’ makeup styles. To meet the diverse makeup needs of individuals or samples, the makeup transfer framework should accurately handle various makeup degrees, ranging from subtle to bold, and exhibit intelligence in adapting to the source makeup. This paper introduces a “3-level” adaptive makeup transfer framework, addressing facial makeup through two sub-tasks: 1. Makeup adaptation, utilizing feature descriptors and eyelid curve algorithms to classify 135 organ-level face shapes; 2. Makeup transfer, achieved by learning the reference picture from three branches (color, highlight, pattern) and applying it to the source picture. The proposed framework, termed “Face Shape Adaptive Makeup Transfer” (FSAMT), demonstrates superior results in makeup transfer output quality, as confirmed by experimental results.
This paper presents MDX-Mixer, which improves music demixing (MDX) performance by leveraging source signals separated by multiple existing MDX models. Deep-learning-based MDX models have improved their separation performances year by year for four kinds of sound sources: “vocals,” “drums,” “bass,” and “other”. Our research question is whether mixing (i.e., weighted sum) the signals separated by state-of-the-art MDX models can obtain either the best of everything or higher separation performance. Previously, in singing voice separation and MDX, there have been studies in which separated signals of the same sound source are mixed with each other using time-invariant or time-varying positive mixing weights. In contrast to those, this study is novel in that it allows for negative weights as well and performs time-varying mixing using all of the separated source signals and the music acoustic signal before separation. The time-varying weights are estimated by modeling the music acoustic signals and their separated signals by dividing them into short segments. In this paper we propose two new systems: one that estimates time-invariant weights using 1×1 convolution, and one that estimates time-varying weights by applying the MLP-Mixer layer proposed in the computer vision field to each segment. The latter model is called MDX-Mixer. Their performances were evaluated based on the source-to-distortion ratio (SDR) using the well-known MUSDB18-HQ dataset. The results show that the MDX-Mixer achieved higher SDR than the separated signals given by three state-of-the-art MDX models.
With the rapid advancement of graphics processing units (GPUs), Virtual Reality (VR) experiences have significantly improved, enhancing immersion and realism. However, these advancements also raise security concerns in VR. In this paper, I introduce a new attack leveraging known WebVR vulnerabilities to track the activities of VR users. The proposed attack leverages the user’s hand motion information exposed to web attackers, demonstrating the capability to identify consumed content, such as 3D images and videos, and pilfer private drawings created in a 3D drawing app. To achieve this, I employed a machine learning approach to process controller sensor data and devised techniques to extract sensitive activities during the use of target apps. The experimental results demonstrate that the viewed content in the targeted content viewer can be identified with 90% accuracy. Furthermore, I successfully obtained drawing outlines that precisely match the user’s original drawings without performance degradation, validating the effectiveness of the attack.
Ji XI Yue XIE Pengxu JIANG Wei JIANG
Currently, a significant portion of acoustic scene categorization (ASC) research is centered around utilizing Convolutional Neural Network (CNN) models. This preference is primarily due to CNN’s ability to effectively extract time-frequency information from audio recordings of scenes by employing spectrum data as input. The expression of many dimensions can be achieved by utilizing 2D spectrum characteristics. Nevertheless, the diverse interpretations of the same object’s existence in different positions on the spectrum map can be attributed to the discrepancies between spectrum properties and picture qualities. The lack of distinction between different aspects of input information in ASC-based CNN networks may result in a decline in system performance. Considering this, a feature pyramid segmentation (FPS) approach based on CNN is proposed. The proposed approach involves utilizing spectrum features as the input for the model. These features are split based on a preset scale, and each segment-level feature is then fed into the CNN network for learning. The SoftMax classifier will receive the output of all feature scales, and these high-level features will be fused and fed to it to categorize different scenarios. The experiment provides evidence to support the efficacy of the FPS strategy and its potential to enhance the performance of the ASC system.
Hongliang FU Qianqian LI Huawei TAO Chunhua ZHU Yue XIE Ruxue GUO
Speech emotion recognition (SER) is a key research technology to realize the third generation of artificial intelligence, which is widely used in human-computer interaction, emotion diagnosis, interpersonal communication and other fields. However, the aliasing of language and semantic information in speech tends to distort the alignment of emotion features, which affects the performance of cross-corpus SER system. This paper proposes a cross-corpus SER model based on causal emotion information representation (CEIR). The model uses the reconstruction loss of the deep autoencoder network and the source domain label information to realize the preliminary separation of causal features. Then, the causal correlation matrix is constructed, and the local maximum mean difference (LMMD) feature alignment technology is combined to make the causal features of different dimensions jointly distributed independent. Finally, the supervised fine-tuning of labeled data is used to achieve effective extraction of causal emotion information. The experimental results show that the average unweighted average recall (UAR) of the proposed algorithm is increased by 3.4% to 7.01% compared with the latest partial algorithms in the field.
Zhi LIU Heng WANG Yuan LI Hongyun LU Hongyuan JING Mengmeng ZHANG
In video-based point cloud compression (V-PCC), the partitioning of the Coding Unit (CU) has ultra-high computational complexity. Just Noticeable Difference Model (JND) is an effective metric to guide this process. However, in this paper, it is found that the performance of traditional JND model is degraded in V-PCC. For the attribute video, due to the pixel-filling operation, the capability of brightness perception is reduced for the JND model. For the geometric video, due to the depth filling operation, the capability of depth perception is degraded in the boundary area for depth based JND models (JNDD). In this paper, a joint JND model (J_JND) is proposed for the attribute video to improve the brightness perception capacity, and an occupancy map guided JNDD model (O_JNDD) is proposed for the geometric video to improve the depth difference estimation accuracy of the boundaries. Based on the two improved JND models, a fast V-PCC Coding Unit (CU) partitioning algorithm is proposed with adaptive CU depth prediction. The experimental results show that the proposed algorithm eliminates 27.46% of total coding time at the cost of only 0.36% and 0.75% Bjontegaard Delta rate increment under the geometry Point-to-Point (D1) error and attribute Luma Peak-signal-Noise-Ratio (PSNR), respectively.
Chang SUN Yitong LIU Hongwen YANG
Sparse-view CT reconstruction has gained significant attention due to the growing concerns about radiation safety. Although recent deep learning-based image domain reconstruction methods have achieved encouraging performance over iterative methods, effectively capturing intricate details and organ structures while suppressing noise remains challenging. This study presents a novel dual-stream encoder-decoder-based reconstruction network that combines global path reconstruction from the entire image with local path reconstruction from image patches. These two branches interact through an attention module, which enhances visual quality and preserves image details by learning correlations between image features and patch features. Visual and numerical results show that the proposed method has superior reconstruction capabilities to state-of-the-art 180-, 90-, and 45-view CT reconstruction methods.
Shuai LI Xinhong YOU Shidong ZHANG Mu FANG Pengping ZHANG
Emerging data-intensive services in distribution grid impose requirements of high-concurrency access for massive internet of things (IoT) devices. However, the lack of effective high-concurrency access management results in severe performance degradation. To address this challenge, we propose a cloud-edge-device collaborative high-concurrency access management algorithm based on multi-timescale joint optimization of channel pre-allocation and load balancing degree. We formulate an optimization problem to minimize the weighted sum of edge-cloud load balancing degree and queuing delay under the constraint of access success rate. The problem is decomposed into a large-timescale channel pre-allocation subproblem solved by the device-edge collaborative access priority scoring mechanism, and a small-timescale data access control subproblem solved by the discounted empirical matching mechanism (DEM) with the perception of high-concurrency number and queue backlog. Particularly, information uncertainty caused by externalities is tackled by exploiting discounted empirical performance which accurately captures the performance influence of historical time points on present preference value. Simulation results demonstrate the effectiveness of the proposed algorithm in reducing edge-cloud load balancing degree and queuing delay.
Yuto ARIMURA Shigeru YAMASHITA
Stochastic Computing (SC) allows additions and multiplications to be realized with lower power than the conventional binary operations if we admit some errors. However, for many complex functions which cannot be realized by only additions and multiplications, we do not know a generic efficient method to calculate a function by using an SC circuit; it is necessary to realize an SC circuit by using a generic method such as polynomial approximation methods for such a function, which may lose the advantage of SC. Thus, there have been many researches to consider efficient SC realization for specific functions; an efficient SC square root circuit with a feedback circuit was proposed by D. Wu et al. recently. This paper generalizes the SC square root circuit with a feedback circuit; we identify a situation when we can implement a function efficiently by an SC circuit with a feedback circuit. As examples of our generalization, we propose SC circuits to calculate the n-th root calculation and division. We also show our analysis on the accuracy of our SC circuits and the hardware costs; our results show the effectiveness of our method compared to the conventional SC designs; our framework may be able to implement a SC circuit that is better than the existing methods in terms of the hardware cost or the calculation error.
Kaoru TAKEMURE Yusuke SAKAI Bagus SANTOSO Goichiro HANAOKA Kazuo OHTA
The existing discrete-logarithm-based two-round multi-signature schemes without using the idealized model, i.e., the Algebraic Group Model (AGM), have quite large reduction loss. This means that an implementation of these schemes requires an elliptic curve (EC) with a very large order for the standard 128-bit security when we consider concrete security. Indeed, the existing standardized ECs have orders too small to ensure 128-bit security of such schemes. Recently, Pan and Wagner proposed two two-round schemes based on the Decisional Diffie-Hellman (DDH) assumption (EUROCRYPT 2023). For 128-bit security in concrete security, the first scheme can use the NIST-standardized EC P-256 and the second can use P-384. However, with these parameter choices, they do not improve the signature size and the communication complexity over the existing non-tight schemes. Therefore, there is no two-round scheme that (i) can use a standardized EC for 128-bit security and (ii) has high efficiency. In this paper, we construct a two-round multi-signature scheme achieving both of them from the DDH assumption. We prove that an EC with at least a 321-bit order is sufficient for our scheme to ensure 128-bit security. Thus, we can use the NIST-standardized EC P-384 for 128-bit security. Moreover, the signature size and the communication complexity per one signer of our proposed scheme under P-384 are 1152 bits and 1535 bits, respectively. These are most efficient among the existing two-round schemes without using the AGM including Pan-Wagner’s schemes and non-tight schemes which do not use the AGM. Our experiment on an ordinary machine shows that for signing and verification, each can be completed in about 65 ms under 100 signers. This shows that our scheme has sufficiently reasonable running time in practice.
Longye WANG Chunlin CHEN Xiaoli ZENG Houshan LIU Lingguo KONG Qingping YU Qingsong WANG
Spatial modulation (SM) is a type of multiple-input multiple-output (MIMO) technology that provides several benefits over traditional MIMO systems. SM-MIMO is characterized by its unique transmission principle, which results in lower costs, enhanced spectrum utilization, and reduced inter-channel interference. To optimize channel estimation performance over frequency-selective channels in the spatial modulation system, cross Z-complementary pairs (CZCPs) have been proposed as training sequences. The zero correlation zone (ZCZ) properties of CZCPs for auto-correlation sums and cross-correlation sums enable them to achieve optimal channel estimation performance. In this paper, we systematically construct CZCPs based on binary Golay complementary pairs and binary Golay complementary pairs via Turyn’s method. We employ a special matrix operation and concatenation method to obtain CZCPs with new lengths 2M + N and 2(M + L), where M and L are the lengths of binary GCP, and N is the length of binary GCP via Turyn’s method. Further, we obtain the perfect CZCP with new length 4N and extend the lengths of CZCPs.
Yingzhong ZHANG Xiaoni DU Wengang JIN Xingbin QIAO
Boolean functions with a few Walsh spectral values have important applications in sequence ciphers and coding theory. In this paper, we first construct a class of Boolean functions with at most five-valued Walsh spectra by using the secondary construction of Boolean functions, in particular, plateaued functions are included. Then, we construct three classes of Boolean functions with five-valued Walsh spectra using Kasami functions and investigate the Walsh spectrum distributions of the new functions. Finally, three classes of minimal linear codes with five-weights are obtained, which can be used to design secret sharing scheme with good access structures.
Wenjian WANG Zhi GU Avik Ranjan ADHIKARY Rong LUO
The auto-correlation property of Huffman sequence makes it a good candidate for its application in radar and communication systems. However, high peak-to-average power ratio (PAPR) of Huffman sequence severely limits its application value. In this paper, we propose a novel algorithm to construct Huffman sequences with low PAPR. We have used the roots of the polynomials corresponding to Huffman sequences of length M + 1 to construct Huffman sequences of length 2M + 1, with low PAPR.
This paper proposes a scheme for reducing pilot interference in cell-free massive multiple-input multiple-output (MIMO) systems through scalable access point (AP) selection and efficient pilot allocation using the Grey Wolf Optimizer (GWO). Specifically, we introduce a bidirectional large-scale fading-based (B-LSFB) AP selection method that builds high-quality connections benefiting both APs and UEs. Then, we limit the number of UEs that each AP can serve and encourage competition among UEs to improve the scalability of this approach. Additionally, we propose a grey wolf optimization based pilot allocation (GWOPA) scheme to minimize pilot contamination. Specifically, we first define a fitness function to quantify the level of pilot interference between UEs, and then construct dynamic interference relationships between any UE and its serving AP sets using a weighted fitness function to minimize pilot interference. The simulation results shows that the B-LSFB strategy achieves scalability with performance similar to large-scale fading-based (LSFB) AP selection. Furthermore, the grey wolf optimization-based pilot allocation scheme significantly improves per-user net throughput with low complexity compared to four existing schemes.
Artificial intelligence and the introduction of Internet of Things technologies have benefited from technological advances and new automated computer system technologies. Eventually, it is now possible to integrate them into a single offline industrial system. This is accomplished through machine-to-machine communication, which eliminates the human factor. The purpose of this article is to examine security systems for machine-to-machine communication systems that rely on identification and authentication algorithms for real-time monitoring. The article investigates security methods for quickly resolving data processing issues by using the Security operations Center’s main machine to identify and authenticate devices from 19 different machines. The results indicate that when machines are running offline and performing various tasks, they can be exposed to data leaks and malware attacks by both the individual machine and the system as a whole. The study looks at the operation of 19 computers, 7 of which were subjected to data leakage and malware attacks. AnyLogic software is used to create visual representations of the results using wireless networks and algorithms based on previously processed methods. The W76S is used as a protective element within intelligent sensors due to its built-in memory protection. For 4 machines, the data leakage time with malware attacks was 70 s. For 10 machines, the duration was 150 s with 3 attacks. Machine 15 had the longest attack duration, lasting 190 s and involving 6 malware attacks, while machine 19 had the shortest attack duration, lasting 200 s and involving 7 malware attacks. The highest numbers indicated that attempting to hack a system increased the risk of damaging a device, potentially resulting in the entire system with connected devices failing. Thus, illegal attacks by attackers using malware may be identified over time, and data processing effects can be prevented by intelligent control. The results reveal that applying identification and authentication methods using a protocol increases cyber-physical system security while also allowing real-time monitoring of offline system security.
Xiangyu LI Ping RUAN Wei HAO Meilin XIE Tao LV
To achieve precise measurement without landing, the high-mobility vehicle-mounted theodolite needs to be leveled quickly with high precision and ensure sufficient support stability before work. After the measurement, it is also necessary to ensure that the high-mobility vehicle-mounted theodolite can be quickly withdrawn. Therefore, this paper proposes a hierarchical automatic leveling strategy and establishes a two-stage electromechanical automatic leveling mechanism model. Using coarse leveling of the first-stage automatic leveling mechanism and fine leveling of the second-stage automatic leveling mechanism, the model realizes high-precision and fast leveling of the vehicle-mounted theodolites. Then, the leveling control method based on repeated positioning is proposed for the first-stage automatic leveling mechanism. To realize the rapid withdrawal for high-mobility vehicle-mounted theodolites, the method ensures the coincidence of spatial movement paths when the structural parts are unfolded and withdrawn. Next, the leg static balance equation is constructed in the leveling state, and the support force detection method is discussed in realizing the stable support for vehicle-mounted theodolites. Furthermore, a mathematical model for “false leg” detection is established furtherly, and a “false leg” detection scheme based on the support force detection method is analyzed to significantly improve the support stability of vehicle-mounted theodolites. Finally, an experimental platform is constructed to perform the performance test for automatic leveling mechanisms. The experimental results show that the leveling accuracy of established two-stage electromechanical automatic leveling mechanism can reach 3.6″, and the leveling time is no more than 2 mins. The maximum support force error of the support force detection method is less than 15%, and the average support force error is less than 10%. In contrast, the maximum support force error of the drive motor torque detection method reaches 80.12%, and its leg support stability is much less than the support force detection method. The model and analysis method proposed in this paper can also be used for vehicle-mounted radar, vehicle-mounted laser measurement devices, vehicle-mounted artillery launchers and other types of vehicle-mounted equipment with high-precision and high-mobility working requirements.
Pengxu JIANG Yang YANG Yue XIE Cairong ZOU Qingyun WANG
Convolutional neural network (CNN) is widely used in acoustic scene classification (ASC) tasks. In most cases, local convolution is utilized to gather time-frequency information between spectrum nodes. It is challenging to adequately express the non-local link between frequency domains in a finite convolution region. In this paper, we propose a dual-path convolutional neural network based on band interaction block (DCNN-bi) for ASC, with mel-spectrogram as the model’s input. We build two parallel CNN paths to learn the high-frequency and low-frequency components of the input feature. Additionally, we have created three band interaction blocks (bi-blocks) to explore the pertinent nodes between various frequency bands, which are connected between two paths. Combining the time-frequency information from two paths, the bi-blocks with three distinct designs acquire non-local information and send it back to the respective paths. The experimental results indicate that the utilization of the bi-block has the potential to improve the initial performance of the CNN substantially. Specifically, when applied to the DCASE 2018 and DCASE 2020 datasets, the CNN exhibited performance improvements of 1.79% and 3.06%, respectively.
Changhui CHEN Haibin KAN Jie PENG Li WANG
Permutation polynomials have important applications in cryptography, coding theory and combinatorial designs. In this letter, we construct four classes of permutation polynomials over 𝔽2n × 𝔽2n, where 𝔽2n is the finite field with 2n elements.
Chao HE Xiaoqiong RAN Rong LUO
Cyclic codes are a subclass of linear codes and have applications in consumer electronics, data storage systems, and communication systems as they have efficient encoding and decoding algorithms. Let C(t,e) denote the cyclic code with two nonzero αt and αe, where α is a generator of 𝔽*3m. In this letter, we investigate the ternary cyclic codes with parameters [3m - 1, 3m - 1 - 2m, 4] based on some results proposed by Ding and Helleseth in 2013. Two new classes of optimal ternary cyclic codes C(t,e) are presented by choosing the proper t and e and determining the solutions of certain equations over 𝔽3m.
Ze Fu GAO Wen Ge YANG Yi Wen JIAO
Space is becoming increasingly congested and contested, which calls for effective means to conduct effective monitoring of high-value space assets, especially in Space Situational Awareness (SSA) missions, while there are imperfections in existing methods and corresponding algorithms. To overcome such a problem, this letter proposes an algorithm for accurate Connected Element Interferometry (CEI) in SSA based on more interpolation information and iterations. Simulation results show that: (i) after iterations, the estimated asymptotic variance of the proposed method can basically achieve uniform convergence, and the ratio of it to ACRB is 1.00235 in δ0 ∈ [-0.5, 0.5], which is closer to 1 than the current best AM algorithms; (ii) In the interval of SNR ∈ [-14dB, 0dB], the estimation error of the proposed algorithm decreases significantly, which is basically comparable to CRLB (maintains at 1.236 times). The research of this letter could play a significant role in effective monitoring and high-precision tracking and measurement with significant space targets during futuristic SSA missions.