1-2hit |
Congcong FANG Yun JIN Guanlin CHEN Yunfan ZHANG Shidang LI Yong MA Yue XIE
Currently, an increasing number of tasks in speech emotion recognition rely on the analysis of both speech and text features. However, there remains a paucity of research exploring the potential of leveraging large language models like GPT-3 to enhance emotion recognition. In this investigation, we harness the power of the GPT-3 model to extract semantic information from transcribed texts, generating text modal features with a dimensionality of 1536. Subsequently, we perform feature fusion, combining the 1536-dimensional text features with 1188-dimensional acoustic features to yield comprehensive multi-modal recognition outcomes. Our findings reveal that the proposed method achieves a weighted accuracy of 79.62% across the four emotion categories in IEMOCAP, underscoring the considerable enhancement in emotion recognition accuracy facilitated by integrating large language models.
Shidang LI Chunguo LI Yongming HUANG Dongming WANG Luxi YANG
Considering worse-case channel uncertainties, we investigate the robust energy efficient (EE) beamforming design problem in a K-user multiple-input-single-output (MISO) interference channel. Our objective is to maximize the worse-case sum EE under individual transmit power constraints. In general, this fractional programming problem is NP-hard for the optimal solution. To obtain an insight into the problem, we first transform the original problem into its lower bound problem with max-min and fractional form by exploiting the relationship between the user rate and the minimum mean square error (MMSE) and using the min-max inequality. To make it tractable, we transform the problem of fractional form into a subtractive form by using the Dinkelbach transformation, and then propose an iterative algorithm using Lagrangian duality, which leads to the locally optimal solution. Simulation results demonstrate that our proposed robust EE beamforming scheme outperforms the conventional algorithm.