The search functionality is under construction.

Author Search Result

[Author] Lin LI(25hit)

1-20hit(25hit)

  • Speech Emotion Recognition Using Multihead Attention in Both Time and Feature Dimensions

    Yue XIE  Ruiyu LIANG  Zhenlin LIANG  Xiaoyan ZHAO  Wenhao ZENG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2023/02/21
      Vol:
    E106-D No:5
      Page(s):
    1098-1101

    To enhance the emotion feature and improve the performance of speech emotion recognition, an attention mechanism is employed to recognize the important information in both time and feature dimensions. In the time dimension, multi-heads attention is modified with the last state of the long short-term memory (LSTM)'s output to match the time accumulation characteristic of LSTM. In the feature dimension, scaled dot-product attention is replaced with additive attention that refers to the method of the state update of LSTM to construct multi-heads attention. This means that a nonlinear change replaces the linear mapping in classical multi-heads attention. Experiments on IEMOCAP datasets demonstrate that the attention mechanism could enhance emotional information and improve the performance of speech emotion recognition.

  • An Efficient Mapping Scheme on Neural Networks for Linear Massive MIMO Detection

    Lin LI  Jianhao HU  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2023/05/19
      Vol:
    E106-A No:11
      Page(s):
    1416-1423

    For massive multiple-input multiple-output (MIMO) communication systems, simple linear detectors such as zero forcing (ZF) and minimum mean square error (MMSE) can achieve near-optimal detection performance with reduced computational complexity. However, such linear detectors always involve complicated matrix inversion, which will suffer from high computational overhead in the practical implementation. Due to the massive parallel-processing and efficient hardware-implementation nature, the neural network has become a promising approach to signal processing for the future wireless communications. In this paper, we first propose an efficient neural network to calculate the pseudo-inverses for any type of matrices based on the improved Newton's method, termed as the PINN. Through detailed analysis and derivation, the linear massive MIMO detectors are mapped on PINNs, which can take full advantage of the research achievements of neural networks in both algorithms and hardwares. Furthermore, an improved limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) quasi-Newton method is studied as the learning algorithm of PINNs to achieve a better performance/complexity trade-off. Simulation results finally validate the efficiency of the proposed scheme.

  • On Algebraic Property of T-Functions

    Ruilin LI  Bing SUN  Chao LI  Shaojing FU  

     
    LETTER

      Vol:
    E95-A No:1
      Page(s):
    267-269

    T-function is a kind of cryptographic function which is shown to be useful in various applications. It is known that any function f on F2n or Z2n automatically deduces a unique polynomial fF ∈ F2n[x] with degree ≤ 2n-1. In this letter, we study an algebraic property of fF while f is a T-function. We prove that for a single cycle T-function f on F2n or Z2n, deg fF=2n-2 which is optimal for a permutation. We also consider a kind of widely used T-function in many cryptographic algorithms, namely the modular addition function Ab(x)=x+b ∈ Z2n[x]. We demonstrate how to calculate deg Ab F from the constant value b. These results can facilitate us to evaluate the immunity of the T-function based cryptosystem against some known attacks such as interpolation attack and integral attack.

  • High Quality and Low Complexity Speech Analysis/Synthesis Based on Sinusoidal Representation

    Jianguo TAN  Wenjun ZHANG  Peilin LIU  

     
    LETTER-Speech and Hearing

      Vol:
    E88-D No:12
      Page(s):
    2893-2896

    Sinusoidal representation has been widely applied to speech modification, low bit rate speech and audio coding. Usually, speech signal is analyzed and synthesized using the overlap-add algorithm or the peak-picking algorithm. But the overlap-add algorithm is well known for high computational complexity and the peak-picking algorithm cannot track the transient and syllabic variation well. In this letter, both algorithms are applied to speech analysis/synthesis. Peaks are picked in the curve of power spectral density for speech signal; the frequencies corresponding to these peaks are arranged according to the descending orders of their corresponding power spectral densities. These frequencies are regarded as the candidate frequencies to determine the corresponding amplitudes and initial phases according to the least mean square error criterion. The summation of the extracted sinusoidal components is used to successively approach the original speech signal. The results show that the proposed algorithm can track the transient and syllabic variation and can attain the good synthesized speech signal with low computational complexity.

  • Joint AP Selection and Grey Wolf Optimization Based Pilot Design for Cell-Free Massive MIMO Systems Open Access

    Zelin LIU  Fangmin XU  

     
    PAPER-Communication Theory and Signals

      Pubricized:
    2023/10/26
      Vol:
    E107-A No:7
      Page(s):
    1011-1018

    This paper proposes a scheme for reducing pilot interference in cell-free massive multiple-input multiple-output (MIMO) systems through scalable access point (AP) selection and efficient pilot allocation using the Grey Wolf Optimizer (GWO). Specifically, we introduce a bidirectional large-scale fading-based (B-LSFB) AP selection method that builds high-quality connections benefiting both APs and UEs. Then, we limit the number of UEs that each AP can serve and encourage competition among UEs to improve the scalability of this approach. Additionally, we propose a grey wolf optimization based pilot allocation (GWOPA) scheme to minimize pilot contamination. Specifically, we first define a fitness function to quantify the level of pilot interference between UEs, and then construct dynamic interference relationships between any UE and its serving AP sets using a weighted fitness function to minimize pilot interference. The simulation results shows that the B-LSFB strategy achieves scalability with performance similar to large-scale fading-based (LSFB) AP selection. Furthermore, the grey wolf optimization-based pilot allocation scheme significantly improves per-user net throughput with low complexity compared to four existing schemes.

  • Attention-Based Dense LSTM for Speech Emotion Recognition Open Access

    Yue XIE  Ruiyu LIANG  Zhenlin LIANG  Li ZHAO  

     
    LETTER-Pattern Recognition

      Pubricized:
    2019/04/17
      Vol:
    E102-D No:7
      Page(s):
    1426-1429

    Despite the widespread use of deep learning for speech emotion recognition, they are severely restricted due to the information loss in the high layer of deep neural networks, as well as the degradation problem. In order to efficiently utilize information and solve degradation, attention-based dense long short-term memory (LSTM) is proposed for speech emotion recognition. LSTM networks with the ability to process time series such as speech are constructed into which attention-based dense connections are introduced. That means the weight coefficients are added to skip-connections of each layer to distinguish the difference of the emotional information between layers and avoid the interference of redundant information from the bottom layer to the effective information from the top layer. The experiments demonstrate that proposed method improves the recognition performance by 12% and 7% on eNTERFACE and IEMOCAP corpus respectively.

  • Latent Influence Based Self-Attention Framework for Heterogeneous Network Embedding

    Yang YAN  Qiuyan WANG  Lin LIU  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2022/03/24
      Vol:
    E105-D No:7
      Page(s):
    1335-1339

    In recent years, Graph Neural Networks has received enormous attention from academia for its huge potential of modeling the network traits such as macrostructure and single node attributes. However, prior mainstream works mainly focus on homogeneous network and lack the capacity to characterize the network heterogeneous property. Besides, most previous literature cannot the model latent influence link under microscope vision, making it infeasible to model the joint relation between the heterogeneity and mutual interaction within multiple relation type. In this letter, we propose a latent influence based self-attention framework to address the difficulties mentioned above. To model the heterogeneity and mutual interactions, we redesign the attention mechanism with latent influence factor on single-type relation level, which learns the importance coefficient from its adjacent neighbors under the same meta-path based patterns. To incorporate the heterogeneous meta-path in a unified dimension, we developed a novel self-attention based framework for meta-path relation fusion according to the learned meta-path coefficient. Our experimental results demonstrate that our framework not only achieves higher results than current state-of-the-art baselines, but also shows promising vision on depicting heterogeneous interactive relations under complicated network structure.

  • Weighted Gradient Pretrain for Low-Resource Speech Emotion Recognition

    Yue XIE  Ruiyu LIANG  Xiaoyan ZHAO  Zhenlin LIANG  Jing DU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2022/04/04
      Vol:
    E105-D No:7
      Page(s):
    1352-1355

    To alleviate the problem of the dependency on the quantity of the training sample data in speech emotion recognition, a weighted gradient pre-train algorithm for low-resource speech emotion recognition is proposed. Multiple public emotion corpora are used for pre-training to generate shared hidden layer (SHL) parameters with the generalization ability. The parameters are used to initialize the downsteam network of the recognition task for the low-resource dataset, thereby improving the recognition performance on low-resource emotion corpora. However, the emotion categories are different among the public corpora, and the number of samples varies greatly, which will increase the difficulty of joint training on multiple emotion datasets. To this end, a weighted gradient (WG) algorithm is proposed to enable the shared layer to learn the generalized representation of different datasets without affecting the priority of the emotion recognition on each corpus. Experiments show that the accuracy is improved by using CASIA, IEMOCAP, and eNTERFACE as the known datasets to pre-train the emotion models of GEMEP, and the performance could be improved further by combining WG with gradient reversal layer.

  • Analytical Drain Current Modeling of Dual-Material Surrounding-Gate MOSFETs

    Zunchao LI  Jinpeng XU  Linlin LIU  Feng LIANG  Kuizhi MEI  

     
    PAPER-Semiconductor Materials and Devices

      Vol:
    E94-C No:6
      Page(s):
    1120-1126

    The asymmetrical halo and dual-material gate structure is used in the surrounding-gate metal-oxide-semiconductor field effect transistor (MOSFET) to improve the performance. By treating the device as three surrounding-gate MOSFETs connected in series and maintaining current continuity, a comprehensive drain current model is developed for it. The model incorporates not only channel length modulation and impact ionization effects, but also the influence of doping concentration and vertical electric field distributions. It is concluded that the device exhibits increased current drivability and improved hot carrier reliability. The derived analytical model is verified with numerical simulation.

  • Data Detection for OFDM Systems with Phase Noise and Channel Estimation Errors Using Variational Inference

    Feng LI  Shuyuan LI  Hailin LI  

     
    PAPER-Communication Theory and Signals

      Vol:
    E100-A No:4
      Page(s):
    1037-1044

    This paper studies a novel iterative detection algorithm for data detection in orthogonal frequency division multiplexing systems in the presence of phase noise (PHN) and channel estimation errors. By simplifying the maximum a posteriori algorithm based on the theory of variational inference, an optimization problem over variational free energy is formulated. After that, the estimation of data, PHN and channel state information is obtained jointly and iteratively. The simulations indicate the validity of this algorithm and show a better performance compared with the traditional schemes.

  • Handwritten Numeral String Recognition: Effects of Character Normalization and Feature Extraction

    Cheng-Lin LIU  Hiroshi SAKO  Hiromichi FUJISAWA  

     
    PAPER-String Recognition

      Vol:
    E88-D No:8
      Page(s):
    1791-1798

    The performance of integrated segmentation and recognition of handwritten numeral strings relies on the classification accuracy and the non-character resistance of the underlying character classifier, which is variable depending on the techniques of pattern normalization, feature extraction, and classifier structure. In this paper, we evaluate the effects of 12 normalization functions and four selected feature types on numeral string recognition. Slant correction (deslant) is combined with the normalization functions and features so as to create 96 feature vectors, which are classified using two classifier structures. In experiments on numeral string images of the NIST Special Database 19, the classifiers have yielded very high string recognition accuracies. We show the superiority of moment normalization with adaptive aspect ratio mapping and the gradient direction feature, and observed that slant correction is beneficial to string recognition when combined with good normalization methods.

  • Fast Prediction Unit Selection and Mode Selection for HEVC Intra Prediction

    Heming SUN  Dajiang ZHOU  Peilin LIU  Satoshi GOTO  

     
    PAPER

      Vol:
    E97-A No:2
      Page(s):
    510-519

    As a next-generation video compression standard, High Efficiency Video Coding (HEVC) achieves enhanced coding performance relative to prior standards such as H.264/AVC. In the new standard, the improved intra prediction plays an important role in bit rate saving. Meanwhile, it also involves significantly increased complexity, due to the adoption of a highly flexible coding unit structure and a large number of angular prediction modes. In this paper, we present a low-complexity intra prediction algorithm for HEVC. We first propose a fast preprocessing stage based on a simplified cost model. Based on its results, a fast prediction unit selection scheme reduces the number of prediction unit (PU) levels that requires fine processing from 5 to 2. To supply PU size decision with appropriate thresholds, a fast training method is also designed. Still based on the preprocessing results, an efficient mode selection scheme reduces the maximum number of angular modes to evaluate from 35 to 8. This achieves further algorithm acceleration by eliminating the necessity to perform fine Hadamard cost calculation. We also propose a 32×32 PU compensation scheme to alleviate the mismatch of cost functions for large transform units, which effectively improves coding performance for high-resolution sequences. In comparison with HM 7.0, the proposed algorithm achieves over 50% complexity reduction in terms of encoding time, with the corresponding bit rate increase lower than 2.0%. Moreover, the achieved complexity reduction is relatively stable and independent to sequence characteristics.

  • Compressive Sensing of Audio Signal via Structured Shrinkage Operators

    Sumxin JIANG  Rendong YING  Peilin LIU  Zhenqi LU  Zenghui ZHANG  

     
    PAPER-Digital Signal Processing

      Vol:
    E97-A No:4
      Page(s):
    923-930

    This paper describes a new method for lossy audio signal compression via compressive sensing (CS). In this method, a structured shrinkage operator is employed to decompose the audio signal into three layers, with two sparse layers, tonal and transient, and additive noise, and then, both the tonal and transient layers are compressed using CS. Since the shrinkage operator is able to take into account the structure information of the coefficients in the transform domain, it is able to achieve a better sparse approximation of the audio signal than traditional methods do. In addition, we propose a sparsity allocation algorithm, which adjusts the sparsity between the two layers, thus improving the performance of CS. Experimental results demonstrated that the new method provided a better compression performance than conventional methods did.

  • An Efficient Test and Repair Flow for Yield Enhancement of One-Time-Programming NROM-Based ROMs

    Tsu-Lin LI  Masaki HASHIZUME  Shyue-Kung LU  

     
    LETTER

      Vol:
    E96-D No:9
      Page(s):
    2026-2030

    NROM is one of the emerging non-volatile-memory technologies, which is promising for replacing current floating-gate-based non-volatile memory such as flash memory. In order to raise the fabrication yield and enhance its reliability, a novel test and repair flow is proposed in this paper. Instead of the conventional fault replacement techniques, a novel fault masking technique is also exploited by considering the logical effects of physical defects when the customer's code is to be programmed. In order to maximize the possibilities of fault masking, a novel data inversion technique is proposed. The corresponding BIST architectures are also presented. According to experimental results, the repair rate and fabrication yield can be improved significantly. Moreover, the incurred hardware overhead is almost negligible.

  • Antenna Array Self-Calibration Algorithm with Location Errors for MUSIC

    Jian BAI  Lin LIU  Xiaoyang ZHANG  

     
    LETTER-Digital Signal Processing

      Pubricized:
    2022/04/20
      Vol:
    E105-A No:10
      Page(s):
    1421-1424

    The characteristics of antenna array, like sensor location, gain and phase response are rarely perfectly known in realistic situations. Location errors usually have a serious impact on the DOA (direction of arrival) estimation. In this paper, a novel array location calibration method of MUSIC (multiple signal classification) algorithm based on the virtual interpolated array is proposed. First, the paper introduces the antenna array positioning scheme. Then, the self-calibration algorithm of FIR-Winner filter based on virtual interpolation array is derived, and its application restriction are also analyzed. Finally, by simulating the different location errors of antenna array, the effectiveness of the proposed method is validated.

  • Development of Program Difference Tool Based on Tree Mapping

    Lin LIAN  Minoru AIZAWA  Katsuro INOUE  Koji TORII  

     
    PAPER-Software Systems

      Vol:
    E78-D No:10
      Page(s):
    1261-1268

    In the program development process, it is ofren necessary for programmers to know the differences between two programs, or two different versions of a program. Since programs have structures such as iteration statement and selection statement, applying text-based tools such as UNIX diff to identify the differences may produce unsatisfactory results. In this paper, we exploit a tree as the internal representation of a program, obtain the mapping between two trees and display the program differences visually based on the mapping and pretty-printing technique so that the structural differences can be identified immediately.

  • A High Performance and Low Bandwidth Multi-Standard Motion Compensation Design for HD Video Decoder

    Xianmin CHEN  Peilin LIU  Dajiang ZHOU  Jiayi ZHU  Xingguang PAN  Satoshi GOTO  

     
    PAPER

      Vol:
    E93-C No:3
      Page(s):
    253-260

    Motion compensation is widely used in many video coding standards. Due to its bandwidth requirement and complexity, motion compensation is one of the most challenging parts in the design of high definition video decoder. In this paper, we propose a high performance and low bandwidth motion compensation design, which supports H.264/AVC, MPEG-1/2 and Chinese AVS standards. We introduce a 2-Dimensional cache that can greatly reduce the external bandwidth requirement. Similarities among the 3 standards are also explored to reduce hardware cost. We also propose a block-pipelining strategy to conceal the long latency of external memory access. Experimental results show that our motion compensation design can reduce the bandwidth by 74% in average and it can real-time decode 1920x1088@30 fps video stream at 80 MHz.

  • A Low-Power MPEG-4 Codec LSI for Mobile Video Application

    Peilin LIU  Li JIANG  Hiroshi NAKAYAMA  Toshiyuki YOSHITAKE  Hiroshi KOMAZAKI  Yasuhiro WATANABE  Hisakatsu ARAKI  Kiyonori MORIOKA  Shinhaeng LEE  Hajime KUBOSAWA  Yukio OTOBE  

     
    PAPER-Design Methods and Implementation

      Vol:
    E86-C No:4
      Page(s):
    652-660

    We have developed a low-power, high-performance MPEG-4 codec LSI for mobile video applications. This codec LSI is capable of up to CIF 30-fps encoding, making it suitable for various visual applications. The measured power consumption of the codec core was 9 mW for QCIF 15-fps codec operation and 38 mW for CIF 30-fps encoding. To provide an error-robust MPEG-4 codec, we implemented an error-resilience function in the LSI. We describe the techniques that have enabled low power consumption and high performance and discuss our test results.

  • Node-Based Genetic Algorithm for Communication Spanning Tree Problem

    Lin LIN  Mitsuo GEN  

     
    PAPER

      Vol:
    E89-B No:4
      Page(s):
    1091-1098

    Genetic Algorithm (GA) and other Evolutionary Algorithms (EAs) have been successfully applied to solve constrained minimum spanning tree (MST) problems of the communication network design and also have been used extensively in a wide variety of communication network design problems. Choosing an appropriate representation of candidate solutions to the problem is the essential issue for applying GAs to solve real world network design problems, since the encoding and the interaction of the encoding with the crossover and mutation operators have strongly influence on the success of GAs. In this paper, we investigate a new encoding crossover and mutation operators on the performance of GAs to design of minimum spanning tree problem. Based on the performance analysis of these encoding methods in GAs, we improve predecessor-based encoding, in which initialization depends on an underlying random spanning-tree algorithm. The proposed crossover and mutation operators offer locality, heritability, and computational efficiency. We compare with the approach to others that encode candidate spanning trees via the Pr?fer number-based encoding, edge set-based encoding, and demonstrate better results on larger instances for the communication spanning tree design problems.

  • Constant Bit-Rate Multi-Stage Rate Control for Rate-Distortion Optimized H.264/AVC Encoders

    Shuijiong WU  Peilin LIU  Yiqing HUANG  Qin LIU  Takeshi IKENAGA  

     
    PAPER

      Vol:
    E93-D No:7
      Page(s):
    1716-1726

    H.264/AVC encoder employs rate control to adaptively adjust quantization parameter (QP) to enable coded video to be transmitted over a constant bit-rate (CBR) channel. In this topic, bit allocation is crucial since it is directly related with actual bit generation and the coding quality. Meanwhile, the rate-distortion-optimization (RDO) based mode-decision technique also affects performance a lot for the strong relation among mode, bits, and quality. This paper presents a multi-stage rate control scheme for R-D optimized H.264/AVC encoders under CBR video transmission. To enhance the precision of the complexity estimation and bit allocation, a frequency-domain parameter named mean-absolute-transform-difference (MATD) is adopted to represent frame and macroblock (MB) residual complexity. Second, the MATD ratio is utilized to enhance the accuracy of frame layer bit prediction. Then, by considering the bit usage status of whole sequence, a measurement combining forward and backward bit analysis is proposed to adjust the Lagrange multiplier λMODE on frame layer to optimize the mode decision for all MBs within the current frame. On the next stage, bits are allocated on MB layer by proposed remaining complexity analysis. Computed QP is further adjusted according to predicted MB texture bits. Simulation results show the PSNR improvement is up to 1.13 dB by using our algorithm, and the stress of output buffer control is also largely released compared with the recommended rate control in H.264/AVC reference software JM13.2.

1-20hit(25hit)