The search functionality is under construction.

Author Search Result

[Author] Wei ZHANG(46hit)

1-20hit(46hit)

  • Achieving Weighted Fairness and Efficient Channel Utilization in IEEE 802.11e WLANs

    Wei ZHANG  Jun SUN  Xinbing WANG  

     
    LETTER-Wireless Communication Technologies

      Vol:
    E91-B No:2
      Page(s):
    653-657

    This paper addresses the problem of maximizing the protocol capacity of 802.11e networks, under the assumption that each access category (AC) has the same packet length. We prove that the maximal protocol capacity can be achieved at an optimal operating point with the medium idle probability of , where Tc* is the duration of collision time in terms of slot unit. Our results indicate that the optimal operating point is independent of the number of stations and throughput ratio among ACs, which means the proposed analytical results still hold even when throughput ratio and station number are time-varying. Further, we show that the maximal protocol capacity can be achieved in saturated cases by properly choosing the protocol parameters. We present a parameter configuration algorithm to achieve both efficient channel utilization and proportional fairness in IEEE 802.11e EDCA networks. Extensive simulation and analytical results are presented to verify the proposed ideas.

  • Unsupervised Learning of Continuous Density HMM for Variable-Length Spoken Unit Discovery

    Meng SUN  Hugo VAN HAMME  Yimin WANG  Xiongwei ZHANG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2015/10/21
      Vol:
    E99-D No:1
      Page(s):
    296-299

    Unsupervised spoken unit discovery or zero-source speech recognition is an emerging research topic which is important for spoken document analysis of languages or dialects with little human annotation. In this paper, we extend our earlier joint training framework for unsupervised learning of discrete density HMM to continuous density HMM (CDHMM) and apply it to spoken unit discovery. In the proposed recipe, we first cluster a group of Gaussians which then act as initializations to the joint training framework of nonnegative matrix factorization and semi-continuous density HMM (SCDHMM). In SCDHMM, all the hidden states share the same group of Gaussians but with different mixture weights. A CDHMM is subsequently constructed by tying the top-N activated Gaussians to each hidden state. Baum-Welch training is finally conducted to update the parameters of the Gaussians, mixture weights and HMM transition probabilities. Experiments were conducted on word discovery from TIDIGITS and phone discovery from TIMIT. For TIDIGITS, units were modeled by 10 states which turn out to be strongly related to words; while for TIMIT, units were modeled by 3 states which are likely to be phonemes.

  • Continuous Speech Recognition Using an On-Line Speaker Adaptation Method Based on Automatic Speaker Clustering

    Wei ZHANG  Seiichi NAKAGAWA  

     
    PAPER-Speech and Speaker Recognition

      Vol:
    E86-D No:3
      Page(s):
    464-473

    This paper evaluates an on-line incremental speaker adaptation method for co-channel conversation including multiple speakers with the assumption that the speaker is unknown and changes frequently. After performing the speaker clustering treatment based on the Vector Quantization (VQ) distortion for every utterance, acoustic models for each cluster are adapted by Maximum Likelihood Linear Regression (MLLR) or Maximum A Posteriori probability (MAP). The performance of continuous speech recognition could be improved. In this paper, to prove the efficiency of the speaker clustering method for improving the performance of continuous speech recognition, the continuous speech recognition experiments with supervised and unsupervised cluster adaptation were conducted, respectively. Finally, evaluation experiments based on other prepared test data were performed on continuous syllable recognition and large vocabulary continuous speech recognition (LVCSR). The efficiency of the speaker adaptation and clustering methods presented in this paper was supported strongly by the experimental results.

  • Performance Analysis of IEEE 802.11e EDCA

    Wei ZHANG  Jun SUN  Jing LIU  Haibin ZHANG  

     
    LETTER-Terrestrial Radio Communications

      Vol:
    E90-B No:1
      Page(s):
    180-183

    This letter presents a clear and more accurate analytical model to evaluate the IEEE 802.11e enhanced distributed channel access (EDCA) protocol. The proposed model distinguishes internal collision from external collision. It also differentiates the two cases when the backoff counter decreases, i.e. an arbitration interframe space (AIFS) period after a busy duration and a time slot after the AIFS period. The analytical model is validated through simulation.

  • Visualization of Tribologically Induced Energy Disturbance to the Stability of High Density Magnetic Recording

    Bo LIU  Yi-Jun MAN  Wei ZHANG  Yan-Sheng MA  

     
    PAPER

      Vol:
    E85-C No:10
      Page(s):
    1795-1799

    As technology moves to 600-1000 Gb/sq-in areal densities and deep sub-10 nm head-disk spacing, it is of crucial importance to prevent both the conventionally defined thermal decay and the tribologically induced decay of recorded magnetic signal. This paper reports a novel method for recording and visualizing the signature of the potential tribological decay. The details of the methodology, its working principles, and typical results obtained are presented in this work. The method is based on the introduction of a type of visualizing disks which use a layer of magneto-optical material with low Curie temperature to replace the magnetic layer used in the conventional magnetic media. The method and corresponding setup were used successfully in the visualization of potential decay caused by slider-particle-disk contact, slider-disk contact during track seeking operations, and slider-disk impact during loading and unloading operations.

  • In-Situ Technology for Evaluating the Stability of a Slider in 2 Dimensions

    Wei ZHANG  Bo LIU  

     
    PAPER

      Vol:
    E86-C No:9
      Page(s):
    1874-1878

    As head-disk spacing is reduced, the effects caused by inter-molecular level interactions between head-slider and disk media are becoming a severe stability concern of head-slider's positioning in both flying height and track following directions. Therefore, there is a need to explore simple but effective methods for characterizing two dimensional (2D) stability. Ideally methods should be easy to implement in both the laboratory and in the quality control of disk drive and component manufacturing. A reading process based in-situ method is explored in this work. The method is simple and can effectively reveal the 2D stability of the head-slider in both laboratory and manufacturing environments. The results obtained also suggest that the observable sway mode vibration of the suspension can be excited earlier than the air-bearing vibration mode, when the flying height is reduced.

  • Speech Recognition for Air Traffic Control via Feature Learning and End-to-End Training

    Peng FAN  Xiyao HUA  Yi LIN  Bo YANG  Jianwei ZHANG  Wenyi GE  Dongyue GUO  

     
    PAPER-Speech and Hearing

      Pubricized:
    2023/01/23
      Vol:
    E106-D No:4
      Page(s):
    538-544

    In this work, we propose a new automatic speech recognition (ASR) system based on feature learning and an end-to-end training procedure for air traffic control (ATC) systems. The proposed model integrates the feature learning block, recurrent neural network (RNN), and connectionist temporal classification loss to build an end-to-end ASR model. Facing the complex environments of ATC speech, instead of the handcrafted features, a learning block is designed to extract informative features from raw waveforms for acoustic modeling. Both the SincNet and 1D convolution blocks are applied to process the raw waveforms, whose outputs are concatenated to the RNN layers for the temporal modeling. Thanks to the ability to learn representations from raw waveforms, the proposed model can be optimized in a complete end-to-end manner, i.e., from waveform to text. Finally, the multilingual issue in the ATC domain is also considered to achieve the ASR task by constructing a combined vocabulary of Chinese characters and English letters. The proposed approach is validated on a multilingual real-world corpus (ATCSpeech), and the experimental results demonstrate that the proposed approach outperforms other baselines, achieving a 6.9% character error rate.

  • Prime-Factor GFFT Architecture for Fast Frequency Domain Decoding of Cyclic Codes

    Yanyan CHANG  Wei ZHANG  Hao WANG  Lina SHI  Yanyan LIU  

     
    LETTER-Coding Theory

      Pubricized:
    2023/07/10
      Vol:
    E107-A No:1
      Page(s):
    174-177

    This letter introduces a prime-factor Galois field Fourier transform (PF-GFFT) architecture to frequency domain decoding (FDD) of cyclic codes. Firstly, a fast FDD scheme is designed which converts the original single longer Fourier transform to a multi-dimensional smaller transform. Furthermore, a ladder-shift architecture for PF-GFFT is explored to solve the rearrangement problem of input and output data. In this regard, PF-GFFT is considered as a lower order spectral calculation scheme, which has sufficient preponderance in reducing the computational complexity. Simulation results show that PF-GFFT compares favorably with the current general GFFT, simplified-GFFT (S-GFFT), and circular shifts-GFFT (CS-GFFT) algorithms in time-consuming cost, and is nearly an order of magnitude or smaller than them. The superiority is a benefit to improving the decoding speed and has potential application value in decoding cyclic codes with longer code lengths.

  • Interlace Strategy of Video and Audio PTSs in MPEG-2 TS

    Wei ZHANG  Yuanhua ZHOU  

     
    LETTER-Multimedia Systems for Communications" Multimedia Systems for Communications

      Vol:
    E87-B No:11
      Page(s):
    3406-3407

    This letter presents a novel video and audio PTSs self-adaptive interlace strategy in MPEG-2 transport stream. By adaptive regulating the relative position of audio and video access units in bit-stream according to their PTSs, the proposed strategy provides reliable video and audio synchronization.

  • Broadband High Efficiency Power Amplifier with Compact Matching Network

    Weirong WANG  Guohua LIU  Zhiwei ZHANG  Zhiqun CHENG  

     
    BRIEF PAPER-Electronic Circuits

      Pubricized:
    2021/03/10
      Vol:
    E104-C No:9
      Page(s):
    467-470

    This letter proposes a power amplifier (PA) with compact matching network. This structure is a parallel dual radial microstrip line in the output matching network branch. The input impedance expression based on the structure is deduced through theoretical analysis, and the load impedance that satisfies the class EFJ PA is obtained through the impedance expression. Compared with the traditional design method, this design method is simple and novel, and the structure is more compact. In order to further improve efficiency and expand bandwidth, the input matching network adopts a stepped impedance matching method. In order to verify the correctness of the design, a broadband high-efficiency PA was designed using GaN HEMT CGH40010F. The test results show that the drain efficiency is 61%-71% in the frequency band 1.4-3.8GHz, the saturated output power is 40.3-41.8dBm, and the size is 53×47mm2.

  • Path Feasibility Analysis of BPEL Processes under Dead Path Elimination Semantics

    Hongda WANG  Jianchun XING  Juelong LI  Qiliang YANG  Xuewei ZHANG  Deshuai HAN  Kai LI  

     
    PAPER-Software Engineering

      Pubricized:
    2015/11/27
      Vol:
    E99-D No:3
      Page(s):
    641-649

    Web Service Business Process Execution Language (BPEL) has become the de facto standard for developing instant service-oriented workflow applications in open environment. The correctness and reliability of BPEL processes have gained increasing concerns. However, the unique features (e.g., dead path elimination (DPE) semantics, parallelism, etc.) of BPEL language have raised enormous problems to it, especially in path feasibility analysis of BPEL processes. Path feasibility analysis of BPEL processes is the basis of BPEL testing, for it relates to the test case generation. Since BPEL processes support both parallelism and DPE semantics, existing techniques can't be directly applied to its path feasibility analysis. To address this problem, we present a novel technique to analyze the path feasibility for BPEL processes. First, to tackle unique features mentioned above, we transform a BPEL process into an intermediary model — BPEL control flow graph, which is proposed to abstract the execution flow of BPEL processes. Second, based on this abstraction, we symbolically encode every path of BPEL processes as some Satisfiability formulas. Finally, we solve these formulas with the help of Satisfiability Modulo Theory (SMT) solvers and the feasible paths of BPEL processes are obtained. We illustrate the applicability and feasibility of our technique through a case study.

  • SegOMP: Sparse Recovery with Fewer Measurements

    Li ZENG  Xiongwei ZHANG  Liang CHEN  Weiwei YANG  

     
    LETTER-Digital Signal Processing

      Vol:
    E97-A No:3
      Page(s):
    862-864

    Presented is a new measuring and reconstruction framework of Compressed Sensing (CS), aiming at reducing the measurements required to ensure faithful reconstruction. A sparse vector is segmented into sparser vectors. These new ones are then randomly sensed. For recovery, we reconstruct these vectors individually and assemble them to obtain the original signal. We show that the proposed scheme, referred to as SegOMP, yields higher probability of exact recovery in theory. It is finished with much smaller number of measurements to achieve a same reconstruction quality when compared to the canonical greedy algorithms. Extensive experiments verify the validity of the SegOMP and demonstrate its potentials.

  • On the Linear Complexity of Generalized Cyclotomic Binary Sequences with Length 2p2

    Jingwei ZHANG  Chang-An ZHAO  Xiao MA  

     
    PAPER-Cryptography and Information Security

      Vol:
    E93-A No:1
      Page(s):
    302-308

    In this paper, we compare two generalized cyclotomic binary sequences with length 2p2 in terms of the linear complexity. One classical sequence is defined using the method introduced by Ding and Helleseth, while the other modified sequence is defined in a slightly different manner. We show that the modified sequence has linear complexity of 2p2, which is higher than that of the classical one.

  • Spectra Restoration of Bone-Conducted Speech via Attention-Based Contextual Information and Spectro-Temporal Structure Constraint Open Access

    Changyan ZHENG  Tieyong CAO  Jibin YANG  Xiongwei ZHANG  Meng SUN  

     
    LETTER-Digital Signal Processing

      Vol:
    E102-A No:12
      Page(s):
    2001-2007

    Compared with acoustic microphone (AM) speech, bone-conducted microphone (BCM) speech is much immune to background noise, but suffers from severe loss of information due to the characteristics of the human-body transmission channel. In this letter, a new method for the speaker-dependent BCM speech enhancement is proposed, in which we focus our attention on the spectra restoration of the distorted speech. In order to better infer the missing components, an attention-based bidirectional Long Short-Term Memory (AB-BLSTM) is designed to optimize the use of contextual information to model the relationship between the spectra of BCM speech and its corresponding clean AM speech. Meanwhile, a structural error metric, Structural SIMilarity (SSIM) metric, originated from image processing is proposed to be the loss function, which provides the constraint of the spectro-temporal structures in recovering of the spectra. Experiments demonstrate that compared with approaches based on conventional DNN and mean square error (MSE), the proposed method can better recover the missing phonemes and obtain spectra with spectro-temporal structure more similar to the target one, which leads to great improvement on objective metrics.

  • 2D Human Skeleton Action Recognition Based on Depth Estimation Open Access

    Lei WANG  Shanmin YANG  Jianwei ZHANG  Song GU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2024/02/27
      Vol:
    E107-D No:7
      Page(s):
    869-877

    Human action recognition (HAR) exhibits limited accuracy in video surveillance due to the 2D information captured with monocular cameras. To address the problem, a depth estimation-based human skeleton action recognition method (SARDE) is proposed in this study, with the aim of transforming 2D human action data into 3D format to dig hidden action clues in the 2D data. SARDE comprises two tasks, i.e., human skeleton action recognition and monocular depth estimation. The two tasks are integrated in a multi-task manner in end-to-end training to comprehensively utilize the correlation between action recognition and depth estimation by sharing parameters to learn the depth features effectively for human action recognition. In this study, graph-structured networks with inception blocks and skip connections are investigated for depth estimation. The experimental results verify the effectiveness and superiority of the proposed method in skeleton action recognition that the method reaches state-of-the-art on the datasets.

  • Rate Distortion Optimized Coding Mode Selection for H.264/AVC in Wireless Environments

    Wei ZHANG  Yuanhua ZHOU  

     
    LETTER-Multimedia Systems

      Vol:
    E87-B No:7
      Page(s):
    2057-2060

    A flexible and robust rate-distortion optimization algorithm is presented to select macroblock coding mode for H.264/AVC transmission over wireless channels subject to burst errors. A two-state Markov model is used to describe the burst errors on the packet level. With the feedback information from the receiver and the estimation of the channel errors, the algorithm analyzes the distortion of the reconstructed macroblock at the decoder due to the channel errors and spatial and temporal error propagation. The optimal coding mode is chosen for each macroblock in rate-distortion (R-D)-based framework. Experimental results using the H.264/AVC test model show a significant performance of resilience to the burst errors.

  • Deep Neural Network Based Monaural Speech Enhancement with Low-Rank Analysis and Speech Present Probability

    Wenhua SHI  Xiongwei ZHANG  Xia ZOU  Meng SUN  Wei HAN  Li LI  Gang MIN  

     
    LETTER-Noise and Vibration

      Vol:
    E101-A No:3
      Page(s):
    585-589

    A monaural speech enhancement method combining deep neural network (DNN) with low rank analysis and speech present probability is proposed in this letter. Low rank and sparse analysis is first applied on the noisy speech spectrogram to get the approximate low rank representation of noise. Then a joint feature training strategy for DNN based speech enhancement is presented, which helps the DNN better predict the target speech. To reduce the residual noise in highly overlapping regions and high frequency domain, speech present probability (SPP) weighted post-processing is employed to further improve the quality of the speech enhanced by trained DNN model. Compared with the supervised non-negative matrix factorization (NMF) and the conventional DNN method, the proposed method obtains improved speech enhancement performance under stationary and non-stationary conditions.

  • Improved Semi-Supervised NMF Based Real-Time Capable Speech Enhancement

    Yonggang HU  Xiongwei ZHANG  Xia ZOU  Meng SUN  Gang MIN  Yinan LI  

     
    LETTER-Speech and Hearing

      Vol:
    E99-A No:1
      Page(s):
    402-406

    Nonnegative matrix factorization (NMF) is one of the most popular tools for speech enhancement. In this letter, we present an improved semi-supervised NMF (ISNMF)-based speech enhancement algorithm combining techniques of noise estimation and Incremental NMF (INMF). In this approach, fixed speech bases are obtained from training samples offline in advance while noise bases are trained on-the-fly whenever new noisy frame arrives. The INMF algorithm is adopted for noise bases learning because it can overcome the difficulties that conventional NMF confronts in online processing. The proposed algorithm is real-time capable in the sense that it processes the time frames of the noisy speech one by one and the computational complexity is feasible. Four different objective evaluation measures at various signal-to-noise ratio (SNR) levels demonstrate the superiority of the proposed method over traditional semi-supervised NMF (SNMF) and well-known robust principal component analysis (RPCA) algorithm.

  • Comparative Study of Head-Disk Spacing Measurement Techniques between Optical Method and Various In-Situ Methods

    Sheng-Bin HU  Zhi-Min YUAN  Wei ZHANG  Bo LIU  Lei WAN  Rui XIAN  

     
    PAPER

      Vol:
    E85-C No:10
      Page(s):
    1784-1788

    The interaction between slider, lubricant and disk surface is becoming the most crucial robustness concern of advanced data storage systems. This paper reports comparative studies among various techniques for the measurement of head-disk spacing. It is noticed that the triple harmonic method gives a reading much closer to the reading of the head-disk spacing obtained optically at on-track center case, comparing with the PW50 method. Specially prepared disks with different carbon overcoat thickness (6.5 nm, 11 nm, 16 nm and 22 nm) were also used to study the reliability and repeatability of the triple harmonic method.

  • A Perceptually Motivated Approach for Speech Enhancement Based on Deep Neural Network

    Wei HAN  Xiongwei ZHANG  Gang MIN  Meng SUN  

     
    LETTER-Speech and Hearing

      Vol:
    E99-A No:4
      Page(s):
    835-838

    In this letter, a novel perceptually motivated single channel speech enhancement approach based on Deep Neural Network (DNN) is presented. Taking into account the good masking properties of the human auditory system, a new DNN architecture is proposed to reduce the perceptual effect of the residual noise. This new DNN architecture is directly trained to learn a gain function which is used to estimate the power spectrum of clean speech and shape the spectrum of the residual noise at the same time. Experimental results demonstrate that the proposed perceptually motivated speech enhancement approach could achieve better objective speech quality when tested with TIMIT sentences corrupted by various types of noise, no matter whether the noise conditions are included in the training set or not.

1-20hit(46hit)