Wei ZHANG Jun SUN Xinbing WANG
This paper addresses the problem of maximizing the protocol capacity of 802.11e networks, under the assumption that each access category (AC) has the same packet length. We prove that the maximal protocol capacity can be achieved at an optimal operating point with the medium idle probability of , where Tc* is the duration of collision time in terms of slot unit. Our results indicate that the optimal operating point is independent of the number of stations and throughput ratio among ACs, which means the proposed analytical results still hold even when throughput ratio and station number are time-varying. Further, we show that the maximal protocol capacity can be achieved in saturated cases by properly choosing the protocol parameters. We present a parameter configuration algorithm to achieve both efficient channel utilization and proportional fairness in IEEE 802.11e EDCA networks. Extensive simulation and analytical results are presented to verify the proposed ideas.
Meng SUN Hugo VAN HAMME Yimin WANG Xiongwei ZHANG
Unsupervised spoken unit discovery or zero-source speech recognition is an emerging research topic which is important for spoken document analysis of languages or dialects with little human annotation. In this paper, we extend our earlier joint training framework for unsupervised learning of discrete density HMM to continuous density HMM (CDHMM) and apply it to spoken unit discovery. In the proposed recipe, we first cluster a group of Gaussians which then act as initializations to the joint training framework of nonnegative matrix factorization and semi-continuous density HMM (SCDHMM). In SCDHMM, all the hidden states share the same group of Gaussians but with different mixture weights. A CDHMM is subsequently constructed by tying the top-N activated Gaussians to each hidden state. Baum-Welch training is finally conducted to update the parameters of the Gaussians, mixture weights and HMM transition probabilities. Experiments were conducted on word discovery from TIDIGITS and phone discovery from TIMIT. For TIDIGITS, units were modeled by 10 states which turn out to be strongly related to words; while for TIMIT, units were modeled by 3 states which are likely to be phonemes.
This paper evaluates an on-line incremental speaker adaptation method for co-channel conversation including multiple speakers with the assumption that the speaker is unknown and changes frequently. After performing the speaker clustering treatment based on the Vector Quantization (VQ) distortion for every utterance, acoustic models for each cluster are adapted by Maximum Likelihood Linear Regression (MLLR) or Maximum A Posteriori probability (MAP). The performance of continuous speech recognition could be improved. In this paper, to prove the efficiency of the speaker clustering method for improving the performance of continuous speech recognition, the continuous speech recognition experiments with supervised and unsupervised cluster adaptation were conducted, respectively. Finally, evaluation experiments based on other prepared test data were performed on continuous syllable recognition and large vocabulary continuous speech recognition (LVCSR). The efficiency of the speaker adaptation and clustering methods presented in this paper was supported strongly by the experimental results.
Wei ZHANG Jun SUN Jing LIU Haibin ZHANG
This letter presents a clear and more accurate analytical model to evaluate the IEEE 802.11e enhanced distributed channel access (EDCA) protocol. The proposed model distinguishes internal collision from external collision. It also differentiates the two cases when the backoff counter decreases, i.e. an arbitration interframe space (AIFS) period after a busy duration and a time slot after the AIFS period. The analytical model is validated through simulation.
Bo LIU Yi-Jun MAN Wei ZHANG Yan-Sheng MA
As technology moves to 600-1000 Gb/sq-in areal densities and deep sub-10 nm head-disk spacing, it is of crucial importance to prevent both the conventionally defined thermal decay and the tribologically induced decay of recorded magnetic signal. This paper reports a novel method for recording and visualizing the signature of the potential tribological decay. The details of the methodology, its working principles, and typical results obtained are presented in this work. The method is based on the introduction of a type of visualizing disks which use a layer of magneto-optical material with low Curie temperature to replace the magnetic layer used in the conventional magnetic media. The method and corresponding setup were used successfully in the visualization of potential decay caused by slider-particle-disk contact, slider-disk contact during track seeking operations, and slider-disk impact during loading and unloading operations.
As head-disk spacing is reduced, the effects caused by inter-molecular level interactions between head-slider and disk media are becoming a severe stability concern of head-slider's positioning in both flying height and track following directions. Therefore, there is a need to explore simple but effective methods for characterizing two dimensional (2D) stability. Ideally methods should be easy to implement in both the laboratory and in the quality control of disk drive and component manufacturing. A reading process based in-situ method is explored in this work. The method is simple and can effectively reveal the 2D stability of the head-slider in both laboratory and manufacturing environments. The results obtained also suggest that the observable sway mode vibration of the suspension can be excited earlier than the air-bearing vibration mode, when the flying height is reduced.
Peng FAN Xiyao HUA Yi LIN Bo YANG Jianwei ZHANG Wenyi GE Dongyue GUO
In this work, we propose a new automatic speech recognition (ASR) system based on feature learning and an end-to-end training procedure for air traffic control (ATC) systems. The proposed model integrates the feature learning block, recurrent neural network (RNN), and connectionist temporal classification loss to build an end-to-end ASR model. Facing the complex environments of ATC speech, instead of the handcrafted features, a learning block is designed to extract informative features from raw waveforms for acoustic modeling. Both the SincNet and 1D convolution blocks are applied to process the raw waveforms, whose outputs are concatenated to the RNN layers for the temporal modeling. Thanks to the ability to learn representations from raw waveforms, the proposed model can be optimized in a complete end-to-end manner, i.e., from waveform to text. Finally, the multilingual issue in the ATC domain is also considered to achieve the ASR task by constructing a combined vocabulary of Chinese characters and English letters. The proposed approach is validated on a multilingual real-world corpus (ATCSpeech), and the experimental results demonstrate that the proposed approach outperforms other baselines, achieving a 6.9% character error rate.
Yanyan CHANG Wei ZHANG Hao WANG Lina SHI Yanyan LIU
This letter introduces a prime-factor Galois field Fourier transform (PF-GFFT) architecture to frequency domain decoding (FDD) of cyclic codes. Firstly, a fast FDD scheme is designed which converts the original single longer Fourier transform to a multi-dimensional smaller transform. Furthermore, a ladder-shift architecture for PF-GFFT is explored to solve the rearrangement problem of input and output data. In this regard, PF-GFFT is considered as a lower order spectral calculation scheme, which has sufficient preponderance in reducing the computational complexity. Simulation results show that PF-GFFT compares favorably with the current general GFFT, simplified-GFFT (S-GFFT), and circular shifts-GFFT (CS-GFFT) algorithms in time-consuming cost, and is nearly an order of magnitude or smaller than them. The superiority is a benefit to improving the decoding speed and has potential application value in decoding cyclic codes with longer code lengths.
This letter presents a novel video and audio PTSs self-adaptive interlace strategy in MPEG-2 transport stream. By adaptive regulating the relative position of audio and video access units in bit-stream according to their PTSs, the proposed strategy provides reliable video and audio synchronization.
Weirong WANG Guohua LIU Zhiwei ZHANG Zhiqun CHENG
This letter proposes a power amplifier (PA) with compact matching network. This structure is a parallel dual radial microstrip line in the output matching network branch. The input impedance expression based on the structure is deduced through theoretical analysis, and the load impedance that satisfies the class EFJ PA is obtained through the impedance expression. Compared with the traditional design method, this design method is simple and novel, and the structure is more compact. In order to further improve efficiency and expand bandwidth, the input matching network adopts a stepped impedance matching method. In order to verify the correctness of the design, a broadband high-efficiency PA was designed using GaN HEMT CGH40010F. The test results show that the drain efficiency is 61%-71% in the frequency band 1.4-3.8GHz, the saturated output power is 40.3-41.8dBm, and the size is 53×47mm2.
Hongda WANG Jianchun XING Juelong LI Qiliang YANG Xuewei ZHANG Deshuai HAN Kai LI
Web Service Business Process Execution Language (BPEL) has become the de facto standard for developing instant service-oriented workflow applications in open environment. The correctness and reliability of BPEL processes have gained increasing concerns. However, the unique features (e.g., dead path elimination (DPE) semantics, parallelism, etc.) of BPEL language have raised enormous problems to it, especially in path feasibility analysis of BPEL processes. Path feasibility analysis of BPEL processes is the basis of BPEL testing, for it relates to the test case generation. Since BPEL processes support both parallelism and DPE semantics, existing techniques can't be directly applied to its path feasibility analysis. To address this problem, we present a novel technique to analyze the path feasibility for BPEL processes. First, to tackle unique features mentioned above, we transform a BPEL process into an intermediary model — BPEL control flow graph, which is proposed to abstract the execution flow of BPEL processes. Second, based on this abstraction, we symbolically encode every path of BPEL processes as some Satisfiability formulas. Finally, we solve these formulas with the help of Satisfiability Modulo Theory (SMT) solvers and the feasible paths of BPEL processes are obtained. We illustrate the applicability and feasibility of our technique through a case study.
Li ZENG Xiongwei ZHANG Liang CHEN Weiwei YANG
Presented is a new measuring and reconstruction framework of Compressed Sensing (CS), aiming at reducing the measurements required to ensure faithful reconstruction. A sparse vector is segmented into sparser vectors. These new ones are then randomly sensed. For recovery, we reconstruct these vectors individually and assemble them to obtain the original signal. We show that the proposed scheme, referred to as SegOMP, yields higher probability of exact recovery in theory. It is finished with much smaller number of measurements to achieve a same reconstruction quality when compared to the canonical greedy algorithms. Extensive experiments verify the validity of the SegOMP and demonstrate its potentials.
Jingwei ZHANG Chang-An ZHAO Xiao MA
In this paper, we compare two generalized cyclotomic binary sequences with length 2p2 in terms of the linear complexity. One classical sequence is defined using the method introduced by Ding and Helleseth, while the other modified sequence is defined in a slightly different manner. We show that the modified sequence has linear complexity of 2p2, which is higher than that of the classical one.
Changyan ZHENG Tieyong CAO Jibin YANG Xiongwei ZHANG Meng SUN
Compared with acoustic microphone (AM) speech, bone-conducted microphone (BCM) speech is much immune to background noise, but suffers from severe loss of information due to the characteristics of the human-body transmission channel. In this letter, a new method for the speaker-dependent BCM speech enhancement is proposed, in which we focus our attention on the spectra restoration of the distorted speech. In order to better infer the missing components, an attention-based bidirectional Long Short-Term Memory (AB-BLSTM) is designed to optimize the use of contextual information to model the relationship between the spectra of BCM speech and its corresponding clean AM speech. Meanwhile, a structural error metric, Structural SIMilarity (SSIM) metric, originated from image processing is proposed to be the loss function, which provides the constraint of the spectro-temporal structures in recovering of the spectra. Experiments demonstrate that compared with approaches based on conventional DNN and mean square error (MSE), the proposed method can better recover the missing phonemes and obtain spectra with spectro-temporal structure more similar to the target one, which leads to great improvement on objective metrics.
Lei WANG Shanmin YANG Jianwei ZHANG Song GU
Human action recognition (HAR) exhibits limited accuracy in video surveillance due to the 2D information captured with monocular cameras. To address the problem, a depth estimation-based human skeleton action recognition method (SARDE) is proposed in this study, with the aim of transforming 2D human action data into 3D format to dig hidden action clues in the 2D data. SARDE comprises two tasks, i.e., human skeleton action recognition and monocular depth estimation. The two tasks are integrated in a multi-task manner in end-to-end training to comprehensively utilize the correlation between action recognition and depth estimation by sharing parameters to learn the depth features effectively for human action recognition. In this study, graph-structured networks with inception blocks and skip connections are investigated for depth estimation. The experimental results verify the effectiveness and superiority of the proposed method in skeleton action recognition that the method reaches state-of-the-art on the datasets.
A flexible and robust rate-distortion optimization algorithm is presented to select macroblock coding mode for H.264/AVC transmission over wireless channels subject to burst errors. A two-state Markov model is used to describe the burst errors on the packet level. With the feedback information from the receiver and the estimation of the channel errors, the algorithm analyzes the distortion of the reconstructed macroblock at the decoder due to the channel errors and spatial and temporal error propagation. The optimal coding mode is chosen for each macroblock in rate-distortion (R-D)-based framework. Experimental results using the H.264/AVC test model show a significant performance of resilience to the burst errors.
Wenhua SHI Xiongwei ZHANG Xia ZOU Meng SUN Wei HAN Li LI Gang MIN
A monaural speech enhancement method combining deep neural network (DNN) with low rank analysis and speech present probability is proposed in this letter. Low rank and sparse analysis is first applied on the noisy speech spectrogram to get the approximate low rank representation of noise. Then a joint feature training strategy for DNN based speech enhancement is presented, which helps the DNN better predict the target speech. To reduce the residual noise in highly overlapping regions and high frequency domain, speech present probability (SPP) weighted post-processing is employed to further improve the quality of the speech enhanced by trained DNN model. Compared with the supervised non-negative matrix factorization (NMF) and the conventional DNN method, the proposed method obtains improved speech enhancement performance under stationary and non-stationary conditions.
Yonggang HU Xiongwei ZHANG Xia ZOU Meng SUN Gang MIN Yinan LI
Nonnegative matrix factorization (NMF) is one of the most popular tools for speech enhancement. In this letter, we present an improved semi-supervised NMF (ISNMF)-based speech enhancement algorithm combining techniques of noise estimation and Incremental NMF (INMF). In this approach, fixed speech bases are obtained from training samples offline in advance while noise bases are trained on-the-fly whenever new noisy frame arrives. The INMF algorithm is adopted for noise bases learning because it can overcome the difficulties that conventional NMF confronts in online processing. The proposed algorithm is real-time capable in the sense that it processes the time frames of the noisy speech one by one and the computational complexity is feasible. Four different objective evaluation measures at various signal-to-noise ratio (SNR) levels demonstrate the superiority of the proposed method over traditional semi-supervised NMF (SNMF) and well-known robust principal component analysis (RPCA) algorithm.
Sheng-Bin HU Zhi-Min YUAN Wei ZHANG Bo LIU Lei WAN Rui XIAN
The interaction between slider, lubricant and disk surface is becoming the most crucial robustness concern of advanced data storage systems. This paper reports comparative studies among various techniques for the measurement of head-disk spacing. It is noticed that the triple harmonic method gives a reading much closer to the reading of the head-disk spacing obtained optically at on-track center case, comparing with the PW50 method. Specially prepared disks with different carbon overcoat thickness (6.5 nm, 11 nm, 16 nm and 22 nm) were also used to study the reliability and repeatability of the triple harmonic method.
Wei HAN Xiongwei ZHANG Gang MIN Meng SUN
In this letter, a novel perceptually motivated single channel speech enhancement approach based on Deep Neural Network (DNN) is presented. Taking into account the good masking properties of the human auditory system, a new DNN architecture is proposed to reduce the perceptual effect of the residual noise. This new DNN architecture is directly trained to learn a gain function which is used to estimate the power spectrum of clean speech and shape the spectrum of the residual noise at the same time. Experimental results demonstrate that the proposed perceptually motivated speech enhancement approach could achieve better objective speech quality when tested with TIMIT sentences corrupted by various types of noise, no matter whether the noise conditions are included in the training set or not.