Average coding rate of a multi-shot Tunstall code, which is a variation of variable-to-fixed length (VF) lossless source codes, for stationary memoryless sources is investigated. A multi-shot VF code parses a given source sequence to variable-length blocks and encodes them to fixed-length codewords. If we consider the situation that the parsing count is fixed, overall multi-shot VF code can be treated as a one-shot VF code. For this setting of Tunstall code, the compression performance is evaluated using two criterions. The first one is the average coding rate which is defined as the codeword length divided by the average block length. The second one is the expectation of the pointwise coding rate. It is proved that both of the above average coding rate converge to the entropy of a stationary memoryless source under the assumption that the geometric mean of the leaf counts of the multi-shot Tunstall parsing trees goes to infinity.
Ken-ichi IWATA Mitsuharu ARIMURA
A generalization of compression via substring enumeration (CSE) for k-th order Markov sources with a finite alphabet is proposed, and an upper bound of the codeword length of the proposed method is presented. We analyze the worst case maximum redundancy of CSE for k-th order Markov sources with a finite alphabet. The compression ratio of the proposed method asymptotically converges to the optimal one for k-th order Markov sources with a finite alphabet if the length n of a source string tends to infinity.
Sound source localization is an essential technique in many applications, e.g., speech enhancement, speech capturing and human-robot interaction. However, the performance of traditional methods degrades in noisy or reverberant environments, and it is sensitive to the spatial location of sound source. To solve these problems, we propose a sound source localization framework based on bi-direction interaural matching filter (IMF) and decision weighting fusion. Firstly, bi-directional IMF is put forward to describe the difference between binaural signals in forward and backward directions, respectively. Then, a hybrid interaural matching filter (HIMF), which is obtained by the bi-direction IMF through decision weighting fusion, is used to alleviate the affection of sound locations on sound source localization. Finally, the cosine similarity between the HIMFs computed from the binaural audio and transfer functions is employed to measure the probability of the source location. Constructing the similarity for all the spatial directions as a matrix, we can determine the source location by Maximum A Posteriori (MAP) estimation. Compared with several state-of-the-art methods, experimental results indicate that HIMF is more robust in noisy environments.
Shunji FUNASAKA Koji NAKANO Yasuaki ITO
The main contribution of this paper is to present a work-optimal parallel algorithm for LZW decompression and to implement it in a CUDA-enabled GPU. Since sequential LZW decompression creates a dictionary table by reading codes in a compressed file one by one, it is not easy to parallelize it. We first present a work-optimal parallel LZW decompression algorithm on the CREW-PRAM (Concurrent-Read Exclusive-Write Parallel Random Access Machine), which is a standard theoretical parallel computing model with a shared memory. We then go on to present an efficient implementation of this parallel algorithm on a GPU. The experimental results show that our GPU implementation performs LZW decompression in 1.15 milliseconds for a gray scale TIFF image with 4096×3072 pixels stored in the global memory of GeForce GTX 980. On the other hand, sequential LZW decompression for the same image stored in the main memory of Intel Core i7 CPU takes 50.1 milliseconds. Thus, our parallel LZW decompression on the global memory of the GPU is 43.6 times faster than a sequential LZW decompression on the main memory of the CPU for this image. To show the applicability of our GPU implementation for LZW decompression, we evaluated the SSD-GPU data loading time for three scenarios. The experimental results show that the scenario using our LZW decompression on the GPU is faster than the others.
Petri Net (PN) is a frequently-used model for deadlock detection. Among various detection methods on PN, reachability analysis is the most accurate one since it never produces any false positive or false negative. Although suffering from the well-known state space explosion problem, reachability analysis is appropriate for small- and medium-scale programs. In order to mitigate the explosion problem several kinds of techniques have been proposed aiming at accelerating the reachability analysis, such as net reduction and abstraction. However, these techniques are for general PN and do not take the particularity of application into consideration, so their optimization potential is not adequately developed. In this paper, the feature of mutual exclusion-based program is considered, therefore several strategies are proposed to accelerate the reachability analysis. Among these strategies a customized net reduction rule aims at reducing the scale of PN, two marking compression methods and two pruning methods can reduce the volume of reachability graph. Reachability analysis on PN can only report one deadlock on each path. However, the reported deadlock may be a false alarm in which situation real deadlocks may be hidden. To improve the detection efficiency, we proposed a deadlock recovery algorithm so that more deadlocks can be detected in a shorter time. To validate the efficiency of these methods, a prototype is implemented and applied to SPLASH2 benchmarks. The experimental results show that these methods accelerate the reachability analysis for mutual exclusion-based deadlock detection significantly.
Tian CHEN Dandan SHEN Xin YI Huaguo LIANG Xiaoqing WEN Wei WANG
Linear feedback shift register (LFSR) reseeding is an effective method for test data reduction. However, the test patterns generated by LFSR reseeding generally have high toggle rate and thus cause high test power. Therefore, it is feasible to fill X bits in deterministic test cubes with 0 or 1 properly before encoding the seed to reduce toggle rate. However, X-filling will increase the number of specified bits, thus increase the difficulty of seed encoding, what's more, the size of LFSR will increase as well. This paper presents a test frame which takes into consideration both compression ratio and power consumption simultaneously. In the first stage, the proposed reseeding-oriented X-filling proceeds for shift power (shift filling) and capture power (capture filling) reduction. Then, encode the filled test cubes using the proposed Compatible Block Code (CBC). The CBC can X-ize specified bits, namely turning specified bits into X bits, and can resolve the conflict between low-power filling and seed encoding. Experiments performed on ISCAS'89 benchmark circuits show that our scheme attains a compression ratio of 94.1% and reduces capture power by at least 15% and scan-in power by more than 79.5%.
Although many approaches about ideal channels have been proposed in previous researches, few authors considered the situation of nonideal communication links. In this paper, we study the problem of distributed decision fusion over nonideal channels by using the scan statistics. In order to obtain the fusion rule under nonideal channels, we set up the nonideal channels model with the modulation error, noise and signal attenuation. Under this model, we update the fusion rule by using the scan statstics. We firstly consider the fusion rule when sensors are distributed in grid, then derive the expressions of the detection probability and false alarm probability when sensors follow an uniform distribution. Extensive simulations are conducted in order to investigate the performance of our fusion rule and the influence of signal-noise ratio (SNR) on the detection and false alarm probability. These simulations show that the theoretical values of the global detection probability and the global false alarm probability are close to the experimental results, and the fusion rule also has high performance at the high SNR region. But there are some further researches need to do for solving the large computational complexity.
Measurement matrix construction is critically important to signal sampling and reconstruction for compressed sensing. From a practical point of view, deterministic construction of the measurement matrix is better than random construction. In this paper, we propose a novel deterministic method to construct a measurement matrix for compressed sensing, CS-FF (compressed sensing-finite field) algorithm. For this proposed algorithm, the constructed measurement matrix is from the finite field Quasi-cyclic Low Density Parity Check (QC-LDPC) code and thus it has quasi-cyclic structure. Furthermore, we construct three groups of measurement matrices. The first group matrices are the proposed matrix and other matrices including deterministic construction matrices and random construction matrices. The other two group matrices are both constructed by our method. We compare the recovery performance of these matrices. Simulation results demonstrate that the recovery performance of our matrix is superior to that of the other matrices. In addition, simulation results show that the compression ratio is an important parameter to analyse and predict the recovery performance of the proposed measurement matrix. Moreover, these matrices have less storage requirement than that of a random one, and they achieve a better trade-off between complexity and performance. Therefore, from practical perspective, the proposed scheme is hardware friendly and easily implemented, and it is suitable to compressed sensing for its quasi-cyclic structure and good recovery performance.
Kazuhiro KOBAYASHI Tomoki TODA Tomoyasu NAKANO Masataka GOTO Satoshi NAKAMURA
As one of the techniques enabling individual singers to produce the varieties of voice timbre beyond their own physical constraints, a statistical voice timbre control technique based on the perceived age has been developed. In this technique, the perceived age of a singing voice, which is the age of the singer as perceived by the listener, is used as one of the intuitively understandable measures to describe voice characteristics of the singing voice. The use of statistical voice conversion (SVC) with a singer-dependent multiple-regression Gaussian mixture model (MR-GMM), which effectively models the voice timbre variations caused by a change of the perceived age, makes it possible for individual singers to manipulate the perceived ages of their own singing voices while retaining their own singer identities. However, there still remain several issues; e.g., 1) a controllable range of the perceived age is limited; 2) quality of the converted singing voice is significantly degraded compared to that of a natural singing voice; and 3) each singer needs to sing the same phrase set as sung by a reference singer to develop the singer-dependent MR-GMM. To address these issues, we propose the following three methods; 1) a method using gender-dependent modeling to expand the controllable range of the perceived age; 2) a method using direct waveform modification based on spectrum differential to improve quality of the converted singing voice; and 3) a rapid unsupervised adaptation method based on maximum a posteriori (MAP) estimation to easily develop the singer-dependent MR-GMM. The experimental results show that the proposed methods achieve a wider controllable range of the perceived age, a significant quality improvement of the converted singing voice, and the development of the singer-dependnet MR-GMM using only a few arbitrary phrases as adaptation data.
Ramesh KUMAR Abdul AZIZ Inwhee JOE
In this paper, we propose and analyze the opportunistic amplify-and-forward (AF) relaying scheme using antenna selection in conjunction with different adaptive transmission techniques over Rayleigh fading channels. In this scheme, the best antenna of a source and the best relay are selected for communication between the source and destination. Closed-form expressions for the outage probability and average symbol error rate (SER) are derived to confirm that increasing the number of antennas is the best option as compared with increasing the number of relays. We also obtain closed-form expressions for the average channel capacity under three different adaptive transmission techniques: 1) optimal power and rate adaptation; 2) constant power with optimal rate adaptation; and 3) channel inversion with a fixed rate. The channel capacity performance of the considered adaptive transmission techniques is evaluated and compared with a different number of relays and various antennas configurations for each adaptive technique. Our derived analytical results are verified through extensive Monte Carlo simulations.
Yoshiaki MORINO Takefumi HIRAGURI Hideaki YOSHINO Kentaro NISHIMORI Takahiro MATSUDA
In IEEE 802.11 wireless local area networks (WLANs), contention window (CW) in carrier sense multiple access with collision avoidance (CSMA/CA) is one of the most important techniques determining throughput performance. In this paper, we propose a novel CW control scheme to achieve high transmission efficiency in dense user environments. Whereas the standard CSMA/CA mechanism. Employs an adaptive CW control scheme that responds to the number of retransmissions, the proposed scheme uses the optimum CW size, which is shown to be a function of the number of terminal stations. In the proposed scheme, the number of terminal stations are estimated from the probability of packet collision measured at an access point (AP). The optimum CW size is then derived from a theoretical analysis based on a Markov chain model. We evaluate the performance of the proposed scheme with simulation experiments and show that it significantly improves the throughput performance.
Zhigang CHEN Xiaolei ZHANG Hussain KHURRAM He HUANG Guomei ZHANG
In this letter, a novel channel impulse response (CIR)-based fingerprinting positioning method using kernel principal component analysis (KPCA) has been proposed. During the offline phase of the proposed method, a survey is performed to collect all CIRs from access points, and a fingerprint database is constructed, which has vectors including CIR and physical location. During the online phase, KPCA is first employed to solve the nonlinearity and complexity in the CIR-position dependencies and extract the principal nonlinear features in CIRs, and support vector regression is then used to adaptively learn the regress function between the KPCA components and physical locations. In addition, the iterative narrowing-scope step is further used to refine the estimation. The performance comparison shows that the proposed method outperforms the traditional received signal strength based positioning methods.
This paper presents a weighted diversity combining technique for the cyclostationarity detection based spectrum sensing of orthogonal frequency division multiplexing signals in cognitive radio. In cognitive radio systems, secondary users must detect the desired signal in an extremely low signal-to-noise ratio (SNR) environment. In such an environment, multiple antenna techniques (space diversity) such as maximum ratio combining are not effective because the energy of the target signal is also extremely weak, and it is difficult to synchronize some received signals. The cyclic autocorrelation function (CAF) is used for traditional cyclostationarity detection based spectrum sensing. In the presented technique, the CAFs of the received signals are combined, while the received signals themselves are combined with general space diversity techniques. In this paper, the value of the CAF at peak and non-peak cyclic frequencies are computed, and we attempt to improve the sensing performance by using different weights for each CAF value. The results were compared with those from conventional methods and showed that the presented technique can improve the spectrum sensing performance.
Xuyang WANG Pengyuan ZHANG Qingwei ZHAO Jielin PAN Yonghong YAN
The introduction of deep neural networks (DNNs) leads to a significant improvement of the automatic speech recognition (ASR) performance. However, the whole ASR system remains sophisticated due to the dependent on the hidden Markov model (HMM). Recently, a new end-to-end ASR framework, which utilizes recurrent neural networks (RNNs) to directly model context-independent targets with connectionist temporal classification (CTC) objective function, is proposed and achieves comparable results with the hybrid HMM/DNN system. In this paper, we investigate per-dimensional learning rate methods, ADAGRAD and ADADELTA included, to improve the recognition of the end-to-end system, based on the fact that the blank symbol used in CTC technique dominates the output and these methods give frequent features small learning rates. Experiment results show that more than 4% relative reduction of word error rate (WER) as well as 5% absolute improvement of label accuracy on the training set are achieved when using ADADELTA, and fewer epochs of training are needed.
The most commonly used scattering parameters (S parameters) are normalized to a real reference resistance, typically 50Ω. In some cases, the use of S parameters normalized to some complex reference impedance is essential or convenient. But there are different definitions of complex-referenced S parameters that are incompatible with each other and serve different purposes. To make matters worse, different simulators implement different ones and which ones are implemented is rarely properly documented. What are possible scenarios in which using the right one matters? This tutorial-style paper is meant as an informal and not overly technical exposition of some such confusing aspects of S parameters, for those who have a basic familiarity with the ordinary, real-referenced S parameters.
Naoki SAWADA Hiromitsu NISHIZAKI
This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections.
Seunggoo NAM Boyoung LEE Beyoungyoun KOH Changsoo KWAK Juseop LEE
This paper presents a K-band fully reconfigurable waveguide resonator filter with a new negative coupling structure. A pair of transmission zeros as well as the center frequency and bandwidth of the presented filter can be adjusted. The filter adopts the concept of a frequency-tunable coupling resonator in designing the coupling structure, which allows for controlling the coupling coefficient. All coupling values in the filter structure can be tuned by adjusting the resonant frequency of each frequency-tunable coupling resonator. This work also presents a design method in detail for the coupling resonator with a negative coupling coefficient. In addition, the approach for separating the resonant peak produced by the coupling resonator with a negative coupling value from the passband for the purpose of improving the stopband performance is described. For verifying the presented filter structure, a fourth-order waveguide filter has been fabricated and measured. The fabricated filter has the center frequency tuning range from 18.34GHz to 18.75GHz, the bandwidth tuning ratio of 1.94 : 1.
Shinnosuke TAKAMICHI Tomoki TODA Graham NEUBIG Sakriani SAKTI Satoshi NAKAMURA
This paper presents a novel statistical sample-based approach for Gaussian Mixture Model (GMM)-based Voice Conversion (VC). Although GMM-based VC has the promising flexibility of model adaptation, quality in converted speech is significantly worse than that of natural speech. This paper addresses the problem of inaccurate modeling, which is one of the main reasons causing the quality degradation. Recently, we have proposed statistical sample-based speech synthesis using rich context models for high-quality and flexible Hidden Markov Model (HMM)-based Text-To-Speech (TTS) synthesis. This method makes it possible not only to produce high-quality speech by introducing ideas from unit selection synthesis, but also to preserve flexibility of the original HMM-based TTS. In this paper, we apply this idea to GMM-based VC. The rich context models are first trained for individual joint speech feature vectors, and then we gather them mixture by mixture to form a Rich context-GMM (R-GMM). In conversion, an iterative generation algorithm using R-GMMs is used to convert speech parameters, after initialization using over-trained probability distributions. Because the proposed method utilizes individual speech features, and its formulation is the same as that of conventional GMM-based VC, it makes it possible to produce high-quality speech while keeping flexibility of the original GMM-based VC. The experimental results demonstrate that the proposed method yields significant improvements in term of speech quality and speaker individuality in converted speech.
Yamato OHTANI Masatsune TAMURA Masahiro MORITA Masami AKAMINE
This paper describes a novel statistical bandwidth extension (BWE) technique based on a Gaussian mixture model (GMM) and a sub-band basis spectrum model (SBM), in which each dimensional component represents a specific acoustic space in the frequency domain. The proposed method can achieve the BWE from speech data with an arbitrary frequency bandwidth whereas the conventional methods perform the conversion from fixed narrow-band data. In the proposed method, we train a GMM with SBM parameters extracted from full-band spectra in advance. According to the bandwidth of input signal, the trained GMM is reconstructed to the GMM of the joint probability density between low-band SBM and high-band SBM components. Then high-band SBM components are estimated from low-band SBM components of the input signal based on the reconstructed GMM. Finally, BWE is achieved by adding the spectra decoded from estimated high-band SBM components to the ones of the input signal. To construct the full-band signal from the narrow-band one, we apply this method to log-amplitude spectra and aperiodic components. Objective and subjective evaluation results show that the proposed method extends the bandwidth of speech data robustly for the log-amplitude spectra. Experimental results also indicate that the aperiodic component extracted from the upsampled narrow-band signal realizes the same performance as the restored and the full-band aperiodic components in the proposed method.
Yumei WANG Jiawei LIANG Hao WANG Eiji OKI Lin ZHANG
In 3GPP (3rd Generation Partnership Project) LTE (Long Term Evolution) systems, when HARQ (Hybrid Automatic Repeat request) retransmission is invoked, the data at the transmitter are retransmitted randomly or sequentially regardless of their relationship to the wrongly decoded data. Such practice is inefficient since precious transmission resources will be spent to retransmit data that may be of no use in error correction at the receiver. This paper proposes an incremental redundancy HARQ scheme based on Error Position Estimating Coding (ePec) and LDPC (Low Density Parity Check Code) channel coding, which is called ePec-LDPC HARQ. The proposal is able to feedback the wrongly decoded code blocks within a specific MAC (Media Access Control) PDU (Protocol Data Unit) from the receiver. The transmitter gets the feedback information and then performs targeted retransmission. That is, only the data related to the wrongly decoded code blocks are retransmitted, which can improve the retransmission efficiency and thus reduce the retransmission overload. An enhanced incremental redundancy LDPC coding approach, called EIR-LDPC, together with a physical layer framing method, is developed to implement ePec-LDPC HARQ. Performance evaluations show that ePec-LDPC HARQ reduces the overall transmission resources by 15% compared to a conventional LDPC HARQ scheme. Moreover, the average retransmission times of each MAC PDU and the transmission delay are also reduced considerably.