1-4hit |
Ryota KAMINISHI Haruna MIYAMOTO Sayaka SHIOTA Hitoshi KIYA
This study evaluates the effects of some non-learning blind bandwidth extension (BWE) methods on state-of-the-art automatic speaker verification (ASV) systems. Recently, a non-linear bandwidth extension (N-BWE) method has been proposed as a blind, non-learning, and light-weight BWE approach. Other non-learning BWEs have also been developed in recent years. For ASV evaluations, most data available to train ASV systems is narrowband (NB) telephone speech. Meanwhile, wideband (WB) data have been used to train the state-of-the-art ASV systems, such as i-vector, d-vector, and x-vector. This can cause sampling rate mismatches when all datasets are used. In this paper, we investigate the influence of sampling rate mismatches in the x-vector-based ASV systems and how non-learning BWE methods perform against them. The results showed that the N-BWE method improved the equal error rate (EER) on ASV systems based on the x-vector when the mismatches were present. We researched the relationship between objective measurements and EERs. Consequently, the N-BWE method produced the lowest EERs on both ASV systems and obtained the lower RMS-LSD value and the higher STOI score.
Jinwei WANG Xirong MA Yuanping ZHU Jizhou SUN
Modern GPUs have evolved to become a more general processor capable of executing scientific and engineering computations. It provides a highly parallel computing environment due to its large number of computing cores, which are suitable for numerous data parallel arithmetic computations, particularly linear algebra operations. The matrix-vector multiplication is one of the most important dense linear algebraic operations. It is applied to a diverse set of applications in many fields and must therefore be fully optimized to achieve a high-performance. In this paper, we proposed a novel auto-tuning method for matrix-vector multiplication on GPUs, where the number of assigned threads that are used to compute one element of the result vector can be auto-tuned according to the size of matrix. On the Nvidia's GPU GTX 650 with the most recent Kepler architecture, we developed an auto-tuner that can automatically select the optimal number of assigned threads for calculation. Based on the auto-tuner's result, we developed a versatile generic matrix-vector multiplication kernel with the CUDA programming model. A series of experiments on different shapes and sizes of matrices were conducted for comparing the performance of our kernel with that of the kernels from CUBLAS 5.0, MAGMA 1.3 and a warp method. The experiments results show that the performance of our matrix-vector multiplication kernel is close to the optimal behavior with increasing of the size of the matrix and has very little dependency on the shape of the matrix, which is a significant improvement compared to the other three kernels that exhibit unstable performance behavior for different shapes of matrices.
Chin-Chin CHEN Chiou-Yng LEE Erl-Huei LU
This work presents a novel scalable and systolic Montgomery's algorithm in GF(2m). The proposed algorithm is based on the Toeplitz matrix-vector representation, which obtains the scalable and systolic Montgomery multiplier in a flexible manner, and can adapt to the required precision. Analytical results indicate that the proposed multiplier over the generic field of GF(2m) has a latency of d+n(2n+1), where n = m / d , and d denotes the selected digital size. The latency is reduced to d+n(n+1) clock cycles when the field is constructed from generalized equally-spaced polynomials. Since the selected digital size is d ≥5 bits, the proposed architectures have lower time-space complexity than traditional digit-serial multipliers. Moreover, the proposed architectures have regularity, modularity and local interconnect ability, making them very suitable for VLSI implementation.
In this paper, a generalized Montgomery multiplication algorithm in GF(2m) using the Toeplitz matrix-vector representation is presented. The hardware architectures derived from this algorithm provide low-complexity bit-parallel systolic multipliers with trinomials and pentanomials. The results reveal that our proposed multipliers reduce the space complexity of approximately 15% compared with an existing systolic Montgomery multiplier for trinomials. Moreover, the proposed architectures have the features of regularity, modularity, and local interconnection. Accordingly, they are well suited to VLSI implementation.