The spectral envelope parameter is a significant speech parameter in the vocoder's quality. Recently, the Vector Quantized Variational AutoEncoder (VQ-VAE) is a state-of-the-art end-to-end quantization method based on the deep learning model. This paper proposed a new technique for improving the embedding space learning of VQ-VAE with the Generative Adversarial Network for quantizing the spectral envelope parameter, called VQ-VAE-EMGAN. In experiments, we designed the quantizer for the spectral envelope parameters of the WORLD vocoder extracted from the 16kHz speech waveform. As the results shown, the proposed technique reduced the Log Spectral Distortion (LSD) around 0.5dB and increased the PESQ by around 0.17 on average for four target bit operations compared to the conventional VQ-VAE.
JianFeng WU HuiBin QIN YongZhu HUA LiHuan SHAO Ji HU ShengYing YANG
This paper proposes a deep neural network (DNN) based framework to address the problem of vector quantization (VQ) for high-dimensional data. The main challenge of applying DNN to VQ is how to reduce the binary coding error of the auto-encoder when the distribution of the coding units is far from binary. To address this problem, three fine-tuning methods have been adopted: 1) adding Gaussian noise to the input of the coding layer, 2) forcing the output of the coding layer to be binary, 3) adding a non-binary penalty term to the loss function. These fine-tuning methods have been extensively evaluated on quantizing speech magnitude spectra. The results demonstrated that each of the methods is useful for improving the coding performance. When implemented for quantizing 968-dimensional speech spectra using only 18-bit, the DNN-based VQ framework achieved an averaged PESQ of about 2.09, which is far beyond the capability of conventional VQ methods.
Side match vector quantization (SMVQ) has been originally developed for image compression and is also useful for steganography. SMVQ requires to create its own state codebook for each block in both encoding and decoding phases. Since the conventional method for the state codebook generation is extremely time-consuming, this letter proposes a fast generation method. The proposed method is tens times faster than the conventional one without loss of perceptual visual quality.
Mirza Golam KIBRIA Hidekazu MURATA Susumu YOSHIDA
This study analyzes the performance of a downlink beamformer with partitioned vector quantization under optimized feedback budget allocation. A multiuser multiple-input single-output downlink precoding system with perfect channel state information at mobile stations is considered. The number of feedback bits allocated to the channel quality indicator (CQI) and the channel direction indicator (CDI) corresponding to each partition are optimized by exploiting the quantization mean square error. In addition, the effects of equal and unequal partitioning on codebook memory and system capacity are studied and elucidated through simulations. The results show that with optimized CQI-CDI allocation, the feedback budget distributions of equal or unequal partitions are proportional to the size ratios of the partitioned subvectors. Furthermore, it is observed that for large-sized partitions, the ratio of optimal CDI to CQI is much higher than that for small-sized partitions.
Masahiro FUKUI Shigeaki SASAKI Yusuke HIWASAKI Kimitaka TSUTSUMI Sachiko KURIHARA Hitoshi OHMURO Yoichi HANEDA
We proposes a new adaptive spectral masking method of algebraic vector quantization (AVQ) for non-sparse signals in the modified discreet cosine transform (MDCT) domain. This paper also proposes switching the adaptive spectral masking on and off depending on whether or not the target signal is non-sparse. The switching decision is based on the results of MDCT-domain sparseness analysis. When the target signal is categorized as non-sparse, the masking level of the target MDCT coefficients is adaptively controlled using spectral envelope information. The performance of the proposed method, as a part of ITU-T G.711.1 Annex D, is evaluated in comparison with conventional AVQ. Subjective listening test results showed that the proposed method improves sound quality by more than 0.1 points on a five-point scale on average for speech, music, and mixed content, which indicates significant improvement.
Wisarn PATCHOO Thomas R. FISCHER
In a sign-magnitude representation of binary lattice codevectors, only a few least significant bit-planes are constrained due to the structure of the lattice, while there is no restriction on other more significant bit-planes. Hence, any convenient bit-plane coding method can be used to encode the lattice codevectors, with modification required only for the lattice-defining, least-significant bit-planes. Simple encoding methods for the lattice-defining bit-planes of the D4, RE8, and Barnes-Wall 16-dimensional lattices are described. Simulation results for the encoding of a uniform source show that standard bit-plane coding together with the proposed encoding provide about the same performance as integer lattice vector quantization when the bit-stream is truncated. When the entire bit-stream is fully decoded, the granular gain of the lattice is realized.
Jinfeng GAO Bilan ZHU Masaki NAKAGAWA
The paper describes how a robust and compact on-line handwritten Japanese text recognizer was developed by compressing each component of an integrated text recognition system including a SVM classifier to evaluate segmentation points, an on-line and off-line combined character recognizer, a linguistic context processor, and a geometric context evaluation module to deploy it on hand-held devices. Selecting an elastic-matching based on-line recognizer and compressing MQDF2 via a combination of LDA, vector quantization and data type transformation, have contributed to building a remarkably small yet robust recognizer. The compact text recognizer covering 7,097 character classes just requires about 15 MB memory to keep 93.11% accuracy on horizontal text lines extracted from the TUAT Kondate database. Compared with the original full-scale Japanese text recognizer, the memory size is reduced from 64.1 MB to 14.9 MB while the accuracy loss is only 0.5% from 93.6% to 93.11%. The method is scalable so even systems of less than 11 MB or less than 6 MB still remain 92.80% or 90.02% accuracy, respectively.
Chi-Jung HUANG Shaw-Hwa HWANG Cheng-Yu YEH
This study proposes an improvement to the Triangular Inequality Elimination (TIE) algorithm for vector quantization (VQ). The proposed approach uses recursive and intersection (RI) rules to compensate and enhance the TIE algorithm. The recursive rule changes reference codewords dynamically and produces the smallest candidate group. The intersection rule removes redundant codewords from these candidate groups. The RI-TIE approach avoids over-reliance on the continuity of the input signal. This study tests the contribution of the RI rules using the VQ-based, G.729 standard LSP encoder and some classic images. Results show that the RI rules perform excellently in the TIE algorithm.
In this paper, a block-constrained trellis coded vector quantization (BC-TCVQ) algorithm is combined with an algebraic codebook to produce an algebraic trellis vector code (ATVC) to be used in ACELP coding. ATVC expands the set of allowed algebraic codebook pulse position, and the trellis branches are labeled with these subsets. The Viterbi algorithm is used to select the excitation codevector. A fast codebook search method using an efficient non-exhaustive search technique is also proposed to reduce the complexity of the ATVC search procedure while maintaining the quality of the reconstructed speech. The ATVC block code is used as the fixed codebook of AMR-NB (12.2 kbps), which reduces the computational complexity compared to the conventional algebraic codebook.
Xu YANG De XU Songhe FENG Yingjun TANG Shuoyan LIU
This paper presents an efficient yet powerful codebook model, named classified codebook model, to categorize natural scene category. The current codebook model typically resorts to large codebook to obtain higher performance for scene categorization, which severely limits the practical applicability of the model. Our model formulates the codebook model with the theory of vector quantization, and thus uses the famous technique of classified vector quantization for scene-category modeling. The significant feature in our model is that it is beneficial for scene categorization, especially at small codebook size, while saving much computation complexity for quantization. We evaluate the proposed model on a well-known challenging scene dataset: 15 Natural Scenes. The experiments have demonstrated that our model can decrease the computation time for codebook generation. What is more, our model can get better performance for scene categorization, and the gain of performance becomes more pronounced at small codebook size.
Shih-Chieh SHIE Ji-Han JIANG Long-Tai CHEN Zeng-Hui HUANG
A secret image transmission scheme based on vector quantization (VQ) and a secret codebook is proposed in this article. The goal of this scheme is to transmit a set of good-quality images secretly via another high-quality cover image with the same image size. In order to reduce the data size of secret images, the images are encoded by an adaptive codebook. To guarantee the visual quality of secret images, the adaptive codebook applied at the transmitter is transmitted to the receiver secretly as well. Moreover, to enhance the security of the proposed scheme and to compact the data size of the codebook, the adaptive codebook is encoded based on VQ using another codebook generated from the cover image. Experiments show impressive results.
Makoto NAKASHIZUKA Hidenari NISHIURA Youji IIGUNI
In this study, we introduce shift-invariant sparse image representations using tree-structured dictionaries. Sparse coding is a generative signal model that approximates signals by the linear combinations of atoms in a dictionary. Since a sparsity penalty is introduced during signal approximation and dictionary learning, the dictionary represents the primal structures of the signals. Under the shift-invariance constraint, the dictionary comprises translated structuring elements (SEs). The computational cost and number of atoms in the dictionary increase along with the increasing number of SEs. In this paper, we propose an algorithm for shift-invariant sparse image representation, in which SEs are learnt with a tree-structured approach. By using a tree-structured dictionary, we can reduce the computational cost of the image decomposition to the logarithmic order of the number of SEs. We also present the results of our experiments on the SE learning and the use of our algorithm in image recovery applications.
Mahdieh KHANMOHAMMADI Reza AGHAIEZADEH ZOROOFI Takashi NISHII Hisashi TANAKA Yoshinobu SATO
Quantification of the hip cartilages is clinically important. In this study, we propose an automatic technique for segmentation and visualization of the acetabular and femoral head cartilages based on clinically obtained multi-slice T1-weighted MR data and a hybrid approach. We follow a knowledge based approach by employing several features such as the anatomical shapes of the hip femoral and acetabular cartilages and corresponding image intensities. We estimate the center of the femoral head by a Hough transform and then automatically select the volume of interest. We then automatically segment the hip bones by a self-adaptive vector quantization technique. Next, we localize the articular central line by a modified canny edge detector based on the first and second derivative filters along the radial lines originated from the femoral head center and anatomical constraint. We then roughly segment the acetabular and femoral head cartilages using derivative images obtained in the previous step and a top-hat filter. Final masks of the acetabular and femoral head cartilages are automatically performed by employing the rough results, the estimated articular center line and the anatomical knowledge. Next, we generate a thickness map for each cartilage in the radial direction based on a Euclidian distance. Three dimensional pelvic bones, acetabular and femoral cartilages and corresponding thicknesses are overlaid and visualized. The techniques have been implemented in C++ and MATLAB environment. We have evaluated and clarified the usefulness of the proposed techniques in the presence of 40 clinical hips multi-slice MR images.
This Letter proposes a new kind of features for color image retrieval based on Distance-weighted Boundary Predictive Vector Quantization (DWBPVQ) Index Histograms. For each color image in the database, 6 histograms (2 for each color component) are calculated from the six corresponding DWBPVQ index sequences. The retrieval simulation results show that, compared with the traditional Spatial-domain Color-Histogram-based (SCH) features and the DCTVQ index histogram-based (DCTVQIH) features, the proposed DWBPVQIH features can greatly improve the recall and precision performance.
In this paper, we propose block matching and learning for color image classification. In our method, training images are partitioned into small blocks. Given a test image, it is also partitioned into small blocks, and mean-blocks corresponding to each test block are calculated with neighbor training blocks. Our method classifies a test image into the class that has the shortest total sum of distances between mean blocks and test ones. We also propose a learning method for reducing memory requirement. Experimental results show that our classification outperforms other classifiers such as support vector machine with bag of keypoints.
Hamed AKBARI Yukio KOSUGI Kazuyuki KOJIMA
In laparoscopic surgery, the lack of tactile sensation and 3D visual feedback make it difficult to identify the position of a blood vessel intraoperatively. An unintentional partial tear or complete rupture of a blood vessel may result in a serious complication; moreover, if the surgeon cannot manage this situation, open surgery will be necessary. Differentiation of arteries from veins and other structures and the ability to independently detect them has a variety of applications in surgical procedures involving the head, neck, lung, heart, abdomen, and extremities. We have used the artery's pulsatile movement to detect and differentiate arteries from veins. The algorithm for change detection in this study uses edge detection for unsupervised image registration. Changed regions are identified by subtracting the systolic and diastolic images. As a post-processing step, region properties, including color average, area, major and minor axis lengths, perimeter, and solidity, are used as inputs of the LVQ (Learning Vector Quantization) network. The output results in two object classes: arteries and non-artery regions. After post-processing, arteries can be detected in the laparoscopic field. The registration method used here is evaluated in comparison with other linear and nonlinear elastic methods. The performance of this method is evaluated for the detection of arteries in several laparoscopic surgeries on an animal model and on eleven human patients. The performance evaluation criteria are based on false negative and false positive rates. This algorithm is able to detect artery regions, even in cases where the arteries are obscured by other tissues.
Fa-Xin YU Zhe-Ming LU Zhen LI Hao LUO
In this Letter, we propose a novel method of low-level global motion feature description based on Vector Quantization (VQ) index histograms of motion feature vectors (MFVVQIH) for the purpose of video shot retrieval. The contribution lies in three aspects: first, we use VQ to eliminate singular points in the motion feature vector space; second, we utilize the global motion feature vector index histogram of a video shot as the global motion signature; third, video shot retrieval based on index histograms instead of original motion feature vectors guarantees the low computation complexity, and thus assures a real-time video shot retrieval. Experimental results show that the proposed scheme has high accuracy and low computation complexity.
In a codebook based precoding MIMO system, the precoding codebook significantly determines the system performance. Consequently, it is crucial to design the precoding codebook, which is related to the channel fading, antenna number, spatial correlation etc. So specific channel conditions correspond to respective optimum codebooks. In this paper, in order to obtain the optimum codebooks, a universal unitary space vector quantization (USVQ) codebook design criterion is provided, which can design the optimum codebooks for various fading and spatial correlated channels with arbitrary antenna configurations. Furthermore, the unitary space K-mean (USK) algorithm is also proposed to generate the USVQ codebook, which is iterative and convergent. Simulations show that the capacities of the precoding MIMO schemes using the USVQ codebooks are very close to those of the ideal precoding cases and outperform those of the schemes using the traditional Grassmannian codebooks and the 3GPP LTE DFT (discrete Fourier transform) codebooks.
ShanXue CHEN FangWei LI WeiLe ZHU TianQi ZHANG
A simple and successful design of initial codebook of vector quantization (VQ) is presented. For existing initial codebook algorithms, such as random method, the initial codebook is strongly influenced by selection of initial codewords and difficult to match with the features of the training vectors. In the proposed method, training vectors are sorted according to the norm of training vectors. Then, the ordered vectors are partitioned into N groups where N is the size of codebook. The initial codewords are obtained from calculating the centroid of each group. This initializtion method has a robust performance and can be combined with the VQ algorithm to further improve the quality of codebook.
In this paper, a communication system using vector quantization (VQ) and channel coding is considered. Here, a design scheme has been proposed to optimize source codebooks in the transmitter and the receiver. In the proposed algorithm, the overall distortion including both the quantization error and channel distortion is minimized. The proposed algorithm is different from the previous work by the facts that the channel encoder is used in the VQ-based communication system, and the source VQ codebook used in the transmitter is different from the one used by the receiver, i.e. asymmetric VQ system. And the bounded-distance decoding (BDD) technique is used to combat the ambiguousness in the channel decoder. We can see from the computer simulations that the optimized system based on the proposed algorithm outperforms a conventional system based on a symmetric VQ codebook. Also, the proposed algorithm enables a reliable image communication over noisy channels.