The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] VQ(31hit)

1-20hit(31hit)

  • Enhancing VQE Convergence for Optimization Problems with Problem-Specific Parameterized Quantum Circuits

    Atsushi MATSUO  Yudai SUZUKI  Ikko HAMAMURA  Shigeru YAMASHITA  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2023/08/17
      Vol:
    E106-D No:11
      Page(s):
    1772-1782

    The Variational Quantum Eigensolver (VQE) algorithm is gaining interest for its potential use in near-term quantum devices. In the VQE algorithm, parameterized quantum circuits (PQCs) are employed to prepare quantum states, which are then utilized to compute the expectation value of a given Hamiltonian. Designing efficient PQCs is crucial for improving convergence speed. In this study, we introduce problem-specific PQCs tailored for optimization problems by dynamically generating PQCs that incorporate problem constraints. This approach reduces a search space by focusing on unitary transformations that benefit the VQE algorithm, and accelerate convergence. Our experimental results demonstrate that the convergence speed of our proposed PQCs outperforms state-of-the-art PQCs, highlighting the potential of problem-specific PQCs in optimization problems.

  • A Visual Question Answering Network Merging High- and Low-Level Semantic Information

    Huimin LI  Dezhi HAN  Chongqing CHEN  Chin-Chen CHANG  Kuan-Ching LI  Dun LI  

     
    PAPER-Core Methods

      Pubricized:
    2022/01/06
      Vol:
    E106-D No:5
      Page(s):
    581-589

    Visual Question Answering (VQA) usually uses deep attention mechanisms to learn fine-grained visual content of images and textual content of questions. However, the deep attention mechanism can only learn high-level semantic information while ignoring the impact of the low-level semantic information on answer prediction. For such, we design a High- and Low-Level Semantic Information Network (HLSIN), which employs two strategies to achieve the fusion of high-level semantic information and low-level semantic information. Adaptive weight learning is taken as the first strategy to allow different levels of semantic information to learn weights separately. The gate-sum mechanism is used as the second to suppress invalid information in various levels of information and fuse valid information. On the benchmark VQA-v2 dataset, we quantitatively and qualitatively evaluate HLSIN and conduct extensive ablation studies to explore the reasons behind HLSIN's effectiveness. Experimental results demonstrate that HLSIN significantly outperforms the previous state-of-the-art, with an overall accuracy of 70.93% on test-dev.

  • Adaptive Spectral Masking of AVQ Coding and Sparseness Detection for ITU-T G.711.1 Annex D and G.722 Annex B Standards

    Masahiro FUKUI  Shigeaki SASAKI  Yusuke HIWASAKI  Kimitaka TSUTSUMI  Sachiko KURIHARA  Hitoshi OHMURO  Yoichi HANEDA  

     
    PAPER-Speech and Hearing

      Vol:
    E97-D No:5
      Page(s):
    1264-1272

    We proposes a new adaptive spectral masking method of algebraic vector quantization (AVQ) for non-sparse signals in the modified discreet cosine transform (MDCT) domain. This paper also proposes switching the adaptive spectral masking on and off depending on whether or not the target signal is non-sparse. The switching decision is based on the results of MDCT-domain sparseness analysis. When the target signal is categorized as non-sparse, the masking level of the target MDCT coefficients is adaptively controlled using spectral envelope information. The performance of the proposed method, as a part of ITU-T G.711.1 Annex D, is evaluated in comparison with conventional AVQ. Subjective listening test results showed that the proposed method improves sound quality by more than 0.1 points on a five-point scale on average for speech, music, and mixed content, which indicates significant improvement.

  • Color Image Retrieval Based on Distance-Weighted Boundary Predictive Vector Quantization Index Histograms

    Zhen SUN  Zhe-Ming LU  Hao LUO  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E92-D No:9
      Page(s):
    1803-1806

    This Letter proposes a new kind of features for color image retrieval based on Distance-weighted Boundary Predictive Vector Quantization (DWBPVQ) Index Histograms. For each color image in the database, 6 histograms (2 for each color component) are calculated from the six corresponding DWBPVQ index sequences. The retrieval simulation results show that, compared with the traditional Spatial-domain Color-Histogram-based (SCH) features and the DCTVQ index histogram-based (DCTVQIH) features, the proposed DWBPVQIH features can greatly improve the recall and precision performance.

  • Unitary Space Vector Quantization Codebook Design for Precoding MIMO System

    Ping WU  Lihua LI  Ping ZHANG  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E91-B No:9
      Page(s):
    2917-2924

    In a codebook based precoding MIMO system, the precoding codebook significantly determines the system performance. Consequently, it is crucial to design the precoding codebook, which is related to the channel fading, antenna number, spatial correlation etc. So specific channel conditions correspond to respective optimum codebooks. In this paper, in order to obtain the optimum codebooks, a universal unitary space vector quantization (USVQ) codebook design criterion is provided, which can design the optimum codebooks for various fading and spatial correlated channels with arbitrary antenna configurations. Furthermore, the unitary space K-mean (USK) algorithm is also proposed to generate the USVQ codebook, which is iterative and convergent. Simulations show that the capacities of the precoding MIMO schemes using the USVQ codebooks are very close to those of the ideal precoding cases and outperform those of the schemes using the traditional Grassmannian codebooks and the 3GPP LTE DFT (discrete Fourier transform) codebooks.

  • A MFCC-Based CELP Speech Coder for Server-Based Speech Recognition in Network Environments

    Jae Sam YOON  Gil Ho LEE  Hong Kook KIM  

     
    PAPER-Speech/Audio Processing

      Vol:
    E90-A No:3
      Page(s):
    626-632

    Existing standard speech coders can provide high quality speech communication. However, they tend to degrade the performance of automatic speech recognition (ASR) systems that use the reconstructed speech. The main cause of the degradation is in that the linear predictive coefficients (LPCs), which are typical spectral envelope parameters in speech coding, are optimized to speech quality rather than to the performance of speech recognition. In this paper, we propose a speech coder using mel-frequency cepstral coefficients (MFCCs) instead of LPCs to improve the performance of a server-based speech recognition system in network environments. To develop the proposed speech coder with a low-bit rate, we first explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel errors. As a result, we propose an 8.7 kbps MFCC-based CELP coder. It is shown that the proposed speech coder has a comparable speech quality to 8 kbps G.729 and the ASR system using the proposed speech coder gives the relative word error rate reduction by 6.8% as compared to the ASR system using G.729 on a large vocabulary task (AURORA4).

  • Fast Concatenative Speech Synthesis Using Pre-Fused Speech Units Based on the Plural Unit Selection and Fusion Method

    Masatsune TAMURA  Tatsuya MIZUTANI  Takehiko KAGOSHIMA  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:2
      Page(s):
    544-553

    We have previously developed a concatenative speech synthesizer based on the plural speech unit selection and fusion method that can synthesize stable and human-like speech. In this method, plural speech units for each speech segment are selected using a cost function and fused by averaging pitch-cycle waveforms. This method has a large computational cost, but some platforms require a speech synthesis system that can work within limited hardware resources. In this paper, we propose an offline unit fusion method that reduces the computational cost. In the proposed method, speech units are fused in advance to make a pre-fused speech unit database. At synthesis time, a speech unit for each segment is selected from the pre-fused speech unit database and the speech waveform is synthesized by applying prosodic modification and concatenation without the computationally expensive unit fusion process. We compared several algorithms for constructing the pre-fused speech unit database. From the subjective and objective evaluations, the effectiveness of the proposed method is confirmed by the results that the quality of synthetic speech of the offline unit fusion method with 100 MB database is close to that of the online unit fusion method with 93 MB JP database and is slightly lower to that of the 390 MB US database, while the computational time is reduced by 80%. We also show that the frequency-weighted VQ-based method is effective for construction of the pre-fused speech unit database.

  • Subjective Multimedia Quality Assessment

    Matthew D. BROTHERTON  Quan HUYNH-THU  David S. HANDS  Kjell BRUNNSTROM  

     
    INVITED PAPER

      Vol:
    E89-A No:11
      Page(s):
    2920-2932

    The Video Quality Experts Group (VQEG) is preparing a programme of subjective multimedia quality tests. The results from these tests will be used to evaluate the performance of competing objective multimedia quality metrics. The reliability of the subjective test data is of great importance for VQEG's task. This paper provides an overview of VQEG's multimedia ad-hoc group. The work of this group will require subjective tests to be performed by laboratories located in Europe, Asia and North America. For VQEG's multimedia work to be successful, the subjective assessment methodology must be precisely defined and produce reliable and repeatable subjective quality data. Although international standards covering multimedia quality assessment methods are in force, there remains some uncertainty regarding the most effective approach to assessing the subjective quality of multimedia. A review of existing methods is provided. Two experiments are presented investigating the suitability of alternative subjective assessment methods (single-stimulus ACR and SAMVIQ). The results of these experiments are discussed within the context of the VQEG multimedia testing programme.

  • New Tendencies in Subjective Video Quality Evaluation

    Vittorio BARONCINI  

     
    INVITED PAPER

      Vol:
    E89-A No:11
      Page(s):
    2933-2937

    This paper provides an overview of the new tendencies in the subjective assessment of the quality of video for Multimedia applications. New subjective assessment methods are here described together with the description of the new general approaches. Some motivations of these new approaches are also here provided.

  • Non-intrusive Quality Monitoring Method of VoIP Speech Based on Network Performance Metrics

    Masataka MASUDA  Takanori HAYASHI  

     
    PAPER

      Vol:
    E89-B No:2
      Page(s):
    304-312

    With the increasing demand for IP telephony services using Voice over IP (VoIP) technology, techniques for monitoring speech quality in actual networks are required to manage the quality of VoIP services constantly. Since the speech quality of VoIP is affected by IP network performance factors, non-intrusive methods of monitoring the quality of service (QoS) by passively measuring network performance are being watched with keen interest. VQmon technology is one of the non-intrusive quality monitoring methods. Although the monitoring functions of the VQmon for post-arrived packet behavior events at VoIP-gateways are effective, the estimating algorithm does not take differences in the implementations of VoIP-gateway products into account. We therefore propose a non-intrusive method of monitoring QoS that works in conjunction with ITU-T Recommendation P.862 "PESQ" that takes the characteristics of VoIP-gateway products into consideration. We compared the performance of non-intrusive quality monitoring technology such as VQmon and the proposed method in terms of estimating the accuracy of speech quality and mouth-to-ear delay. The experimental results revealed that the proposed method outperforms the conventional one, achieving sufficient accuracy for quality monitoring of VoIP services.

  • Adaptive Data Hiding Based on SMVQ Prediction

    Shih-Chieh SHIE  Shinfeng D. LIN  Chih-Ming FANG  

     
    LETTER-Application Information Security

      Vol:
    E89-D No:1
      Page(s):
    358-362

    An adaptive data hiding scheme capable of hiding considerable quantities of secret data while preserving acceptable visual quality for cover images is proposed. The major idea of this scheme is to hide secret data into the compressed codes of cover image during the encoding process of side-match vector quantization (SMVQ) such that the interceptors will not capture the secret information. Based on the experimental results, it is confirmed that the proposed scheme is better than earlier works. Moreover, the receiver can efficiently receive both the compressed cover image and the hidden secret data at the same time.

  • Multiband Vector Quantization Based on Inner Product for Wideband Speech Coding

    Joon-Hyuk CHANG  Sanjit K. MITRA  

     
    LETTER-Speech and Hearing

      Vol:
    E88-D No:11
      Page(s):
    2606-2608

    This paper describes a multiband vector quantization (VQ) technique based on inner product for wideband speech coding at 16 kb/s. Our approach consists of splitting the input speech into two separate bands and then applying an independent coding scheme for each band. A code excited linear prediction (CELP) coder is used in the lower band while a transform based coding strategy is applied in the higher band. The spectral components in the higher frequency band are represented by a set of modulated lapped transform (MLT) coefficients. The higher frequency band is divided into three subbands, and the MLT coefficients construct a vector for each subband. Specifically, for the VQ of these vectors, an inner product-based distance measure is proposed as a new strategy. The proposed 16 kb/s coder with the inner-product based distortion measure achieves better performance than the 48 kb/s ITU-T G.722 in subjective quality tests.

  • A Fast Encoding Technique for Vector Quantization of LSF Parameters

    Sangwon KANG  Yongwon SHIN  Changyong SON  Thomas R. FISCHER  

     
    PAPER-Multimedia Systems for Communications" Multimedia Systems for Communications

      Vol:
    E88-B No:9
      Page(s):
    3750-3755

    A fast encoding technique is described for vector quantization (VQ) of line spectral frequency parameters. A reduction in VQ encoding complexity is achieved by using a preliminary test that reduces the necessary codebook search range. The test is performed based on two criteria. One criterion uses the distance between a specific single element of the input vector and the corresponding element of the codevectors in the codebook. The other criterion makes use of the ordering property of LSF parameters. The fast encoding technique is implemented in the enhanced variable rate codec (EVRC) encoding algorithm. Simulation results show that the average searching range of the codebook can be reduced by 44.50% for the EVRC without degradation of spectral distortion (SD).

  • Fuzzy Training Algorithm for Wavelet Codebook Based Text-Independent Speaker Identification

    Shung-Yung LUNG  

     
    LETTER-Speech and Hearing

      Vol:
    E88-A No:6
      Page(s):
    1619-1621

    A speaker identification system based on wavelet transform (WT) derived from codebook design using fuzzy c-mean algorithm (FCM) is proposed. We have combined FCM and the vector quantization (VQ) algorithm to avoid typical local minima for speaker data compression. Identification accuracies of 94% were achieved for 100 Mandarin speakers.

  • Robust Speaker Identification System Based on Multilayer Eigen-Codebook Vector Quantization

    Ching-Tang HSIEH  Eugene LAI  Wan-Chen CHEN  

     
    PAPER

      Vol:
    E87-D No:5
      Page(s):
    1185-1193

    This paper presents some effective methods for improving the performance of a speaker identification system. Based on the multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency subbands in order not to spread noise distortions over the entire feature space. For capturing the characteristics of the vocal tract, the linear predictive cepstral coefficients (LPCC) of the lower frequency subband for each decomposition process are calculated. In addition, a hard threshold technique for the lower frequency subband in each decomposition process is also applied to eliminate the effect of noise interference. Furthermore, cepstral domain feature vector normalization is applied to all computed features in order to provide similar parameter statistics in all acoustic environments. In order to effectively utilize all these multiband speech features, we propose a modified vector quantization as the identifier. This model uses the multilayer concept to eliminate the interference among the multiband speech features and then uses the principal component analysis (PCA) method to evaluate the codebooks for capturing a more detailed distribution of the speaker's phoneme characteristics. The proposed method is evaluated using the KING speech database for text-independent speaker identification. Experimental results show that the recognition performance of the proposed method is better than those of the vector quantization (VQ) and the Gaussian mixture model (GMM) using full-band LPCC and mel-frequency cepstral coefficients (MFCC) features in both clean and noisy environments. Also, a satisfactory performance can be achieved in low SNR environments.

  • A Robust Blind Image Watermarking Scheme Based on Vector Quantization

    Soo-Chang PEI  Jun-Horng CHEN  

     
    PAPER-Image

      Vol:
    E87-A No:4
      Page(s):
    912-919

    Watermarking schemes have been extensively discussed and developed recently. People are usually facing the dilemma of two factors, robustness and transparency. To achieve these requirements, embedding the watermark message in the transform domain or the spatial domain is usually considered. In this paper, we will propose a blind image watermarking scheme based on vector quantization. By exploiting a modified binary tree splitting method, a stable codebook could be generated so that the watermark message could be novelly embedded and survive the JPEG compression and the Gaussian noise addition. The embedded message could be extracted without referring the host image. It makes the scheme more practical.

  • An Improved Fast Encoding Algorithm for Vector Quantization Using 2-Pixel-Merging Sum Pyramid and Manhattan-Distance-First Check

    Zhibin PAN  Koji KOTANI  Tadahiro OHMI  

     
    LETTER-Image Processing, Image Pattern Recognition

      Vol:
    E87-D No:2
      Page(s):
    494-499

    Vector quantization (VQ) features a very heavy encoding process. In previous work, an efficient encoding algorithm using mean pyramid has been developed. To improve it further, a fast search algorithm is proposed in this letter. Specifically speaking, four major modifications are made. First, to rearrange the original codebook directly along the sorted real sums to reduce the search scope and then update the lower and upper bound dynamically. Second, to use sum instead of the mean that includes roundoff error to thoroughly avoid a possible mismatched winner. Third, to construct a sum pyramid using 2-pixel-merging other than 4-pixel-merging way to generate more in-between levels. Fourth, to introduce the Cauchy-Schwarz inequality to bridge Euclidean and Manhattan distance together so that the difference check between 2 vectors can be pre-conducted only by much lighter Manhattan distance computation. Experimental results show that the proposed algorithm is more search-efficient.

  • Efficient Genetic Algorithm of Codebook Design for Text-Independent Speaker Recognition

    Chih-Chien Thomas CHEN  Chin-Ta CHEN  Shung-Yung LUNG  

     
    LETTER-Speech and Hearing

      Vol:
    E85-A No:11
      Page(s):
    2529-2531

    This letter presents text-independent speaker identification results for telephone speech. A speaker identification system based on Karhunen-Loeve transform (KLT) derived from codebook design using genetic algorithm (GA) is proposed. We have combined genetic algorithm (GA) and the vector quantization (VQ) algorithm to avoid typical local minima for speaker data compression. Identification accuracies of 91% were achieved for 100 Mandarin speakers.

  • Image Retrieval Using VQ Based Local Modified Gabor Feature

    Dae-Kyu SHIN  Hyun-Sool KIM  Tae-Yun CHUNG  Sang-Hui PARK  

     
    LETTER-Image Processing, Image Pattern Recognition

      Vol:
    E85-D No:8
      Page(s):
    1349-1353

    This paper proposes a new method of retrieving images from large image databases. The method is based on VQ (Vector Quantization) of local texture features at interest points automatically detected in an image. The texture features are extracted by Gabor wavelet filter bank, and rearranged for rotation. These features are classified by VQ and then construct a pattern histogram. Retrievals are performed by just comparing pattern histograms between images.

  • Two Fast Nearest Neighbor Searching Algorithms for Vector Quantization

    SeongJoon BAEK  Koeng-Mo SUNG  

     
    PAPER-Algorithms and Data Structures

      Vol:
    E84-A No:10
      Page(s):
    2569-2575

    In this paper, two efficient codebook searching algorithms for vector quantization (VQ) are presented. The first fast search algorithm utilizes the compactness property of signal energy of orthogonal transformation. On the transformed domain, the algorithm uses geometrical relations between the input vector and codeword to discard many unlikely codewords. The second algorithm, which transforms principal components only, is proposed to alleviate some calculation overhead and the amount of storage. The relation between the principal components and the input vector is utilized in the second algorithm. Since both of the proposed algorithms reject those codewords that are impossible to be the nearest codeword, they produce the same output as conventional full search algorithm. Simulation results confirm the effectiveness of the proposed algorithms.

1-20hit(31hit)