IEICE global.ieice.org Site

Keyword Search Result

[Keyword] (42807hit)

7101-7120hit(42807hit)

Experimental Design Method for High-Efficiency Microwave Power Amplifiers Based on a Low-Frequency Active Harmonic Load-Pull Technique
Ryo ISHIKAWA Yoichiro TAKAYAMA Kazuhiko HONJO

PAPER

Vol:
E99-C No:10
Page(s):
1147-1155
A novel experimental design method based on a low-frequency active load-pull technique that includes harmonic tuning has been proposed for high-efficiency microwave power amplifiers. The intrinsic core component of a transistor with a maximum oscillation frequency of more than several tens of gigahertz can be approximately assumed as the nonlinear current source with no frequency dependence at an operation frequency of several gigahertz. In addition, the reactive parasitic elements in a transistor can be omitted at a frequency of much less than 1GHz. Therefore, the optimum impedance condition including harmonics for obtaining high efficiency in a nonlinear current source can be directly investigated based on a low-frequency active harmonic load-pull technique in the low-frequency region. The optimum load condition at the operation frequency for an external load circuit can be estimated by considering the properties of the reactive parasitic elements and the nonlinear current source. For an InGaAs/GaAs pHEMT, active harmonic load-pull considering up to the fifth-order harmonic frequency was experimentally carried out at the fundamental frequency of 20MHz. By using the estimated optimum impedance condition for an equivalent nonlinear current source, high-frequency amplifiers were designed and fabricated at the 1.9-GHz, 2.45-GHz, and 5.8-GHz bands. The fabricated amplifiers exhibited maximum drain efficiency values of 79%, 80%, and 74% at 1.9GHz, 2.47GHz, and 5.78GHz, respectively.
One-Bit to Four-Bit Dual Conversion for Security Enhancement against Power Analysis
Seungkwang LEE Nam-Su JHO

PAPER-Cryptography and Information Security

Vol:
E99-A No:10
Page(s):
1833-1842
Power analysis exploits the leaked information gained from cryptographic devices including, but not limited to, power consumption generated during cryptographic operations. If a number of power traces are given to an attacker, it is possible to reveal a cryptographic key efficiently, sometimes within a few minutes, using various statistical methods. In this sense, software countermeasures including higher-order masking or software dual-rail with precharge logic have been proposed to produce randomized or constant power consumption during the key-dependent operations. However, they have critical disadvantages in terms of computational time and security. In this paper, we propose a new solution called “one-bit to four-bit dual conversion” for enhanced security against power analysis. For an exemplary embodiment of the proposed scheme, we apply it to an AES implementation and demonstrate its security and performance. The overall costs are approximately 148KB memory space for the lookup tables and about a 3-fold increase in execution time than the straightforward implementation of AES.
Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis
Xin WANG Shinji TAKAKI Junichi YAMAGISHI

PAPER-Speech synthesis

Pubricized:
2016/07/19
Vol:
E99-D No:10
Page(s):
2471-2480
Building high-quality text-to-speech (TTS) systems without expert knowledge of the target language and/or time-consuming manual annotation of speech and text data is an important yet challenging research topic. In this kind of TTS system, it is vital to find representation of the input text that is both effective and easy to acquire. Recently, the continuous representation of raw word inputs, called “word embedding”, has been successfully used in various natural language processing tasks. It has also been used as the additional or alternative linguistic input features to a neural-network-based acoustic model for TTS systems. In this paper, we further investigate the use of this embedding technique to represent phonemes, syllables and phrases for the acoustic model based on the recurrent and feed-forward neural network. Results of the experiments show that most of these continuous representations cannot significantly improve the system's performance when they are fed into the acoustic model either as additional component or as a replacement of the conventional prosodic context. However, subjective evaluation shows that the continuous representation of phrases can achieve significant improvement when it is combined with the prosodic context as input to the acoustic model based on the feed-forward neural network.
N-gram Approximation of Latent Words Language Models for Domain Robust Automatic Speech Recognition Open Access
Ryo MASUMURA Taichi ASAMI Takanobu OBA Hirokazu MASATAKI Sumitaka SAKAUCHI Satoshi TAKAHASHI

PAPER-Language modeling

Pubricized:
2016/07/19
Vol:
E99-D No:10
Page(s):
2462-2470
This paper aims to improve the domain robustness of language modeling for automatic speech recognition (ASR). To this end, we focus on applying the latent words language model (LWLM) to ASR. LWLMs are generative models whose structure is based on Bayesian soft class-based modeling with vast latent variable space. Their flexible attributes help us to efficiently realize the effects of smoothing and dimensionality reduction and so address the data sparseness problem; LWLMs constructed from limited domain data are expected to robustly cover unknown multiple domains in ASR. However, the attribute flexibility seriously increases computation complexity. If we rigorously compute the generative probability for an observed word sequence, we must consider the huge quantities of all possible latent word assignments. Since this is computationally impractical, some approximation is inevitable for ASR implementation. To solve the problem and apply this approach to ASR, this paper presents an n-gram approximation of LWLM. The n-gram approximation is a method that approximates LWLM as a simple back-off n-gram structure, and offers LWLM-based robust one-pass ASR decoding. Our experiments verify the effectiveness of our approach by evaluating perplexity and ASR performance in not only in-domain data sets but also out-of-domain data sets.
Statistical Bandwidth Extension for Speech Synthesis Based on Gaussian Mixture Model with Sub-Band Basis Spectrum Model
Yamato OHTANI Masatsune TAMURA Masahiro MORITA Masami AKAMINE

PAPER-Voice conversion

Pubricized:
2016/07/19
Vol:
E99-D No:10
Page(s):
2481-2489
This paper describes a novel statistical bandwidth extension (BWE) technique based on a Gaussian mixture model (GMM) and a sub-band basis spectrum model (SBM), in which each dimensional component represents a specific acoustic space in the frequency domain. The proposed method can achieve the BWE from speech data with an arbitrary frequency bandwidth whereas the conventional methods perform the conversion from fixed narrow-band data. In the proposed method, we train a GMM with SBM parameters extracted from full-band spectra in advance. According to the bandwidth of input signal, the trained GMM is reconstructed to the GMM of the joint probability density between low-band SBM and high-band SBM components. Then high-band SBM components are estimated from low-band SBM components of the input signal based on the reconstructed GMM. Finally, BWE is achieved by adding the spectra decoded from estimated high-band SBM components to the ones of the input signal. To construct the full-band signal from the narrow-band one, we apply this method to log-amplitude spectra and aperiodic components. Objective and subjective evaluation results show that the proposed method extends the bandwidth of speech data robustly for the log-amplitude spectra. Experimental results also indicate that the aperiodic component extracted from the upsampled narrow-band signal realizes the same performance as the restored and the full-band aperiodic components in the proposed method.
A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models
Shinnosuke TAKAMICHI Tomoki TODA Graham NEUBIG Sakriani SAKTI Satoshi NAKAMURA

PAPER-Voice conversion

Pubricized:
2016/07/19
Vol:
E99-D No:10
Page(s):
2490-2498
This paper presents a novel statistical sample-based approach for Gaussian Mixture Model (GMM)-based Voice Conversion (VC). Although GMM-based VC has the promising flexibility of model adaptation, quality in converted speech is significantly worse than that of natural speech. This paper addresses the problem of inaccurate modeling, which is one of the main reasons causing the quality degradation. Recently, we have proposed statistical sample-based speech synthesis using rich context models for high-quality and flexible Hidden Markov Model (HMM)-based Text-To-Speech (TTS) synthesis. This method makes it possible not only to produce high-quality speech by introducing ideas from unit selection synthesis, but also to preserve flexibility of the original HMM-based TTS. In this paper, we apply this idea to GMM-based VC. The rich context models are first trained for individual joint speech feature vectors, and then we gather them mixture by mixture to form a Rich context-GMM (R-GMM). In conversion, an iterative generation algorithm using R-GMMs is used to convert speech parameters, after initialization using over-trained probability distributions. Because the proposed method utilizes individual speech features, and its formulation is the same as that of conventional GMM-based VC, it makes it possible to produce high-quality speech while keeping flexibility of the original GMM-based VC. The experimental results demonstrate that the proposed method yields significant improvements in term of speech quality and speaker individuality in converted speech.
Policy Optimization for Spoken Dialog Management Using Genetic Algorithm
Hang REN Qingwei ZHAO Yonghong YAN

PAPER-Spoken dialog system

Pubricized:
2016/07/19
Vol:
E99-D No:10
Page(s):
2499-2507
The optimization of spoken dialog management policies is a non-trivial task due to the erroneous inputs from speech recognition and language understanding modules. The dialog manager needs to ground uncertain semantic information at times to fully understand the need of human users and successfully complete the required dialog tasks. Approaches based on reinforcement learning are currently mainstream in academia and have been proved to be effective, especially when operating in noisy environments. However, in reinforcement learning the dialog strategy is often represented by complex numeric model and thus is incomprehensible to humans. The trained policies are very difficult for dialog system designers to verify or modify, which largely limits the deployment for commercial applications. In this paper we propose a novel framework for optimizing dialog policies specified in human-readable domain language using genetic algorithm. We present learning algorithms using user simulator and real human-machine dialog corpora. Empirical experimental results show that the proposed approach can achieve competitive performance on par with some state-of-the-art reinforcement learning algorithms, while maintaining a comprehensible policy structure.
Neural Network Approaches to Dialog Response Retrieval and Generation
Lasguido NIO Sakriani SAKTI Graham NEUBIG Koichiro YOSHINO Satoshi NAKAMURA

PAPER-Spoken dialog system

Pubricized:
2016/07/19
Vol:
E99-D No:10
Page(s):
2508-2517
In this work, we propose a new statistical model for building robust dialog systems using neural networks to either retrieve or generate dialog response based on an existing data sources. In the retrieval task, we propose an approach that uses paraphrase identification during the retrieval process. This is done by employing recursive autoencoders and dynamic pooling to determine whether two sentences with arbitrary length have the same meaning. For both the generation and retrieval tasks, we propose a model using long short term memory (LSTM) neural networks that works by first using an LSTM encoder to read in the user's utterance into a continuous vector-space representation, then using an LSTM decoder to generate the most probable word sequence. An evaluation based on objective and subjective metrics shows that the new proposed approaches have the ability to deal with user inputs that are not well covered in the database compared to standard example-based dialog baselines.
Semi-Incremental Recognition of On-Line Handwritten Japanese Text
Cuong-Tuan NGUYEN Bilan ZHU Masaki NAKAGAWA

PAPER-Image Recognition, Computer Vision

Pubricized:
2016/07/19
Vol:
E99-D No:10
Page(s):
2619-2628
This paper presents a semi-incremental recognition method for on-line handwritten Japanese text and its evaluation. As text becomes longer, recognition time and waiting time become large if it is recognized after it is written (batch recognition). Thus, incremental methods have been proposed with recognition triggered by every stroke but the recognition rates are damaged and more CPU time is incurred. We propose semi-incremental recognition and employ a local processing strategy by focusing on a recent sequence of strokes defined as ”scope” rather than every new stroke. For the latest scope, we build and update a segmentation and recognition candidate lattice and advance the best-path search incrementally. We utilize the result of the best-path search in the previous scope to exclude unnecessary segmentation candidates. This reduces the number of candidate character recognition with the result of reduced processing time. We also reuse the segmentation and recognition candidate lattice in the previous scope for the latest scope. Moreover, triggering recognition processes every several strokes saves CPU time. Experiments made on TUAT-Kondate database show the effectiveness of the proposed semi-incremental recognition method not only in reduced processing time and waiting time, but also in recognition accuracy.
Fast Coding-Mode Selection and CU-Depth Prediction Algorithm Based on Text-Block Recognition for Screen Content Coding
Mengmeng ZHANG Ang ZHU Zhi LIU

LETTER-Image Processing and Video Processing

Pubricized:
2016/07/12
Vol:
E99-D No:10
Page(s):
2651-2655
As an important extension of high-efficiency video coding (HEVC), screen content coding (SCC) includes various new coding modes, such as Intra Block Copy (IBC), Palette-based coding (Palette), and Adaptive Color Transform (ACT). These new tools have improved screen content encoding performance. This paper proposed a novel and fast algorithm by classifying Code Units (CUs) as text CUs or non-text CUs. For text CUs, the Intra mode was skipped in the compression process, whereas for non-text CUs, the IBC mode was skipped. The current CU depth range was then predicted according to its adjacent left CU depth level. Compared with the reference software HM16.7+SCM5.4, the proposed algorithm reduced encoding time by 23% on average and achieved an approximate 0.44% increase in Bjøntegaard delta bit rate and a negligible peak signal-to-noise ratio loss.
Acoustic Scene Analysis Based on Hierarchical Generative Model of Acoustic Event Sequence
Keisuke IMOTO Suehiro SHIMAUCHI

PAPER-Acoustic event detection

Pubricized:
2016/07/19
Vol:
E99-D No:10
Page(s):
2539-2549
We propose a novel method for estimating acoustic scenes such as user activities, e.g., “cooking,” “vacuuming,” “watching TV,” or situations, e.g., “being on the bus,” “being in a park,” “meeting,” utilizing the information of acoustic events. There are some methods for estimating acoustic scenes that associate a combination of acoustic events with an acoustic scene. However, the existing methods cannot adequately express acoustic scenes, e.g., “cooking,” that have more than one subordinate category, e.g., “frying ingredients” or “plating food,” because they directly associate acoustic events with acoustic scenes. In this paper, we propose an acoustic scene estimation method based on a hierarchical probabilistic generative model of an acoustic event sequence taking into account the relation among acoustic scenes, their subordinate categories, and acoustic event sequences. In the proposed model, each acoustic scene is represented as a probability distribution over their unsupervised subordinate categories, called “acoustic sub-topics,” and each acoustic sub-topic is represented as a probability distribution over acoustic events. Acoustic scene estimation experiments with real-life sounds showed that the proposed method could correctly extract subordinate categories of acoustic scenes.
Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection
Naoki SAWADA Hiromitsu NISHIZAKI

PAPER-Spoken term detection

Pubricized:
2016/07/19
Vol:
E99-D No:10
Page(s):
2518-2527
This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections.
HISTORY: An Efficient and Robust Algorithm for Noisy 1-Bit Compressed Sensing
Biao SUN Hui FENG Xinxin XU

PAPER-Fundamentals of Information Systems

Pubricized:
2016/07/06
Vol:
E99-D No:10
Page(s):
2566-2573
We consider the problem of sparse signal recovery from 1-bit measurements. Due to the noise present in the acquisition and transmission process, some quantized bits may be flipped to their opposite states. These sign flips may result in severe performance degradation. In this study, a novel algorithm, termed HISTORY, is proposed. It consists of Hamming support detection and coefficients recovery. The HISTORY algorithm has high recovery accuracy and is robust to strong measurement noise. Numerical results are provided to demonstrate the effectiveness and superiority of the proposed algorithm.
A Linear Time Algorithm for Finding a Spanning Tree with Non-Terminal Set V_NT on Cographs
Shin-ichi NAKAYAMA Shigeru MASUYAMA

PAPER-Fundamentals of Information Systems

Pubricized:
2016/07/12
Vol:
E99-D No:10
Page(s):
2574-2584
Given a graph G=(V,E) where V and E are a vertex and an edge set, respectively, specified with a subset VNT of vertices called a non-terminal set, the spanning tree with non-terminal set VNT is a connected and acyclic spanning subgraph of G that contains all the vertices of V where each vertex in a non-terminal set is not a leaf. In the case where each edge has the weight of a nonnegative integer, the problem of finding a minimum spanning tree with a non-terminal set VNT of G was known to be NP-hard. However, the complexity of finding a spanning tree on general graphs where each edge has the weight of one was unknown. In this paper, we consider this problem and first show that it is NP-hard even if each edge has the weight of one on general graphs. We also show that if G is a cograph then finding a spanning tree with a non-terminal set VNT of G is linearly solvable when each edge has the weight of one.
Reliability-Enhanced ECC-Based Memory Architecture Using In-Field Self-Repair
Gian MAYUGA Yuta YAMATO Tomokazu YONEDA Yasuo SATO Michiko INOUE

PAPER-Dependable Computing

Pubricized:
2016/06/27
Vol:
E99-D No:10
Page(s):
2591-2599
Embedded memory is extensively being used in SoCs, and is rapidly growing in size and density. It contributes to SoCs to have greater features, but at the expense of taking up the most area. Due to continuous scaling of nanoscale device technology, large area size memory introduces aging-induced faults and soft errors, which affects reliability. In-field test and repair, as well as ECC, can be used to maintain reliability, and recently, these methods are used together to form a combined approach, wherein uncorrectable words are repaired, while correctable words are left to the ECC. In this paper, we propose a novel in-field repair strategy that repairs uncorrectable words, and possibly correctable words, for an ECC-based memory architecture. It executes an adaptive reconfiguration method that ensures 'fresh' memory words are always used until spare words run out. Experimental results demonstrate that our strategy enhances reliability, and the area overhead contribution is small.
LRU-LC: Fast Estimating Cardinality of Flows over Sliding Windows
Jingsong SHAN Jianxin LUO Guiqiang NI Yinjin FU Zhaofeng WU

LETTER-Fundamentals of Information Systems

Pubricized:
2016/06/29
Vol:
E99-D No:10
Page(s):
2629-2632
Estimating the cardinality of flows over sliding windows on high-speed links is still a challenging work under time and space constrains. To solve this problem, we present a novel data structure maintaining a summary of data and propose a constant-time update algorithm for fast evicting expired information. Moreover, a further memory-reducing schema is given at a cost of very little loss of accuracy.
Mining Spatial Temporal Saliency Structure for Action Recognition
Yinan LIU Qingbo WU Linfeng XU Bo WU

LETTER-Pattern Recognition

Pubricized:
2016/07/06
Vol:
E99-D No:10
Page(s):
2643-2646
Traditional action recognition approaches use pre-defined rigid areas to process the space-time information, e.g. spatial pyramids, cuboids. However, most action categories happen in an unconstrained manner, that is, the same action in different videos can happen at different places. Thus we need a better video representation to deal with the space-time variations. In this paper, we introduce the idea of mining spatial temporal saliency. To better handle the uniqueness of each video, we use a space-time over-segmentation approach, e.g. supervoxel. We choose three different saliency measures that take not only the appearance cues, but also the motion cues into consideration. Furthermore, we design a category-specific mining process to find the discriminative power in each action category. Experiments on action recognition datasets such as UCF11 and HMDB51 show that the proposed spatial temporal saliency video representation can match or surpass some of the state-of-the-art alternatives in the task of action recognition.
Robust and Adaptive Object Tracking via Correspondence Clustering
Bo WU Yurui XIE Wang LUO

LETTER-Image Recognition, Computer Vision

Pubricized:
2016/06/23
Vol:
E99-D No:10
Page(s):
2664-2667
We propose a new visual tracking method, where the target appearance is represented by combining color distribution and keypoints. Firstly, the object is localized via a keypoint-based tracking and matching strategy, where a new clustering method is presented to remove outliers. Secondly, the tracking confidence is evaluated by the color template. According to the tracking confidence, the local and global keypoints matching can be performed adaptively. Finally, we propose a target appearance update method in which the new appearance can be learned and added to the target model. The proposed tracker is compared with five state-of-the-art tracking methods on a recent benchmark dataset. Both qualitative and quantitative evaluations show that our method has favorable performance.
Automatic Model Order Selection for Convolutive Non-Negative Matrix Factorization
Yinan LI Xiongwei ZHANG Meng SUN Chong JIA Xia ZOU

LETTER-Speech and Hearing

Vol:
E99-A No:10
Page(s):
1867-1870
Exploring a parsimonious model that is just enough to represent the temporal dependency of time serial signals such as audio or speech is a practical requirement for many signal processing applications. A well suited method for intuitively and efficiently representing magnitude spectra is to use convolutive non-negative matrix factorization (CNMF) to discover the temporal relationship among nearby frames. However, the model order selection problem in CNMF, i.e., the choice of the number of convolutive bases, has seldom been investigated ever. In this paper, we propose a novel Bayesian framework that can automatically learn the optimal model order through maximum a posteriori (MAP) estimation. The proposed method yields a parsimonious and low-rank approximation by removing the redundant bases iteratively. We conducted intuitive experiments to show that the proposed algorithm is very effective in automatically determining the correct model order.
Simple and Tunable MNM by Figure of Eight Resonator and Its Application to Microwave Isolator
Shota KOMATSU Toshiro KODERA

BRIEF PAPER

Vol:
E99-C No:10
Page(s):
1215-1218
Magnet-less non-reciprocal metamaterial (MNM) synthesise artificial magnetic gyrotropy by metal ring resonator with unilateral component insertion. Clear advantage to natural magnetic material is full integrated circuit ingredient compatibility but still suffers from drawbacks of consumption power in active component and footprint of ring resonator. A new MNM structure by a varactor inserted figure of eight resonator is introduced, which enables reduction of active components by half and even smaller footprint to the original simple ring resonator structure in addition to frequency tunability.

7101-7120hit(42807hit)

Keyword Search Result

[Keyword] (42807hit)

Experimental Design Method for High-Efficiency Microwave Power Amplifiers Based on a Low-Frequency Active Harmonic Load-Pull Technique

One-Bit to Four-Bit Dual Conversion for Security Enhancement against Power Analysis

Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis

N-gram Approximation of Latent Words Language Models for Domain Robust Automatic Speech Recognition Open Access

Statistical Bandwidth Extension for Speech Synthesis Based on Gaussian Mixture Model with Sub-Band Basis Spectrum Model

A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models

Policy Optimization for Spoken Dialog Management Using Genetic Algorithm

Neural Network Approaches to Dialog Response Retrieval and Generation

Semi-Incremental Recognition of On-Line Handwritten Japanese Text

Fast Coding-Mode Selection and CU-Depth Prediction Algorithm Based on Text-Block Recognition for Screen Content Coding

Acoustic Scene Analysis Based on Hierarchical Generative Model of Acoustic Event Sequence

Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection

HISTORY: An Efficient and Robust Algorithm for Noisy 1-Bit Compressed Sensing

A Linear Time Algorithm for Finding a Spanning Tree with Non-Terminal Set V_NT on Cographs

Reliability-Enhanced ECC-Based Memory Architecture Using In-Field Self-Repair

LRU-LC: Fast Estimating Cardinality of Flows over Sliding Windows

Mining Spatial Temporal Saliency Structure for Action Recognition

Robust and Adaptive Object Tracking via Correspondence Clustering

Automatic Model Order Selection for Convolutive Non-Negative Matrix Factorization

Simple and Tunable MNM by Figure of Eight Resonator and Its Application to Microwave Isolator

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles