The search functionality is under construction.
The search functionality is under construction.

IEICE TRANSACTIONS on Information

  • Impact Factor

    0.59

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.4

Advance publication (published online immediately after acceptance)

Volume E90-D No.5  (Publication Date:2007/05/01)

    Regular Section
  • A Labeled Transition Model A-LTS for History-Based Aspect Weaving and Its Expressive Power

    Isao YAGI  Yoshiaki TAKATA  Hiroyuki SEKI  

     
    PAPER-Automata and Formal Language Theory

      Page(s):
    799-807

    This paper proposes an event-based transition system called A-LTS. An A-LTS is a simple system consisting of two agents, a basic program and a monitor. The monitor observes the behavior of the basic program and if the behavior matches some pre-defined pattern, then the monitor interrupts the execution of the basic program and possibly triggers the execution of another specific program. An A-LTS models a common feature found in recent software technologies such as Aspect-Oriented Programming (AOP), history-based access control and active database. We investigate the expressive power of A-LTS and show that it is strictly stronger than finite state machines and strictly weaker than pushdown automata (PDA). This implies that the model checking problem for A-LTS is decidable. It is also shown that the expressive power of A-LTS, linear context-free grammar and deterministic PDA are mutually incomparable. We also discuss the relationship between A-LTS and pointcut/advice in AOP.

  • Transfer Information Enhancement with a 2-D Tactile Stimulator Array for an Acoustic Vision Substitute System

    Hirofumi TAKI  Toru SATO  

     
    PAPER-Rehabilitation Engineering and Assistive Technology

      Page(s):
    808-815

    Existing vision substitute systems have insufficient spatial resolution to provide environmental information. To present detailed spatial information, we propose two stimulation methods to enhance transfer information using a 2-D tactile stimulator array. First, stimulators are divided into several groups. Since each stimulator group is activated alternately, the interval of stimulations can be shortened to less than the two-point discrimination threshold. In the case that stimulators are divided into two and four groups, the number of stimulators increases to twice and four times, respectively, that in the case of the two-point discrimination threshold. Further, a user selects the measurement range and the system presents targets within the range. The user acquires spatial information of the entire measurement area by changing the measurement range. This method can accurately present a range of targets. We examine and confirm these methods experimentally.

  • A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

    Tomoki TODA  Keiichi TOKUDA  

     
    PAPER-Speech and Hearing

      Page(s):
    816-824

    This paper describes a novel parameter generation algorithm for an HMM-based speech synthesis technique. The conventional algorithm generates a parameter trajectory of static features that maximizes the likelihood of a given HMM for the parameter sequence consisting of the static and dynamic features under an explicit constraint between those two features. The generated trajectory is often excessively smoothed due to the statistical processing. Using the over-smoothed speech parameters usually causes muffled sounds. In order to alleviate the over-smoothing effect, we propose a generation algorithm considering not only the HMM likelihood maximized in the conventional algorithm but also a likelihood for a global variance (GV) of the generated trajectory. The latter likelihood works as a penalty for the over-smoothing, i.e., a reduction of the GV of the generated trajectory. The result of a perceptual evaluation demonstrates that the proposed algorithm causes considerably large improvements in the naturalness of synthetic speech.

  • A Hidden Semi-Markov Model-Based Speech Synthesis System

    Heiga ZEN  Keiichi TOKUDA  Takashi MASUKO  Takao KOBAYASIH  Tadashi KITAMURA  

     
    PAPER-Speech and Hearing

      Page(s):
    825-834

    A statistical speech synthesis system based on the hidden Markov model (HMM) was recently proposed. In this system, spectrum, excitation, and duration of speech are modeled simultaneously by context-dependent HMMs, and speech parameter vector sequences are generated from the HMMs themselves. This system defines a speech synthesis problem in a generative model framework and solves it based on the maximum likelihood (ML) criterion. However, there is an inconsistency: although state duration probability density functions (PDFs) are explicitly used in the synthesis part of the system, they have not been incorporated into its training part. This inconsistency can make the synthesized speech sound less natural. In this paper, we propose a statistical speech synthesis system based on a hidden semi-Markov model (HSMM), which can be viewed as an HMM with explicit state duration PDFs. The use of HSMMs can solve the above inconsistency because we can incorporate the state duration PDFs explicitly into both the synthesis and the training parts of the system. Subjective listening test results show that use of HSMMs improves the reported naturalness of synthesized speech.

  • Word Error Rate Minimization Using an Integrated Confidence Measure

    Akio KOBAYASHI  Kazuo ONOE  Shinichi HOMMA  Shoei SATO  Toru IMAI  

     
    PAPER-Speech and Hearing

      Page(s):
    835-843

    This paper describes a new criterion for speech recognition using an integrated confidence measure to minimize the word error rate (WER). The conventional criteria for WER minimization obtain the expected WER of a sentence hypothesis merely by comparing it with other hypotheses in an n-best list. The proposed criterion estimates the expected WER by using an integrated confidence measure with word posterior probabilities for a given acoustic input. The integrated confidence measure, which is implemented as a classifier based on maximum entropy (ME) modeling or support vector machines (SVMs), is used to acquire probabilities reflecting whether the word hypotheses are correct. The classifier is comprised of a variety of confidence measures and can deal with a temporal sequence of them to attain a more reliable confidence. Our proposed criterion for minimizing WER achieved a WER of 9.8% and a 3.9% reduction, relative to conventional n-best rescoring methods in transcribing Japanese broadcast news in various environments such as under noisy field and spontaneous speech conditions.

  • Compression of Video Data Using Parametric Line and Natural Cubic Spline Block Level Approximation

    Murtaza Ali KHAN  Yoshio OHNO  

     
    PAPER-Image Processing and Video Processing

      Page(s):
    844-850

    This paper presents a method for lossy compression of digital video data by parametric line and Natural cubic spline approximation. The method estimates the variation of pixel values in the temporal dimension by taking group of pixels together as keyblocks and interpolating them in Euclidean space. Break and fit criterion is used to minimize the number of keyblocks required for encoding and decoding of approximated data. Each group of pixels at fixed spatial location is encoded/decoded independently. The proposed method can easily be incorporated in the existing video data compression techniques based on Discrete Cosine Transform or Wavelet Transform.

  • Uncalibrated Factorization Using a Variable Symmetric Affine Camera

    Kenichi KANATANI  Yasuyuki SUGAYA  Hanno ACKERMANN  

     
    PAPER-Image Recognition, Computer Vision

      Page(s):
    851-858

    In order to reconstruct 3-D Euclidean shape by the Tomasi-Kanade factorization, one needs to specify an affine camera model such as orthographic, weak perspective, and paraperspective. We present a new method that does not require any such specific models. We show that a minimal requirement for an affine camera to mimic perspective projection leads to a unique camera model, called symmetric affine camera, which has two free functions. We determine their values from input images by linear computation and demonstrate by experiments that an appropriate camera model is automatically selected.

  • Web Page Filtering for Domain Ontology with the Context of Concept

    Bo-Yeong KANG  Hong-Gee KIM  

     
    LETTER-Artificial Intelligence and Cognitive Science

      Page(s):
    859-862

    Despite the importance of domain-specific resource construction for domain ontology development, few studies have sought to develop a method for automatically identifying domain ontology-relevant web pages. To address this situation, here we propose a web page filtering scheme for domain ontology that identifies domain-relevant web pages from the web based on the context of concepts. Testing of the proposed filtering scheme with a business domain ontology on YahooPicks web pages yielded promising filtering results that were superior to those obtained using the baseline system.

  • Identification of ARMA Speech Models Using an Effective Representation of Voice Source

    M. Shahidur RAHMAN  Tetsuya SHIMAMURA  

     
    LETTER-Speech and Hearing

      Page(s):
    863-867

    A two-stage least square identification method is proposed for estimating ARMA (autoregressive moving average) coefficients from speech signals. A pulse-train like input sequence is often employed to account for the source effects in estimating vocal tract parameters of voiced speech. Due to glottal and radiation effects, the pulse train, however, does not represent the effective voice source. The authors have already proposed a simple but effective model of voice source for estimating AR (autoregressive) coefficients. This letter extends our approach to ARMA analysis to wider varieties of speech sounds including nasal vowels and consonants. Analysis results on both synthetic and natural nasal speech are presented to demonstrate the analysis ability of the method.

  • Response Time Reduction of Speech Recognizers Using Single Gaussians

    Sangbae JEONG  Hoirin KIM  Minsoo HAHN  

     
    LETTER-Speech and Hearing

      Page(s):
    868-871

    In this paper, we propose a useful algorithm that can be applied to reduce the response time of speech recognizers based on HMM's. In our algorithm, to reduce the response time, promising HMM states are selected by single Gaussians. In speech recognition, HMM state likelihoods are evaluated by the corresponding single Gaussians first, and then likelihoods by original full Gaussians are computed and replaced only for the HMM states having relatively large likelihoods. By doing so, we can reduce the pattern-matching time for speech recognition significantly without any noticeable loss of the recognition rate. In addition, we cluster the single Gaussians into groups by measuring the distance between Gaussians. Therefore, we can reduce the extra memory much more. In our 10,000 word Korean POI (point-of-interest) recognition task, our proposed algorithm shows 35.57% reduction of the response time in comparison with that of the baseline system at the cost of 10% degradation of the WER.

  • Frame-Level ρ-Domain R-D Optimization in H.264

    Yutao DONG  Xiangzhong FANG  Jing YANG  

     
    LETTER-Image Processing and Video Processing

      Page(s):
    872-876

    The frame-level R-D optimization in H.264 is very important in video storage scenarios. Among all of the sub-optimal algorithms, a greedy iteration algorithm (GIA) can best lower the computational complexity of frame-level R-D optimization. In order to further lower the computational complexity, a ρ-domain frame-level R-D optimization algorithm is proposed in this letter. Different from GIA, every frame's rate and distortion can be estimated accurately without actual encoding in our proposed algorithm. Simulation results show that our proposed algorithm can lower the computational complexity greatly with negligible variation in peak signal-to-noise ratio (PSNR) compared with GIA.

  • A More Robust Subsampling-Based Image Watermarking

    Chih-Cheng LO  Pao-Tung WANG  Jeng-Shyang PAN  Bin-Yih LIAO  

     
    LETTER-Image Processing and Video Processing

      Page(s):
    877-878

    In this letter, we propose a novel subsampling based image watermark sequentially embedding scheme to reduce the risk of common permutation attack. The image is still perceptual after watermarking, and experimental results also show its effectiveness and robustness.