The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)

1221-1240hit(2504hit)

  • Fast-Delay and Low-Power Level Shifter for Low-Voltage Applications

    O-Sam KWON  Kyeong-Sik MIN  

     
    LETTER-Electronic Circuits

      Vol:
    E90-C No:7
      Page(s):
    1540-1543

    A new level shifter is proposed in this paper that mitigates the contention problem between its pull-up and pull-down switches without suffering the delay penalty. Comparing this new one with two conventional shifters (CLS-1 and CLS-2) indicates that CLS-1 and CLS-2 have the delay times which are 308% and 26% slower than the proposed shifter when VDDL/VDDH=0.3 and the fan-out=2, respectively. In addition, the comparison of power-delay products shows CLS-2 consumes 28.5% more energy than the proposed shifter. For the layout area, the proposed shifter needs only 15% more than CLS-2. By comparing the propagation delay times, the power-delay products, and the area overhead, the proposed shifter is considered very suitable to future Very Deep Sub-Micron (VDSM) technologies with low-voltage applications.

  • Effective Energy Feature Compensation Using Modified Log-energy Dynamic Range Normalization for Robust Speech Recognition

    Yoonjae LEE  Hanseok KO  

     
    LETTER-Fundamental Theories for Communications

      Vol:
    E90-B No:6
      Page(s):
    1508-1511

    This paper proposes effective energy feature normalization methods for robust speech recognition in noisy environments. We first develop an energy subtraction method and a modified method for the Log-energy Dynamic Range Normalization (ERN) using inverse function. We then present the hybrid method combining the energy subtraction and the modified ERN. Using Aurora2.0 database for representative evaluations, a significant performance improvement over the ERN method is demonstrated.

  • Single Channel Speech Enhancement Based on Perceptual Frequency-Weighting

    Seiji HAYASHI  Masahiro SUGUIMOTO  

     
    LETTER-Speech and Hearing

      Vol:
    E90-D No:6
      Page(s):
    998-1001

    The present paper describes a quality enhancement of speech corrupted by additive background noise in a single channel system. The proposed approach is based on the introduction of perceptual criteria using a frequency-weighting filter in a subtractive-type enhancement process. This newly developed algorithm allows for an automatic adaptation in the time and frequency of the enhancement system and finds a suitable noise estimate according to the frequency of the corrupted speech. Experimental results show that the proposed approach can efficiently remove additive noise related to various types of noise corruption.

  • Predictive Trellis-Coded Quantization of the Cepstral Coefficients for the Distributed Speech Recognition

    Sangwon KANG  Joonseok LEE  

     
    LETTER-Multimedia Systems for Communications

      Vol:
    E90-B No:6
      Page(s):
    1570-1572

    In this paper, we propose a predictive block-constrained trellis-coded quantization (BC-TCQ) to quantize cepstral coefficients for distributed speech recognition. For prediction of the cepstral coefficients, the first order auto-regressive (AR) predictor is used. To quantize the prediction error signal effectively, we use the BC-TCQ. The quantization is compared to the split vector quantizers used in the ETSI standard, and is shown to lower cepstral distance and bit rates.

  • Mel-Wiener Filter for Mel-LPC Based Speech Recognition

    Md. Babul ISLAM  Kazumasa YAMAMOTO  Hiroshi MATSUMOTO  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:6
      Page(s):
    935-942

    This paper proposes a Mel-Wiener filter to enhance Mel-LPC spectra in the presence of additive noise. The transfer function of the proposed filter is defined by using a first-order all-pass filter instead of unit delay. The filter coefficients are estimated based on minimization of the sum of the square error on the linear frequency scale without applying the bilinear transformation and efficiently implemented in the autocorrelation domain. The proposed filter does not require any time-frequency conversion, which saves a large amount of computational load. The performance of the proposed system is comparable to that of ETSI AFE. The optimum filter order is found to be 3, and thus filtering is computationally inexpensive. The computational cost of the proposed system except VAD is 53% of ETSI AFE.

  • Acoustic Field Analysis of Surface Acoustic Wave Dispersive Delay Lines Using Inclined Chirp IDT

    Koichiro MISU  Koji IBATA  Shusou WADAKA  Takao CHIBA  Minoru K. KUROSAWA  

     
    PAPER-Ultrasonics

      Vol:
    E90-A No:5
      Page(s):
    1014-1020

    Acoustic field analysis results of surface acoustic wave dispersive delay lines using inclined chirp IDTs on a Y-Z LiNbO3 substrate are described. The calculated results are compared with optical measurements. The angular spectrum of the plane wave method is applied to calculation of the acoustic fields considering the anisotropy of the SAW velocity by using the polynomial approximation. Acoustic field propagating along the Z-axis of the substrate, which is the main beam excited by the inclined chirp IDT, shows asymmetric distribution between the +Z and -Z directions. Furthermore the SAW beam propagating in a slanted direction with an angle of +18 from the Z axis to the X-axis is observed. It is described that the SAW beam propagating in a slanted direction is the first side lobe excited by the inclined chirp IDT. The acoustic field shows asymmetric distribution along the X-axis because of the asymmetric structure of the inclined chirp IDT. Finally, acoustic field of a two-IDT connected structure which consists of the same IDTs electrically connected in series is presented. The acoustic field of the two-IDT connected structure is calculated to be superposed onto the calculated result of the acoustic field exited by one IDT on the calculated result shifted along the X-axis. Two SAW beams excited by IDTs are observed. The distributions of the SAW beams are not in parallel. The calculated results show good agreement with the optical measurement results.

  • A Hidden Semi-Markov Model-Based Speech Synthesis System

    Heiga ZEN  Keiichi TOKUDA  Takashi MASUKO  Takao KOBAYASIH  Tadashi KITAMURA  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:5
      Page(s):
    825-834

    A statistical speech synthesis system based on the hidden Markov model (HMM) was recently proposed. In this system, spectrum, excitation, and duration of speech are modeled simultaneously by context-dependent HMMs, and speech parameter vector sequences are generated from the HMMs themselves. This system defines a speech synthesis problem in a generative model framework and solves it based on the maximum likelihood (ML) criterion. However, there is an inconsistency: although state duration probability density functions (PDFs) are explicitly used in the synthesis part of the system, they have not been incorporated into its training part. This inconsistency can make the synthesized speech sound less natural. In this paper, we propose a statistical speech synthesis system based on a hidden semi-Markov model (HSMM), which can be viewed as an HMM with explicit state duration PDFs. The use of HSMMs can solve the above inconsistency because we can incorporate the state duration PDFs explicitly into both the synthesis and the training parts of the system. Subjective listening test results show that use of HSMMs improves the reported naturalness of synthesized speech.

  • A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

    Tomoki TODA  Keiichi TOKUDA  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:5
      Page(s):
    816-824

    This paper describes a novel parameter generation algorithm for an HMM-based speech synthesis technique. The conventional algorithm generates a parameter trajectory of static features that maximizes the likelihood of a given HMM for the parameter sequence consisting of the static and dynamic features under an explicit constraint between those two features. The generated trajectory is often excessively smoothed due to the statistical processing. Using the over-smoothed speech parameters usually causes muffled sounds. In order to alleviate the over-smoothing effect, we propose a generation algorithm considering not only the HMM likelihood maximized in the conventional algorithm but also a likelihood for a global variance (GV) of the generated trajectory. The latter likelihood works as a penalty for the over-smoothing, i.e., a reduction of the GV of the generated trajectory. The result of a perceptual evaluation demonstrates that the proposed algorithm causes considerably large improvements in the naturalness of synthetic speech.

  • A New Equivalence Relation of Logic Functions and Its Application in the Design of AND-OR-EXOR Networks

    Debatosh DEBNATH  Tsutomu SASAO  

     
    PAPER

      Vol:
    E90-A No:5
      Page(s):
    932-940

    This paper presents a design method for AND-OR-EXOR three-level networks, where a single two-input exclusive-OR (EXOR) gate is used. The network realizes an EXOR of two sum-of-products expressions (EX-SOPs). The problem is to minimize the total number of products in the two sum-of-products expressions (SOPs). We introduce the notion of µ-equivalence of logic functions to develop exact minimization algorithms for EX-SOPs with up to five variables. We minimized all the NP-representative functions for up to five variables and showed that five-variable functions require 9 or fewer products in minimum EX-SOPs. For n-variable functions, minimum EX-SOPs require at most 9·2n-5 (n ≤ 6) products. This upper bound is smaller than 2n-1, which is the upper bound for SOPs. We also found that, for five-variable functions, on the average, minimum EX-SOPs require about 40% fewer literals than minimum SOPs.

  • Response Time Reduction of Speech Recognizers Using Single Gaussians

    Sangbae JEONG  Hoirin KIM  Minsoo HAHN  

     
    LETTER-Speech and Hearing

      Vol:
    E90-D No:5
      Page(s):
    868-871

    In this paper, we propose a useful algorithm that can be applied to reduce the response time of speech recognizers based on HMM's. In our algorithm, to reduce the response time, promising HMM states are selected by single Gaussians. In speech recognition, HMM state likelihoods are evaluated by the corresponding single Gaussians first, and then likelihoods by original full Gaussians are computed and replaced only for the HMM states having relatively large likelihoods. By doing so, we can reduce the pattern-matching time for speech recognition significantly without any noticeable loss of the recognition rate. In addition, we cluster the single Gaussians into groups by measuring the distance between Gaussians. Therefore, we can reduce the extra memory much more. In our 10,000 word Korean POI (point-of-interest) recognition task, our proposed algorithm shows 35.57% reduction of the response time in comparison with that of the baseline system at the cost of 10% degradation of the WER.

  • A Labeled Transition Model A-LTS for History-Based Aspect Weaving and Its Expressive Power

    Isao YAGI  Yoshiaki TAKATA  Hiroyuki SEKI  

     
    PAPER-Automata and Formal Language Theory

      Vol:
    E90-D No:5
      Page(s):
    799-807

    This paper proposes an event-based transition system called A-LTS. An A-LTS is a simple system consisting of two agents, a basic program and a monitor. The monitor observes the behavior of the basic program and if the behavior matches some pre-defined pattern, then the monitor interrupts the execution of the basic program and possibly triggers the execution of another specific program. An A-LTS models a common feature found in recent software technologies such as Aspect-Oriented Programming (AOP), history-based access control and active database. We investigate the expressive power of A-LTS and show that it is strictly stronger than finite state machines and strictly weaker than pushdown automata (PDA). This implies that the model checking problem for A-LTS is decidable. It is also shown that the expressive power of A-LTS, linear context-free grammar and deterministic PDA are mutually incomparable. We also discuss the relationship between A-LTS and pointcut/advice in AOP.

  • Assessment of On-Line Model Quality and Threshold Estimation in Speaker Verification

    Javier R. SAETA  Javier HERNANDO  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:4
      Page(s):
    759-765

    The selection of the most representative utterances coming from a speaker is essential for the right performance of automatic enrollment in speaker verification. Model quality measures and threshold estimation methods mainly deal with the scarcity of data and the difficulty of obtaining data from impostors in real applications. Conventional methods estimate the quality of the training utterances once the model is created. In such case, it is not possible to ask the user for more utterances during the training session if necessary. A new training session must be started. That was especially unusable in applications where only one or two enrolment sessions were allowed. In this paper, a new on-line quality method based on a male and a female Universal Background Model (UBM) is introduced. The two models act as a reference for new utterances and show if they belong to the same speaker and provide a measure of its quality at the same time. On the other hand, the estimation of the verification threshold is also strongly influenced by the previous selection of the speaker's utterances. In this context, potential outliers, i.e., those client scores which are distant with regard to mean, could lead to wrong mean and variance client estimations. To alleviate this problem, some efficient threshold estimation methods based on removing or weighting scores are proposed here. Before estimating the threshold, the client scores catalogued as outliers are removed, pruned or weighted, improving subsequent estimations. Text-dependent experiments have been carried out by using a telephonic multi-session database in Spanish. The database has been recorded by the authors and has 184 speakers.

  • 18-GHz Clock Distribution Using a Coupled VCO Array

    Takayuki SHIBASAKI  Hirotaka TAMURA  Kouichi KANDA  Hisakatsu YAMAGUCHI  Junji OGAWA  Tadahiro KURODA  

     
    PAPER-Analog and Communications

      Vol:
    E90-C No:4
      Page(s):
    811-822

    This paper describes an 18-GHz coupled VCO array for low jitter and low phase deviation clock distribution. To reduce the skew, jitter and power consumption associated with clock distribution, the clock is generated by a one-dimensional VCO array in which the oscillating nodes of adjacent VCOs are directly connected with wires. The effects of the wire length and number of unit VCOs in the array are discussed. Both 4-unit and a 2-unit VCO arrays for delivering a clock signal to a 16:1 multiplexor were designed and fabricated in a 90-nm CMOS process. The frequency range of the 4-unit VCO array was 16 GHz to 18.5 GHz while each unit VCO consumed 2 mA.

  • Asymmetric Slope Dual Mode Differential Logic Circuit for Compatibility of Low-Power and High-Speed Operations

    Masao MORIMOTO  Makoto NAGATA  Kazuo TAKI  

     
    PAPER-Digital

      Vol:
    E90-C No:4
      Page(s):
    675-682

    Asymmetric Slope Dual Mode Differential Logic (ASDMDL) embodies high-speed dynamic and low-power static operations in a single design. Two-phase dual-rail logic signaling is used in a high-speed operation, where a logical evaluation is preceded by pre-charge, and it asserts one of the rails with an asymmetrically shortened rise transition to express a binary result. On the other hand, single-phase differential logic signaling eliminates pre-charge and leads to a low-power static operation. The operation mode is defined by the logic signaling styles, and no control signal is needed in the logic cell. The design of mixed CMOS and ASDMDL logic circuits can be automated with general logic synthesis and place-and-route techniques, since the physical ASDMDL cell is prepared in such a way to comply with a CMOS standard-cell design flow. A mixed ASDMDL/CMOS micro-processor in a 0.18-µm CMOS technology demonstrated 232 MHz operation, corresponding to 14% speed improvement over a full CMOS implementation. This was achieved by substituting ASDMDL cells for only 4% of the CMOS logic cells in data paths. The low-speed operation of ASDMDL at 100 MHz was nearly equivalent to that of CMOS. However, power consumption was reduced by 3% due to the use of ASDMDL complex logic cells. Area overhead was less than 4%.

  • Distributed Dynamic Spectrum Management for Digital Subscriber Lines

    Yu-Sun LIU  Zeng-Jey SU  

     
    PAPER-Transmission Systems and Transmission Equipment for Communications

      Vol:
    E90-B No:3
      Page(s):
    491-498

    This paper investigates the dynamic spectrum management problem for digital subscriber lines. Two new distributed dynamic spectrum management algorithms, which improve upon the existing iterative water-filling algorithm, are proposed. Unlike the iterative water-filling algorithm, in which crosstalk interference is reduced by using adaptive power backoff, the new algorithms employ full power and mitigate crosstalk interference by shifting one user's spectrum away from the other's. Simulation results show that the new algorithms achieve significant performance gains over the iterative water-filling algorithm in mixed central office/remote terminal (CO/RT) deployment asymmetric digital subscriber line (ADSL) and upstream very-high bit-rate digital subscriber line (VDSL).

  • An Embedding Scheme for Binary and Grayscale Watermarks by Spectrum Spreading and Its Performance Analysis

    Ming-Chiang CHENG  Kuen-Tsair LAY  

     
    PAPER-Image

      Vol:
    E90-A No:3
      Page(s):
    670-681

    Digital watermarking is a technique that aims at hiding a message signal in a multimedia signal for copyright claim, authentication, device control, or broadcast monitoring, etc. In this paper, we focus on embedding watermarks into still images, where the watermarks themselves can be binary sequences or grayscale images. We propose to scramble the watermark bits with pseudo-noise (PN) or orthogonal codes before they are embedded into an image. We also try to incorporate error correction coding (ECC) into the watermarking scheme, anticipating reduction of the watermark bit error rate (WBER). Due to the similarity between the PN/orthogonal-coded watermarking and the spread spectrum communication, it is natural that, following similar derivations regarding data BER in digital communications, we derive certain explicit quantitative relationships regarding the tradeoff between the WBER, the watermark capacity (i.e. the number of watermark bits) and the distortion suffered by the original image, which is measured in terms of the embedded image's signal-to-noise ratio (abbreviated as ISNR). These quantitative relationships are compactly summarized into a so-called tradeoff triangle, which constitutes the major contribution of this paper. For the embedding of grayscale watermarks, an unequal error protection (UEP) scheme is proposed to provide different degrees of robustness for watermark bits of different degrees of significance. In this UEP scheme, optimal strength factors for embedding different watermark bits are sought so that the mean squared error suffered by the extracted watermark, which is by itself a grayscale image, is minimized while a specified ISNR is maintained.

  • Superconductivity for Mass Spectroscopy

    Masataka OHKUBO  

     
    INVITED PAPER

      Vol:
    E90-C No:3
      Page(s):
    550-555

    Time-of-Flight Mass Spectroscopy (TOF-MS) with superconducting detectors has two advantages over MS with conventional ion detectors. First, it is coverage for a very wide range of molecule weight over 1,000,000. Secondly, kinetic energies of accelerated molecules can be measured at impact events one by one. These unique features enable an ultimate detection efficiency of 100% for intact ions and a fragmentation analysis that is critical for top-down proteomics. Superconducting MS is expected to play a role in, for example, the detection of antigen-antibody complexes, which are important for medical diagnosis. In this paper, how superconductivity contributes to MS is described.

  • Novel Square Photonic Crystal Fibers with Ultra-Flattened Chromatic Dispersion and Low Confinement Losses

    Feroza BEGUM  Yoshinori NAMIHIRA  S.M. Abdur RAZZAK  Nianyu ZOU  

     
    PAPER-Optoelectronics

      Vol:
    E90-C No:3
      Page(s):
    607-612

    This study proposes a novel structure of index-guiding square photonic crystal fibers (SPCF) having simultaneously ultra-flattened chromatic dispersion characteristics and low confinement losses in a wide wavelength range. The finite difference method (FDM) with anisotropic perfectly matched layers (PMLs) is used to analyze the various properties of square PCF. The findings reveal that it is possible to design five-ring PCFs with a flattened negative chromatic dispersion of 0-1.5 ps/(nm.km) in a wavelength range of 1.27 µm to 1.7 µm and a flattened chromatic dispersion of 01.15 ps/(nm.km) in a wavelength range of 1.25 µm to 1.61 µm. Simultaneously it also exhibited that the confinement losses are less than 10-9 dB/m and 10-10 dB/m in the wavelength range of 1.25 µm to 1.7 µm.

  • A MFCC-Based CELP Speech Coder for Server-Based Speech Recognition in Network Environments

    Jae Sam YOON  Gil Ho LEE  Hong Kook KIM  

     
    PAPER-Speech/Audio Processing

      Vol:
    E90-A No:3
      Page(s):
    626-632

    Existing standard speech coders can provide high quality speech communication. However, they tend to degrade the performance of automatic speech recognition (ASR) systems that use the reconstructed speech. The main cause of the degradation is in that the linear predictive coefficients (LPCs), which are typical spectral envelope parameters in speech coding, are optimized to speech quality rather than to the performance of speech recognition. In this paper, we propose a speech coder using mel-frequency cepstral coefficients (MFCCs) instead of LPCs to improve the performance of a server-based speech recognition system in network environments. To develop the proposed speech coder with a low-bit rate, we first explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel errors. As a result, we propose an 8.7 kbps MFCC-based CELP coder. It is shown that the proposed speech coder has a comparable speech quality to 8 kbps G.729 and the ASR system using the proposed speech coder gives the relative word error rate reduction by 6.8% as compared to the ASR system using G.729 on a large vocabulary task (AURORA4).

  • X-Ray Detection Using Superconducting Tunnel Junction Shaped Normal-Distribution-Function

    Tohru TAINO  Tomohiro NISHIHARA  Koichi HOSHINO  Hiroaki MYOREN  Hiromi SATO  Hirohiko M. SHIMIZU  Susumu TAKADA  

     
    PAPER

      Vol:
    E90-C No:3
      Page(s):
    566-569

    A normal-distribution-function-shaped superconducting tunnel junction (NDF-STJ) which consists of Nb/Al-AlOx/Al/Nb has been fabricated as an X-ray detector. Current - voltage characteristics were measured at 0.4 K using three kinds of STJs, which have the dispersion parameters σ of 0.25, 0.45 and 0.75. These STJs showed very low subgap leakage current of about 5 nA. By irradiating with 5.9 keV X-rays, we obtained the spectrum of these NDF-STJs. They showed good energy resolution with small magnetic fields of below 3 mT, which is about one-tenth of those for conventional-shaped STJs.

1221-1240hit(2504hit)