The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)

601-620hit(2504hit)

  • A Robust Speech Communication into Smart Info-Media System

    Yoshikazu MIYANAGA  Wataru TAKAHASHI  Shingo YOSHIZAWA  

     
    INVITED PAPER

      Vol:
    E96-A No:11
      Page(s):
    2074-2080

    This paper introduces our developed noise robust speech communication techniques and describes its implementation to a smart info-media system, i.e., a small robot. Our designed speech communication system consists of automatic speech detection, recognition, and rejection. By using automatic speech detection and recognition, an observed speech waveform can be recognized without a manual trigger. In addition, using speech rejection, this system only accepts registered speech phrases and rejects any other words. In other words, although an arbitrary input speech waveform can be fed into this system and recognized, the system responds only to the registered speech phrases. The developed noise robust speech processing can reduce various noises in many environments. In addition to the design of noise robust speech recognition, the LSI design of this system has been introduced. By using the design of speech recognition application specific IC (ASIC), we can simultaneously realize low power consumption and real-time processing. This paper describes the LSI architecture of this system and its performances in some field experiments. In terms of current speech recognition accuracy, the system can realize 85-99% under 0-20dB SNR and echo environments.

  • Fourier-Domain Modal Delay Measurements for Multimode Fibers Optimized for the 850-nm Band in a Local Area Network

    Chan-Young KIM  Tae-Jung AHN  

     
    PAPER-Optical Fiber for Communications

      Vol:
    E96-B No:11
      Page(s):
    2840-2844

    We present transmission- and reflection-type measurement methods for the differential mode delay (DMD) of a multimode optical fiber (MMF) optimized for high-speed local area networks (LANs) for the 850-nm band. Compared with a previously reported transmission-type measurement method for the 1550-nm wavelength band, we demonstrate here high-resolution DMD measurement methods for MMFs in the 850-nm band. As the method is based on a Fourier-domain intermodal interference technique, the measurement sensitivity is ∼60-dB, and it requires a fiber only a few meters in length. The shorter wavelength also allows a threefold improvement in the measurement resolution. The reflection-type measurement technique is a more practical than the transmission-type measurement technique for the field testing of short MMFs already installed in networks. We believe that this method will be a practical tool not only for field testing of short-length MMFs already installed in networks but also for the development of new plastic optical fibers (POFs).

  • Content Aware Image Resizing with Constraint of Object Aspect Ratio Preservation

    Kazu MISHIBA  Masaaki IKEHARA  Takeshi YOSHITOME  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E96-D No:11
      Page(s):
    2427-2436

    In this paper, we propose a novel content-aware image resizing method based on grid transformation. Our method focuses on not only keeping important regions unchanged but also keeping the aspect ratio of the main object in an image unchanged. The dual conditions can avoid distortion which often occurs when only using the former condition. Our method first calculates image importance. Next, we extract the main objects on an image by using image importance. Finally, we calculate the optimal grid transformation which suppresses changes in size of important regions and in the aspect ratios of the main objects. Our method uses lower and upper thresholds for transformation to suppress distortion due to extreme shrinking and enlargement. To achieve better resizing results, we introduce a boundary discarding process. This process can assign wider regions to important regions, reducing distortions on important regions. Experimental results demonstrate that our proposed method resizes images with less distortion than other resizing methods.

  • A Travel-Efficient Driving Assistance Scheme in VANETs by Providing Recommended Speed

    Chunxiao LI  Weijia CHEN  Dawei HE  Xuelong HU  Shigeru SHIMAMOTO  

     
    PAPER-Intelligent Transport System

      Vol:
    E96-A No:10
      Page(s):
    2007-2015

    Vehicles' speed is one of the key factors in vehicle travel efficiency, as speed is related to vehicle travel time, travel safety, fuel consumption, and exhaust gas emissions (e.g., CO2 emissions). Therefore, to improve the travel efficiency, a recommended speed calculation scheme is proposed to assist driving in Vehicle Ad hoc networks (VANETs) circumstances. In the proposed scheme, vehicles' current speed and space headway are obtained by Vehicle-to-Roadside unit (V2R) communication and Vehicle-to-Vehicle (V2V) communication. Based on the vehicles' current speed and adjacent vehicles' space headway, a recommended speed is calculated by on-board units installed in the vehicles, and then this recommended speed is provided to drivers. The drivers can change their speed to the recommended speed. At the recommended speed, vehicle travel efficiency can be improved: vehicles can arrive at destinations in a shorter travel time with fewer stop times, lower fuel consumption, and less CO2 emission. In particular, when approaching intersections, vehicles can pass through the intersections with less red light waiting time and a higher non-stop passing rate.

  • Bayesian Nonparametric Approach to Blind Separation of Infinitely Many Sparse Sources

    Hirokazu KAMEOKA  Misa SATO  Takuma ONO  Nobutaka ONO  Shigeki SAGAYAMA  

     
    PAPER

      Vol:
    E96-A No:10
      Page(s):
    1928-1937

    This paper deals with the problem of underdetermined blind source separation (BSS) where the number of sources is unknown. We propose a BSS approach that simultaneously estimates the number of sources, separates the sources based on the sparseness of speech, estimates the direction of arrival of each source, and performs permutation alignment. We confirmed experimentally that reasonably good separation was obtained with the present method without specifying the number of sources.

  • Speaker-Independent Speech Emotion Recognition Based on Two-Layer Multiple Kernel Learning

    Yun JIN  Peng SONG  Wenming ZHENG  Li ZHAO  Minghai XIN  

     
    LETTER-Speech and Hearing

      Vol:
    E96-D No:10
      Page(s):
    2286-2289

    In this paper, a two-layer Multiple Kernel Learning (MKL) scheme for speaker-independent speech emotion recognition is presented. In the first layer, MKL is used for feature selection. The training samples are separated into n groups according to some rules. All groups are used for feature selection to obtain n sparse feature subsets. The intersection and the union of all feature subsets are the result of our feature selection methods. In the second layer, MKL is used again for speech emotion classification with the selected features. In order to evaluate the effectiveness of our proposed two-layer MKL scheme, we compare it with state-of-the-art results. It is shown that our scheme results in large gain in performance. Furthermore, another experiment is carried out to compare our feature selection method with other popular ones. And the result proves the effectiveness of our feature selection method.

  • Nonlinear Modeling and Analysis on Concurrent Amplification of Dual-Band Gaussian Signals Open Access

    Ikuma ANDO  GiaKhanh TRAN  Kiyomichi ARAKI  Takayuki YAMADA  Takana KAHO  Yo YAMAGUCHI  Kazuhiro UEHARA  

     
    PAPER

      Vol:
    E96-C No:10
      Page(s):
    1254-1262

    In the recently developed Flexible Wireless System (FWS), the same platform needs to deal with different wireless systems. This increases nonlinear distortion in its wideband power amplifier (PA) because the PA needs to concurrently amplify multi-band signals. By taking higher harmonics as well as inter- and cross-modulation distortion into consideration, we have developed a method to analytically evaluate the adjacent channel leakage power ratio (ACPR) and error vector magnitude (EVM) on the basis of the PA's nonlinear characteristics. We devise a novel method for modeling the PA amplifying dual-band signals. The method makes it possible to model it merely by performing a one-tone test, making use of the Volterra series expansion and the general Wiener model. We then use the Mehler formula to derive the closed-form expressions of the PA's output power spectral density (PSD), ACPR, and EVM. The derivations are based on the assumption that the transmitted signals are complex Gaussian distributed in orthogonal frequency division multiplexing (OFDM) transmission systems. We validate the method by comparing measurement and simulation results and confirm it can appropriately predict the ACPR and EVM performance of the nonlinear PA output with OFDM inputs. In short, the method enables correct modeling of a wideband PA that amplifies dual-band signals merely by conducting a one-tone test.

  • Analytic and Numerical Modeling of Normal Penetration of Early-Time (E1) High Altitude Electromagnetic Pulse (HEMP) into Dispersive Underground Multilayer Structures

    Hee-Do KANG  Il-Young OH  Tong-Ho CHUNG  Jong-Gwan YOOK  

     
    PAPER-Antennas and Propagation

      Vol:
    E96-B No:10
      Page(s):
    2625-2632

    In this paper, penetration phenomenon of an early-time (E1) high altitude electromagnetic pulse (HEMP) into dispersive underground multilayer structures is analyzed using electromagnetic modeling of wave propagation in frequency dependent lossy media. The electromagnetic pulse is dealt with in the power spectrum ranging from 100kHz to the 100MHz band, considering the fact that the power spectrum of the E1 HEMP rapidly decreases 30dB below its maximum value beyond the 100MHz band. In addition, the propagation channel consisting of several dielectric materials is modeled with the dispersive relative permittivity of each medium. Based on source and channel models, the propagation phenomenon is analyzed in the frequency and time domains. The attenuation levels at a 100m underground point are observed to be about 15 and 20dB at 100kHz and 1MHz, respectively, and the peak level of the penetrating electric field is found 5.6kV/m. To ensure the causality of the result, we utilize the Hilbert transform.

  • Complexity of Strong Satisfiability Problems for Reactive System Specifications

    Masaya SHIMAKAWA  Shigeki HAGIHARA  Naoki YONEZAKI  

     
    PAPER-Fundamentals of Information Systems

      Vol:
    E96-D No:10
      Page(s):
    2187-2193

    Many fatal accidents involving safety-critical reactive systems have occurred in unexpected situations, which were not considered during the design and test phases of system development. To prevent such accidents, reactive systems should be designed to respond appropriately to any request from an environment at any time. Verifying this property during the specification phase reduces the development costs of safety-critical reactive systems. This property of a specification is commonly known as realizability. The complexity of the realizability problem is 2EXPTIME-complete. We have introduced the concept of strong satisfiability, which is a necessary condition for realizability. Many practical unrealizable specifications are also strongly unsatisfiable. In this paper, we show that the complexity of the strong satisfiability problem is EXPSPACE-complete. This means that strong satisfiability offers the advantage of lower complexity for analysis, compared to realizability. Moreover, we show that the strong satisfiability problem remains EXPSPACE-complete even when only formulae with a temporal depth of at most 2 are allowed.

  • Multi-Stage Automatic NE and PoS Annotation Using Pattern-Based and Statistical-Based Techniques for Thai Corpus Construction

    Nattapong TONGTEP  Thanaruk THEERAMUNKONG  

     
    PAPER-Natural Language Processing

      Vol:
    E96-D No:10
      Page(s):
    2245-2256

    Automated or semi-automated annotation is a practical solution for large-scale corpus construction. However, the special characteristics of Thai language, such as lack of word-boundary and sentence-boundary markers, trigger several issues in automatic corpus annotation. This paper presents a multi-stage annotation framework, containing two stages of chunking and three stages of tagging. The two chunking stages are pattern matching-based named entity (NE) extraction and dictionary-based word segmentation while the three succeeding tagging stages are dictionary-, pattern- and statist09812490981249ical-based tagging. Applying heuristics of ambiguity priority, NE extraction is performed first on an original text using a set of patterns, in the order of pattern ambiguity. Next, the remaining text is segmented into words with a dictionary. The obtained chunks are then tagged with types of named entities or parts-of-speech (PoS) using dictionaries, patterns and statistics. Focusing on the reduction of human intervention in corpus construction, our experimental results show that the dictionary-based tagging process can assign unique tags to 64.92% of the words, with the remaining of 24.14% unknown words and 10.94% ambiguously tagged words. Later, the pattern-based tagging can reduce unknown words to only 13.34% while the statistical-based tagging can solve the ambiguously tagged words to only 3.01%.

  • Improved Speech-Presence Uncertainty Estimation Based on Spectral Gradient for Global Soft Decision-Based Speech Enhancement

    Jong-Woong KIM  Joon-Hyuk CHANG  Sang Won NAM  Dong Kook KIM  Jong Won SHIN  

     
    LETTER-Speech and Hearing

      Vol:
    E96-A No:10
      Page(s):
    2025-2028

    In this paper, we propose a speech-presence uncertainty estimation to improve the global soft decision-based speech enhancement technique by using the spectral gradient scheme. The conventional soft decision-based speech enhancement technique uses a fixed ratio (Q) of the a priori speech-presence and speech-absence probabilities to derive the speech-absence probability (SAP). However, we attempt to adaptively change Q according to the spectral gradient between the current and past frames as well as the status of the voice activity in the previous two frames. As a result, the distinct values of Q to each frequency in each frame are assigned in order to improve the performance of the SAP by tracking the robust a priori information of the speech-presence in time.

  • Speaker Recognition Using Sparse Probabilistic Linear Discriminant Analysis

    Hai YANG  Yunfei XU  Qinwei ZHAO  Ruohua ZHOU  Yonghong YAN  

     
    PAPER

      Vol:
    E96-A No:10
      Page(s):
    1938-1945

    Sparse representation has been studied within the field of signal processing as a means of providing a compact form of signal representation. This paper introduces a sparse representation based framework named Sparse Probabilistic Linear Discriminant Analysis in speaker recognition. In this latent variable model, probabilistic linear discriminant analysis is modified to obtain an algorithm for learning overcomplete sparse representations by replacing the Gaussian prior on the factors with Laplace prior that encourages sparseness. For a given speaker signal, the dictionary obtained from this model has good representational power while supporting optimal discrimination of the classes. An expectation-maximization algorithm is derived to train the model with a variational approximation to a range of heavy-tailed distributions whose limit is the Laplace. The variational approximation is also used to compute the likelihood ratio score of all trials of speakers. This approach performed well on the core-extended conditions of the NIST 2010 Speaker Recognition Evaluation, and is competitive compared to the Gaussian Probabilistic Linear Discriminant Analysis, in terms of normalized Decision Cost Function and Equal Error Rate.

  • A Capture-Safety Checking Metric Based on Transition-Time-Relation for At-Speed Scan Testing

    Kohei MIYASE  Ryota SAKAI  Xiaoqing WEN  Masao ASO  Hiroshi FURUKAWA  Yuta YAMATO  Seiji KAJIHARA  

     
    PAPER

      Vol:
    E96-D No:9
      Page(s):
    2003-2011

    Test power has become a critical issue, especially for low-power devices with deeply optimized functional power profiles. Particularly, excessive capture power in at-speed scan testing may cause timing failures that result in test-induced yield loss. This has made capture-safety checking mandatory for test vectors. However, previous capture-safety checking metrics suffer from inadequate accuracy since they ignore the time relations among different transitions caused by a test vector in a circuit. This paper presents a novel metric called the Transition-Time-Relation-based (TTR) metric which takes transition time relations into consideration in capture-safety checking. Detailed analysis done on an industrial circuit has demonstrated the advantages of the TTR metric. Capture-safety checking with the TTR metric greatly improves the accuracy of test vector sign-off and low-capture-power test generation.

  • Speaker Adaptation Based on PARAFAC2 of Transformation Matrices for Continuous Speech Recognition

    Yongwon JEONG  Sangjun LIM  Young Kuk KIM  Hyung Soon KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E96-D No:9
      Page(s):
    2152-2155

    We present an acoustic model adaptation method where the transformation matrix for a new speaker is given by the product of bases and a weight matrix. The bases are built from the parallel factor analysis 2 (PARAFAC2) of training speakers' transformation matrices. We perform continuous speech recognition experiments using the WSJ0 corpus.

  • A Current-Mirror Winner-Take-All Sense Amplifier for Low Voltage SRAMs

    Song JIA  Heqing XU  Fengfeng WU  Yuan WANG  

     
    BRIEF PAPER-Integrated Electronics

      Vol:
    E96-C No:9
      Page(s):
    1205-1207

    We propose a current mode sense amplifier that uses a current-mirror to increase the bitline sensing current, which dominates the sensing speed. A comparison of the sensing delay shows that the proposed sense amplifier can provide about 12.6∼15.4% improvement depending on different bitline loads in sensing speed over original WTA scheme.

  • Multi-Channel Cooperative Spectrum Sensing in Cognitive Radio Networks

    Ji-Hoon LEE  Woo-Jin SONG  

     
    LETTER-Communication Theory and Signals

      Vol:
    E96-A No:9
      Page(s):
    1909-1913

    Spectrum sensing is one of the main functions in cognitive radio networks. To improve the sensing performance and increase spectrum efficiency, a number of cooperative spectrum sensing methods have been proposed. However, most of these methods focused on a single-channel environment. In this letter, we present a novel cooperative spectrum sensing method based on cooperator selection in a multi-channel cognitive radio network. Using reinforcement learning, a cognitive radio user can select reliable and robust cooperators, without any a priori knowledge. Using the proposed method, a cognitive radio user can achieve better sensing capability and overcome performance degradation problems due to malicious users or erratic user behavior. Numerical results show that the proposed method can achieve excellent performance.

  • Horizontal Spectral Entropy with Long-Span of Time for Robust Voice Activity Detection

    Kun-Ching WANG  

     
    LETTER-Speech and Hearing

      Vol:
    E96-D No:9
      Page(s):
    2156-2161

    This letter introduces innovative VAD based on horizontal spectral entropy with long-span of time (HSELT) feature sets to improve mobile ASR performance in low signal-to-noise ratio (SNR) conditions. Since the signal characteristics of nonstationary noise change with time, we need long-term information of the noisy speech signal to define a more robust decision rule yielding high accuracy. We find that HSELT measures can horizontally enhance the transition between speech and non-speech segments. Based on this finding, we use the HSELT measures to achieve high accuracy for detecting speech signal form various stationary and nonstationary noises.

  • Spectral Subtraction Based on Non-extensive Statistics for Speech Recognition

    Hilman PARDEDE  Koji IWANO  Koichi SHINODA  

     
    PAPER-Speech and Hearing

      Vol:
    E96-D No:8
      Page(s):
    1774-1782

    Spectral subtraction (SS) is an additive noise removal method which is derived in an extensive framework. In spectral subtraction, it is assumed that speech and noise spectra follow Gaussian distributions and are independent with each other. Hence, noisy speech also follows a Gaussian distribution. Spectral subtraction formula is obtained by maximizing the likelihood of noisy speech distribution with respect to its variance. However, it is well known that noisy speech observed in real situations often follows a heavy-tailed distribution, not a Gaussian distribution. In this paper, we introduce a q-Gaussian distribution in the non-extensive statistics to represent the distribution of noisy speech and derive a new spectral subtraction method based on it. We found that the q-Gaussian distribution fits the noisy speech distribution better than the Gaussian distribution does. Our speech recognition experiments using the Aurora-2 database showed that the proposed method, q-spectral subtraction (q-SS), outperformed the conventional SS method.

  • 1.5–9.7-Gb/s Complete 4-PAM Serial Link Transceiver with a Wide Frequency Range CDR

    Bongsub SONG  Kyunghoon KIM  Junan LEE  Kwangsoo KIM  Younglok KIM  Jinwook BURM  

     
    PAPER-Electronic Circuits

      Vol:
    E96-C No:8
      Page(s):
    1048-1053

    A complete 4-level pulse amplitude modulation (4-PAM) serial link transceiver including a wide frequency range clock generator and clock data recovery (CDR) is proposed in this paper. A dual-loop architecture, consisting of a frequency locked loop (FLL) and a phase locked loop (PLL), is employed for the wide frequency range clocks. The generated clocks from the FLL (clock generator) and the PLL (CDR) are utilized for a transmitter clock and a receiver clock, respectively. Both FLL and PLL employ the identical voltage controlled oscillators consisting of ring-type delay-cells. To improve the frequency tuning range of the VCO, deep triode PMOS loads are utilized for each delay-cell, since the turn-on resistance of the deep triode PMOS varies substantially by the gate-voltage. As a result, fabricated in a 0.13-µm CMOS process, the proposed 4-PAM transceiver operates from 1.5 Gb/s to 9.7 Gb/s with a bit error rate of 10-12. At the maximum data-rate, the entire power dissipation of the transceiver is 254 mW, and the measured jitter of the recovered clock is 1.61 psrms.

  • A 36-mW 1.5-GS/s 7-Bit Time-Interleaved SAR ADC Using Source Follower Based Track-and-Hold Circuit in 65-nm CMOS

    Masanori FURUTA  Ippei AKITA  Junya MATSUNO  Tetsuro ITAKURA  

     
    PAPER-Analog Signal Processing

      Vol:
    E96-A No:7
      Page(s):
    1552-1561

    This paper presents a 7-bit 1.5-GS/s time-interleaved (TI) SAR ADC. The scheme achieves better isolation between sub-ADCs thanks to embedding a track-and-hold (T/H) amplifier and reference voltage buffer in each sub-ADC. The proposed dynamic T/H circuit enables high-speed, low-power operation. The prototype is fabricated in a 65-nm CMOS technology. The total active area is 0.14,mm2 and the ADC consumes 36 mW from a 1.2-V supply. The measured results show the peak spurious-free dynamic range (SFDR) and signal-to-noise-and-distortion ratio (SNDR) are 52.4 dB and 39.6 dB, respectively, and an figure of Merit (FoM) of 300 fJ/conv. is achieved.

601-620hit(2504hit)