Yuki IKENO Kunio SAKAKIBARA Nobuyoshi KIKUMA Hiroshi HIRAYAMA
We developed a slotted waveguide planer array antenna with partially parallel feeding in millimeter-wave band. Travelling-wave excitation is more effective for low loss feeding of array antennas than parallel feeding systems. However, array antenna with travelling-wave excitation essentially possesses a significant problem of long line effect which degrades gain due to beam shift by frequency change when the array antenna is fed from the edge of the radiating waveguide. We propose the way to reduce the gain degradation due to frequency change, thus, partially parallel feeding system is developed. Measured performance of the developed antenna is evaluated in this paper.
Hiroshi HIRAYAMA Nobuyoshi KIKUMA Kunio SAKAKIBARA
A new scheme to avoid null zone for HF-band RFID without expanding antenna size is proposed. At first, we demonstrate by FDTD simulation that the null zone occurs because of cancellation of magnetic fields over the loop surface. To prevent cancellation of magnetic fields, the loop antenna is split into four parts, which work as a planar array antenna. The outputs of antennas are gathered by using combining circuit. We have validated by FDTD simulation that the proposed scheme enlarges the worst received power by 13.1 dB.
Nobutaka KITO Kensuke HANAI Naofumi TAKAGI
A C-testable 4-2 adder tree for an easily testable high-speed multiplier is proposed, and a recursive method for test generation is shown. By using the specific patterns that we call 'alternately inverted patterns,' the adder tree, as well as partial product generators, can be tested with 14 patterns regardless of its operand size under the cell fault model. The test patterns are easily fed through the partial product generators. The hardware overhead of the 4-2 adder tree with partial product generators for a 64-bit multiplier is about 15%. By using a previously proposed easily testable adder as the final adder, we can obtain an easily testable high-speed multiplier.
Young-Seok PARK Pyung-Su HAN Woo-Young CHOI
A linear model for feedforward ring oscillators (FROs) is developed and oscillator characteristics are analyzed using the model. The model allows prediction of multiple oscillation modes as well as the oscillation frequency of each mode. The prediction agrees well with SPICE simulation results.
Junichi HORI Kentarou SUNAGA Satoru WATANABE
We investigated suitable spatial inverse filters for cortical dipole imaging from the scalp electroencephalogram (EEG). The effects of incorporating statistical information of signal and noise into inverse procedures were examined by computer simulations and experimental studies. The parametric projection filter (PPF) and parametric Wiener filter (PWF) were applied to an inhomogeneous three-sphere volume conductor head model. The noise covariance matrix was estimated by applying independent component analysis (ICA) to scalp potentials. The present simulation results suggest that the PPF and the PWF provided excellent performance when the noise covariance was estimated from the differential noise between EEG and the separated signal using ICA and the signal covariance was estimated from the separated signal. Moreover, the spatial resolution of the cortical dipole imaging was improved while the influence of noise was suppressed by including the differential noise at the instant of the imaging and by adjusting the duration of noise sample according to the signal to noise ratio. We applied the proposed imaging technique to human experimental data of visual evoked potential and obtained reasonable results that coincide to physiological knowledge.
Jae-Hun CHOI Joon-Hyuk CHANG Seong-Ro LEE
In this paper, a novel approach to speech reinforcement in a low-bit-rate speech coder under ambient noise environments is proposed. The excitation vector of ambient noise is efficiently obtained at the near-end and then combined with the excitation signal of the far-end for a suitable reinforcement gain within the G.729 CS-ACELP Annex. B framework. For this reason, this can be clearly different from previous approaches in that the present approach does not require an additional arithmetic step such as the discrete Fourier transform (DFT). Experimental results indicate that the proposed method shows better performance than or at least comparable to conventional approaches with a lower computational burden.
Mohammad Tariqul ISLAM Ahmed Toaha MOBASHSHER Norbahiah MISRAN
In this paper, a novel feeding technique is proposed to feed a printed rectangular ring patch antenna that attains high gain in two bands simultaneously. The prototype antenna exhibits good impedance bandwidths satisfying ISM 2.45/5.8 GHz achieving maximum gain of 9.56 and 10.17 dBi, respectively, with a stable radiation pattern.
Tetsuo KOSAKA Yuui TAKEDA Takashi ITO Masaharu KATO Masaki KOHDA
In this paper, we propose a new speaker-class modeling and its adaptation method for the LVCSR system and evaluate the method on the Corpus of Spontaneous Japanese (CSJ). In this method, closer speakers are selected from training speakers and the acoustic models are trained by using their utterances for each evaluation speaker. One of the major issues of the speaker-class model is determining the selection range of speakers. In order to solve the problem, several models which have a variety of speaker range are prepared for each evaluation speaker in advance, and the most proper model is selected on a likelihood basis in the recognition step. In addition, we improved the recognition performance using unsupervised speaker adaptation with the speaker-class models. In the recognition experiments, a significant improvement could be obtained by using the proposed speaker adaptation based on speaker-class models compared with the conventional adaptation method.
Yanqing SUN Yu ZHOU Qingwei ZHAO Yonghong YAN
This paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1 kHz and 3 kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15 dB and 0 dB SNR (signal to noise ratio) respectively, and 64.50% for the three years' 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.
Takashi NOSE Yuhei OTA Takao KOBAYASHI
We propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel training data. In the proposed technique, the phoneme information with durations and a quantized F0 contour are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized F0 symbols are used as prosodic context. A phonetically and prosodically context-dependent label sequence is generated from the transmitted phoneme and the F0 symbols. Then, converted speech is generated from the label sequence with durations using the target speaker's pre-trained context-dependent HMMs. In the model training, the models of the source and target speakers can be trained separately, hence there is no need to prepare parallel speech data of the source and target speakers. Objective and subjective experimental results show that the segment-based voice conversion with phonetic and prosodic contexts works effectively even if the parallel speech data is not available.
Alberto Yoshihiro NAKANO Seiichi NAKAGAWA Kazumasa YAMAMOTO
In this work, spatial information consisting of the position and orientation angle of an acoustic source is estimated by an artificial neural network (ANN). The estimated position of a speaker in an enclosed space is used to refine the estimated time delays for a delay-and-sum beamformer, thus enhancing the output signal. On the other hand, the orientation angle is used to restrict the lexicon used in the recognition phase, assuming that the speaker faces a particular direction while speaking. To compensate the effect of the transmission channel inside a short frame analysis window, a new cepstral mean normalization (CMN) method based on a Gaussian mixture model (GMM) is investigated and shows better performance than the conventional CMN for short utterances. The performance of the proposed method is evaluated through Japanese digit/command recognition experiments.
Yasunari OBUCHI Takashi SUMIYOSHI
In this paper we introduce a new framework of audio processing, which is essential to achieve a trigger-free speech interface for home appliances. If the speech interface works continually in real environments, it must extract occasional voice commands and reject everything else. It is extremely important to reduce the number of false alarms because the number of irrelevant inputs is much larger than the number of voice commands even for heavy users of appliances. The framework, called Intentional Voice Command Detection, is based on voice activity detection, but enhanced by various speech/audio processing techniques such as emotion recognition. The effectiveness of the proposed framework is evaluated using a newly-collected large-scale corpus. The advantages of combining various features were tested and confirmed, and the simple LDA-based classifier demonstrated acceptable performance. The effectiveness of various methods of user adaptation is also discussed.
Seongyong AHN Hyejeong HONG HyunJin KIM Jin-Ho AHN Dongmyong BAEK Sungho KANG
This paper proposes a new pattern matching architecture with multi-character processing for deep packet inspection. The proposed pattern matching architecture detects the start point of pattern matching from multi-character input using input text alignment. By eliminating duplicate hardware components using process element tree, hardware cost is greatly reduced in the proposed pattern matching architecture.
Shun WATANABE Ryutaroh MATSUMOTO Tomohiko UYEMATSU
Privacy amplification is a technique to distill a secret key from a random variable by a function so that the distilled key and eavesdropper's random variable are statistically independent. There are three kinds of security criteria for the key distilled by privacy amplification: the normalized divergence criterion, which is also known as the weak security criterion, the variational distance criterion, and the divergence criterion, which is also known as the strong security criterion. As a technique to distill a secret key, it is known that the encoder of a Slepian-Wolf (the source coding with full side-information at the decoder) code can be used as a function for privacy amplification if we employ the weak security criterion. In this paper, we show that the encoder of a Slepian-Wolf code cannot be used as a function for privacy amplification if we employ the criteria other than the weak one.
Yamato OHTANI Tomoki TODA Hiroshi SARUWATARI Kiyohiro SHIKANO
We have developed a one-to-many eigenvoice conversion (EVC) system that allows us to convert a single source speaker's voice into an arbitrary target speaker's voice using an eigenvoice Gaussian mixture model (EV-GMM). This system is capable of effectively building a conversion model for an arbitrary target speaker by adapting the EV-GMM using only a small amount of speech data uttered by the target speaker in a text-independent manner. However, the conversion performance is still insufficient for the following reasons: 1) the excitation signal is not precisely modeled; 2) the oversmoothing of the converted spectrum causes muffled sounds in converted speech; and 3) the conversion model is affected by redundant acoustic variations among a lot of pre-stored target speakers used for building the EV-GMM. In order to address these problems, we apply the following promising techniques to one-to-many EVC: 1) mixed excitation; 2) a conversion algorithm considering global variance; and 3) adaptive training of the EV-GMM. The experimental results demonstrate that the conversion performance of one-to-many EVC is significantly improved by integrating all of these techniques into the one-to-many EVC system.
Yoshihide KATO Shigeki MATSUBARA
This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.
In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.
Pham Thanh GIANG Kenji NAKAGAWA
The IEEE 802.11 MAC standard for wireless ad hoc networks adopts Binary Exponential Back-off (BEB) mechanism to resolve bandwidth contention between stations. BEB mechanism controls the bandwidth allocation for each station by choosing a back-off value from one to CW according to the uniform random distribution, where CW is the contention window size. However, in asymmetric multi-hop networks, some stations are disadvantaged in opportunity of access to the shared channel and may suffer severe throughput degradation when the traffic load is large. Then, the network performance is degraded in terms of throughput and fairness. In this paper, we propose a new cross-layer scheme aiming to solve the per-flow unfairness problem and achieve good throughput performance in IEEE 802.11 multi-hop ad hoc networks. Our cross-layer scheme collects useful information from the physical, MAC and link layers of own station. This information is used to determine the optimal Contention Window (CW) size for per-station fairness. We also use this information to adjust CW size for each flow in the station in order to achieve per-flow fairness. Performance of our cross-layer scheme is examined on various asymmetric multi-hop network topologies by using Network Simulator (NS-2).
Tadatoshi BABASAKI Toshimitsu TANAKA Toru TANAKA Yousuke NOZAKI Tadahito AOKI Fujio KUROKAWA
High efficiency power feeding systems are effective solutions for reducing the ICT power consumption with reducing power consumption of the ICT equipment and cooling systems. A higher voltage direct current (HVDC) power feeding system prototype was produced. This system is composed of a rectifier equipment, power distribution unit, batteries, and the ICT equipment. The configuration is similar to a -48 V DC power supply system. The output of the rectifier equipment is 100 kW, and the output voltage is 401.4 V. This paper present the configuration of the HVDC power feeding system and discuss its basic characteristics in the prototype system.
Statistical speech recognition using continuous-density hidden Markov models (CDHMMs) has yielded many practical applications. However, in general, mismatches between the training data and input data significantly degrade recognition accuracy. Various acoustic model adaptation techniques using a few input utterances have been employed to overcome this problem. In this article, we survey these adaptation techniques, including maximum a posteriori (MAP) estimation, maximum likelihood linear regression (MLLR), and eigenvoice. We also present a schematic view called the adaptation pyramid to illustrate how these methods relate to each other.