Amaro LIMA Heiga ZEN Yoshihiko NANKAKU Keiichi TOKUDA Tadashi KITAMURA Fernando G. RESENDE
This paper presents an analysis of the applicability of Sparse Kernel Principal Component Analysis (SKPCA) for feature extraction in speech recognition, as well as, a proposed approach to make the SKPCA technique realizable for a large amount of training data, which is an usual context in speech recognition systems. Although the KPCA (Kernel Principal Component Analysis) has proved to be an efficient technique for being applied to speech recognition, it has the disadvantage of requiring training data reduction, when its amount is excessively large. This data reduction is important to avoid computational unfeasibility and/or an extremely high computational burden related to the feature representation step of the training and the test data evaluations. The standard approach to perform this data reduction is to randomly choose frames from the original data set, which does not necessarily provide a good statistical representation of the original data set. In order to solve this problem a likelihood related re-estimation procedure was applied to the KPCA framework, thus creating the SKPCA, which nevertheless is not realizable for large training databases. The proposed approach consists in clustering the training data and applying to these clusters a SKPCA like data reduction technique generating the reduced data clusters. These reduced data clusters are merged and reduced in a recursive procedure until just one cluster is obtained, making the SKPCA approach realizable for a large amount of training data. The experimental results show the efficiency of SKPCA technique with the proposed approach over the KPCA with the standard sparse solution using randomly chosen frames and the standard feature extraction techniques.
Chee Seong GOH Sze Yun SET Kazuro KIKUCHI
We report tunable optical devices based on fiber Bragg gratings (FBGs), whose filtering characteristics are controlled by strain distributions. These devices include a widely wavelength tunable filter, a tunable group-velocity dispersion (GVD) compensator, a tunable dispersion slope (DS) compensator, and a variable-bandwidth optical add/drop multiplexer (OADM), which will play important roles for next-generation reconfigurable optical networks.
Weifeng LI Tetsuya SHINDE Hiroshi FUJIMURA Chiyomi MIYAJIMA Takanori NISHINO Katunobu ITOU Kazuya TAKEDA Fumitada ITAKURA
This paper describes a new multi-channel method of noisy speech recognition, which estimates the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by distributed microphones. The advantages of the proposed method are as follows: 1) The method does not require a sensitive geometric layout, calibration of the sensors nor additional pre-processing for tracking the speech source; 2) System works in very small computation amounts; and 3) Regression weights can be statistically optimized over the given training data. Once the optimal regression weights are obtained by regression learning, they can be utilized to generate the estimated log spectrum in the recognition phase, where the speech of close-talking is no longer required. The performance of the proposed method is illustrated by speech recognition of real in-car dialogue data. In comparison to the nearest distant microphone and multi-microphone adaptive beamformer, the proposed approach obtains relative word error rate (WER) reductions of 9.8% and 3.6%, respectively.
This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual daily-life situations. The paper proposes that the expresion of affect (often translated as 'kansei' in Japanese) is the main factor differentiating laboratory speech from real-world conversational speech, and presents a framework for the specification of affect through differences in speaking style and voice quality. Having an enormous corpus of speech samples available for concatenation allows the selection of complete phrase-sized utterance segments, and changes the focus of unit selection from segmental or phonetic continuity to one of prosodic and discoursal appropriateness instead. Samples of the resulting large-corpus-based synthesis can be heard at http://feast.his.atr.jp/AESOP.
Hiroyoshi YAMAMOTO Yoshihiko NANKAKU Chiyomi MIYAJIMA Keiichi TOKUDA Tadashi KITAMURA
This paper investigates the parameter tying structures of a mixture of factor analyzers (MFA) and discriminative training of MFA for speaker identification. The parameters of factor loading matrices or diagonal matrices are shared in different mixtures of MFA. Then, minimum classification error (MCE) training is applied to the MFA parameters to enhance the discrimination ability. The result of a text-independent speaker identification experiment shows that MFA outperforms the conventional Gaussian mixture model (GMM) with diagonal or full covariance matrices and achieves the best performance when sharing the diagonal matrices, resulting in a relative gain of 26% over the GMM with diagonal covariance matrices. The improvement is more significant especially in sparse training data condition. The recognition performance is further improved by MCE training with an additional gain of 3% error reduction.
Naoya KAWAMOTO Naoto MATSUO Atsushi MASUDA Yoshitaka KITAMON Hideki MATSUMURA Yasunori HARADA Tadaki MIYOSHI Hiroki HAMADA
The role of hydrogen in the Si film during excimer laser annealing (ELA) has been successfully studied by using a novel sample structure, which is stacked by a-Si film and SiN film. Hydrogen contents in the Si films during ELA are changed by preparing samples with hydrogen content of 2.3-8.2 at.% in the SiN films with a use of catalytic (Cat)-CVD method. For the low concentration of hydrogens in the Si film, the grain size increases by decreasing hydrogen concentration in the Si film, and the internal stress of the film decreases as increasing the shot number. For the high concentration of hydrogens in the Si film, hydrogen burst was observed at 500 mJ/cm2 and the dependence of the internal stress on the shot number becomes weak even at 318 mJ/cm2. These phenomena can be understood basically using the secondary grain growth mechanism, which we have proposed.
In this letter, we propose a low-complexity estimation method of cyclic-prefix (CP) length for a discrete multitone (DMT) very high-speed digital subscriber line (VDSL) system. Using the sign bits of the received DMT VDSL signals, the proposed method provides a good estimate of CP length, which is suitable for various channel characteristics. This simple estimation method is consistent with the initialization procedure of T1E1.4 multi-carrier modulation (MCM)-based VDSL Standard. Finally, simulation results with VDSL test loops are presented.
Kazuya TAKAHASHI Yoshiki KOBAYASHI Miyuki FUJII Naoyuki SHIMBO Hirotada UEDA Kazuo TSUTSUI
We propose a sea surveillance system that automatically detects intruding objects in the sea. The difficulty with an automatic system is detecting objects such as moving boats while reducing false positives caused by some waves and reflections in the sea. A false positive is reporting an object which doesn't actually exist, while a false negative is a failure in detecting an intruding object. Firstly, we identify factors of false positives. Secondly, we propose a new surveillance system considering these factors. Our proposed system combines three detecting methods. The first method is detection of Differences between Surveillance images and Flapping Reference images (DSFR). The second method is detection of Contours from Averaging images (CA). The third method is Silhouette object Detection (SD). The combination of DSFR and CA detects various moving objects under normal light conditions, while SD detects objects under backlight conditions. Finally we apply our proposed method to actual situations. Our proposed method detected boats while reducing false positives effectively.
In this paper, we describe an accelerative current-programming method for active matrix OLED (AM-OLED) display. This new method uses common source configuration, "Acceleration Control" line and some mechanisms to prevent the programming current from flowing through OLED device. It would solve the basic problem of the current-programming pixel circuit: a long programming period, especially at the dark gray-level. The proposed method accelerates the current programming process at any gray levels, and it would be the solution for the problem.
Pornanong PONGPAIBOOL Toru UNO Takuji ARIMA
A numerical technique for improving the accuracy of a rectangular loop antenna based on the Finite Difference Time Domain (FDTD) method is proposed. In this technique, a quasi-static field behaviour is incorporated into the FDTD update equations, and the more accurate numerical technique can be obtained without the need of using fine cells. The simulation results of this proposed technique are compared with the Method of Moment to confirm the effectiveness of the technique.
Kiyohiro FURUTANI Takeshi HAMAMOTO Takeo MIKI Masaya NAKANO Takashi KONO Shigeru KIKUDA Yasuhiro KONISHI Tsutomu YOSHIHARA
This paper describes two circuit techniques useful for the design of high density and high speed low cost double data rate memories. One is a highly flexible row and column redundancy circuit which allows the division of flexible row redundancy unit into multiple column redundancy unit for higher flexibility, with a new test mode circuit which enables the use of the finer pitch laser fuse. Another is a compact read data path which allows the smooth data flow without wait time in the high frequency operation with less area penalty. These circuit techniques achieved the compact chip size with the cell efficiency of 60.6% and the high bandwidth of 400 MHz operation with CL=2.5.
Power distribution in multilayered periodic waveguides is first analyzed by longitudinal modal transmission-line theory (L-MTLT). Novel effective characteristic impedances of the equivalent network for TE and TM modes are then derived, and a symmetrical grating guide with three layers is rigorously evaluated to clarify the validity of our approach. Excellent agreement between our results and the results due to other methods indicates that our approach is able to not only reveal all the physical meaning embedded in the multilayered and multi-sectional periodic waveguides, but also predict various possible Bragg regimes rigorously and simply.
Noriyuki MAEDA Yoshihisa KISHIYAMA Hiroyuki ATARASHI Mamoru SAWAHASHI
This paper proposes the optimum design for adaptively controlling the spreading factor in Orthogonal Frequency and Code Division Multiplexing (OFCDM) with two-dimensional spreading according to the cell configuration, channel load, and propagation channel conditions, assuming the adaptive modulation and channel coding (AMC) scheme employing QPSK and 16QAM data modulation. Furthermore, we propose a two-dimensional orthogonal channelization code assignment scheme to achieve skillfully orthogonal multiplexing of multiple physical channels. We first demonstrate the reduction effect of inter-code interference by the proposed two-dimensional orthogonal channelization code assignment. Then, computer simulation results show that in time domain spreading, the optimum spreading factor, except for an extremely high mobility case such as for the fading maximum Doppler frequency of fD = 1500 Hz, becomes SFTime = 16. Furthermore, it should be decreased to SFTime = 8 for such a very fast fading environment using 16QAM data modulation. We also clarify when the channel load is light such as Cmux/SF = 0.25 (Cmux and SF denote the number of multiplexed codes and total spreading factor, respectively), the required average received signal energy per symbol-to-noise power spectrum density ratio (Es/N0) is reduced as the spreading factor in the frequency domain is increased up to say SFFreq = 32 for QPSK and 16QAM data modulation. When the channel load is close to full such as when Cmux/SF = 0.94, the optimum spreading factor in the frequency domain is SFFreq = 1 for 16QAM data modulation and SFFreq = 1 to 8 for QPSK data modulation according to the delay spread. Consequently, by setting several combinations of spreading factors in the time and frequency domains, the near maximum link capacity is achieved both in cellular and hotspot cell configurations assuming various channel conditions.
Nobuyuki ITOH Ken-ichi HIRASHIKI Tadashi TERADA Makoto KIKUTA Shin-ichiro ISHIZUKA Tsuyoshi KOTO Tsuneo SUZUKI Hidehiko AOKI
Integrated 900-MHz ISM band transceiver LSI for analog cordless telephone has been realized by cost-effective process technology with sufficient performance. This LSI consisted of fully integrated transceiver, from RF-LNA to audio amplifier for RX chain, from microphone's amplifier to RF-PA for TX chain, and integrated RX- and TX-LO consisting of PLLs and VCOs. In view of narrow signal bandwidth with analog modulation, extremely low phase noise at low offset frequency from carrier was required for integrated VCO. Also, in view of fully duplex operations, signal isolation between TX and RX was required. Despite such a high integration and high performance, chip cost had to be minimized for low-cost applications. The 12-dB SINAD RX sensitivity was -111.2 dBm, the output power of TX was +3 dBm, and the phase noise of integrated VCO was -77 dBc/Hz at 3 kHz offset away from carrier. The current consumption at fully duplex operation was 76 mA at 3.6 V power supply. The chip was realized by 0.8 µm standard silicon BiCMOS process.
An audio signal level compressor is presented, which is based on the approximation algorithm using an interpolating polynomial. To implement a compression characteristic in a digital audio system, a power calculation with fractional numbers is required and it is difficult to be performed directly in digital circuits. We introduce a polynomial expression to approximate the power operation, then the gain calculation is easily performed with a number of additions, multiplications and a division. Newton's interpolation formula is used to calculate the compression characteristics in a very short time and the obtained compression characteristics are very close to the ideal ones.
Nobukazu TAKAI Shigetaka TAKAGI Nobuo FUJII
This paper proposes a rail-to-rail OTA. By adding a signal decomposing circuit at the input of given OTAs that have a limited input voltage range, a rail-to-rail OTA is obtained. Each decomposed input voltage signal is converted to a current signal by an OTA and each output current of OTAs is summed to obtain a linear output signal. Since the input signal is decomposed into small magnitude voltage signals, the OTAs used to the voltage-current conversion do not require a wide input-range and any OTA can be used to realize a rail-to-rail input voltage range OTA. HSPICE simulations are performed to verify the validity of the proposed method.
Hyong Rock PARK Dongwoo KIM Een-Kee HONG
Video telephone service (VTS) is considered one of promising services provided in wideband CDMA (WCDMA) networks. Without a designated call admission policy, VTS calls are expected to suffer from relatively high probability of blocking since they normally have more stringent signal quality requirement than ordinary voice calls. In this letter, we consider a prioritized call admission design in order to reduce the blocking probability of VTS calls, which may encourage the users to access the newly-provided VTS in a more comfortable way. The VTS calls are given a priority by reserving a number of channel-processing equipments. With the reservation, the blocking probability of prioritized VTS calls can be reduced evidently. That of ordinary calls, however, is increasing instead. This letter provides a system model that counts the blocking probabilities of VTS and ordinary calls simultaneously, and numerically examines an adequate level of the prioritization for VTS calls. The results show that the prioritization level should be selected depending on received interference as well as bandwidth required for VTS.
This paper describes a second-order continuous-time ΔΣ modulator for a W-CDMA receiver, which operates at a supply voltage of 0.9 V, the lowest so far reported for W-CDMA. Inverter-based balanced OTAs without using differential pair are proposed for a low-voltage operation. Circuit parameters are optimized by system simulations. The modulator was implemented in a 0.13-µm CMOS technology. It consumes only 1.5 mW. The measured SNDR is 50.9 dB over a bandwidth of 1.92 MHz.
Jeong-Min JU Gyey-Teak JEONG Joong-Han YOON Cheol-Soon KIM Hyung-Sup KIM Kyung-Sup KWAK
In this study, a multiple U-shaped slot microstrip patch antenna for application to the 5 GHz band is designed and fabricated. To obtain sufficient bandwidth in the operating band, foam is inserted between the substrate and ground plane, the type of form is styrofoam, the coaxial probe source is used, and the position of the probe shift is adjusted from the center to the left. The measured result (5.02-5.955 GHz) of the fabricated antenna satisfies the conditions of VSWR < 2.0 in 5 GHz band (5.15-5.35 GHz, 5.47-5.725 GHz, 5.725-5.825 GHz), gain of 3.88-9.28 dBi, and broad radiation pattern.
The facility layout problem is one of the most fundamental quadratic assignment problems in operations research. In this paper, we present an improved genetic algorithm for solving the facility layout problem. In our computational model, we propose several improvements to the basic genetic procedures including conditional crossover and mutation. The performance of the proposed method is evaluated on some benchmark problems. Computational results showed that the improved genetic algorithm is capable of producing high-quality solutions.