In this paper, we explore a method to the problem of spoken document categorization, which is the task of automatically assigning spoken documents into a set of predetermined categories. To categorize spoken documents, subword unit representations are used as an alternative to word units generated by either keyword spotting or large vocabulary continuous speech recognition (LVCSR). An advantage of using subword acoustic unit representations to spoken document categorization is that it does not require prior knowledge about the contents of the spoken documents and addresses the out of vocabulary (OOV) problem. Moreover, this method works in reliance on the sounds of speech rather than exact orthography. The use of subword units instead of words allows approximate matching on inaccurate transcriptions, makes "sounds-like" spoken document categorization possible. We also explore the performance of our method when the training set contains both perfect and errorful phonetic transcriptions, and hope the classifiers can learn from the confusion characteristics of recognizer and pronunciation variants of words to improve the robustness of whole system. Our experiments based on both artificial and real corrupted data sets show that the proposed method is more effective and robust than the word based method.
Ching-Tang HSIEH Eugene LAI Wan-Chen CHEN
This paper presents some effective methods for improving the performance of a speaker identification system. Based on the multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency subbands in order not to spread noise distortions over the entire feature space. For capturing the characteristics of the vocal tract, the linear predictive cepstral coefficients (LPCC) of the lower frequency subband for each decomposition process are calculated. In addition, a hard threshold technique for the lower frequency subband in each decomposition process is also applied to eliminate the effect of noise interference. Furthermore, cepstral domain feature vector normalization is applied to all computed features in order to provide similar parameter statistics in all acoustic environments. In order to effectively utilize all these multiband speech features, we propose a modified vector quantization as the identifier. This model uses the multilayer concept to eliminate the interference among the multiband speech features and then uses the principal component analysis (PCA) method to evaluate the codebooks for capturing a more detailed distribution of the speaker's phoneme characteristics. The proposed method is evaluated using the KING speech database for text-independent speaker identification. Experimental results show that the recognition performance of the proposed method is better than those of the vector quantization (VQ) and the Gaussian mixture model (GMM) using full-band LPCC and mel-frequency cepstral coefficients (MFCC) features in both clean and noisy environments. Also, a satisfactory performance can be achieved in low SNR environments.
Hiroyuki KANEKO Koichi FUKUDA Akira KAWANAKA
Efficient representations of a 3-D object shape and its texture data have attracted wide attention for the transmission of computer graphics data and for the development of multi-view real image rendering systems on computer networks. Polygonal mesh data, which consist of connectivity information, geometry data, and texture data, are often used for representing 3-D objects in many applications. This paper presents a wavelet coding technique for coding the geometry data structured on a triangular lattice plane obtained by structuring the connectivity of the polygonal mesh data. Since the structured geometry data have an arbitrarily-shaped support on the triangular lattice plane, a shape-adaptive wavelet transform was used to obtain the wavelet coefficients, whose number is identical to the number of original data, while preserving the self-similarity of the wavelet coefficients across subbands. In addition, the wavelet coding technique includes extensions of the zerotree entropy (ZTE) coding for taking into account the rate-distortion properties of the structured geometry data. The parent-children dependencies are defined as the set of wavelet coefficients from different bands that represent the same spatial region in the triangular lattice plane, and the wavelet coefficients in the spatial tree are optimally pruned based on the rate-distortion properties of the geometry data. Experiments in which proposed wavelet coding was applied to some sets of polygonal mesh data showed that the proposed wavelet coding achieved better coding efficiency than the Topologically Assisted Geometry Compression scheme adopted in the MPEG-4 standard.
The present state of IEC and JIS standards is reviewed on measurement methods of low-loss dielectric and high-tempera-ture superconductor (HTS) materials in the microwave and millimeter wave range. Four resonance methods are discussed actually, that is, a two-dielectric resonator method for dielectric rod measurements, a two-sapphire resonator method for HTS film measurements, a cavity resonator method for microwave measurements of dielectric plates and a cutoff circular waveguide method for millimeter wave measurements of dielectric plates. These methods realize the high accuracy sufficient for measurements of temperature dependence of material properties.
The concepts of M-convexity and L-convexity, introduced by Murota (1996, 1998) for functions on the integer lattice, extract combinatorial structures in well-solved nonlinear combinatorial optimization problems. These concepts are extended to polyhedral convex functions and quadratic functions on the real space by Murota-Shioura (2000, 2001). In this paper, we consider a further extension to general convex functions. The main aim of this paper is to provide rigorous proofs for fundamental properties of general M-convex and L-convex functions.
Stergios STERGIOU Dimitris VOUDOURIS George PAPAKONSTANTINOU
In this work, a novel Multiple Valued Exclusive-Or Sum Of Products (MVESOP) minimization formulation is analyzed and an algorithm is presented that detects minimum MVESOP expressions when the weight of the function is less than eight. A heuristic MVESOP algorithm based on a novel cube transformation operation is then presented. Experimental results on MCNC benchmarks and randomly generated functions indicate that the algorithm matches or outperforms the quality of the state of the art in ESOP minimizers.
Osamu ICHIKAWA Tetsuya TAKIGUCHI Masafumi NISHIMURA
In a two-microphone approach, interchannel differences in time (ICTD) and interchannel differences in sound level (ICLD) have generally been used for sound source localization. But those cues are not effective for vertical localization in the median plane (direct front). For that purpose, spectral cues based on features of head-related transfer functions (HRTF) have been investigated, but they are not robust enough against signal variations and environmental noise. In this paper, we use a "profile" as a cue while using a combination of reflectors specially designed for vertical localization. The observed sound is converted into a profile containing information about reflections as well as ICTD and ICLD data. The observed profile is decomposed into signal and noise by using template profiles associated with sound source locations. The template minimizing the residual of the decomposition gives the estimated sound source location. Experiments show this method can correctly provide a rough estimate of the vertical location even in a noisy environment.
Hidenori KUWAKADO Hatsukazu TANAKA
We propose a method for reducing the size of a share in visual secret sharing schemes. The proposed method does not cause the leakage and the loss of the original image. The quality of the recovered image is almost same as that of previous schemes.
Toshiki ENDO Shingo KUROIWA Satoshi NAKAMURA
This paper addresses problems involved in performing speech recognition over mobile and IP networks. The main problem is speech data loss caused by packet loss in the network. We present two missing-feature-based approaches that recover lost regions of speech data. These approaches are based on the reconstruction of missing frames or on marginal distributions. For comparison, we also use a packing method, which skips lost data. We evaluate these approaches with packet loss models, i.e., random loss and Gilbert loss models. The results show that the marginal-distributed-based technique is most effective for a packet loss environment; the degradation of word accuracy is only 5% when the packet loss rate is 30% and only 3% when mean burst loss length is 24 frames in the case of DSR front-end. The simple data imputation method is also effective in the case of clean speech.
Takashi IWASAKI Makoto TAKASHIMA
A novel method for measuring microwave reflection coefficients without the open and load standards is proposed. In this method, a single probe is inserted into an air line and the output wave is detected by a vector detector. Offset shorts are used for the calibration. The measurement system is constructed using 7 mm coaxial line and APC7 connectors. The result of the measurement in the frequency range 1-9 GHz shows the possibility of the proposed method. All the major systematic errors can be estimated from the data that is easily obtainable.
Hisashi FUTAKI Tomoaki OHTSUKI
Recently, low-density parity-check (LDPC) codes have attracted much attention. LDPC codes can achieve the near Shannon limit performance like turbo codes. For the LDPC codes, the reduced complexity decoding algorithms referred to as uniformly most powerful (UMP) BP- and normalized BP-based algorithms were proposed for BPSK on an additive white Gaussian noise (AWGN) channel. The conventional BP and BP-based algorithms can be applied to BPSK modulation. For high bit-rate transmission, multilevel modulation is preferred. Thus, the BP algorithm for multilevel modulations is proposed in . In this paper, we propose the BP algorithm with reduced complexity for multilevel modulations, where the first likelihood of the proposed BP algorithm is modified to adjust multilevel modulations. We compare the error rate performance of the proposed algorithm with that of the conventional algorithm on AWGN and flat Rayleigh fading channels. We also propose the UMP BP- and normalized BP-based algorithms for multilevel modulations on AWGN and flat Rayleigh fading channels. We show that the error rate performance of the proposed BP algorithm is almost identical to that of the algorithm in, where the decoding complexity of the proposed BP algorithm is less than that of the algorithm in. We also show that the proposed BP-based algorithms can achieve the good trade-off between the complexity and the error rate performance.
Jau-Yang CHANG Hsing-Lung CHEN
Future mobile communication systems are expected to support multimedia applications (audio phone, video on demand, video conference, file transfer, etc.). Multimedia applications make a great demand for bandwidth and impose stringent quality of service requirements on the mobile wireless networks. In order to provide mobile hosts with high quality of service in the next generation mobile multimedia wireless networks, efficient and better bandwidth reservation schemes must be developed. A novel traffic-based bandwidth reservation scheme is proposed in this paper as a solution to support quality of service guarantees in the mobile multimedia wireless networks. Based on the existing network conditions, the proposed scheme makes an adaptive decision for bandwidth reservation and call admission by employing fuzzy inference mechanism, timing based reservation strategy, and round-borrowing strategy in each base station. The amount of reserved bandwidth for each base station is dynamically adjusted, according to the on-line traffic information of each base station. We use the dynamically adaptive approach to reduce the connection-blocking probability and connection-dropping probability, while increasing the bandwidth utilization for quality of service sensitive mobile multimedia wireless networks. Simulation results show that our traffic-based bandwidth reservation scheme outperforms the previously known schemes in terms of connection-blocking probability, connection-dropping probability, and bandwidth utilization.
Hideyuki TORII Makoto NAKAMURA
In the present paper, we evaluate the inter-cell interference of AS-CDMA systems. First, the cross-correlation property of AS-CDMA systems is examined by theoretical study in order to clarify the fundamental feature of the inter-cell interference. The result shows that the influence of one interference terminal in each adjacent cell is dominant regardless of whether approximate synchronization is maintained. Next, the ratio of interference signal power and desired signal power is evaluated by computer simulation. The simulation result shows that total interference power does not increase even when approximate synchronization is not maintained.
Yoko UWATE Yoshifumi NISHIO Tetsushi UETA Tohru KAWABE Tohru IKEGUCHI
In this paper, performance of chaos and burst noises injected to the Hopfield Neural Network for quadratic assignment problems is investigated. For the evaluation of the noises, two methods to appreciate finding a lot of nearly optimal solutions are proposed. By computer simulations, it is confirmed that the burst noise generated by the Gilbert model with a laminar part and a burst part achieved the good performance as the intermittency chaos noise near the three-periodic window.
James OKELLO Kenji UEDA Hiroshi OCHI
In this letter we verify that a blind adaptive algorithm operating at a low intermediate frequency (Low-IF) can be applied to a system where carrier phase synchronization has not been achieved. We consider a quadrature amplitude shift keyed (QPSK) signal as the transmitted signal, and assume that the orthogonal low intermediate sinusoidal frequency used to generate the transmitted signal is well known. The proposed algorithm combines two algorithms: Namely, the least mean square (LMS) algorithm which has a cost function with unique minimum, and the constant modulus algorithm (CMA), which was first proposed by Godard. By doing this and operating the equalizer at a rate greater than the symbol rate, we take advantage of the variable amplitude of the sub-carriers and the fast convergence of LMS algorithm, so as to achieve a faster convergence speed. When the computer simulation results of the proposed algorithm are compared with the constant modulus algorithm (CMA) and the modified CMA (MCMA), we observed that the proposed algorithm exhibited a faster convergence speed.
Yasunari OBUCHI Nobuo HATAOKA Richard M. STERN
In this paper we describe a new framework of feature compensation for robust speech recognition, which is suitable especially for small devices. We introduce Delta-cepstrum Normalization (DCN) that normalizes not only cepstral coefficients, but also their time-derivatives. Cepstral Mean Normalization (CMN) and Mean and Variance Normalization (MVN) are fast and efficient algorithms of environmental adaptation, and have been used widely. In those algorithms, normalization was applied to cepstral coefficients to reduce the irrelevant information from them, but such a normalization was not applied to time-derivative parameters because the reduction of the irrelevant information was not enough. However, Histogram Equalization (HEQ) provides better compensation and can be applied even to the delta and delta-delta cepstra. We investigate various implementation of DCN, and show that we can achieve the best performance when the normalization of the cepstra and the delta cepstra can be mutually interdependent. We evaluate the performance of DCN using speech data recorded by a PDA. DCN provides significant improvements compared to HEQ. It is shown that DCN gives 15% relative word error rate reduction from HEQ. We also examine the possibility of combining Vector Taylor Series (VTS) and DCN. Even though some combinations do not improve the performance of VTS, it is shown that the best combination gives the better performance than VTS alone. Finally, the advantage of DCN in terms of the computation speed is also discussed.
Takashi KURAFUJI Yasunobu NAKASE Hidehiro TAKATA Yukinaga IMAMURA Rei AKIYAMA Tadao YAMANAKA Atsushi IWABU Shutarou YASUDA Toshitsugu MIWA Yasuhiro NUNOMURA Niichi ITOH Tetsuya KAGEMOTO Nobuharu YOSHIOKA Takeshi SHIBAGAKI Hiroyuki KONDO Masayuki KOYAMA Takahiko ARAKAWA Shuhei IWADE
We apply a selective-sets resizable cache and a complete hierarchy SRAM for the high-performance and low-power RISC CPU core. The selective-sets resizable cache can change the cache memory size by varying the number of cache sets. It reduces the leakage current by 23% with slight degradation of the worst case operating speed from 213 MHz to 210 MHz. The complete hierarchy SRAM enables the partial swing operation not only in the bit lines, but also in the global signal lines. It reduces the current consumption of the memory by 4.6%, and attains the high-speed access of 1.4 ns in the typical case.
Making information technology (IT) more accessible to elderly users is an important objective, in particular, concerning input devices. In this study, it has been investigated how the aging factor and the letter (character) size of a keyboard affects the efficiency in data entry. In addition, computer experience by the elderly was examined relative to efficiency. The performance measures (entry speed and correctly entered number per min) were twice better in a young group of computer users than in middle-aged and elderly groups. The effect of the size of the keyboard letters on performance was observed for the middle-aged and elderly groups who had no experience using a computer. The young, middle-aged, and elderly groups with computer experience were not affected by the size of the keyboard letters.
Christopher J. HOGGER Frank R. KRIWACZEK
We describe a framework for deriving specifications of wizard-like tools by detecting coherent patterns of behaviour among user actions observed in a portal environment. Implementation in the portal of tools compliant with these specifications can then provide useful support for the kind of work patterns observed. The derivation process employs a customizable knowledge base which defines coherent patterns and seeks concrete instances of them among series of actions that occur with sufficient frequency among those observed.
Kwang-deok SEO Seong-cheol HEO Soon-kak KWON Jae-kyoon KIM
In this paper, we propose a dynamic bit-rate reduction scheme for transcoding an MPEG-1 bitstream into an MPEG-4 simple profile bitstream with a typical bit-rate of 384 kbps. For dynamic bit-rate reduction, a significant reduction in the bit-rate is achieved by combining the processes of requantization and frame-skipping. Conventional requantization methods for a homogeneous transcoder cannot be used directly for a heterogeneous transcoder due to the mismatch in the quantization parameters between the MPEG-1 and MPEG-4 syntax and the difference in the compression efficiency between MPEG-1 and MPEG-4. Accordingly, to solve these problems, a new requantization method is proposed for an MPEG-1 to MPEG-4 transcoder consisting of R-Q (rate-quantization) modeling with a simple feedback and an adjustment of the quantization parameters to compensate for the different coding efficiency between MPEG-1 and MPEG-4. For bit-rate reduction by frame-skipping, an efficient method is proposed for estimating the relevant motion vectors from the skipped frames. The conventional FDVS (forward dominant vector selection) method is improved to reflect the effect of the macroblock types in the skipped frames. Simulation results demonstrated that the proposed method combining requantization and frame-skipping can generate a transcoded MPEG-4 bitstream that is much closer to the desired low bit-rate than the conventional method along with a superior objective quality.