Junibakti SANUBARI Keiichi TOKUDA Mahoki ONODA
In this paper, a new M-estimation technique for the linear prediction analysis of speech is proposed. Since in the conventional linear prediction (CLP) method the obtained estimates are very much affected by the large amplitude residual parts, in the proposed method we use a loss function which assigns large weighting factor for small amplitude residuals and small weighting factor for large amplitude residuals which is for instance caused by the pitch excitations. The loss function is based on the assumption that the residual signal has an independent and identical t-distribution t(α) with α degrees of freedom. The efficiency of this new estimator depends on α. When α=, we get the CLP method. When the proposed method with small α is applied to the problems of estimating the formant frequencies and bandwidths of the synthetic speech by finding the roots of the prediction polynomial, we can achieve a more accurate and a smaller standard deviation (SD) estimate than that with large α. When the signal is very spiky, the proposed method can ahieve more efficient and accurate estimates than that with robust linear prediction (RBLP) method. The loss function is modified in the similar manner as the autocorrelation method. The solution is calculated by the Newton-Raphson iteration technique. The simulation results show that only few iterations are needed to reach a stationary point, the stationary point is always a local minimum and the obtained prediction filter is always minimum phase. Preliminary experiments on the human speech data indicate that the obtained results are insensitive to the placement of the analysis window and a higher spectral resolution than the CLP and RBLP method can be achieved.
Heung-Shik KIM Jong-Soo PARK Myunghwan KIM
An algorithm is presented for selecting the k-th smallest element of a totally ordered (but not sorted) set of n elements, 1kn, in the case that a special-purpose sorter is used as a coprocessor. When the pipeline merge sorter is used as the special-purpose sorter, we analyze the comparison complexity of the algorithm for the given capacity of the sorter. The comparison complexity of the algorithm is 1.4167no(n), provided that the capacity of the sorter is 256 elements. The comparison complexity of the algorithm decreases as the capacity of the sorter increases.
Shinji TSUZUKI Toshiyuki AIBARA Saburo TAZAKI Yoshio YAMADA
In power line SS communication system, since available frequency range is limited from 10 kHz to 450 kHz by the law, we can't transmit any components of lower and higher frequency regions. In this paper, we propose a method for improving bit error rate by using the PN sequence coded by the new channel code, FM-V5 code, instead of PE code to have correlation property in the coded PN sequence. Correlation property in the coded PN sequence makes us effectively use Viterbi decoding technique on the receiving side. To enhance correlation property more, we also examine to apply additionally partial response (PR) system, so called PRML system, on the receiving side. The results of computer simulation show the improvement of about 4.5 dB on SNR at bit error rate 10-5 by using FM-V5 code without PR system compared with PE code. In the case of FM-V5 code with PR(1, -1) system, we get the further improvement of about 11 dB on SNR at the same bit error rate 10-5 compared with PE code. As a result, our method can attain SNR improvement about 20 dB compared with conventional simple PN sequence, that is the conventional Direct Sequence/Spread Spectrum (DS/SS), method.
Kei IKEDA Mitsutoshi HATORI Kiyoharu AIZAWA
The inherent simplicity of the LMS (Least Mean Square) Algorithm has lead to its wide usage. However, it is well known that high speed convergence and low final misadjustment cannot be realized simultaneously by the conventional LMS method. To overcome this trade-off problem, a new adaptive algorithm using Multiple ADF's (Adaptive Digital Filters) is proposed. The proposed algorithm modifies coefficients using multiple gradient vectors of the squared error, which are computed at different points on the performance surface. First, the proposed algorithm using 2 ADF's is discussed. Simulation results show that both high speed convergence and low final misadjustment can be realized. The computation time of this proposed algorithm is nearly as much as that of LMS if parallel processing techniques are used. Moreover, the proposed algorithm using more than 2 ADF's is discussed. It is understood that if more than 2 ADF's are used, further improvement in the convergence speed in not realized, but a reduction of the final misadjustment and an improvement in the stability are realized. Finally, a method which can improve the convergence property in the presence of correlated input is discussed. It is indicated that using priori knowledge and matrix transformation, the convergence property is quite improved even when a strongly correlated signal input is applied.
Keiichirou YAMANO Dusan JOKANOVIC Tsuyoshi ANDO Masataka OHTA Kaoru TAKAHASHI
In this paper an approach to formal specification and verification of ISDN services in LOTOS is presented. As for specification, it is shown that LOTOS can be effectively applied to describe different levels of ISDN service specifications. At the higher level, only the external behaviour of the network is specified. On the other hand, at the lower level, specifications include the behaviour of network components such as switching systems, where each switching system can be specified independently of each other. Such specification style, proves suitable for verification of specifications by using the concepts of the simulation relation.
Rinshi SUGINO Yoshiko OKUI Masaki OKUNO Mayumi SHIGENO Yasuhisa SATO Akira OHSAWA Takashi ITO
The mechanism of UV-excited dry cleaning using photoexcited chlorine radicals has been investigated for removing iron and aluminum contamination on a silicon surface. The iron and aluminum contaminants with a surface concentration of 1013 atoms/cm2 were intentionally introduced via an ammonium-hydrogenperoxide solution. The silicon etching rates from the Uv-excited dry cleaning differ depending on the contaminants. Fe and Al can be removed in the same manner. The removal of Fe and Al is highly temperature dependent, and is little affected by the silicon etching depth. Both Fe and Al on the silicon surface were completely removed by UV-excited dry cleaning at a cleaning temperature of 170, and were decreased by two orders of magnitude from the initial level when the surface was etched only 2 nm deep.
Yasuhisa HAYASHI Satoshi KONDO Nobuyuki TAKASU Akio OGIHARA Shojiro YONEDA
This study proposes a new training method for hidden Markov model with separate vector quantization (SVQ-HMM) in speech recognition. The proposed method uses the correlation of two different kinds of features: cepstrum and delta-cepstrum. The correlation is used to decrease the number of reestimation for two features thus the total computation time for training models decreases. The proposed method is applied to Japanese language isolated dgit recognition.
Spoken language systems such as speech-to-speech dialog translation systems have been gaining more attention in recent years. These systems require full integration of speech recognition and natural language understanding. This paper presents an efficient parsing algorithm that integrates the search problems of speech processing and language processing. The parsing algorithm we propose here is regarded as an extension of the finite-state-network directed, one-pass search algorithm to one directed by a context-free grammar with retention of the time-synchronous procedure. The extended search algorithm is used to find approximately globally optimal sentence hypotheses; it does not have overhead which exists in, for example, hierarchical systems based on the lattice parsing approach. The computational complexity of this search algorithm is proportional to the length of the input speech. As the search process in the speech recognition can directly take account of the predictive information in the sentence parsing, this framework can be extended to sopken language systems which deal with dynamically varying constraints in dialogue situations.
Zixue CHENG Kaoru TAKAHASHI Norio SHIRATORI Shoichi NOGUCHI
In this paper, we present an automatic implementation method by which executable communication programs in C can be generated from protocol specifications in LOTOS. The implementation method consists of two parts: 1) An implementation strategy and 2) a set of translation rules. The first part consists of the basic ideas on how to realize the primary mechanisms in LOTOS specifications. The second part formulates the implementation method by way of the translation rules based on the implementation strategy. The characteristics of our method can be summarized as follows: We formulate our implementation method by way of translation rules. These rules are defined topdown in the form of syntax-directed translation function. The mechanism for controlling concurrency and communication among the user processes corresponding to the processes in LOTOS specification is easily realized by using UNIX operating system functions. The translation rules have been implemented on the AS 3000 (SUN3) workstation. An application of this implementation method is demonstrated by a simplified token-ring-protocol.
Haruyuki HARADA Mitsuru TANAKA Takashi TAKENAKA
This letter discusses the quality improvement of reconstructed images in diffraction tomography. An efficient iterative procedure based on the modified Newton-Kantorovich method and the Gerchberg-Papoulis algorithm is presented. The simulated results demonstrate the property of high-quality reconstruction even for cases where the first-order Born approximation fails.
Norikuni YABUMOTO Yukio KOMINE
Thermal desorption spectroscopy (TDS) is applied to analyze the oxidation reactions of hydrogen-terminated Si(100) surfaces in both the heating and cooling processes after hydrogen desorption. The oxidation reaction of oxygen and water with a silicon surface after hydrogen desorption shows hysteresis in the heating and cooling processes. In the cooling process, oxidation finishes when the silicon surface is adequately oxidized to about a 10 thickness. Oxidation continues to occur at lower temperatures when the total volume of oxygen and water is too small to saturate the bare silicon surface. The reaction of water with silicon releases hydrogen at more than 500. Hydrogen does not adsorb on the silicon oxide surface. A trace amount of oxygen, less than 110-6 Torr, roughens the surface.
Spread spectrum (SS) communication systems have been studied and developed for commercial applications such as mobile communications, consumer communications and so on. The reason is that SS communication systems have various characteristic features, such as, robust immunity to interference and jamming, achieving privacy in communication and random access capability (spread spectrum multiple access: SSMA). The performance of the systems, mainly, depends on modulation method of spreading the spectrum. This paper introduces recent several methods of modulation in SS communication systems. First, direct-sequence (DS), which is a fundamental modulation in SS communication system, and M-ary/SSMA, which can increase the number of multiple access users, are described. For increasing the data rate per user, parallel SS, parallel combinatory SS and a code division multiplex using orthogonal Manchester coded M-sequences are introduced. Second, frequency hopping (FH), in which the spectrum of signal is spread by hopping carrier frequency, and related systems are shown. Finally, from a viewpoint of communication theory, spectral efficiency (i.e., data rate
The selection method of the moment of inertia of the flywheel in a digital measurement system of torque-speed curve plotting for a kind of motor is presented. The selection standards of the moment of inertia and the map displaying the operating ranges of the measurement system are shown. The selection procedure of the moment of inertia is also shown.
Eisuke KUDOH Tadashi MATSUMOTO
User capacity of a DS/CDMA cellular mobile radio system employing transmitter power control (TPC) is investigated. Assuming log-normally distributed control error, outage probability is evaluated through computer simulations. The user capacity is dramatically decreased as the power control error increases. If the standard deviation is larger than about 2dB, the user capacity is decreased by more than 60%. It is shown that power control error with a standard deviation of less than or equal to 0.5dB is required to accommodate 90% of the maximum user capacity. The capacity decrease in the reverse and forward link channels due to non-uniform user distributions are also investigated. It is shown that if system users are densely distributed within the zone fringe whose thickness is 80% of the radius, the reverse link capacity is decreased by about 22%. The forward link capacity is comparatively insenstitive to non-uniform user distribution.
Kenichi HAYASHI Tohru SUGAWARA
A new set of self-consistent linear equations is presented for the analysis of the startup characteristics of gyrotron oscillators with an open cavity consisting of weakly irregular waveguides. Numerical results on frequency detuning and oscillation starting current for a whispering-gallery-mode gyrotron are described in which these equations were utilized. Experiments for making a check on the effectiveness of the derived equations showed that they well express the operation of gyrotrons in comparison with the linear theory using an empty cavity field as the wave field.
Applications of neural networks are prevailing in speech recognition research. In this paper, first, suitable role of neural network (mainly back-propagation based multi-layer type) in speech recognition, is discussed. Considering that speech is a long, variable length, structured pattern, a direction, in which neural network is used in cooperation with existing structural analysis frameworks, is recommended. Activities are surveyed, including those intended to cooperatively merge neural networks into dynamic programming based structural analysis framework. It is observed that considerable efforts have been paid to suppress the high nonlinearity of network output. As far as surveyed, no experiment in real field has been reported.
Kazuo HASHIMOTO Tohru ASAMI Seiichi YAMAMOTO
Since Vendler classified aspect into four categories, state, achievement, activity, and accomplishment, much effort has been made to define the notion of aspect logically. It is commonly agreed that aspect represents the general temporal characteristics of events and states. However, there still remains a considerable amount of disagreement about its formal treatment. One of the major problems is that the aspect of a sentence shifts by certain types of sentence construction. For instance, adding time adverbials to a sentence modifies the original aspect, taking the progressive form of the verb changes the aspect, and so on. These phenomena are known as the aspect shifts. The other is the problem known as the imperfective paradox. The imperfective paradox is a problem of the truth definition of the progressives. The truth condition of the progressive form of the sentence is defined at an internal subinterval of the temporal range of the corresponding non-progressive sentence. If the truth condition of the progressive form of the sentence is defined using the truth condition of the non-progressive form of the sentence, there are logical contradictions of truth definition in a sentence such as "Max was building a house, but he never built it". These problems cause much confusion (1) in the truth definition of aspects, (2) in the definition of aspect operations, such as initiative, terminative, progressive, perfective, etc., and also (3) in the definition of adding time adverbials. This paper reviews the semantic problems with respect to aspect, and presents a consistent mechanism of aspect interpretation in order to settle all these semantic puzzles at once. For the sake of logical clarity, we construct a formal language, Lt, where every meaningful formula is a pair of a meaningful sentence and its aspect. The syntax of Lt describes the phenomenology of aspect shifts. The semantics of Lt defines temporal interpretation for all the meaningful sentences of Lt, with assuming the temporal interpretations of three inherent aspects, state, achievement, and activity. The proposed aspect interpretation gives a reasonable account for aspect shifts, and solves the imperfective paradox by asssuming the time structure to be backwards linear.
Hyunkoo KANG Yoon UH Tasuku TAKAGI
We propose a new distributed signal (analog or digital) transmission system which has the immunity against the noisy channel. An information signal in transmitter is distributed by distributor and the distributed signal is transmitted. Received signal is reconstructed by the inverse distributor in receiver. In this system, an impulsive interference noise which disturbs the transmission signal in the channel passes decoder only, and this interference noise is distributed by the inverse distributor while the transmitted signal is reconstructed. Some appended signals make it possible to estimate the noise components which inversely distributed with the Fourier transformation as the distributor. Basing upon this principle, the transmission system will have an ability to suppress the impulsive interference, and the channel will have high noise immunity. The construction of receiver which can eliminate the impulsive noise is derived.
Andreas S. SPANIAS Frank H. WU
The objective of this paper is to provide an overview of the recent developments in the area of speech processing and in particular in the fields of speech coding and speech recognition. The speech coding review covers DPCM coders, model-based vocoders, waveform coders, and hybrid coders. The hybrid coders are described in some detail since they are the subject of current research. Our treatment of speech recognition techniques concentrates on the methodologies for voice recognition and the progress made in speaker independent recognition. In addition, we describe the efforts towards commercial deployment of this technology.
Yoshinori KITAHARA Yoh'ichi TOHKURA
In speech output expected as an ideal man-machine interface, there exists an important issue on emotion production in order to not only improve its naturalness but also achieve more sophisticated speech interaction between man and machine. Speech has two aspects, which are prosodic information and phonetic feature. For the purpose of application to natural and high quality speech synthesis, the role of prosody in speech perception has been studied. In this paper, prosodic components, which contribute to the expression of emotions and their intensity, are clarified by analyzing emotional speech and by conducting listening tests of synthetic speech. The analysis is performed by substituting the components of neutral speech (i.e., one with no particular emotion) with those of emotional speech preserving the temporal correspondence by means of DTW. It has been confirmed that prosodic components, which are composed of pitch structure, temporal structure and amplitude structure, contribute to the expression of emotions more than the spectral structure of speech. The results of listening tests using prosodic substituted speech show that temporal structure is the most important for the expression of anger, while all of three components are much more important for the intensity of anger. Pitch structure also plays a significant role in the expression of joy and sadness and their intensity. These results make it possible to convert neutral utterances into utterances expressing various emotions. The results can also be applied to controlling the emotional characteristics of speech in synthesis by rule.