The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)


  • Cepstral Amplitude Range Normalization for Noise Robust Speech Recognition

    Shingo YOSHIZAWA  Noboru HAYASAKA  Naoya WADA  Yoshikazu MIYANAGA  

    PAPER-Speech and Hearing

    E87-D No:8

    This paper describes a noise robustness technique that normalizes the cepstral amplitude range in order to remove the influence of additive noise. Additive noise causes speech feature mismatches between testing and training environments and it degrades recognition accuracy in noisy environments. We presume an approximate model that expresses the influence by changing the amplitude range and the DC component in the log-spectra. According to this model, we propose a cepstral amplitude range normalization (CARN) that normalizes the cepstral distance between maximum and minimum values. It can estimate noise robust features without prior knowledge or adaptation. We evaluated its performance in an isolated word recognition task by using the Noisex92 database. Compared with the combinations of conventional methods, the CARN could improve recognition accuracy under various SNR conditions.

  • The Impact of Source Traffic Distribution on Quality of Service (QoS) in ATM Networks

    Seshasayi PILLALAMARRI  Sumit GHOSH  


    E87-B No:8

    A principal attraction of ATM networks, in both wired and wireless realizations, is that the key quality of service (QoS) parameters of every call, including end-to-end delay, jitter, and loss are guaranteed by the network when appropriate cell-level traffic controls are imposed at the user network interface (UNI) on a per call basis, utilizing the peak cell rate (PCR) and the sustainable cell rate (SCR) values for the multimedia--voice, video, and data, traffic sources. There are three practical difficulties with these guarantees. First, while PCR and SCR values are, in general, difficult to obtain for traffic sources, the typical user-provided parameter is a combination of the PCR, SCR, and the maximum burstiness over the entire duration of the traffic. Second, the difficulty in accurately defining PCR arises from the requirement that the smallest time interval must be specified over which the PCR is computed which, in the limit, will approach zero or the network's resolution of time. Third, the literature does not contain any reference to a scientific principle underlying these guarantees. Under these circumstances, the issue of providing QoS guarantees in the real world, through traffic controls applied on a per call basis, is rendered uncertain. This paper adopts a radically different, high level approach to the issue of QoS guarantees. It aims at uncovering through systematic experimentation a relationship, if any exists, between the key high level user traffic characteristics and the resulting QoS measures in a realistic operational environment. It may be observed that while each user is solely interested in the QoS of his/her own traffic, the network provider cares for two factors: (1) Maximize the link utilization in the network since links constitute a significant investment, and (2) ensure the QoS guarantees for every user traffic, thereby maintaining customer satisfaction. Based on the observations, this paper proposes a two-phase strategy. Under the first phase, the average "link utilization" computed over all the links in a network is maintained within a range, specified by the underlying network provider, through high level call admission control, i.e. by limiting the volume of the incident traffic on the network, at any time. The second phase is based on the hypothesis that the number of traffic sources, their nature--audio, video, or data, and the bandwidth distribution of the source traffic, admitted subject to a specific chosen value of "link utilization" in the network, will exert a unique influence on the cumulative delay distribution at the buffers of the representative nodes and, hence, on the QoS guarantees of each call. The underlying thinking is as follows. The cumulative buffer delay distribution, at any given node and at any time instant, will clearly reflect the cumulative effect of the traffic distributions of the multiple connections that are currently active on the input links. Any bounds imposed on the cumulative buffer delay distribution at the nodes of the network will also dominate the QoS bounds of each of the constituent user traffic. Thus, for each individual traffic source, the buffer delay distributions at the nodes of the network, obtained for different traffic distributions, may serve as its QoS measure. If the hypothesis is proven true, in essence, the number of traffic sources and their bandwidth distribution will serve asa practically realizable high level traffic control in providing realistic QoS guarantees for every call. To verify the correctness of the hypothesis, an experiment is designed that consists of a representative ATM network, traffic sources that are characterized through representative and realistic user-provided parameters, and a given set of input traffic volumes appropriate for a network provider approved link utilization measure. The key source traffic parameters include the number of sources that are incident on the network and the constituent links at any given time, the bandwidth requirement of the sources, and their nature. For each call, the constituent cells are generated stochastically, utilizing the typical user-provided parameter as an estimate of the bandwidth requirement. Extensive simulations reveal that, for a given link utilization level held uniform throughout the network, while the QoS metrics--end-to-end cell delay, jitter, and loss, are superior in the presence of many calls each with low bandwidth requirement, they are significantly worse when the network carries fewer calls of very high bandwidths. The findings demonstrate the feasibility of guaranteeing QoS for each and every call through high level traffic controls. As for practicality, call durations are relatively long, ranging from ms to even minutes, thereby enabling network management to exercise realistic controls over them, even in a geographically widely dispersed ATM network. In contrast, current traffic controls that act on ATM cells at the UNI face formidable challenge from high bandwidth traffic where cell lifetimes may be extremely short, in the range of µs. The findings also underscore two additional important contributions of this paper. First, the network provider may collect data on the high level user traffic characteristics, compute the corresponding average link utilization in the network, and measure the cumulative buffer delay distributions at the nodes, in an operational network. The provider may then determine, based on all relevant criteria, a range of input and system parameters over which the network may be permitted to operate, the intersection of all of which may yield a realistic network operating point (NOP). During subsequent operation of the network, the network provider may guide and maintain the network at a desired NOP by exercising control over the input and system parameters including link utilization, call admittance based on the requested bandwidth, etc. Second, the finding constitutes a vulnerability of ATM networks which a perpetrator may exploit to launch a performance attack.

  • Temperature Measurements of Breaking Arc between Copper Contacts at Three Constant Speeds (10, 20 and 30 mm/s)

    Tetsuya KITAJIMA  Junya SEKIKAWA  Mitsuru TAKEUCHI  Takayoshi KUBONO  

    PAPER-Arc Discharge

    E87-C No:8

    The purpose of this study is to examine the impact of the opening speed on a breaking arc. The opening speeds are 10, 20 and 30 mm/s. The breaking arc is generated in a D.C. 42 V/10.5 A circuit, and the arc voltage, the arc current, the gap length and the arc spectrum intensity are measured. Arc temperature is calculated by using a Boltzmann plot. Even if the opening speed is changed, the arc temperature starts from a high temperature, and falls gradually to 4650-4750 K with time. Namely, the opening speed has no influence on the arc temperature.

  • Metaheuristic Optimization Algorithms for Texture Classification Using Multichannel Approaches

    Jing-Wein WANG  


    E87-A No:7

    This paper proposes the use of the ratio of wavelet extrema numbers taken from the horizontal and vertical counts respectively as a texture feature, which is called aspect ratio of extrema number (AREN). We formulate the classification problem upon natural and synthesized texture images as an optimization problem and develop a coevolving approach to select both scalar wavelet and multiwavelet feature spaces of greater discriminatory power. Sequential searches and genetic algorithms (GAs) are comparatively investigated. The experiments using wavelet packet decompositions with the innovative packet-tree selection scheme ascertain that the classification accuracy of coevolutionary genetic algorithms (CGAs) is acceptable enough.

  • Terahertz Spectroscopic Imaging and Its Application to Drug Detection

    Kodo KAWASE  Yuichi OGAWA  Yuuki WATANABE  


    E87-C No:7

    We have developed a novel basic technology for terahertz (THz) imaging, which allows detection and identification of chemicals by introducing the component spatial pattern analysis. The spatial distributions of the chemicals were obtained from terahertz multispectral transillumination images, using absorption spectra previously measured with a widely tunable THz-wave parametric oscillator. We have also separated the component spatial patterns of frequency-dependent absorptions in chemicals and frequency-independent components such as plastic, paper and measurement noise in THz spectroscopic images. Further we have applied this technique to the detection and identification of illicit drugs concealed in envelopes.

  • A Statistical Method of Evaluating Pronunciation Proficiency for English Words Spoken by Japanese

    Seiichi NAKAGAWA  Naoki NAKAMURA  Kazumasa MORI  

    PAPER-Speech and Hearing

    E87-D No:7

    In this paper, we propose a statistical method of evaluating the pronunciation proficiency of English words spoken by Japanese. We analyzed statistically the utterances to note a combination that has a high correlation between an English teacher's score and certain acoustic features. We obserbed that the phoneme recognition rates (correct rate and accuracy) were the best measure of pronunciation proficiency, and the likelihood ratio of English phoneme acoustic models to phoneme acoustic models adapted by Japanese was the second best measure. The effective measure which was highly correlated with the English teacher's score was the combination of the likelihood for American native models, likelihood for English models adapted by Japanese, the best likelihood for arbitrary sequences of acoustic models, phoneme recognition rate and the rate of speech. We obtained a correlation coefficient of 0.81 with an open data for vocabulary and 0.69 with open data for speaker at the five words set level, respectively. The coefficient was higher than the correlation between humans' scores, 0.65. In the 15 words set level which corresponds to one or two sentences, we obtained the correlation coefficient of 0.86 with open data for the speaker.

  • Harmonic Model Based Excitation Enhancement for Low-Bit-Rate Speech Coding

    Hong Kook KIM  Mi Suk LEE  Chul Hong KWON  

    LETTER-Speech and Hearing

    E87-D No:7

    A new excitation enhancement technique based on a harmonic model is proposed in this paper to improve the speech quality of low-bit-rate speech coders. This technique is employed only in the decoding process of speech coders and improves high-frequency components of excitation. We develop the procedure of harmonic model parameters estimation and harmonic generation and apply the technique to a current state-of-art low bit rate speech coder. Experiments on spectrum reading and spectrum distortion measurement show that the proposed excitation enhancement technique improves speech quality.

  • Node Mobility Aware Routing for Mobile Ad Hoc Network

    Shinichi FURUSHO  Teruaki KITASUKA  Tsuneo NAKANISHI  Akira FUKUDA  


    E87-B No:7

    In ad-hoc on-demand routing algorithm, when a route is broken a relay node must perform error transaction and the source node must do rerouting to discover an alternate route. It is important to construct a stable route when route discovery occurs. In this paper, we use relative speeds among nodes as a measure of node mobility. Our routing algorithm chooses nodes of lower relative speed as relay nodes. As a result of our simulation, when there is one session in the network, our proposing algorithm can reduce the number of route breaks: about 3 times smaller than DSR. And our proposing algorithm can deliver more packets than DSR: 18% higher rate. However, in the congested traffic situation our algorithm should be improved.

  • An Algorithm for Detecting 3-Way Feature Interactions

    Shizuko KAWAUCHI  Tadashi OHTA  

    PAPER-Software Development Environment

    E87-B No:7

    This paper proposes an algorithm for detecting 3-way interactions. As far as the authors know, this is the first proposal ever made for a detection algorithm of 3-way interactions. In this paper, by analyzing examples, the mechanism of 3-way interactions is clarified and a detection algorithm of 3-way interactions is proposed. Namely the proposed detection algorithm is heuristic. To evaluate the algorithm, we implemented a detection system based on the proposed algorithm and applied it to 12 services, and 82 3-way interactions were detected. This shows the proposed algorithm is effective.

  • Auto Focusing Algorithm for Iris Recognition Camera Using Corneal Specular Reflection

    Kang Ryoung PARK  

    This paper was deleted on March 10, 2006 because it was found to be a duplicate submission (see details in the pdf file).
    PAPER-Image Processing and Video Processing

    E87-D No:7

    Iris recognition is used to identify a user based on the iris texture information which exists between the white sclera and the black pupil. For fast iris recognition, it is very important to capture user's focused eye image at fast speed. If not, the total recognition time is increased and it makes the user feel much inconvenience. In previous researches and systems, they use the focusing method which has been used for general landscape scene without considering the characteristics of iris image. So, they take much focusing time sometimes, especially in case of the user with glasses. To overcome such problems, we propose a new iris image acquisition method to capture user's focused eye image at very fast speed based on the corneal specular reflection. Experimental results show that the focusing time for both the users with glasses and without glasses is average 480 ms and we can conclude our method can be used for the real-time iris recognition camera.

  • Tunable Dispersion and Dispersion Slope Compensator Based on Two Twin Chirped FBGs with Temperature Gradient for 160 Gbit/s Transmission

    Shin-ichi WAKABAYASHI  Asako BABA  Hitomi MORIYA  Xiaomin WANG  Tatsushi HASEGAWA  Akira SUZUKI  


    E87-C No:7

    We have developed the tunable dispersion compensator based on two twin linearly chirped fiber Bragg gratings with various temperature gradients. Controlling the temperature gradient over one of the twin fiber Bragg gratings by Peltier elements, the dispersion and the dispersion slope were changed independently and continuously. The dispersion and dispersion slope compensator has a large bandwidth of 8 nm and low group-delay ripple of < 4 ps in its chirped fiber Bragg gratings. We experimentally demonstrated a precise controllability of the dispersion and the dispersion slope using linear and parabolic temperature gradient. The dispersion and the dispersion slope changes were achieved continuously with -0.67 ps/nm/ and -0.14 ps/nm2/. The transmission characteristics of the dispersion slope compensation were examined using ultra short pulses in the fiber link. When the total dispersion was zero, the distorted pulse was restored back and the tail was significantly suppressed. 160 Gbit/s signals were also demonstrated over 140 km within 1 dB power penalty by using the dispersion slope compensator.

  • Wide-Band Dispersion Compensation for 1000-km Single-Mode Fiber by Midway Spectral Inversion Using Cascaded Nonlinearities in LiNbO3 Waveguide

    Xiaomin WANG  Daisuke KUNIMATSU  Tatsushi HASEGAWA  Akira SUZUKI  


    E87-C No:7

    We demonstrate the wide-band (> 25-nm) long-distance (> 1000-km) chromatic dispersion compensation by midway spectral inversion (MSI) using a periodically-polled LiNbO3 device. In order to achieve a flat zero net dispersion, the fourth order dispersion of the single-mode fibers is canceled by MSI, while the third order dispersion is compensated for by the negative slope dispersion compensation fiber (NS-DCF). The second order dispersion is canceled out by both. The long distance propagation is realized by a double recirculation-loop system. A very flat zero dispersion is measured for the first time for over 1000-km single-mode fiber propagation with MSI dispersion compensation.

  • Speculative Selection Routing in 2D Torus Network

    Tran CONG SO  Shigeru OYANAGI  Katsuhiro YAMAZAKI  

    PAPER-Networking and System Architectures

    E87-D No:7

    We have proposed a speculative selection function for adaptive routing, which uses idle cycles of the network physical links to exchange network information between nodes, thus helps to decide the best selection. Previous study on the mesh network showed that SSR gives message selection flexibility that improves network performance by balancing the network traffic in both global and local scopes. This paper evaluates the speculative selection function on 2D torus network with simulation. The simulation compares the network throughput and latency with various traffic patterns. The visualization graphs show how the speculative selection eliminates hotspots and disperses traffic in the global scope. The simulation results demonstrate that by using speculative selection, the network performance is increased by around 7%. Compared to the mesh network, the torus's version has smaller gain due to the high performance nature of the torus network.

  • A Study of Aspect Ratio of the Aperture and the Effect on Antenna Efficiency in Oversized Rectangular Slotted Waveguide Arrays

    Hisahiro KAI  Jiro HIROKAWA  Makoto ANDO  

    PAPER-Antenna and Propagation

    E87-B No:6

    A post-wall waveguide-fed parallel plate slotted array is an attractive candidate for high efficiency and mass producible planar array antennas for millimeter wave applications. For the slot design of this large sized array, a periodic boundary wall model based on the assumption of infinite array size and a parallel waveguide is used. In fact, the aperture is large but still finite (10-40 wavelength) and the TEM-like wave is perturbed due to the narrow walls at the periphery of the aperture as well as the slot coupling; antenna efficiency is affected by the size and the aspect ratio of the aperture. All these observations imply the unique defects of oversized waveguide arrays. In this paper, the aperture efficiency of post-wall waveguide arrays is assessed as a function of size and aspect ratio of the aperture for the first time, both in theory and measurement. An effective field analysis for an electrically large oversized waveguide array, developed by the author, is utilized for determining the slot excitation coefficients and aperture illumination. It is predicted that the oversized waveguide array has a potential efficiency of 80-90% if the aperture is larger than 18 wavelength on a side and the gain is more than 30 dBi. A transversely wide aperture generally provides higher efficiency than a longitudinally long aperture, provided a perfectly uniform TEM wave would be launched from the feed waveguide.

  • Compensation of Speech Coding Distortion for Wireless Speech Recognition

    Hong Kook KIM  

    LETTER-Speech and Hearing

    E87-D No:6

    In this paper, we perform some experiments to show that the quantization noise caused by low-bit-rate speech coding can be characterized as a white noise process. Then, the signal-to-quantization noise ratio of the decoded speech for a given bit-rate is estimated by observing the perceptual speech quality equivalent to the artificially generated noisy speech obtained by adding a white Gaussian noise source. This information is incorporated into the parameter tuning of a noise-robust compensation algorithm for speech recognition so that the compensation algorithm can be performed better under a range of the estimated SNRs. Finally, we apply the compensation algorithm to a connected digit string recognition system that utilizes speech signals decoded by the GSM adaptive multi-rate (AMR) speech coder. It is shown that the noise-robust compensation algorithm reduces word error rates by 15% or more at low bit-rate modes of the AMR speech coder.

  • TAJODA: Proposed Tactile and Jog Dial Interface for the Blind

    Chieko ASAKAWA  Hironobu TAKAGI  Shuichi INO  Tohru IFUKUBE  


    E87-D No:6

    There is a fatal difference in obtaining information between sighted people and the blind. Screen reading technology assists blind people in accessing digital documents by themselves helping to bridge such gap. However, these days they are becoming much more visual using various types of visual effects for sighted people to explore the information intuitively at a glance. It is very hard to convey visual effects non-visually and intuitively while retaining the original effects. In addition, it takes a long time to explore the information, since blind people use the keyboard for exploration, while sighted people use eye movement. This research aims at improving the non-visual exploration interface and improving the quality of non-visual information. Therefore, TAJODA (tactile jog dial interface) was proposed to solve these problems. It presents verbal information (text information) in the form of speech, while nonverbal information (visual effects) is represented in the form of tactile sensations. It uses a jog dial as an exploration device, which makes it possible to explore forward or backward intuitively in the speech information by spinning the jog dial clockwise or counterclockwise. It also integrates a tactile device to represent visual effects non-visually. Both speech and tactile information can be synchronized with the dial movements. The speed of spinning the dial affects the speech rate. The main part of this paper describes an experimental evaluation of the effectiveness of the proposed TAJODA interface. The experimental system used a preprocessed recorded human voice as test data. The training sessions showed that it was easy to learn how to use TAJODA. The comparison test session clearly showed that the subjects could perform the comparison task using TAJODA significantly faster (2.4 times faster) than with the comparison method that is closest to the existing screen reading function. Through this experiment, our results showed that TAJODA can drastically improve the non-visual exploration interface.

  • Methods of Improving the Accuracy and Reproducibility of Objective Quality Assessment of VoIP Speech

    Akira TAKAHASHI  Masataka MASUDA  Atsuko KURASHIMA  

    PAPER-Multimedia Systems

    E87-B No:6

    VoIP is one of the key technologies for recent telecommunication services. The quality of its services should be discussed in subjective terms. Since subjective quality assessment is time-consuming and expensive, however, objective quality assessment which estimates subjective quality without carrying out subjective quality experiments is desirable. This paper discusses the performance of the objective quality measure that was standardized as ITU-T Recommendation P.862 and clarifies the quality factors that can be evaluated with satisfactory accuracy based on it. We found that P.862 can be applied to the evaluation of coding distortion, tandeming of codecs, transmission bit-errors, packet loss, and silence compression in a codec, at least for clean Japanese speech. In addition, we propose a method of estimating the subjective quality evaluation value from objective measurement results and show the validity of this method. We also evaluate the uniqueness of objective quality assessment based on P.862 from the viewpoints of the effect of measurement noise and the variation of test speech samples, and propose how to improve the reproducibility of objective quality assessment.

  • Microphone Array with Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator for Speech Enhancement

    Hongseok KWON  Jongmok SON  Keunsung BAE  


    E87-A No:6

    This paper describes a new speech enhancement system that employs a microphone array with post-processing based on minimum mean-square error short-time spectral amplitude (MMSE-STSA) estimator. To get more accurate MMSE-STSA estimator in a microphone array, modification and refinement procedure are carried out from each microphone output. Performance of the proposed system is compared with that of other methods using a microphone array. Noise removal experiments for white and pink noises demonstrate the superiority of the proposed speech enhancement system to others with a microphone array in average output SNRs and cepstral distance measures.

  • Toward QoS Management of VoIP: Experimental Investigation of the Relations between IP Network Performances and VoIP Speech Quality

    Hiroki FURUYA  Shinichi NOMOTO  Hideaki YAMADA  Norihiro FUKUMOTO  Fumiaki SUGAYA  


    E87-B No:6

    This paper investigates the relations between IP network performances and the speech quality of the Voice over IP (VoIP) service through extensive experiments on a test bed network. The aim is to establish an effective and practical methodology for telecommunications operators to manage the quality of VoIP service via the management of IP network performances under their control. As IP network performances, utilization of the bottleneck link in the test bed and the following statistical factors of VoIP packets are examined: the standard deviation of delay variations (jitters), the standard deviation of packet interarrival times, and the packet loss ratio. On the other hand, VoIP speech quality is monitored as the Perceptual Evaluation of Speech Quality (PESQ). To investigate the relations under various network conditions, the experiments are performed by varying the following network related parameters of the test bed: the bandwidth of the bottleneck link, the size of the bottleneck buffer, the propagation delay, and the average of the data sizes transmitted as background data traffic. Statistical analyses of the experimental results suggest that managing the standard deviation of jitters in a network serves as a promising methodology, because its close relation to VoIP speech quality possesses robustness to changes in the network conditions. The robustness makes it practically useful since telecommunications operators can apply it to their networks, which are subject to change. The findings in this paper have opened up new visions for telecommunications operators to manage the Quality of Service (QoS) of VoIP service.

  • Design of a Robust LSP Quantizer for a High-Quality 4-kbit/s CELP Speech Coder

    Yusuke HIWASAKI  Kazunori MANO  Kazutoshi YASUNAGA  Toshiyuki MORII  Hiroyuki EHARA  Takao KANEKO  

    PAPER-Speech and Hearing

    E87-D No:6

    This paper presents an efficient LSP quantizer implementation for low bit-rate coders. The major feature of the quantizer is that it uses a truncated cepstral distance criterion for the code selection procedure. This approach has generally been considered too computationally costly. We utilized the quantizer with a moving-average predictor, two-stage-split vector quantizer and delayed decision. We have investigated the optimal parameter settings in this case and incorporated the quantizer thus obtained into an ITU-T 4-kbit/s speech coding candidate algorithm with a bit budget of 21 bits. The objective performance is better than that with a conventional weighted mean-square criterion, while the complexity is still kept to a reasonable level. The paper also describes the codebook design and techniques that were employed to achieve robustness in noisy channel conditions.
