IEICE global.ieice.org Site

Author Search Result

[Author] Kazuo ONO(12hit)

1-12hit

Coupling Characteristics between a Slab Waveguide and a Tapered Slab Waveguide with a Wedge-Shaped Nonlinear Cladding
Kazuo ONO Tamotsu SAKAI Hisashi OSAWA Yoshihiro OKAMOTO

LETTER-Opto-Electronics

Vol:
E75-C No:8
Page(s):
953-956
A novel coupling configuration consisting of a tapered slab waveguide with a wedge-shaped nonlinear cladding is proposed. Coupling characteristics for TE waves are analyzed by means of the beam propagation method. The proposed configuration is less sensitive to the offset between coupled waveguides than is the configuration with a homogeneous non-linear cladding.
Modal-Matching Analysis of Loss in Bent Graded-Index Optical Slab Waveguides
Maria MIRIANASHVILI Kazuo ONO Masashi HOTTA

PAPER-Electromagnetic Theory

Vol:
E84-C No:2
Page(s):
238-242
Loss analysis in bent graded-index optical slab waveguides is given using the modal-matching method. The conformal mapping replaces curved structure by an equivalent straight waveguide with a modified index profile. For this planar waveguide structure, the normal modes are calculated using a multilayer approximation method. The wave incident on the bend is expanded initially into a finite set of normal modes of the equivalent straight structure, and the transverse fields are matched across the junction. The numerical results show the loss formation in the graded-index waveguides and its dependence of the effective index of the corresponding straight waveguide.
Filter Bank Subtraction for Robust Speech Recognition
Kazuo ONOE Hiroyuki SEGI Takeshi KOBAYAKAWA Shoei SATO Shinichi HOMMA Toru IMAI Akio ANDO

PAPER-Robust Speech Recognition and Enhancement

Vol:
E86-D No:3
Page(s):
483-488
In this paper, we propose a new technique of filter bank subtraction for robust speech recognition under various acoustic conditions. Spectral subtraction is a simple and useful technique for reducing the influence of additive noise. Conventional spectral subtraction assumes accurate estimation of the noise spectrum and no correlation between speech and noise. Those assumptions, however, are rarely satisfied in reality, leading to the degradation of speech recognition accuracy. Moreover, the recognition improvement attained by conventional methods is slight when the input SNR changes sharply. We propose a new method in which the output values of filter banks are used for noise estimation and subtraction. By estimating noise at each filter bank, instead of at each frequency point, the method alleviates the necessity for precise estimation of noise. We also take into consideration expected phase differences between the spectra of speech and noise in the subtraction and control a subtraction coefficient theoretically. Recognition experiments on test sets at several SNRs showed that the filter bank subtraction technique improved the word accuracy significantly and got better results than conventional spectral subtraction on all the test sets. In other experiments, on recognizing speech from TV news field reports with environmental noise, the proposed subtraction method yielded better results than the conventional method.
Coupling Characteristics of Butt-Joined Single-Mode Slab Waveguide and Tapered Slab Waveguide with Nonlinear Cladding
Kazuo ONO Tamotsu SAKAI Hisashi OSAWA Yoshihiro OKAMOTO

LETTER-Electromagnetic Theory

Vol:
E74-A No:12
Page(s):
3949-3951
Power dependent coupling efficiency for a butt-joined configuration using nonlinear cladding is analyzed by the beam propagation method. This configuration is not so sensitive to the angular misalignment as well as the offset between coupled waveguides on condition that the input power is appropriate to assure the propagation of spatial soliton.
Design Considerations for Multimode Y Junction Waveguides in Lens-Like Media
Kazuo ONO Shinnosuke SAWA

LETTER-Electro-Optics

Vol:
E73-E No:6
Page(s):
870-872
A design method of mode conversion type Y junction waveguides in lens-like media is proposed. The Y junction designed for reducing the mode conversion losses form the fundamental mode to higher order even modes can also reduce the mode conversion losses form the lowest odd mode to higher order odd modes.
Robust Speech Recognition by Using Compensated Acoustic Scores
Shoei SATO Kazuo ONOE Akio KOBAYASHI Toru IMAI

PAPER-Speech Recognition

Vol:
E89-D No:3
Page(s):
915-921
This paper proposes a new compensation method of acoustic scores in the Viterbi search for robust speech recognition. This method introduces noise models to represent a wide variety of noises and realizes robust decoding together with conventional techniques of subtraction and adaptation. This method uses likelihoods of noise models in two ways. One is to calculate a confidence factor for each input frame by comparing likelihoods of speech models and noise models. Then the weight of the acoustic score for a noisy frame is reduced according to the value of the confidence factor for compensation. The other is to use the likelihood of noise model as an alternative that of a silence model when given noisy input. Since a lower confidence factor compresses acoustic scores, the decoder rather relies on language scores and keeps more hypotheses within a fixed search depth for a noisy frame. An experiment using commentary transcriptions of a broadcast sports program (MLB: Major League Baseball) showed that the proposed method obtained a 6.7% relative word error reduction. The method also reduced the relative error rate of key words by 17.9%, and this is expected lead to an improvement metadata extraction accuracy.
Bi-Spectral Acoustic Features for Robust Speech Recognition
Kazuo ONOE Shoei SATO Shinichi HOMMA Akio KOBAYASHI Toru IMAI Tohru TAKAGI

LETTER

Vol:
E91-D No:3
Page(s):
631-634
The extraction of acoustic features for robust speech recognition is very important for improving its performance in realistic environments. The bi-spectrum based on the Fourier transformation of the third-order cumulants expresses the non-Gaussianity and the phase information of the speech signal, showing the dependency between frequency components. In this letter, we propose a method of extracting short-time bi-spectral acoustic features with averaging features in a single frame. Merged with the conventional Mel frequency cepstral coefficients (MFCC) based on the power spectrum by the principal component analysis (PCA), the proposed features gave a 6.9% relative lower a word error rate in Japanese broadcast news transcription experiments.
Word Error Rate Minimization Using an Integrated Confidence Measure
Akio KOBAYASHI Kazuo ONOE Shinichi HOMMA Shoei SATO Toru IMAI

PAPER-Speech and Hearing

Vol:
E90-D No:5
Page(s):
835-843
This paper describes a new criterion for speech recognition using an integrated confidence measure to minimize the word error rate (WER). The conventional criteria for WER minimization obtain the expected WER of a sentence hypothesis merely by comparing it with other hypotheses in an n-best list. The proposed criterion estimates the expected WER by using an integrated confidence measure with word posterior probabilities for a given acoustic input. The integrated confidence measure, which is implemented as a classifier based on maximum entropy (ME) modeling or support vector machines (SVMs), is used to acquire probabilities reflecting whether the word hypotheses are correct. The classifier is comprised of a variety of confidence measures and can deal with a temporal sequence of them to attain a more reliable confidence. Our proposed criterion for minimizing WER achieved a WER of 9.8% and a 3.9% reduction, relative to conventional n-best rescoring methods in transcribing Japanese broadcast news in various environments such as under noisy field and spontaneous speech conditions.
Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR
Shoei SATO Akio KOBAYASHI Kazuo ONOE Shinichi HOMMA Toru IMAI Tohru TAKAGI Tetsunori KOBAYASHI

PAPER-Speech and Hearing

Vol:
E91-D No:3
Page(s):
815-824
We present a novel method of integrating the likelihoods of multiple feature streams, representing different acoustic aspects, for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a higher weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to show discriminative ability. A conventional method proposed for the recognition of spoken digits calculates the weights from the entropy of the whole set of HMM states. This paper extends the dynamic weighting to a real-time large-vocabulary continuous speech recognition (LVCSR) system. The proposed weight is calculated in real-time from mutual information between an input stream and active HMM states in a search space without an additional likelihood calculation. Furthermore, the mutual information takes the width of the search space into account by calculating the marginal entropy from the number of active states. In this paper, we integrate three features that are extracted through auditory filters by taking into account the human auditory system's ability to extract amplitude and frequency modulations. Due to this, features representing energy, amplitude drift, and resonant frequency drifts, are integrated. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments on field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9.2% in field reports and 4.7% in spontaneous commentaries relative to the best result obtained from a single stream.
Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News
Toru IMAI Shoei SATO Shinichi HOMMA Kazuo ONOE Akio KOBAYASHI

PAPER-Speech and Hearing

Vol:
E90-D No:8
Page(s):
1286-1291
This paper describes a new method to detect speech segments online with identifying gender attributes for efficient dual gender-dependent speech recognition and broadcast news captioning. The proposed online speech detection performs dual-gender phoneme recognition and detects a start-point and an end-point based on the ratio between the cumulative phoneme likelihood and the cumulative non-speech likelihood with a very small delay from the audio input. Obtaining the speech segments, the phoneme recognizer also identifies gender attributes with high discrimination in order to guide the subsequent dual-gender continuous speech recognizer efficiently. As soon as the start-point is detected, the continuous speech recognizer with paralleled gender-dependent acoustic models starts a search and allows search transitions between male and female in a speech segment based on the gender attributes. Speech recognition experiments on conversational commentaries and field reporting from Japanese broadcast news showed that the proposed speech detection method was effective in reducing the false rejection rate from 4.6% to 0.53% and also recognition errors in comparison with a conventional method using adaptive energy thresholds. It was also effective in identifying the gender attributes, whose correct rate was 99.7% of words. With the new speech detection and the gender identification, the proposed dual-gender speech recognition significantly reduced the word error rate by 11.2% relative to a conventional gender-independent system, while keeping the computational cost feasible for real-time operation.
Fluctuation Tolerant Charge-Integration Read Scheme for Ultrafast DNA Sequencing with Nanopore Device
Kazuo ONO Yoshimitsu YANAGAWA Akira KOTABE Riichiro TAKEMURA Tatsuo NAKAGAWA Tomio IWASAKI Takayuki KAWAHARA

PAPER

Vol:
E95-C No:4
Page(s):
651-660
A charge-integration read scheme has been developed for a solid-nanopore DNA-sequencer that determines a genome by direct and electrical measurements of transverse tunneling current in single-stranded DNA. The magnitude of the current was simulated with a first-principles molecular dynamics method. It was found that the magnitude is as small as in the sub-pico ampere range, and signals from four bases represent wide distributions with overlaps between each base. The distribution is believed to originate with translational and rotational motion of DNA in a nanopore with a frequency of over 105 Hz. A sequence scheme is presented to distinguish the distributed signals. The scheme makes widely distributed signals time-integrated convergent by cumulating charge at the capacitance of a nanopore device and read circuits. We estimated that an integration time of 1.4 ms is sufficient to obtain a signal difference of over 10 mV for distinguishing between each DNA base. Moreover, the time is shortened if paired bases, such as A-T and C-G in double-stranded DNA, can be measured simultaneously with two nanopores. Circuit simulations, which included the capacitance of a nanopore calculated with a device simulator, successfully distinguished between DNA bases in less than 2.0 ms. The speed is roughly six orders faster than that of a conventional DNA sequencer. It is possible to determine the human genome in one day if 100-nanopores are operated in parallel.
Simultaneous Subtitling System for Broadcast News Programs with a Speech Recognizer
Akio ANDO Toru IMAI Akio KOBAYASHI Shinich HOMMA Jun GOTO Nobumasa SEIYAMA Takeshi MISHIMA Takeshi KOBAYAKAWA Shoei SATO Kazuo ONOE Hiroyuki SEGI Atsushi IMAI Atsushi MATSUI Akira NAKAMURA Hideki TANAKA Tohru TAKAGI Eiichi MIYASAKA Haruo ISONO

INVITED PAPER

Vol:
E86-D No:1
Page(s):
15-25
There is a strong demand to expand captioned broadcasting for TV news programs in Japan. However, keyboard entry of captioned manuscripts for news program cannot keep pace with the speed of speech, because in the case of Japanese it takes time to select the correct characters from among homonyms. In order to implement simultaneous subtitled broadcasting for Japanese news programs, a simultaneous subtitling system by speech recognition has been developed. This system consists of a real-time speech recognition system to handle broadcast news transcription and a recognition-error correction system that manually corrects mistakes in the recognition result with short delay time. NHK started simultaneous subtitled broadcasting for the news program "News 7" on the evening of March 27, 2000.

Author Search Result

[Author] Kazuo ONO(12hit)

Coupling Characteristics between a Slab Waveguide and a Tapered Slab Waveguide with a Wedge-Shaped Nonlinear Cladding

Modal-Matching Analysis of Loss in Bent Graded-Index Optical Slab Waveguides

Filter Bank Subtraction for Robust Speech Recognition

Coupling Characteristics of Butt-Joined Single-Mode Slab Waveguide and Tapered Slab Waveguide with Nonlinear Cladding

Design Considerations for Multimode Y Junction Waveguides in Lens-Like Media

Robust Speech Recognition by Using Compensated Acoustic Scores

Bi-Spectral Acoustic Features for Robust Speech Recognition

Word Error Rate Minimization Using an Integrated Confidence Measure

Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR

Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News

Fluctuation Tolerant Charge-Integration Read Scheme for Ultrafast DNA Sequencing with Nanopore Device

Simultaneous Subtitling System for Broadcast News Programs with a Speech Recognizer

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles