IEICE global.ieice.org Site

Keyword Search Result

[Keyword] ISY(44hit)

1-20hit(44hit)

Unbiased Pseudo-Labeling for Learning with Noisy Labels
Ryota HIGASHIMOTO Soh YOSHIDA Takashi HORIHATA Mitsuji MUNEYASU

LETTER

Pubricized:
2023/09/19
Vol:
E107-D No:1
Page(s):
44-48
Noisy labels in training data can significantly harm the performance of deep neural networks (DNNs). Recent research on learning with noisy labels uses a property of DNNs called the memorization effect to divide the training data into a set of data with reliable labels and a set of data with unreliable labels. Methods introducing semi-supervised learning strategies discard the unreliable labels and assign pseudo-labels generated from the confident predictions of the model. So far, this semi-supervised strategy has yielded the best results in this field. However, we observe that even when models are trained on balanced data, the distribution of the pseudo-labels can still exhibit an imbalance that is driven by data similarity. Additionally, a data bias is seen that originates from the division of the training data using the semi-supervised method. If we address both types of bias that arise from pseudo-labels, we can avoid the decrease in generalization performance caused by biased noisy pseudo-labels. We propose a learning method with noisy labels that introduces unbiased pseudo-labeling based on causal inference. The proposed method achieves significant accuracy gains in experiments at high noise rates on the standard benchmarks CIFAR-10 and CIFAR-100.
Sample Selection Approach with Number of False Predictions for Learning with Noisy Labels
Yuichiro NOMURA Takio KURITA

PAPER-Image Recognition, Computer Vision

Pubricized:
2022/07/21
Vol:
E105-D No:10
Page(s):
1759-1768
In recent years, deep neural networks (DNNs) have made a significant impact on a variety of research fields and applications. One drawback of DNNs is that it requires a huge amount of dataset for training. Since it is very expensive to ask experts to label the data, many non-expert data collection methods such as web crawling have been proposed. However, dataset created by non-experts often contain corrupted labels, and DNNs trained on such dataset are unreliable. Since DNNs have an enormous number of parameters, it tends to overfit to noisy labels, resulting in poor generalization performance. This problem is called Learning with Noisy labels (LNL). Recent studies showed that DNNs are robust to the noisy labels in the early stage of learning before over-fitting to noisy labels because DNNs learn the simple patterns first. Therefore DNNs tend to output true labels for samples with noisy labels in the early stage of learning, and the number of false predictions for samples with noisy labels is higher than for samples with clean labels. Based on these observations, we propose a new sample selection approach for LNL using the number of false predictions. Our method periodically collects the records of false predictions during training, and select samples with a low number of false predictions from the recent records. Then our method iteratively performs sample selection and training a DNNs model using the updated dataset. Since the model is trained with more clean samples and records more accurate false predictions for sample selection, the generalization performance of the model gradually increases. We evaluated our method on two benchmark datasets, CIFAR-10 and CIFAR-100 with synthetically generated noisy labels, and the obtained results which are better than or comparative to the-state-of-the-art approaches.
Consistency Regularization on Clean Samples for Learning with Noisy Labels
Yuichiro NOMURA Takio KURITA

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2021/10/28
Vol:
E105-D No:2
Page(s):
387-395
In the recent years, deep learning has achieved significant results in various areas of machine learning. Deep learning requires a huge amount of data to train a model, and data collection techniques such as web crawling have been developed. However, there is a risk that these data collection techniques may generate incorrect labels. If a deep learning model for image classification is trained on a dataset with noisy labels, the generalization performance significantly decreases. This problem is called Learning with Noisy Labels (LNL). One of the recent researches on LNL, called DivideMix [1], has successfully divided the dataset into samples with clean labels and ones with noisy labels by modeling loss distribution of all training samples with a two-component Mixture Gaussian model (GMM). Then it treats the divided dataset as labeled and unlabeled samples and trains the classification model in a semi-supervised manner. Since the selected samples have lower loss values and are easy to classify, training models are in a risk of overfitting to the simple pattern during training. To train the classification model without overfitting to the simple patterns, we propose to introduce consistency regularization on the selected samples by GMM. The consistency regularization perturbs input images and encourages model to outputs the same value to the perturbed images and the original images. The classification model simultaneously receives the samples selected as clean and their perturbed ones, and it achieves higher generalization performance with less overfitting to the selected samples. We evaluated our method with synthetically generated noisy labels on CIFAR-10 and CIFAR-100 and obtained results that are comparable or better than the state-of-the-art method.
Noisy Localization Annotation Refinement for Object Detection
Jiafeng MAO Qing YU Kiyoharu AIZAWA

PAPER-Image Recognition, Computer Vision

Pubricized:
2021/05/25
Vol:
E104-D No:9
Page(s):
1478-1485
Well annotated dataset is crucial to the training of object detectors. However, the production of finely annotated datasets for object detection tasks is extremely labor-intensive, therefore, cloud sourcing is often used to create datasets, which leads to these datasets tending to contain incorrect annotations such as inaccurate localization bounding boxes. In this study, we highlight a problem of object detection with noisy bounding box annotations and show that these noisy annotations are harmful to the performance of deep neural networks. To solve this problem, we further propose a framework to allow the network to modify the noisy datasets by alternating refinement. The experimental results demonstrate that our proposed framework can significantly alleviate the influences of noise on model performance.
A Game-Theoretic Approach for Community Detection in Signed Networks
Shuaihui WANG Guyu HU Zhisong PAN Jin ZHANG Dong LI

PAPER-Graphs and Networks

Vol:
E102-A No:6
Page(s):
796-807
Signed networks are ubiquitous in the real world. It is of great significance to study the problem of community detection in signed networks. In general, the behaviors of nodes in a signed network are rational, which coincide with the players in the theory of game that can be used to model the process of the community formation. Unlike unsigned networks, signed networks include both positive and negative edges, representing the relationship of friends and foes respectively. In the process of community formation, nodes usually choose to be in the same community with friends and between different communities with enemies. Based on this idea, we proposed a game theory model to address the problem of community detection in signed networks. Taking nodes as players, we build a gain function based on the numbers of positive edges and negative edges inside and outside a community, and prove the existence of Nash equilibrium point. In this way, when the game reaches the Nash equilibrium state, the optimal strategy space for all nodes is the result of the final community division. To systematically investigate the performance of our method, elaborated experiments on both synthetic networks and real-world networks are conducted. Experimental results demonstrate that our method is not only more accurate than other existing algorithms, but also more robust to noise.
In-Vehicle Voice Interface with Improved Utterance Classification Accuracy Using Off-the-Shelf Cloud Speech Recognizer
Takeshi HOMMA Yasunari OBUCHI Kazuaki SHIMA Rintaro IKESHITA Hiroaki KOKUBO Takuya MATSUMOTO

PAPER-Speech and Hearing

Pubricized:
2018/08/31
Vol:
E101-D No:12
Page(s):
3123-3137
For voice-enabled car navigation systems that use a multi-purpose cloud speech recognition service (cloud ASR), utterance classification that is robust against speech recognition errors is needed to realize a user-friendly voice interface. The purpose of this study is to improve the accuracy of utterance classification for voice-enabled car navigation systems when inputs to a classifier are error-prone speech recognition results obtained from a cloud ASR. The role of utterance classification is to predict which car navigation function a user wants to execute from a spontaneous utterance. A cloud ASR causes speech recognition errors due to the noises that occur when traveling in a car, and the errors degrade the accuracy of utterance classification. There are many methods for reducing the number of speech recognition errors by modifying the inside of a speech recognizer. However, application developers cannot apply these methods to cloud ASRs because they cannot customize the ASRs. In this paper, we propose a system for improving the accuracy of utterance classification by modifying both speech-signal inputs to a cloud ASR and recognized-sentence outputs from an ASR. First, our system performs speech enhancement on a user's utterance and then sends both enhanced and non-enhanced speech signals to a cloud ASR. Speech recognition results from both speech signals are merged to reduce the number of recognition errors. Second, to reduce that of utterance classification errors, we propose a data augmentation method, which we call “optimal doping,” where not only accurate transcriptions but also error-prone recognized sentences are added to training data. An evaluation with real user utterances spoken to car navigation products showed that our system reduces the number of utterance classification errors by 54% from a baseline condition. Finally, we propose a semi-automatic upgrading approach for classifiers to benefit from the improved performance of cloud ASRs.
Harmonic-Based Robust Voice Activity Detection for Enhanced Low SNR Noisy Speech Recognition System
Po-Yi SHIH Po-Chuan LIN Jhing-Fa WANG

PAPER-Speech and Hearing

Vol:
E99-A No:11
Page(s):
1928-1936
This paper describes a novel harmonic-based robust voice activity detection (H-RVAD) method with harmonic spectral local peak (HSLP) feature. HSLP is extracted by spectral amplitude analysis between the adjacent formants, and such characteristic can be used to identify and verify audio stream containing meaningful human speech accurately in low SNR environment. And, an enhanced low SNR noisy speech recognition system framework with wakeup module, speech recognition module and confirmation module is proposed. Users can determine or reject the system feedback while a recognition result was given in the framework, to prevent any chance that the voiced noise misleads the recognition result. The H-RVAD method is evaluated by the AURORA2 corpus in eight types of noise and three SNR levels and increased overall average performance from 4% to 20%. In home noise, the performance of H-RVAD method can be performed from 4% to 14% sentence recognition rate in average.
MTF-Based Kalman Filtering with Linear Prediction for Power Envelope Restoration in Noisy Reverberant Environments
Yang LIU Shota MORITA Masashi UNOKI

PAPER-Digital Signal Processing

Vol:
E99-A No:2
Page(s):
560-569
This paper proposes a method based on modulation transfer function (MTF) to restore the power envelope of noisy reverberant speech by using a Kalman filter with linear prediction (LP). Its advantage is that it can simultaneously suppress the effects of noise and reverberation by restoring the smeared MTF without measuring room impulse responses. This scheme has two processes: power envelope subtraction and power envelope inverse filtering. In the subtraction process, the statistical properties of observation noise and driving noise for power envelope are investigated for the criteria of the Kalman filter which requires noise to be white and Gaussian. Furthermore, LP coefficients drastically affect the Kalman filter performance, and a method is developed for deriving LP coefficients from noisy reverberant speech. In the dereverberation process, an inverse filtering method is applied to remove the effects of reverberation. Objective experiments were conducted under various noisy reverberant conditions to evaluate how well the proposed Kalman filtering method based on MTF improves the signal-to-error ratio (SER) and correlation between restored power envelopes compared with conventional methods. Results showed that the proposed Kalman filtering method based on MTF can improve SER and correlation more than conventional methods.
Contour-Based Binary Image Orientation Detection by Orientation Context and Roulette Distance
Jian ZHOU Takafumi MATSUMARU

PAPER-Image

Vol:
E99-A No:2
Page(s):
621-633
This paper proposes a novel technology to detect the orientation of an image relying on its contour which is noised to varying degrees. For the image orientation detection, most methods regard to the landscape image and the image taken of a single object. In these cases, the contours of these images are supposed to be immune to the noise. This paper focuses on the the contour noised after image segmentation. A polar orientation descriptor Orientation Context is viewed as a feature to describe the coarse distribution of the contour points. This descriptor is verified to be independent of translation, isotropic scaling, and rotation transformation by theory and experiment. The relative orientation depends on the minimum distance Roulette Distance between the descriptor of a template image and that of a test image. The proposed method is capable of detecting the direction on the interval from 0 to 359 degrees which is wider than the former contour-based means (Distance Phase [1], from 0 to 179 degrees). What's more, the results of experiments show that not only the normal binary image (Noise-0, Accuracy-1: 84.8%) (defined later) achieves more accurate orientation but also the binary image with slight contour noise (Noise-1, Accuracy-1: 73.5%) could obtain more precise orientation compared to Distance Phase (Noise-0, Accuracy-1: 56.3%; Noise-1, Accuracy-1: 27.5%). Although the proposed method (O(op2)) takes more time to detect the orientation than Distance Phase (O(st)), it could be realized including the preprocessing in real time test with a frame rate of 30.
On Improving the Performance of a Speech Model-Based Blind Reverberation Time Estimation in Noisy Environments
Tung-chin LEE Young-cheol PARK Dae-hee YOUN

LETTER-Measurement Technology

Vol:
E97-A No:12
Page(s):
2688-2692
This paper proposes a method of improving the performance of blind reverberation time (RT) estimation in noisy environments. RT estimation is conducted using a maximum likelihood (ML) method based on the autocorrelation function of the linear predictive residual signal. To reduce the effect of environmental noise, a noise reduction technique is applied to the noisy speech signal. In addition, a frequency coefficient selection is performed to eliminate signal components with low signal-to-noise ratio (SNR). Experimental results confirm that the proposed method improves the accuracy of RT measures, particularly when the speech signal is corrupted by a colored noise with a narrow bandwidth.
Binaural Sound Source Localization in Noisy Reverberant Environments Based on Equalization-Cancellation Theory
Thanh-Duc CHAU Junfeng LI Masato AKAGI

PAPER-Engineering Acoustics

Vol:
E97-A No:10
Page(s):
2011-2020
Sound source localization (SSL), with a binaural input in practical environments, is a challenging task due to the effects of noise and reverberation. In psychoacoustic research field, one of the theories to explain the mechanism of human perception in such environments is the well-known equalization-cancellation (EC) model. Motivated by the EC theory, this paper investigates a binaural SSL method by integrating EC procedures into a beamforming technique. The principle idea is that the EC procedures are first utilized to eliminate the sound signal component at each candidate direction respectively; direction of sound source is then determined as the direction at which the residual energy is minimal. The EC procedures applied in the proposed method differ from those in traditional EC models, in which the interference signals in rooms are accounted in E and C operations based on limited prior known information. Experimental results demonstrate that our proposed method outperforms the traditional SSL algorithms in the presence of noise and reverberation simultaneously.
An 8-Bit 100-kS/s CMOS Single-Ended SA ADC for 88 Point EEG/MEG Acquisition System
Ji-Hun EO Yeon-Ho JEONG Young-Chan JANG

PAPER

Vol:
E96-A No:2
Page(s):
453-458
An 8-bit 100-kS/s successive approximation (SA) analog-to-digital converter (ADC) is proposed for measuring EEG and MEG signals in an 88 point. The architectures of a SA ADC with a single-ended analog input and a split-capacitor-based digital-to-analog converter (SC-DAC) are used to reduce the power consumption and chip area of the entire ADC. The proposed SA ADC uses a time-domain comparator that has an input offset self-calibration circuit. It also includes a serial output interface to support a daisy channel that reduces the number of channels for the multi-point sensor interface. It is designed by using a 0.35-µm 1-poly 6-metal CMOS process with a 3.3 V supply to implement together with a conventional analog circuit such as a low-noise-amplifier. The measured DNL and INL of the SA ADC are +0.63/-0.46 and +0.46/-0.51 LSB, respectively. The SNDR is 48.39 dB for a 1.11 kHz analog input signal at a sampling rate of 100 kS/s. The power consumption and core area are 38.71 µW and 0.059 mm2, respectively.
Simple Nonbinary Coding Strategy for Very Noisy Relay Channels
Puripong SUTHISOPAPAN Kenta KASAI Anupap MEESOMBOON Virasit IMTAWIL Kohichi SAKANIWA

PAPER-Coding Theory

Vol:
E95-A No:12
Page(s):
2122-2129
From an information-theoretic point of view, it is well known that the capacity of relay channels comprising of three terminals is much greater than that of two terminal direct channels especially for low SNR region. Previously invented relay coding strategies have not been designed to achieve this relaying gain occurring in the low SNR region. In this paper, we propose a new simple coding strategy for a relay channel with low SNR or, equivalently, for a very noisy relay channel. The multiplicative repetition is utilized to design this simple coding strategy. We claim that the proposed strategy is simple since the destination and the relay can decode with almost the same computational complexity by sharing the same structure of decoder. An appropriate static power allocation which yields the maximum throughput close to the optimal one in low SNRs is also suggested. Under practical constraints such as equal time-sharing etc., the asymptotic performance of this simple strategy is within 0.5 dB from the achievable rate of a relay channel. Furthermore, the performance at few thousand bits enjoys a relaying gain by approximately 1 dB.
Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach
Hansjorg HOFMANN Sakriani SAKTI Chiori HORI Hideki KASHIOKA Satoshi NAKAMURA Wolfgang MINKER

PAPER-Speech and Hearing

Vol:
E95-D No:8
Page(s):
2084-2093
The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.
An Efficient Wide-Baseline Dense Matching Descriptor
Yanli WAN Zhenjiang MIAO Zhen TANG Lili WAN Zhe WANG

LETTER-Image Recognition, Computer Vision

Vol:
E95-D No:7
Page(s):
2021-2024
This letter proposes an efficient local descriptor for wide-baseline dense matching. It improves the existing Daisy descriptor by combining intensity-based Haar wavelet response with a new color-based ratio model. The color ratio model is invariant to changes of viewing direction, object geometry, and the direction, intensity and spectral power distribution of the illumination. The experiments show that our descriptor has high discriminative power and robustness.
Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions
Longbiao WANG Kazue MINAMI Kazumasa YAMAMOTO Seiichi NAKAGAWA

PAPER-Speaker Recognition

Vol:
E93-D No:9
Page(s):
2397-2406
In this paper, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. In our previous study, a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech was proposed, and the performance of the combination of the phase information and MFCCs was remarkably better than that of MFCCs. In this paper, we evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. Spectral subtraction, a method skipping frames with low energy/Signal-to-Noise (SN) and noisy speech training models are used to analyze the effect of the phase information and MFCCs in noisy conditions. The NTT database and the JNAS (Japanese Newspaper Article Sentences) database added with stationary/non-stationary noise were used to evaluate our proposed method. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By deleting unreliable frames (frames having low energy/SN), the speaker identification performance was improved significantly. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.
Speech Recognition under Multiple Noise Environment Based on Multi-Mixture HMM and Weight Optimization by the Aspect Model
Seong-Jun HAHM Yuichi OHKAWA Masashi ITO Motoyuki SUZUKI Akinori ITO Shozo MAKINO

PAPER-Robust Speech Recognition

Vol:
E93-D No:9
Page(s):
2407-2416
In this paper, we propose an acoustic model that is robust to multiple noise environments, as well as a method for adapting the acoustic model to an environment to improve the model. The model is called "the multi-mixture model," which is based on a mixture of different HMMs each of which is trained using speech under different noise conditions. Speech recognition experiments showed that the proposed model performs better than the conventional multi-condition model. The method for adaptation is based on the aspect model, which is a "mixture-of-mixture" model. To realize adaptation using extremely small amount of adaptation data (i.e., a few seconds), we train a small number of mixture models, which can be interpreted as models for "clusters" of noise environments. Then, the models are mixed using weights, which are determined according to the adaptation data. The experimental results showed that the adaptation based on the aspect model improved the word accuracy in a heavy noise environment and showed no performance deterioration for all noise conditions, while the conventional methods either did not improve the performance or showed both improvement and degradation of recognition performance according to noise conditions.
A Novel SNR Estimation Technique Associated with Hybrid ARQ
Qingchun CHEN Pingzhi FAN

PAPER-Communication Theory and Signals

Vol:
E92-A No:11
Page(s):
2895-2909
By using multiple repeated signal replicas to formulate the accumulative observed noisy signal sequence (AONSS) or the differential observed noisy signal sequence (DONSS) in the hybrid ARQ system, a novel data-aided maximum likelihood (DA ML) SNR estimation and a blind ML SNR estimation technique are proposed for the AWGN channel. It is revealed that the conventional DA ML estimate is a special case of the novel DA ML estimate, and both the proposed DA ML and the proposed blind ML SNR estimation techniques can offer satisfactory SNR estimation without introducing significant additional complexity to the existing hybrid ARQ scheme. Based on the AONSS, both the generalized deterministic and the random Cramer-Rao lower bounds (GCRLBs), which include the traditional Cramer-Rao lower bounds (CRLBs) as special cases, are also derived. Finally, the applicability of the proposed SNR estimation techniques based on the AONSS and the DONSS are validated through numerical analysis and simulation results.
Influence of Lombard Effect: Accuracy Analysis of Simulation-Based Assessments of Noisy Speech Recognition Systems for Various Recognition Conditions
Tetsuji OGAWA Tetsunori KOBAYASHI

PAPER-Speech and Hearing

Vol:
E92-D No:11
Page(s):
2244-2252
The accuracy of simulation-based assessments of speech recognition systems under noisy conditions is investigated with a focus on the influence of the Lombard effect on the speech recognition performances. This investigation was carried out under various recognition conditions of different sound pressure levels of ambient noise, for different recognition tasks, such as continuous speech recognition and spoken word recognition, and using different recognition systems, i.e., systems with and without adaptation of the acoustic models to ambient noise. Experimental results showed that accurate simulation was not always achieved when dry sources with neutral talking style were used, but it could be achieved if the dry sources that include the influence of the Lombard effect were used; the simulation in the latter case is accurate, irrespective of the recognition conditions.
Sample-Adaptive Product Quantizers with Affine Index Assignments for Noisy Channels
Dong Sik KIM Youngcheol PARK

PAPER-Fundamental Theories for Communications

Vol:
E92-B No:10
Page(s):
3084-3093
When we design a robust vector quantizer (VQ) for noisy channels, an appropriate index assignment function should be contrived to minimize the channel-error effect. For relatively high rates, the complexity for finding an optimal index assignment function is too high to be implemented. To overcome such a problem, we use a structurally constrained VQ, which is called the sample-adaptive product quantizer (SAPQ) [12], for low complexities of quantization and index assignment. The product quantizer (PQ) and its variation SAPQ [13], which are based on the scalar quantizer (SQ) and thus belong to a class of the binary lattice VQ [16], have inherent error resilience even though the conventional affine index assignment functions, such as the natural binary code, are employed. The error resilience of SAPQ is observed in a weak sense through worst-case bounds. Using SAPQ for noisy channels is useful especially for high rates, e.g., > 1 bit/sample, and it is numerically shown that the channel-limit performance of SAPQ is comparable to that of the best codebook permutation of binary switching algorithm (BSA) [23]. Further, the PQ or SAPQ codebook with an affine index assignment function is used for the initial guess of the conventional clustering algorithm, and it is shown that the performance of the best BSA can be easily achieved.

1-20hit(44hit)

Keyword Search Result

[Keyword] ISY(44hit)

Unbiased Pseudo-Labeling for Learning with Noisy Labels

Sample Selection Approach with Number of False Predictions for Learning with Noisy Labels

Consistency Regularization on Clean Samples for Learning with Noisy Labels

Noisy Localization Annotation Refinement for Object Detection

A Game-Theoretic Approach for Community Detection in Signed Networks

In-Vehicle Voice Interface with Improved Utterance Classification Accuracy Using Off-the-Shelf Cloud Speech Recognizer

Harmonic-Based Robust Voice Activity Detection for Enhanced Low SNR Noisy Speech Recognition System

MTF-Based Kalman Filtering with Linear Prediction for Power Envelope Restoration in Noisy Reverberant Environments

Contour-Based Binary Image Orientation Detection by Orientation Context and Roulette Distance

On Improving the Performance of a Speech Model-Based Blind Reverberation Time Estimation in Noisy Environments

Binaural Sound Source Localization in Noisy Reverberant Environments Based on Equalization-Cancellation Theory

An 8-Bit 100-kS/s CMOS Single-Ended SA ADC for 88 Point EEG/MEG Acquisition System

Simple Nonbinary Coding Strategy for Very Noisy Relay Channels

Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

An Efficient Wide-Baseline Dense Matching Descriptor

Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions

Speech Recognition under Multiple Noise Environment Based on Multi-Mixture HMM and Weight Optimization by the Aspect Model

A Novel SNR Estimation Technique Associated with Hybrid ARQ

Influence of Lombard Effect: Accuracy Analysis of Simulation-Based Assessments of Noisy Speech Recognition Systems for Various Recognition Conditions

Sample-Adaptive Product Quantizers with Affine Index Assignments for Noisy Channels

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles