The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] microphone(72hit)

1-20hit(72hit)

  • Spectra Restoration of Bone-Conducted Speech via Attention-Based Contextual Information and Spectro-Temporal Structure Constraint Open Access

    Changyan ZHENG  Tieyong CAO  Jibin YANG  Xiongwei ZHANG  Meng SUN  

     
    LETTER-Digital Signal Processing

      Vol:
    E102-A No:12
      Page(s):
    2001-2007

    Compared with acoustic microphone (AM) speech, bone-conducted microphone (BCM) speech is much immune to background noise, but suffers from severe loss of information due to the characteristics of the human-body transmission channel. In this letter, a new method for the speaker-dependent BCM speech enhancement is proposed, in which we focus our attention on the spectra restoration of the distorted speech. In order to better infer the missing components, an attention-based bidirectional Long Short-Term Memory (AB-BLSTM) is designed to optimize the use of contextual information to model the relationship between the spectra of BCM speech and its corresponding clean AM speech. Meanwhile, a structural error metric, Structural SIMilarity (SSIM) metric, originated from image processing is proposed to be the loss function, which provides the constraint of the spectro-temporal structures in recovering of the spectra. Experiments demonstrate that compared with approaches based on conventional DNN and mean square error (MSE), the proposed method can better recover the missing phonemes and obtain spectra with spectro-temporal structure more similar to the target one, which leads to great improvement on objective metrics.

  • Speech Quality Enhancement for In-Ear Microphone Based on Neural Network

    Hochong PARK  Yong-Shik SHIN  Seong-Hyeon SHIN  

     
    LETTER-Speech and Hearing

      Pubricized:
    2019/05/15
      Vol:
    E102-D No:8
      Page(s):
    1594-1597

    Speech captured by an in-ear microphone placed inside an occluded ear has a high signal-to-noise ratio; however, it has different sound characteristics compared to normal speech captured through air conduction. In this study, a method for blind speech quality enhancement is proposed that can convert speech captured by an in-ear microphone to one that resembles normal speech. The proposed method estimates an input-dependent enhancement function by using a neural network in the feature domain and enhances the captured speech via time-domain filtering. Subjective and objective evaluations confirm that the speech enhanced using our proposed method sounds more similar to normal speech than that enhanced using conventional equalizer-based methods.

  • Design and Analysis of First-Order Steerable Nonorthogonal Differential Microphone Arrays

    Qiang YU  Xiaoguang WU  Yaping BAO  

     
    LETTER-Engineering Acoustics

      Vol:
    E101-A No:10
      Page(s):
    1687-1692

    Differential microphone arrays have been widely used in hands-free communication systems because of their frequency-invariant beampatterns, high directivity factors and small apertures. Considering the position of acoustic source always moving within a certain range in real application, this letter proposes an approach to construct the steerable first-order differential beampattern by using four omnidirectional microphones arranged in a non-orthogonal circular geometry. The theoretical analysis and simulation results show beampattern constructed via this method achieves the same direction factor (DF) as traditional DMAs and higher white noise gain (WNG) within a certain angular range. The simulation results also show the proposed method applies to processing speech signal. In experiments, we show the effectiveness and small computation amount of the proposed method.

  • Integration of Spatial Cue-Based Noise Reduction and Speech Model-Based Source Restoration for Real Time Speech Enhancement

    Tomoko KAWASE  Kenta NIWA  Masakiyo FUJIMOTO  Kazunori KOBAYASHI  Shoko ARAKI  Tomohiro NAKATANI  

     
    PAPER-Digital Signal Processing

      Vol:
    E100-A No:5
      Page(s):
    1127-1136

    We propose a microphone array speech enhancement method that integrates spatial-cue-based source power spectral density (PSD) estimation and statistical speech model-based PSD estimation. The goal of this research was to clearly pick up target speech even in noisy environments such as crowded places, factories, and cars running at high speed. Beamforming with post-Wiener filtering is commonly used in many conventional studies on microphone-array noise reduction. For calculating a Wiener filter, speech/noise PSDs are essential, and they are estimated using spatial cues obtained from microphone observations. Assuming that the sound sources are sparse in the temporal-spatial domain, speech/noise PSDs may be estimated accurately. However, PSD estimation errors increase under circumstances beyond this assumption. In this study, we integrated speech models and PSD-estimation-in-beamspace method to correct speech/noise PSD estimation errors. The roughly estimated noise PSD was obtained frame-by-frame by analyzing spatial cues from array observations. By combining noise PSD with the statistical model of clean-speech, the relationships between the PSD of the observed signal and that of the target speech, hereafter called the observation model, could be described without pre-training. By exploiting Bayes' theorem, a Wiener filter is statistically generated from observation models. Experiments conducted to evaluate the proposed method showed that the signal-to-noise ratio and naturalness of the output speech signal were significantly better than that with conventional methods.

  • An Extension of MUSIC Exploiting Higher-Order Moments via Nonlinear Mapping

    Yuya SUGIMOTO  Shigeki MIYABE  Takeshi YAMADA  Shoji MAKINO  Biing-Hwang JUANG  

     
    PAPER-Engineering Acoustics

      Vol:
    E99-A No:6
      Page(s):
    1152-1162

    MUltiple SIgnal Classification (MUSIC) is a standard technique for direction of arrival (DOA) estimation with high resolution. However, MUSIC cannot estimate DOAs accurately in the case of underdetermined conditions, where the number of sources exceeds the number of microphones. To overcome this drawback, an extension of MUSIC using cumulants called 2q-MUSIC has been proposed, but this method greatly suffers from the variance of the statistics, given as the temporal mean of the observation process, and requires long observation. In this paper, we propose a new approach for extending MUSIC that exploits higher-order moments of the signal for the underdetermined DOA estimation with smaller variance. We propose an estimation algorithm that nonlinearly maps the observed signal onto a space with expanded dimensionality and conducts MUSIC-based correlation analysis in the expanded space. Since the dimensionality of the noise subspace is increased by the mapping, the proposed method enables the estimation of DOAs in the case of underdetermined conditions. Furthermore, we describe the class of mapping that allows us to analyze the higher-order moments of the observed signal in the original space. We compare 2q-MUSIC and the proposed method through an experiment assuming that the true number of sources is known as prior information to evaluate in terms of the bias-variance tradeoff of the statistics and computational complexity. The results clarify that the proposed method has advantages for both computational complexity and estimation accuracy in short-time analysis, i.e., the time duration of the analyzed data is short.

  • Integration of Multiple Microphone Arrays and Use of Sound Reflections for 3D Localization of Sound Sources

    Carlos T. ISHI  Jani EVEN  Norihiro HAGITA  

     
    PAPER

      Vol:
    E97-A No:9
      Page(s):
    1867-1874

    We proposed a method for estimating sound source positions in 3D space by integrating sound directions estimated by multiple microphone arrays and taking advantage of reflection information. Two types of sources with different directivity properties (human speech and loudspeaker speech) were evaluated for different positions and orientations. Experimental results showed the effectiveness of using reflection information, depending on the position and orientation of the sound sources relative to the array, walls, and the source type. The use of reflection information increased the source position detection rates by 10% on average and up to 60% for the best case.

  • Compressed Sampling and Source Localization of Miniature Microphone Array

    Qingyun WANG  Xinchun JI  Ruiyu LIANG  Li ZHAO  

     
    LETTER

      Vol:
    E97-A No:9
      Page(s):
    1902-1906

    In the traditional microphone array signal processing, the performance degrades rapidly when the array aperture decreases, which has been a barrier restricting its implementation in the small-scale acoustic system such as digital hearing aids. In this work a new compressed sampling method of miniature microphone array is proposed, which compresses information in the internal of ADC by means of mixture system of hardware circuit and software program in order to remove the redundancy of the different array element signals. The architecture of the method is developed using the Verilog language and has already been tested in the FPGA chip. Experiments of compressed sampling and reconstruction show the successful sparseness and reconstruction for speech sources. Owing to having avoided singularity problem of the correlation matrix of the miniature microphone array, when used in the direction of arrival (DOA) estimation in digital hearing aids, the proposed method has the advantage of higher resolution compared with the traditional GCC and MUSIC algorithms.

  • 3D Sound-Space Sensing Method Based on Numerous Symmetrically Arranged Microphones

    Shuichi SAKAMOTO  Satoshi HONGO  Yôiti SUZUKI  

     
    PAPER

      Vol:
    E97-A No:9
      Page(s):
    1893-1901

    Sensing and reproduction of precise sound-space information is important to realize highly realistic audio communications. This study was conducted to realize high-precision sensors of 3D sound-space information for transmission to distant places and for preservation of sound data for the future. Proposed method comprises a compact and spherical object with numerous microphones. Each recorded signal from multiple microphones that are uniformly distributed on the sphere is simply weighted and summed to synthesize signals to be presented to a listener's left and right ears. The calculated signals are presented binaurally via ordinary binaural systems such as headphones. Moreover, the weight can be changed according to a human's 3D head movement. A human's 3D head movement is well known to be a crucially important factor to facilitate human spatial hearing. For accurate spatial hearing, 3D sound-space information is acquired as accurately reflecting the listener's head movement. We named the proposed method SENZI (Symmetrical object with ENchased ZIllion microphones). The results of computer simulations demonstrate that our proposed SENZI outperforms a conventional method (binaural Ambisonics). It can sense 3D sound-space with high precision over a wide frequency range.

  • Sound Source Orientation Estimation Based on an Orientation-Extended Beamformer

    Hirofumi NAKAJIMA  Keiko KIKUCHI  Kazuhiro NAKADAI  Yutaka KANEDA  

     
    PAPER

      Vol:
    E97-A No:9
      Page(s):
    1875-1883

    This paper proposes a sound source orientation estimation method that is suitable for a distributed microphone arrangement. The proposed method is based on orientation-extended beamforming (OEBF), which has four features: (a) robustness against reverberations, (b) robustness against noises, (c) free arrangements of microphones and (d) feasibility for real-time processing. In terms of (a) and (c), since OEBF is based on a general propagation model using transfer functions (TFs) that include all propagation phenomena such as reflections and diffractions, OEBF causes no model errors for the propagation phenomena, and is applicable to arbitrary microphone arrangements. Regarding (b), OEBF overcomes noise effects by incorporating three additional processes (Amplitude extraction, time-frequency mask and histogram integration) that are also proposed in this paper. As for (d), OEBF is executable in real-time basis as the execution process is the same as usual beamforming processes. A numerical experiment was performed to confirm the theoretical validity of OEBF. The results showed that OEBF was able to estimate sound source positions and orientations very precisely. Practical experiments were carried out using a 96-channel microphone array in real environments. The results indicated that OEBF worked properly even under reverberant and noisy environments and the averaged estimation error was given only 4°.

  • Comparison of Output Devices for Augmented Audio Reality

    Kazuhiro KONDO  Naoya ANAZAWA  Yosuke KOBAYASHI  

     
    PAPER-Speech and Hearing

      Vol:
    E97-D No:8
      Page(s):
    2114-2123

    We compared two audio output devices for augmented audio reality applications. In these applications, we plan to use speech annotations on top of the actual ambient environment. Thus, it becomes essential that these audio output devices are able to deliver intelligible speech annotation along with transparent delivery of the environmental auditory scene. Two candidate devices were compared. The first output was the bone-conduction headphone, which can deliver speech signals by vibrating the skull, while normal hearing is left intact for surrounding noise since these headphones leave the ear canals open. The other is the binaural microphone/earphone combo, which is in a form factor similar to a regular earphone, but integrates a small microphone at the ear canal entry. The input from these microphones can be fed back to the earphones along with the annotation speech. We also compared these devices to normal hearing (i.e., without headphones or earphones) for reference. We compared the speech intelligibility when competing babble noise is simultaneously given from the surrounding environment. It was found that the binaural combo can generally deliver speech signals at comparable or higher intelligibility than the bone-conduction headphones. However, with the binaural combo, we found that the ear canal transfer characteristics were altered significantly by shutting the ear canals closed with the earphones. Accordingly, if we employed a compensation filter to account for this transfer function deviation, the resultant speech intelligibility was found to be significantly higher. However, both of these devices were found to be acceptable as audio output devices for augmented audio reality applications since both are able to deliver speech signals at high intelligibility even when a significant amount of competing noise is present. In fact, both of these speech output methods were able to deliver speech signals at higher intelligibility than natural speech, especially when the SNR was low.

  • Microphone Classification Using Canonical Correlation Analysis

    Jongwon SEOK  Keunsung BAE  

     
    LETTER-Multimedia Environment Technology

      Vol:
    E97-A No:4
      Page(s):
    1024-1026

    Canonical correlation analysis (CCA) is applied to extract features for microphone classification. We utilized the coherence between near-silence regions. Experimental results show the promise of canonical correlation features for microphone classification.

  • Effective Frame Selection for Blind Source Separation Based on Frequency Domain Independent Component Analysis

    Yusuke MIZUNO  Kazunobu KONDO  Takanori NISHINO  Norihide KITAOKA  Kazuya TAKEDA  

     
    PAPER-Engineering Acoustics

      Vol:
    E97-A No:3
      Page(s):
    784-791

    Blind source separation is a technique that can separate sound sources without such information as source location, the number of sources, and the utterance content. Multi-channel source separation using many microphones separates signals with high accuracy, even if there are many sources. However, these methods have extremely high computational complexity, which must be reduced. In this paper, we propose a computational complexity reduction method for blind source separation based on frequency domain independent component analysis (FDICA) and examine temporal data that are effective for source separation. A frame with many sound sources is effective for FDICA source separation. We assume that a frame with a low kurtosis has many sound sources and preferentially select such frames. In our proposed method, we used the log power spectrum and the kurtosis of the magnitude distribution of the observed data as selection criteria and conducted source separation experiments using speech signals from twelve speakers. We evaluated the separation performances by the signal-to-interference ratio (SIR) improvement score. From our results, the SIR improvement score was 24.3dB when all the frames were used, and 23.3dB when the 300 frames selected by our criteria were used. These results clarified that our proposed selection criteria based on kurtosis and magnitude is effective. Furthermore, we significantly reduced the computational complexity because it is proportional to the number of selected frames.

  • An Estimation Method of Sound Source Orientation Using Eigenspace Variation of Spatial Correlation Matrix

    Kenta NIWA  Yusuke HIOKA  Sumitaka SAKAUCHI  Ken'ichi FURUYA  Yoichi HANEDA  

     
    PAPER-Engineering Acoustics

      Vol:
    E96-A No:9
      Page(s):
    1831-1839

    A method to estimate sound source orientation in a reverberant room using a microphone array is proposed. We extend the conventional modeling of a room transfer function based on the image method in order to take into account the directivity of a sound source. With this extension, a transfer function between a sound source and a listener (or a microphone) is described by the superposition of transfer functions from each image source to the listener multiplied by the source directivity; thus, the sound source orientation can be estimated by analyzing how the image sources are distributed (power distribution of image sources) from observed signals. We applied eigenvalue analysis to the spatial correlation matrix of the microphone array observation to obtain the power distribution of image sources. Bsed on the assumption that the spatial correlation matrix for each set of source position and orientation is known a priori, the variation of the eigenspace can be modeled. By comparing the eigenspace of observed signals and that of pre-learned models, we estimated the sound source orientation. Through experiments using seven microphones, the sound source orientation was estimated with high accuracy by increasing the reverberation time of a room.

  • Multichannel Two-Stage Beamforming with Unconstrained Beamformer and Distortion Reduction

    Masahito TOGAMI  Yohei KAWAGUCHI  Yasunari OBUCHI  

     
    PAPER-Engineering Acoustics

      Vol:
    E96-A No:4
      Page(s):
    749-761

    This paper proposes a novel multichannel speech enhancement technique for reverberant rooms that is effective when noise sources are spatially stationary, such as a projector fan noise, an air-conditioner noise, and unwanted speech sources at the back of microphones. Speech enhancement performance of the conventional multichannel Wiener filter (MWF) degrades when the Signal-to-Noise Ratio (SNR) of the current microphone input signal changes from the noise-only period. Furthermore, the MWF structure is computationally inefficient, because the MWF updates the whole spatial beamformer periodically to track switching of the speakers (e.g. turn-taking). In contrast to the MWF, the proposed method reduces noise independently of the SNR. The proposed method has a novel two-stage structure, which reduces noise and distortion of the desired source signal in a cascade manner by using two different beamformers. The first beamformer focuses on noise reduction without any constraint on the desired source, which is insensitive to SNR variation. However, the output signal after the first beamformer is distorted. The second beamformer focuses on distortion reduction of the desired source signal. Theoretically, complete elimination of distortion is assured. Additionally, the proposed method has a computationally efficient structure optimized for spatially stationary noise reduction problems. The first beamformer is updated only when the speech enhancement system is initialized. Only the second beamformer is updated periodically to track switching of the active speaker. The experimental results indicate that the proposed method can reduce spatially stationary noise source signals effectively with less distortion of the desired source signal even in a reverberant conference room.

  • Two-Microphone Noise Reduction Using Spatial Information-Based Spectral Amplitude Estimation

    Kai LI  Yanmeng GUO  Qiang FU  Junfeng LI  Yonghong YAN  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:5
      Page(s):
    1454-1464

    Traditional two-microphone noise reduction algorithms to deal with highly nonstationary directional noises generally use the direction of arrival or phase difference information. The performance of these algorithms deteriorate when diffuse noises coexist with nonstationary directional noises in realistic adverse environments. In this paper, we present a two-channel noise reduction algorithm using a spatial information-based speech estimator and a spatial-information-controlled soft-decision noise estimator to improve the noise reduction performance in realistic non-stationary noisy environments. A target presence probability estimator based on Bayes rules using both phase difference and magnitude squared coherence is proposed for soft-decision of noise estimation, so that they can share complementary advantages when both directional noises and diffuse noises are present. Performances of the proposed two-microphone noise reduction algorithm are evaluated by noise reduction, log-spectral distance (LSD) and word recognition rate (WRR) of a distant-talking ASR system in a real room's noisy environment. Experimental results show that the proposed algorithm achieves better noises suppression without further distorting the desired signal components over the comparative dual-channel noise reduction algorithms.

  • A Single-Supply 84 dB DR Audio-Band ADC for Compact Digital Microphones

    Huy-Binh LE  Sang-Gug LEE  Seung-Tak RYU  

     
    PAPER-Electronic Circuits

      Vol:
    E95-C No:1
      Page(s):
    130-136

    A 20 kHz audio-band ADC with a single pair of power and ground pads is implemented for a digital electret microphone. Under the limited power/ground pad condition, the switching noise effect on the signal quality is estimated via post simulations with parasitic models. Performance degradation is minimized by time-domain noise isolation with sufficient time-spacing between the sampling edge and the output transition. The prototype ADC was implemented in a 0.18 µm CMOS process. It operates under a minimum supply voltage of 1.6 V with total current of 420 µA. Operating at 2.56 MHz clock frequency, it achieves 84 dB dynamic range and a 64 dB peak signal-to-(noise+distortion) ratio. The measured power supply rejection at a 100 mVpp 217 Hz square wave is -72 dB.

  • Active Noise Control System for Reducing MR Noise

    Masafumi KUMAMOTO  Masahiro KIDA  Ryotaro HIRAYAMA  Yoshinobu KAJIKAWA  Toru TANI  Yoshimasa KURUMI  

     
    PAPER-Engineering Acoustics

      Vol:
    E94-A No:7
      Page(s):
    1479-1486

    We propose an active noise control (ANC) system for reducing periodic noise generated in a high magnetic field such as noise generated from magnetic resonance imaging (MRI) devices (MR noise). The proposed ANC system utilizes optical microphones and piezoelectric loudspeakers, because specific acoustic equipment is required to overcome the high-field problem, and consists of a head-mounted structure to control noise near the user's ears and to compensate for the low output of the piezoelectric loudspeaker. Moreover, internal model control (IMC)-based feedback ANC is employed because the MR noise includes some periodic components and is predictable. Our experimental results demonstrate that the proposed ANC system (head-mounted structure) can significantly reduce MR noise by approximately 30 dB in a high field in an actual MRI room even if the imaging mode changes frequently.

  • Shaka: User Movement Estimation Considering Reliability, Power Saving, and Latency Using Mobile Phone

    Arei KOBAYASHI  Shigeki MURAMATSU  Daisuke KAMISAKA  Takafumi WATANABE  Atsunori MINAMIKAWA  Takeshi IWAMOTO  Hiroyuki YOKOYAMA  

     
    PAPER

      Vol:
    E94-D No:6
      Page(s):
    1153-1163

    This paper proposes a method for using an accelerometer, microphone, and GPS in a mobile phone to recognize the movement of the user. Past attempts at identifying the movement associated with riding on a bicycle, train, bus or car and common human movements like standing still, walking or running have had problems with poor accuracy due to factors such as sudden changes in vibration or times when the vibrations resembled those for other types of movement. Moreover, previous methods have had problems with has the problem of high power consumption because of the sensor processing load. The proposed method aims to avoid these problems by estimating the reliability of the inference result, and by combining two inference modes to decrease the power consumption. Field trials demonstrate that our method achieves 90% or better average accuracy for the seven types of movement listed above. Shaka's power saving functionality enables us to extend the battery life of a mobile phone to over 100 hours while our estimation algorithm is running in the background. Furthermore, this paper uses experimental results to show the trade-off between accuracy and latency when estimating user activity.

  • Blind Source Separation Using Dodecahedral Microphone Array under Reverberant Conditions

    Motoki OGASAWARA  Takanori NISHINO  Kazuya TAKEDA  

     
    PAPER-Engineering Acoustics

      Vol:
    E94-A No:3
      Page(s):
    897-906

    The separation and localization of sound source signals are important techniques for many applications, such as highly realistic communication and speech recognition systems. These systems are expected to work without such prior information as the number of sound sources and the environmental conditions. In this paper, we developed a dodecahedral microphone array and proposed a novel separation method with our developed device. This method refers to human sound localization cues and uses acoustical characteristics obtained by the shape of the dodecahedral microphone array. Moreover, this method includes an estimation method of the number of sound sources that can operate without prior information. The sound source separation performances were evaluated under simulated and actual reverberant conditions, and the results were compared with the conventional method. The experimental results showed that our separation performance outperformed the conventional method.

  • Improving Power Spectra Estimation in 2-Dimensional Areas Using Number of Active Sound Sources

    Yusuke HIOKA  Ken'ichi FURUYA  Yoichi HANEDA  Akitoshi KATAOKA  

     
    PAPER-Engineering Acoustics

      Vol:
    E94-A No:1
      Page(s):
    273-281

    An improvement of estimating sound power spectra located in a particular 2-dimensional area is proposed. We previously proposed a conventional method that estimates sound power spectra using multiple fixed beamformings in order to emphasize speech located in a particular 2-dimensional area. However, the method has one drawback that the number of areas where the active sound sources are located must be restricted. This restriction makes the method less effective when many noise source located in different areas are simultaneously active. In this paper, we reveal the cause of this restriction and determine the maximum number of areas for which the method is able to simultaneously estimate sound power spectra. Then we also introduce a procedure for investigating areas that include active sound sources to reduce the number of unknown power spectra to be estimated. The effectiveness of the proposed method is examined by experimental evaluation applied to sounds recorded in a practical environment.

1-20hit(72hit)