1-9hit |
Ann-Chen CHANG Jhih-Chung CHANG
This letter deals with eigenspace-based (ESB) beamforming based on the decision-directed (DD) correction with robust capability. It has been shown that the output of the ESB beamformer includes the desired signal and noise under small pointing errors. In conjugation with DD and soft decision decoding scheme, the proposed approach can be used to form a robust DD-ESB beamformer without any specific training sequence. Computer simulations are provided to illustrate the effectiveness of the proposed beamformer.
Dongwen YING Masashi UNOKI Xugang LU Jianwu DANG
How to reduce noise with less speech distortion is a challenging issue for speech enhancement. We propose a novel approach for reducing noise with the cost of less speech distortion. A noise signal can generally be considered to consist of two components, a "white-like" component with a uniform energy distribution and a "color" component with a concentrated energy distribution in some frequency bands. An approach based on noise eigenspace projections is proposed to pack the color component into a subspace, named "noise subspace". This subspace is then removed from the eigenspace to reduce the color component. For the white-like component, a conventional enhancement algorithm is adopted as a complementary processor. We tested our algorithm on a speech enhancement task using speech data from the Texas Instruments and Massachusetts Institute of Technology (TIMIT) dataset and noise data from NOISEX-92. The experimental results show that the proposed algorithm efficiently reduces noise with little speech distortion. Objective and subjective evaluations confirmed that the proposed algorithm outperformed conventional enhancement algorithms.
Lina Tomokazu TAKAHASHI Ichiro IDE Hiroshi MURASE
We propose an appearance manifold with view-dependent covariance matrix for face recognition from video sequences in two learning frameworks: the supervised-learning and the incremental unsupervised-learning. The advantages of this method are, first, the appearance manifold with view-dependent covariance matrix model is robust to pose changes and is also noise invariant, since the embedded covariance matrices are calculated based on their poses in order to learn the samples' distributions along the manifold. Moreover, the proposed incremental unsupervised-learning framework is more realistic for real-world face recognition applications. It is obvious that it is difficult to collect large amounts of face sequences under complete poses (from left sideview to right sideview) for training. Here, an incremental unsupervised-learning framework allows us to train the system with the available initial sequences, and later update the system's knowledge incrementally every time an unlabelled sequence is input. In addition, we also integrate the appearance manifold with view-dependent covariance matrix model with a pose estimation system for improving the classification accuracy and easily detecting sequences with overlapped poses for merging process in the incremental unsupervised-learning framework. The experimental results showed that, in both frameworks, the proposed appearance manifold with view-dependent covariance matrix method could recognize faces from video sequences accurately.
Lina Tomokazu TAKAHASHI Ichiro IDE Hiroshi MURASE
We propose the construction of an appearance manifold with embedded view-dependent covariance matrix to recognize 3D objects which are influenced by geometric distortions and quality degradation effects. The appearance manifold is used to capture the pose variability, while the covariance matrix is used to learn the distribution of samples for gaining noise-invariance. However, since the appearance of an object in the captured image is different for every different pose, the covariance matrix value is also different for every pose position. Therefore, it is important to embed view-dependent covariance matrices in the manifold of an object. We propose two models of constructing an appearance manifold with view-dependent covariance matrix, called the View-dependent Covariance matrix by training-Point Interpolation (VCPI) and View-dependent Covariance matrix by Eigenvector Interpolation (VCEI) methods. Here, the embedded view-dependent covariance matrix of the VCPI method is obtained by interpolating every training-points from one pose to other training-points in a consecutive pose. Meanwhile, in the VCEI method, the embedded view-dependent covariance matrix is obtained by interpolating only the eigenvectors and eigenvalues without considering the correspondences of each training image. As it embeds the covariance matrix in manifold, our view-dependent covariance matrix methods are robust to any pose changes and are also noise invariant. Our main goal is to construct a robust and efficient manifold with embedded view-dependent covariance matrix for recognizing objects from images which are influenced with various degradation effects.
Takehito OGATA Joo Kooi TAN Seiji ISHIKAWA
This paper proposes an efficient technique for human motion recognition based on motion history images and an eigenspace technique. In recent years, human motion recognition has become one of the most popular research fields. It is expected to be applied in a security system, man-machine communication, and so on. In the proposed technique, we use two feature images and the eigenspace technique to realize high-speed recognition. An experiment was performed on recognizing six human motions and the results showed satisfactory performance of the technique.
The proposed DOA (Direction Of Arrival) estimation method by integrating the frequency array data generated from microphone pairs in an equilateral-triangular microphone array is extended here. The method uses four microphones located at the apices of regular tetrahedron to enable to estimate the elevation angle from the array plane as well. Furthermore, we introduce an idea for separate estimation of azimuth and elevation to reduce the computational loads.
In this paper, we propose a DOA (Direction Of Arrival) estimation method of speech signal using three microphones. The angular resolution of the method is almost uniform with respect to DOA. Our previous DOA estimation method using the frequency-domain array data for a pair of microphones achieves high precision estimation. However, its resolution degrades as the propagating direction being apart from the array broadside. In the method presented here, we utilize three microphones located at vertices of equilateral triangle and integrate the frequency-domain array data for three pairs of microphones. For the estimation scheme, the subspace analysis for the integrated frequency array data is proposed. Through both computer simulations and experiments in a real acoustical environment, we show the efficiency of the proposed method.
Fumihiko SAKAUE Takeshi SHAKUNAGA
The present paper reports a robust projection onto eigenspace that is based on iterative projection. The fundamental method proposed in Shakunaga and Sakaue and involves iterative analysis of relative residual and projection. The present paper refines the projection method by solving linear equations while taking noise ratio into account. The refinement improves both the efficiency and robustness of the projection. Experimental results indicate that the proposed method works well for various kinds of noise, including shadows, reflections and occlusions. The proposed method can be applied to a wide variety of computer vision problems, which include object/face recognition and image-based rendering.
In speech enhancement with adaptive microphone array, the voice activity detection (VAD) is indispensable for the adaptation control. Even though many VAD methods have been proposed as a pre-processor for speech recognition and compression, they can hardly discriminate nonstationary interferences which frequently exist in real environment. In this research, we propose a novel VAD method with array signal processing in the wavelet domain. In that domain we can integrate the temporal, spectral and spatial information to achieve robust voice activity discriminability for a nonstationary interference arriving from close direction of speech. The signals acquired by microphone array are at first decomposed into appropriate subbands using wavelet packet to extract its temporal and spectral features. Then directionality check and direction estimation on each subbands are executed to do VAD with respect to the spatial information. Computer simulation results for sound data demonstrate that the proposed method keeps its discriminability even for the interference arriving from close direction of speech.