Asifullah KHAN Syed Fahad TAHIR Tae-Sun CHOI
We present a novel approach to developing Machine Learning (ML) based decoding models for extracting a watermark in the presence of attacks. Statistical characterization of the components of various frequency bands is exploited to allow blind extraction of the watermark. Experimental results show that the proposed ML based decoding scheme can adapt to suit the watermark application by learning the alterations in the feature space incurred by the attack employed.
Yosuke TATEKURA Takeshi WATANABE
A robust multichannel sound reproduction system that utilizes the relationship between the width of the actual control area and the control frequency of the control points is proposed. The reproduction accuracy of a conventional sound reproduction system is reduced by room environment variations when fixed inverse filter coefficients are used. This tendency becomes more significant when control points are arranged more closely. To resolve this problem, the frequency control band at every control point is switched to avoid degrading the reproduced sound in low frequencies, so the pass band range of the control points at both ears is only high-range. That of the other control points is the entire control range. Numerical simulation with real environmental data showed that improvement of the reproduction accuracy is about 6.1 dB on average, even with a temperature fluctuation of 5C as an environmental variation in the listening room.
Suehiro SHIMAUCHI Yoichi HANEDA Akitoshi KATAOKA
We propose a new robust frequency domain acoustic echo cancellation filter that employs a normalized residual echo enhancement. By interpreting the conventional robust step-size control approaches as a statistical-model-based residual echo enhancement problem, the optimal step-size introduced in the most of conventional approaches is regarded as optimal only on the assumption that both the residual echo and the outlier in the error output signal are described by Gaussian distributions. However, the Gaussian-Gaussian mixture assumption does not always hold well, especially when both the residual echo and the outlier are speech signals (known as a double-talk situation). The proposed filtering scheme is based on the Gaussian-Laplacian mixture assumption for the signals normalized by the reference input signal amplitude. By comparing the performances of the proposed and conventional approaches through the simulations, we show that the Gaussian-Laplacian mixture assumption for the normalized signals can provide a better control scheme for the acoustic echo cancellation.
Chirawat KOTCHASARN Poompat SAENGUDOMLERT
We investigate the problem of joint transmitter and receiver power allocation with the minimax mean square error (MSE) criterion for uplink transmissions in a multi-carrier code division multiple access (MC-CDMA) system. The objective of power allocation is to minimize the maximum MSE among all users each of which has limited transmit power. This problem is a nonlinear optimization problem. Using the Lagrange multiplier method, we derive the Karush-Kuhn-Tucker (KKT) conditions which are necessary for a power allocation to be optimal. Numerical results indicate that, compared to the minimum total MSE criterion, the minimax MSE criterion yields a higher total MSE but provides a fairer treatment across the users. The advantages of the minimax MSE criterion are more evident when we consider the bit error rate (BER) estimates. Numerical results show that the minimax MSE criterion yields a lower maximum BER and a lower average BER. We also observe that, with the minimax MSE criterion, some users do not transmit at full power. For comparison, with the minimum total MSE criterion, all users transmit at full power. In addition, we investigate robust joint transmitter and receiver power allocation where the channel state information (CSI) is not perfect. The CSI error is assumed to be unknown but bounded by a deterministic value. This problem is formulated as a semidefinite programming (SDP) problem with bilinear matrix inequality (BMI) constraints. Numerical results show that, with imperfect CSI, the minimax MSE criterion also outperforms the minimum total MSE criterion in terms of the maximum and average BERs.
Kyung-Su KIM Hae-Yeoun LEE Dong-Hyuck IM Heung-Kyu LEE
Commercial markets employ digital right management (DRM) systems to protect valuable high-definition (HD) quality videos. DRM system uses watermarking to provide copyright protection and ownership authentication of multimedia contents. We propose a real-time video watermarking scheme for HD video in the uncompressed domain. Especially, our approach is in aspect of practical perspectives to satisfy perceptual quality, real-time processing, and robustness requirements. We simplify and optimize human visual system mask for real-time performance and also apply dithering technique for invisibility. Extensive experiments are performed to prove that the proposed scheme satisfies the invisibility, real-time processing, and robustness requirements against video processing attacks. We concentrate upon video processing attacks that commonly occur in HD quality videos to display on portable devices. These attacks include not only scaling and low bit-rate encoding, but also malicious attacks such as format conversion and frame rate change.
This paper presents an algorithm for the robust watermarking of 3D polygonal mesh models. The proposed algorithm embeds the watermark into a 2D image extracted from the 3D model, rather than directly embedding it into 3D geometry. The proposed embedding domain, i.e., the 2D image, is devised to be robust against the attacks like mesh simplification which severely modifies the vertices and connectivity while preserving the appearance of the model. The watermark-embedded model is obtained by using a simple vertex perturbation algorithm without iterative optimization. Two exemplary watermark applications using the proposed methods are also presented: one is to embed several bits into 3D models and the other is to detect only the existence of a watermark. The experimental results show that the proposed algorithm is robust against similarity transform, mesh simplification, additive Gaussian noise, quantization of vertex coordinates and mesh smoothing, and that its computational complexity is lower than that of the conventional methods.
Seungwoo CHUN Yoshihiro HAYAKAWA Koji NAKAJIMA
The visual inspection of defects in products is heavily dependent on human experience and instinct. In this situation, it is difficult to reduce the production costs and to shorten the inspection time and hence the total process time. Consequently people involved in this area desire an automatic inspection system. In this paper, we propose a hardware neural network, which is expected to provide high-speed operation for automatic inspection of products. Since neural networks can learn, this is a suitable method for self-adjustment of criteria for classification. To achieve high-speed operation, we use parallel and pipelining techniques. Furthermore, we use a piecewise linear function instead of a conventional activation function in order to save hardware resources. Consequently, our proposed hardware neural network achieved 6GCPS and 2GCUPS, which in our test sample proved to be sufficiently fast.
Fawnizu Azmadi HUSSIN Tomokazu YONEDA Alex ORAILOLU Hideo FUJIWARA
This paper proposes a test methodology for core-based testing of System-on-Chips by utilizing the functional bus as a test access mechanism. The functional bus is used as a transportation channel for the test stimuli and responses from a tester to the cores under test (CUT). To enable test concurrency, local test buffers are added to all CUTs. In order to limit the buffer area overhead while minimizing the test application time, we propose a packet-based scheduling algorithm called PAcket Set Scheduling (PASS), which finds the complete packet delivery schedule under a given power constraint. The utilization of test packets, consisting of a small number of bits of test data, for test data delivery allow an efficient sharing of bus bandwidth with the help of an effective buffer-based test architecture. The experimental results show that the methodology is highly effective, especially for smaller bus widths, compared to previous approaches that do not use the functional bus.
Keiichi FUNAKI Tatsuhiko KINJO
Complex speech analysis for an analytic speech signal can accurately estimate the spectrum in low frequencies since the analytic signal provides spectrum only over positive frequencies. The remarkable feature makes it possible to realize more accurate F0 estimation using complex residual signal extracted by complex-valued speech analysis. We have already proposed F0 estimation using complex LPC residual, in which the autocorrelation function weighted by AMDF was adopted as the criterion. The method adopted MMSE-based complex LPC analysis and it has been reported that it can estimate more accurate F0 for IRS filtered speech corrupted by white Gauss noise although it can not work better for the IRS filtered speech corrupted by pink noise. In this paper, robust complex speech analysis based on ELS (Extended Least Square) method is introduced in order to overcome the drawback. The experimental results for additive white Gauss or pink noise demonstrate that the proposed algorithm based on robust ELS-based complex AR analysis can perform better than other methods.
Xin XU Noboru HAYASAKA Yoshikazu MIYANAGA
This paper proposes a new algorithm named Adaptive Running Spectrum Filtering (ARSF) to restore the amplitude spectra of speech corrupted by additive noises. Based on the pre-hand noise estimation, adaptive filtering is used in speech modulation spectra according to the noise conditions. The periodic structures in the amplitude spectra are kept against noise distortion. Since the amplitude spectral structures contain the information of fundamental frequency, which is the inverse of pitch period, ARSF algorithm is added into robust pitch detection to increase the accuracy. Compared with the conventional methods, experimental results show that the proposed method significantly improves the robustness of pitch detection against noise conditions with several types and SNRs.
In this paper, signal processing techniques which can be applied to automatic speech recognition to improve its robustness are reviewed. The choice of signal processing techniques is strongly dependent on the scenario of the applications. The analysis of scenario and the choice of suitable signal processing techniques are shown through two examples.
Longbiao WANG Seiichi NAKAGAWA Norihide KITAOKA
In a distant-talking environment, the length of channel impulse response is longer than the short-term spectral analysis window. Conventional short-term spectrum based Cepstral Mean Normalization (CMN) is therefore, not effective under these conditions. In this paper, we propose a robust speech recognition method by combining a short-term spectrum based CMN with a long-term one. We assume that a static speech segment (such as a vowel, for example) affected by reverberation, can be modeled by a long-term cepstral analysis. Thus, the effect of long reverberation on a static speech segment may be compensated by the long-term spectrum based CMN. The cepstral distance of neighboring frames is used to discriminate the static speech segment (long-term spectrum) and the non-static speech segment (short-term spectrum). The cepstra of the static and non-static speech segments are normalized by the corresponding cepstral means. In a previous study, we proposed an environmentally robust speech recognition method based on Position-Dependent CMN (PDCMN) to compensate for channel distortion depending on speaker position, and which is more efficient than conventional CMN. In this paper, the concept of combining short-term and long-term spectrum based CMN is extended to PDCMN. We call this Variable Term spectrum based PDCMN (VT-PDCMN). Since PDCMN/VT-PDCMN cannot normalize speaker variations because a position-dependent cepstral mean contains the average speaker characteristics over all speakers, we also combine PDCMN/VT-PDCMN with conventional CMN in this study. We conducted the experiments based on our proposed method using limited vocabulary (100 words) distant-talking isolated word recognition in a real environment. The proposed method achieved a relative error reduction rate of 60.9% over the conventional short-term spectrum based CMN and 30.6% over the short-term spectrum based PDCMN.
Satoshi KOBASHIKAWA Satoshi TAKAHASHI
Users require speech recognition systems that offer rapid response and high accuracy concurrently. Speech recognition accuracy is degraded by additive noise, imposed by ambient noise, and convolutional noise, created by space transfer characteristics, especially in distant talking situations. Against each type of noise, existing model adaptation techniques achieve robustness by using HMM-composition and CMN (cepstral mean normalization). Since they need an additive noise sample as well as a user speech sample to generate the models required, they can not achieve rapid response, though it may be possible to catch just the additive noise in a previous step. In the previous step, the technique proposed herein uses just the additive noise to generate an adapted and normalized model against both types of noise. When the user's speech sample is captured, only online-CMN need be performed to start the recognition processing, so the technique offers rapid response. In addition, to cover the unpredictable S/N values possible in real applications, the technique creates several S/N HMMs. Simulations using artificial speech data show that the proposed technique increased the character correct rate by 11.62% compared to CMN.
Nari TANABE Toshihiro FURUKAWA Shigeo TSUJII
We propose a noise suppression algorithm with the Kalman filter theory. The algorithm aims to achieve robust noise suppression for the additive white and colored disturbance from the canonical state space models with (i) a state equation composed of the speech signal and (ii) an observation equation composed of the speech signal and additive noise. The remarkable features of the proposed algorithm are (1) applied to adaptive white and colored noises where the additive colored noise uses babble noise, (2) realization of high performance noise suppression without sacrificing high quality of the speech signal despite simple noise suppression using only the Kalman filter algorithm, while many conventional methods based on the Kalman filter theory usually perform the noise suppression using the parameter estimation algorithm of AR (auto-regressive) system and the Kalman filter algorithm. We show the effectiveness of the proposed method, which utilizes the Kalman filter theory for the proposed canonical state space model with the colored driving source, using numerical results and subjective evaluation results.
Chunsheng HUA Qian CHEN Haiyuan WU Toshikazu WADA
This paper presents an RK-means clustering algorithm which is developed for reliable data grouping by introducing a new reliability evaluation to the K-means clustering algorithm. The conventional K-means clustering algorithm has two shortfalls: 1) the clustering result will become unreliable if the assumed number of the clusters is incorrect; 2) during the update of a cluster center, all the data points belong to that cluster are used equally without considering how distant they are to the cluster center. In this paper, we introduce a new reliability evaluation to K-means clustering algorithm by considering the triangular relationship among each data point and its two nearest cluster centers. We applied the proposed algorithm to track objects in video sequence and confirmed its effectiveness and advantages.
Ilmu BYUN Hae Gwang HWANG Young Jin SANG Kwang Soon KIM
Various space time code (STC) designs have been proposed to obtain full diversity at full rate in multiple-input multiple-output (MIMO) channels for uncoded systems. However, commercial wireless systems typically employ powerful channel codes such as turbo codes and low density parity check (LDPC) codes together with an STC. For these applications, an STC optimized for uncoded systems may not provide the best performance. In this paper, an STC with relatively good performance over a wide range of code rates is proposed. Simulation results show that the performance of the proposed robust STC is very close to the best performance of the SM and the Golden code in various code rates.
Shouhei KIDERA Takuya SAKAMOTO Toru SATO
Target shape estimation with UWB pulse radars is a promising imaging technique for household robots. We have already proposed a fast imaging algorithm, SEABED, that is based on a reversible transform BST (Boundary Scattering Transform) between the received signals and the target shape. However, the target image obtained by SEABED deteriorates in a noisy environment because it utilizes a derivative of received data. In this paper, we propose a robust imaging method with an envelope of circles. We clarify by numerical simulation that the proposed method can realize a level of robust and fast imaging that cannot be achieved by the original SEABED.
Shouhei KIDERA Takuya SAKAMOTO Toru SATO
UWB pulse radars enable us to measure a target location with high range-resolution, and so are applicable for measurement systems for robots and automobile. We have already proposed a robust and fast imaging algorithm with an envelope of circles, which is suitable for these applications. In this method, we determine time delays from received signals with the matched filter for a transmitted waveform. However, scattered waveforms are different from transmitted one depending on the target shape. Therefore, the resolution of the target edges deteriorates due to these waveform distortions. In this paper, a high-resolution imaging algorithm for convex targets is proposed by iteration of the shape and waveform estimation. We show application examples with numerical simulations and experiments, and confirm its capability to detect edges of an object.
To design a controller with block-diagonal structure for trajectory sensitivity minimization, we propose a method based on LMI. In order to reduce the trajectory sensitivity, linear quadratic regulator theory is adopted, and this is solved using LMI optimization technique.
A new quadruple watermarking scheme of digital images against geometrical attacks is proposed in this letter. We treat the center and the four vertexes of the original image as the reference points and embed the same quadruple watermarks by means of polar coordinates, which is geometrically invariant. The center of an image is assumed to not to be removed after rotating, scaling and local distortions according to the general practical image processing. In the watermark extraction process, the vertexes of the image are found by a searching method. Thus watermark synchronization is obtained. Experimental results show that the scheme is robust to the geometrical distortions including rotation, scaling, cropping and local distortions.