This letter introduces innovative VAD based on horizontal spectral entropy with long-span of time (HSELT) feature sets to improve mobile ASR performance in low signal-to-noise ratio (SNR) conditions. Since the signal characteristics of nonstationary noise change with time, we need long-term information of the noisy speech signal to define a more robust decision rule yielding high accuracy. We find that HSELT measures can horizontally enhance the transition between speech and non-speech segments. Based on this finding, we use the HSELT measures to achieve high accuracy for detecting speech signal form various stationary and nonstationary noises.
In recent years, the demand for low-power design has remained undiminished. In this paper, a pseudo power gating (SPG) structure using a normal logic cell is proposed to extend the power gating to an ultrafine grained region at the gate level. In the proposed method, the controlling value of a logic element is used to control the switching activity of modules computing other inputs of the element. For each element, there exists a submodule controlled by an input to the element. Power reduction is maximized by controlling the order of the submodule selection. A basic algorithm and a switching activity first algorithm have been developed to optimize the power. In this application, a steady maximum depth constraint is added to prevent the depth increase caused by the insertion of the control signal. In this work, various factors affecting the power consumption of library level circuits with the SPG are determined. In such factors, the occurrence of glitches increases the power consumption and a method to reduce the occurrence of glitches is proposed by considering the parity of inverters. The proposed SPG method was evaluated through the simulation of the netlist extracted from the layout using the VDEC Rohm 0.18 µm process. Experiments on ISCAS'85 benchmarks show that the reduction in total power consumption achieved is 13% on average with a 2.5% circuit delay degradation. Finally, the effectiveness of the proposed method under different primary input statistics is considered.
Conventional entropy measure is derived from full-band (range from 0 Hz to 4 kHz); however, it can not clearly describe the spectrum variability during voice-activity. Here we propose a novel concept of adaptive long-term sub-band entropy ( ALT-SubEnpy ) measure and combine it with a multi-thresholding scheme for voice activity detection. In detail, the ALT-SubEnpy measure developed with four part parameters of sub-entropy which uses different long-term spectral window length at each part. Consequently, the proposed ALT-SubEnpy -based algorithm recursively updates the four adaptive thresholds on each part. The proposed ALT-SubEnpy-based VAD method is shown to be an effective method while working at variable noise-level condition.
In this paper, we propose a novel voice activity detection (VAD) algorithm using global speech absence probability (GSAP) based on Teager energy (TE) for speech enhancement. The proposed method provides a better representation of GSAP, resulting in improved decision performance for speech and noise segments by the use of a TE operator which is employed to suppress the influence of noise signals. The performance of our approach is evaluated by objective tests under various environments, and it is found that the suggested method yields better results than conventional schemes.
A novel long-term sub-band entropy (LT-SubEntropy) measure, which uses improved long-term spectral analysis and sub-band entropy, is proposed for voice activity detection (VAD). Based on the measure, we can accurately exploit the inherent nature of the formant structure on speech spectrogram (the well-known as voiceprint). Results show that the proposed VAD is superior to existing standard VAD methods at low SNR levels, especially at variable-level noise.
Xin MAN Takashi HORIYAMA Shinji KIMURA
Clock gating is supported by commercial tools as a power optimization feature based on the guard signal described in HDL (structural method). However, the identification of control signals for gated registers is hard and designer-intensive work. Besides, since the clock gating cells also consume power, it is imperative to minimize the number of inserted clock gating cells and their switching activities for power optimization. In this paper, we propose an automatic multi-stage clock gating algorithm with ILP (Integer Linear Programming) formulation, including clock gating control candidate extraction, constraints construction and optimum control signal selection. By multi-stage clock gating, unnecessary clock pulses to clock gating cells can be avoided by other clock gating cells, so that the switching activity of clock gating cells can be reduced. We find that any multi-stage control signals are also single-stage control signals, and any combination of signals can be selected from single-stage candidates. The proposed method can be applied to 3 or more cascaded stages. The multi-stage clock gating optimization problem is formulated as constraints in LP format for the selection of cascaded clock-gating order of multi-stage candidate combinations, and a commercial ILP solver (IBM CPLEX) is applied to obtain the control signals for each register with minimum switching activity. Those signals are used to generate a gate level description with guarded registers from original design, and a commercial synthesis and layout tools are applied to obtain the circuit with multi-stage clock gating. For a set of benchmark circuits and a Low Density Parity Check (LDPC) Decoder (6.6k gates, 212 F.F.s), the proposed method is applied and actual power consumption is estimated using Synopsys NanoSim after layout. On average, 31% actual power reduction has been obtained compared with original designs with structural clock gating, and more than 10% improvement has been achieved for some circuits compared with single-stage optimization method. CPU time for optimum multi-stage control selection is several seconds for up to 25k variables in LP format. By applying the proposed clock gating, area can also be reduced since the multiplexors controlling register inputs are eliminated.
This paper presents a no-reference (NR) based video-quality estimation method for compressed videos which apply inter-frame prediction. The proposed method does not need bitstream information. Only pixel information of decoded videos is used for the video-quality estimation. An activity value which indicates a variance of luminance values is calculated for every given-size pixel block. The activity difference between an intra-coded frame and its adjacent frame is calculated and is employed for the video-quality estimation. In addition, a blockiness level and a blur level are also estimated at every frame by analyzing pixel information only. The estimated blockiness level and blur level are also taken into account to improve quality-estimation accuracy in the proposed method. Experimental results show that the proposed method achieves accurate video-quality estimation without the original video which does not include any artifacts by the video compression. The correlation coefficient between subjective video quality and estimated quality is 0.925. The proposed method is suitable for automatic video-quality checks when service providers cannot access the original videos.
An algorithm for the discrimination between human upstairs and downstairs using a tri-axial accelerometer is presented in this paper, which consists of vertical acceleration calibration, extraction of two kinds of features (Interquartile Range and Wavelet Energy), effective feature subset selection with the wrapper approach, and SVM classification. The proposed algorithm can recognize upstairs and downstairs with 95.64% average accuracy for different sensor locations, i.e. located on the subject's waist belt, in the trousers pocket, and in the shirt pocket. Even for the mixed data from all sensor locations, the average recognition accuracy can reach 94.84%. Experimental results have successfully validated the effectiveness of the proposed method.
Arei KOBAYASHI Shigeki MURAMATSU Daisuke KAMISAKA Takafumi WATANABE Atsunori MINAMIKAWA Takeshi IWAMOTO Hiroyuki YOKOYAMA
This paper proposes a method for using an accelerometer, microphone, and GPS in a mobile phone to recognize the movement of the user. Past attempts at identifying the movement associated with riding on a bicycle, train, bus or car and common human movements like standing still, walking or running have had problems with poor accuracy due to factors such as sudden changes in vibration or times when the vibrations resembled those for other types of movement. Moreover, previous methods have had problems with has the problem of high power consumption because of the sensor processing load. The proposed method aims to avoid these problems by estimating the reliability of the inference result, and by combining two inference modes to decrease the power consumption. Field trials demonstrate that our method achieves 90% or better average accuracy for the seven types of movement listed above. Shaka's power saving functionality enables us to extend the battery life of a mobile phone to over 100 hours while our estimation algorithm is running in the background. Furthermore, this paper uses experimental results to show the trade-off between accuracy and latency when estimating user activity.
Yuta YAMATO Xiaoqing WEN Kohei MIYASE Hiroshi FURUKAWA Seiji KAJIHARA
Power-aware X-filling is a preferable approach to avoiding IR-drop-induced yield loss in at-speed scan testing. However, the ability of previous X-filling methods to reduce launch switching activity may be unsatisfactory, due to low effect (insufficient and global-only reduction) and/or low scalability (long CPU time). This paper addresses this reduction quality problem with a novel GA (Genetic Algorithm) based X-filling method, called GA-fill. Its goals are (1) to achieve both effectiveness and scalability in a more balanced manner and (2) to make the reduction effect of launch switching activity more concentrated on critical areas that have higher impact on IR-drop-induced yield loss. Evaluation experiments are being conducted on both benchmark and industrial circuits, and the results have demonstrated the usefulness of GA-fill.
The earlier the stage where we perform low power design, the higher the dynamic power reduction we achieve. In this paper, we focus on reducing switching activity in high-level synthesis, especially, in the problem of functional module binding, bus binding or register binding. We propose an effective low power bus binding algorithm based on the table decomposition method, to reduce switching activity. The proposed algorithm is based on the decomposition of the original problem into sub-problems by exploiting the optimal substructure. As a result, it finds an optimal or close-to-optimal binding solution with less computation time. Experimental results show the proposed method obtains a solution 2.3-22.2% closer to optimal solution than one with a conventional heuristic method, 8.0-479.2 times faster than the optimal one (at a threshold value of 1.0E+9).
Chang Woo HAN Shin Jae KANG Nam Soo KIM
In this letter, we propose a novel approach to human activity recognition. We present a class of features that are robust to the tilt of the attached sensor module and a state transition model suitable for HMM-based activity recognition. In addition, postprocessing techniques are applied to stabilize the recognition results. The proposed approach shows significant improvements in recognition experiments over a variety of human activity DB.
Traditional wavelet-based speech enhancement algorithms are ineffective in the presence of highly non-stationary noise because of the difficulties in the accurate estimation of the local noise spectrum. In this paper, a simple method of noise estimation employing the use of a voice activity detector is proposed. We can improve the output of a wavelet-based speech enhancement algorithm in the presence of random noise bursts according to the results of VAD decision. The noisy speech is first preprocessed using bark-scale wavelet packet decomposition ( BSWPD ) to convert a noisy signal into wavelet coefficients (WCs). It is found that the VAD using bark-scale spectral entropy, called as BS-Entropy, parameter is superior to other energy-based approach especially in variable noise-level. The wavelet coefficient threshold (WCT) of each subband is then temporally adjusted according to the result of VAD approach. In a speech-dominated frame, the speech is categorized into either a voiced frame or an unvoiced frame. A voiced frame possesses a strong tone-like spectrum in lower subbands, so that the WCs of lower-band must be reserved. On the contrary, the WCT tends to increase in lower-band if the speech is categorized as unvoiced. In a noise-dominated frame, the background noise can be almost completely removed by increasing the WCT. The objective and subjective experimental results are then used to evaluate the proposed system. The experiments show that this algorithm is valid on various noise conditions, especially for color noise and non-stationary noise conditions.
Toru YAMADA Yoshihiro MIYAMOTO Yuzo SENDA Masahiro SERIZAWA
This paper presents a Reduced-reference based video-quality estimation method suitable for individual end-user quality monitoring of IPTV services. With the proposed method, the activity values for individual given-size pixel blocks of an original video are transmitted to end-user terminals. At the end-user terminals, the video quality of a received video is estimated on the basis of the activity-difference between the original video and the received video. Psychovisual weightings and video-quality score adjustments for fatal degradations are applied to improve estimation accuracy. In addition, low-bit-rate transmission is achieved by using temporal sub-sampling and by transmitting only the lower six bits of each activity value. The proposed method achieves accurate video quality estimation using only low-bit-rate original video information (15 kbps for SDTV). The correlation coefficient between actual subjective video quality and estimated quality is 0.901 with 15 kbps side information. The proposed method does not need computationally demanding spatial and gain-and-offset registrations. Therefore, it is suitable for real-time video-quality monitoring in IPTV services.
In this letter, we propose an efficient method to improve the performance of voiced/unvoiced (V/UV) sounds decision for the selectable mode vocoder (SMV) of 3GPP2 using the Gaussian mixture model (GMM). We first present an effective analysis of the features and the classification method adopted in the SMV. And feature vectors which are applied to the GMM are then selected from relevant parameters of the SMV for the efficient V/UV classification. The performance of the proposed algorithm are evaluated under various conditions and yield better results compared to the conventional method of the SMV.
David COURNAPEAU Tatsuya KAWAHARA
A new online, unsupervised voice activity detection (VAD) method is proposed. The method is based on a feature derived from high-order statistics (HOS), enhanced by a second metric based on normalized autocorrelation peaks to improve its robustness to non-Gaussian noises. This feature is also oriented for discriminating between close-talk and far-field speech, thus providing a VAD method in the context of human-to-human interaction independent of the energy level. The classification is done by an online variation of the Expectation-Maximization (EM) algorithm, to track and adapt to noise variations in the speech signal. Performance of the proposed method is evaluated on an in-house data and on CENSREC-1-C, a publicly available database used for VAD in the context of automatic speech recognition (ASR). On both test sets, the proposed method outperforms a simple energy-based algorithm and is shown to be more robust against the change in speech sparsity, SNR variability and the noise type.
Q-Haing JO Yun-Sik PARK Kye-Hwan LEE Joon-Hyuk CHANG
In this letter, we propose effective feature vectors to improve the performance of voice activity detection (VAD) employing a support vector machine (SVM), which is known to incorporate an optimized nonlinear decision over two different classes. To extract the effective feature vectors, we present a novel scheme that combines the a posteriori SNR, a priori SNR, and predicted SNR, widely adopted in conventional statistical model-based VAD.
Erwan LE MALECOT Masayoshi KOHARA Yoshiaki HORI Kouichi SAKURAI
With the multiplication of attacks against computer networks, system administrators are required to monitor carefully the traffic exchanged by the networks they manage. However, that monitoring task is increasingly laborious because of the augmentation of the amount of data to analyze. And that trend is going to intensify with the explosion of the number of devices connected to computer networks along with the global rise of the available network bandwidth. So system administrators now heavily rely on automated tools to assist them and simplify the analysis of the data. Yet, these tools provide limited support and, most of the time, require highly skilled operators. Recently, some research teams have started to study the application of visualization techniques to the analysis of network traffic data. We believe that this original approach can also allow system administrators to deal with the large amount of data they have to process. In this paper, we introduce a tool for network traffic monitoring using visualization techniques that we developed in order to assist the system administrators of our corporate network. We explain how we designed the tool and some of the choices we made regarding the visualization techniques to use. The resulting tool proposes two linked representations of the network traffic and activity, one in 2D and the other in 3D. As 2D and 3D visualization techniques have different assets, we resulted in combining them in our tool to take advantage of their complementarity. We finally tested our tool in order to evaluate the accuracy of our approach.
An orthonormal basis adaptation method for function approximation was developed and applied to reinforcement learning with multi-dimensional continuous state space. First, a basis used for linear function approximation of a control function is set to an orthonormal basis. Next, basis elements with small activities are replaced with other candidate elements as learning progresses. As this replacement is repeated, the number of basis elements with large activities increases. Example chaos control problems for multiple logistic maps were solved, demonstrating that the method for adapting an orthonormal basis can modify a basis while holding the orthonormality in accordance with changes in the environment to improve the performance of reinforcement learning and to eliminate the adverse effects of redundant noisy states.
Masakiyo FUJIMOTO Kentaro ISHIZUKA
This paper addresses the problem of voice activity detection (VAD) in noisy environments. The VAD method proposed in this paper is based on a statistical model approach, and estimates statistical models sequentially without a priori knowledge of noise. Namely, the proposed method constructs a clean speech / silence state transition model beforehand, and sequentially adapts the model to the noisy environment by using a switching Kalman filter when a signal is observed. In this paper, we carried out two evaluations. In the first, we observed that the proposed method significantly outperforms conventional methods as regards voice activity detection accuracy in simulated noise environments. Second, we evaluated the proposed method on a VAD evaluation framework, CENSREC-1-C. The evaluation results revealed that the proposed method significantly outperforms the baseline results of CENSREC-1-C as regards VAD accuracy in real environments. In addition, we confirmed that the proposed method helps to improve the accuracy of concatenated speech recognition in real environments.