1-19hit |
Qin CHENG Linghua ZHANG Bo XUE Feng SHU Yang YU
As an emerging technology, device-free localization (DFL) using wireless sensor networks to detect targets not carrying any electronic devices, has spawned extensive applications, such as security safeguards and smart homes or hospitals. Previous studies formulate DFL as a classification problem, but there are still some challenges in terms of accuracy and robustness. In this paper, we exploit a generalized thresholding algorithm with parameter p as a penalty function to solve inverse problems with sparsity constraints for DFL. The function applies less bias to the large coefficients and penalizes small coefficients by reducing the value of p. By taking the distinctive capability of the p thresholding function to measure sparsity, the proposed approach can achieve accurate and robust localization performance in challenging environments. Extensive experiments show that the algorithm outperforms current alternatives.
In this paper, we propose L0 norm optimization in a scrambled sparse representation domain and its application to an Encryption-then-Compression (EtC) system. We design a random unitary transform that conserves L0 norm isometry. The resulting encryption method provides a practical orthogonal matching pursuit (OMP) algorithm that allows computation in the encrypted domain. We prove that the proposed method theoretically has exactly the same estimation performance as the nonencrypted variant of the OMP algorithm. In addition, we demonstrate the security strength of the proposed secure sparse representation when applied to the EtC system. Even if the dictionary information is leaked, the proposed scheme protects the privacy information of observed signals.
In this paper, we propose a secure computation of sparse coding and its application to Encryption-then-Compression (EtC) systems. The proposed scheme introduces secure sparse coding that allows computation of an Orthogonal Matching Pursuit (OMP) algorithm in an encrypted domain. We prove theoretically that the proposed method estimates exactly the same sparse representations that the OMP algorithm for non-encrypted computation does. This means that there is no degradation of the sparse representation performance. Furthermore, the proposed method can control the sparsity without decoding the encrypted signals. Next, we propose an EtC system based on the secure sparse coding. The proposed secure EtC system can protect the private information of the original image contents while performing image compression. It provides the same rate-distortion performance as that of sparse coding without encryption, as demonstrated on both synthetic data and natural images.
Jaihyun PARK Bonhwa KU Youngsaeng JIN Hanseok KO
Side scan sonar using low frequency can quickly search a wide range, but the images acquired are of low quality. The image super resolution (SR) method can mitigate this problem. The SR method typically uses sparse coding, but accurately estimating sparse coefficients incurs substantial computational costs. To reduce processing time, we propose a region-selective sparse coding based SR system that emphasizes object regions. In particular, the region that contains interesting objects is detected for side scan sonar based underwater images so that the subsequent sparse coding based SR process can be selectively applied. Effectiveness of the proposed method is verified by the reduced processing time required for image reconstruction yet preserving the same level of visual quality as conventional methods.
Shilei CHENG Song GU Maoquan YE Mei XIE
Human action recognition in videos draws huge research interests in computer vision. The Bag-of-Word model is quite commonly used to obtain the video level representations, however, BoW model roughly assigns each feature vector to its nearest visual word and the collection of unordered words ignores the interest points' spatial information, inevitably causing nontrivial quantization errors and impairing improvements on classification rates. To address these drawbacks, we propose an approach for action recognition by encoding spatio-temporal log Euclidean covariance matrix (ST-LECM) features within the low-rank and sparse representation framework. Motivated by low rank matrix recovery, local descriptors in a spatial temporal neighborhood have similar representation and should be approximately low rank. The learned coefficients can not only capture the global data structures, but also preserve consistent. Experimental results showed that the proposed approach yields excellent recognition performance on synthetic video datasets and are robust to action variability, view variations and partial occlusion.
Yinan LI Xiongwei ZHANG Meng SUN Yonggang HU Li LI
An online version of convolutive non-negative sparse coding (CNSC) with the generalized Kullback-Leibler (K-L) divergence is proposed to adaptively learn spectral-temporal bases from speech streams. The proposed scheme processes training data piece-by-piece and incrementally updates learned bases with accumulated statistics to overcome the inefficiency of its offline counterpart in processing large scale or streaming data. Compared to conventional non-negative sparse coding, we utilize the convolutive model within bases, so that each basis is capable of describing a relatively long temporal span of signals, which helps to improve the representation power of the model. Moreover, by incorporating a voice activity detector (VAD), we propose an unsupervised enhancement algorithm that updates the noise dictionary adaptively from non-speech intervals. Meanwhile, for the speech intervals, one can adaptively learn the speech bases by keeping the noise ones fixed. Experimental results show that the proposed algorithm outperforms the competing algorithms substantially, especially when the background noise is highly non-stationary.
Local spatio-temporal features are popular in the human action recognition task. In practice, they are usually coupled with a feature encoding approach, which helps to obtain the video-level vector representations that can be used in learning and recognition. In this paper, we present an efficient local feature encoding approach, which is called Approximate Sparse Coding (ASC). ASC computes the sparse codes for a large collection of prototype local feature descriptors in the off-line learning phase using Sparse Coding (SC) and look up the nearest prototype's precomputed sparse code for each to-be-encoded local feature in the encoding phase using Approximate Nearest Neighbour (ANN) search. It shares the low dimensionality of SC and the high speed of ANN, which are both desired properties for a local feature encoding approach. ASC has been excessively evaluated on the KTH dataset and the HMDB51 dataset. We confirmed that it is able to encode large quantity of local video features into discriminative low dimensional representations efficiently.
Li TIAN Qi JIA Sei-ichiro KAMATA
In this study, we propose a simple, yet general and powerful framework of integrating multiple global and local features by Product Sparse Coding (PSC) for image retrieval. In our framework, multiple global and local features are extracted from images and then are transformed to Trimmed-Root (TR)-features. After that, the features are encoded into compact codes by PSC. Finally, a two-stage ranking strategy is proposed for indexing in retrieval. We make three major contributions in this study. First, we propose TR representation of multiple image features and show that the TR representation offers better performance than the original features. Second, the integrated features by PSC is very compact and effective with lower complexity than by the standard sparse coding. Finally, the two-stage ranking strategy can balance the efficiency and memory usage in storage. Experiments demonstrate that our compact image representation is superior to the state-of-the-art alternatives for large-scale image retrieval.
Shuang LIU Zhong ZHANG Xiaozhong CAO
Although sparse coding has emerged as an extremely powerful tool for texture and image classification, it neglects the relationship of coding coefficients from the same class in the training stage, which may cause a decline in the classification performance. In this paper, we propose a novel coding strategy named compact sparse coding for ground-based cloud classification. We add a constraint on coding coefficients into the objective function of traditional sparse coding. In this way, coding coefficients from the same class can be forced to their mean vector, making them more compact and discriminative. Experiments demonstrate that our method achieves better performance than the state-of-the-art methods.
Yang LI Junyong YE Tongqing WANG Shijian HUANG
Traditional sparse representation-based methods for human action recognition usually pool over the entire video to form the final feature representation, neglecting any spatio-temporal information of features. To employ spatio-temporal information, we present a novel histogram representation obtained by statistics on temporal changes of sparse coding coefficients frame by frame in the spatial pyramids constructed from videos. The histograms are further fed into a support vector machine with a spatial pyramid matching kernel for final action classification. We validate our method on two benchmarks, KTH and UCF Sports, and experiment results show the effectiveness of our method in human action recognition.
Peng SONG Wenming ZHENG Ruiyu LIANG
In traditional speech emotion recognition systems, when the training and testing utterances are obtained from different corpora, the recognition rates will decrease dramatically. To tackle this problem, in this letter, inspired from the recent developments of sparse coding and transfer learning, a novel sparse transfer learning method is presented for speech emotion recognition. Firstly, a sparse coding algorithm is employed to learn a robust sparse representation of emotional features. Then, a novel sparse transfer learning approach is presented, where the distance between the feature distributions of source and target datasets is considered and used to regularize the objective function of sparse coding. The experimental results demonstrate that, compared with the automatic recognition approach, the proposed method achieves promising improvements on recognition rates and significantly outperforms the classic dimension reduction based transfer learning approach.
Jigisha N PATEL Jerin JOSE Suprava PATNAIK
The concept of sparse representation is gaining momentum in image processing applications, especially in image compression, from last one decade. Sparse coding algorithms represent signals as a sparse linear combination of atoms of an overcomplete dictionary. Earlier works shows that sparse coding of images using learned dictionaries outperforms the JPEG standard for image compression. The conventional method of image compression based on sparse coding, though successful, does not adapting the compression rate based on the image local block characteristics. Here, we have proposed a new framework in which the image is classified into three classes by measuring the block activities followed by sparse coding each of the classes using dictionaries learned specific to each class. K-SVD algorithm has been used for dictionary learning. The sparse coefficients for each class are Huffman encoded and combined to form a single bit stream. The model imparts some rate-distortion attributes to compression as there is provision for setting a different constraint for each class depending on its characteristics. We analyse and compare this model with the conventional model. The outcomes are encouraging and the model makes way for an efficient sparse representation based image compression.
Shuang BAI Jianjun HOU Noboru OHNISHI
Local descriptors, Local Binary Pattern (LBP) and Scale Invariant Feature Transform (SIFT) are widely used in various computer applications. They emphasize different aspects of image contents. In this letter, we propose to combine them in sparse coding for categorizing scene images. First, we regularly extract LBP and SIFT features from training images. Then, corresponding to each feature, a visual word codebook is constructed. The obtained LBP and SIFT codebooks are used to create a two-dimensional table, in which each entry corresponds to an LBP visual word and a SIFT visual word. Given an input image, LBP and SIFT features extracted from the same positions of this image are encoded together based on sparse coding. After that, spatial max pooling is adopted to determine the image representation. Obtained image representations are converted into one-dimensional features and classified by utilizing SVM classifiers. Finally, we conduct extensive experiments on datasets of Scene Categories 8 and MIT 67 Indoor Scene to evaluate the proposed method. Obtained results demonstrate that combining features in the proposed manner is effective for scene categorization.
Ryoichi TAKASHIMA Tetsuya TAKIGUCHI Yasuo ARIKI
This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.
Ryota MIYATA Koji KURATA Toru AONISHI
We investigate a sparsely encoded Hopfield model with unit replacement by using a statistical mechanical method called self-consistent signal-to-noise analysis. We theoretically obtain a relation between the storage capacity and the number of replacement units for each sparseness a. Moreover, we compare the unit replacement model with the forgetting model in terms of the network storage capacity. The results show that the unit replacement model has a finite value of the optimal sparseness on an open interval 0 (1/2 coding) < a < 1 (the limit of sparseness) to maximize the storage capacity for a large number of replacement units, although the forgetting model does not.
Obtaining a compact representation of a large-size feature map built by mapper robots is a critical issue in recent mobile robotics. This “map compression” problem is explored from a novel perspective of dictionary-based data compression techniques in the paper. The primary contribution of the paper is the proposal of the dictionary-based map compression approach. A map compression system is presented by employing RANSAC map matching and sparse coding as building blocks. The effectiveness levels of the proposed techniques is investigated in terms of map compression ratio, compression speed, the retrieval performance of compressed/decompressed maps, as well as applications to the Kolmogorov complexity.
A novel method for single image super resolution without any training samples is presented in the paper. By sparse representation, the method attempts to recover at each pixel its best possible resolution increase based on the self similarity of the image patches across different scale and rotation transforms. The experiments indicate that the proposed method can produce robust and competitive results.
Makoto NAKASHIZUKA Hidenari NISHIURA Youji IIGUNI
In this study, we introduce shift-invariant sparse image representations using tree-structured dictionaries. Sparse coding is a generative signal model that approximates signals by the linear combinations of atoms in a dictionary. Since a sparsity penalty is introduced during signal approximation and dictionary learning, the dictionary represents the primal structures of the signals. Under the shift-invariance constraint, the dictionary comprises translated structuring elements (SEs). The computational cost and number of atoms in the dictionary increase along with the increasing number of SEs. In this paper, we propose an algorithm for shift-invariant sparse image representation, in which SEs are learnt with a tree-structured approach. By using a tree-structured dictionary, we can reduce the computational cost of the image decomposition to the logarithmic order of the number of SEs. We also present the results of our experiments on the SE learning and the use of our algorithm in image recovery applications.
In this paper, an associative memory model with a forgetting process proposed by Mezard et al. is investigated as a means of storing sparsely encoded patterns by the SCSNA proposed by Shiino and Fukai. Similar to the case of storing non-sparse (non-biased) patterns as analyzed by Mezard et al., this sparsely encoded associative memory model is also free from a catastrophic deterioration of the memory caused by memory pattern overloading. We theoretically obtain a relationship between the storage capacity and the forgetting rate, and find that there is an optimal forgetting rate leading to the maximum storage capacity. We call this the optimal storage capacity rate. As the memory pattern firing rate decreases, the optimal storage capacity increases and the optimal forgetting rate decreases. Furthermore, we shown that the capacity rate (i.e. the ratio of the storage capacity for the conventional correlation learning rule to the optimal storage capacity) is almost constant with respect to the memory pattern firing rate.