Yixuan ZHANG Meiting XUE Huan ZHANG Shubiao LIU Bei ZHAO
Network traffic control and classification have become increasingly dependent on deep packet inspection (DPI) approaches, which are the most precise techniques for intrusion detection and prevention. However, the increasing traffic volumes and link speed exert considerable pressure on DPI techniques to process packets with high performance in restricted available memory. To overcome this problem, we proposed dual cuckoo filter (DCF) as a data structure based on cuckoo filter (CF). The CF can be extended to the parallel mode called parallel Cuckoo Filter (PCF). The proposed data structure employs an extra hash function to obtain two potential indices of entries. The DCF magnifies the superiority of the CF with no additional memory. Moreover, it can be extended to the parallel mode, resulting in a data structure referred to as parallel Dual Cuckoo filter (PDCF). The implementation results show that using the DCF and PDCF as identification tools in a DPI system results in time improvements of up to 2% and 30% over the CF and PCF, respectively.
Fei WU Shuaishuai LI Guangchuan PENG Yongheng MA Xiao-Yuan JING
Cross-modal hashing technology has attracted much attention for its favorable retrieval performance and low storage cost. However, for existing cross-modal hashing methods, the heterogeneity of data across modalities is still a challenge and how to fully explore and utilize the intra-modality features has not been well studied. In this paper, we propose a novel cross-modal hashing approach called Modality-fused Graph Network (MFGN). The network architecture consists of a text channel and an image channel that are used to learn modality-specific features, and a modality fusion channel that uses the graph network to learn the modality-shared representations to reduce the heterogeneity across modalities. In addition, an integration module is introduced for the image and text channels to fully explore intra-modality features. Experiments on two widely used datasets show that our approach achieves better results than the state-of-the-art cross-modal hashing methods.
Content-based image retrieval has been a hot topic among computer vision researchers for a long time. There have been many advances over the years, one of the recent ones being deep metric learning, inspired by the success of deep neural networks in many machine learning tasks. The goal of metric learning is to extract good high-level features from image pixel data using neural networks. These features provide useful abstractions, which can enable algorithms to perform visual comparison between images with human-like accuracy. To learn these features, supervised information of image similarity or relative similarity is often used. One important issue in deep metric learning is how to define similarity for multi-label or multi-object scenes in images. Traditionally, pairwise similarity is defined based on the presence of a single common label between two images. However, this definition is very coarse and not suitable for multi-label or multi-object data. Another common mistake is to completely ignore the multiplicity of objects in images, hence ignoring the multi-object facet of certain types of datasets. In our work, we propose an approach for learning deep image representations based on the relative similarity of both multi-label and multi-object image data. We introduce an intuitive and effective similarity metric based on the Jaccard similarity coefficient, which is equivalent to the intersection over union of two label sets. Hence we treat similarity as a continuous, as opposed to discrete quantity. We incorporate this similarity metric into a triplet loss with an adaptive margin, and achieve good mean average precision on image retrieval tasks. We further show, using a recently proposed quantization method, that the resulting deep feature can be quantized whilst preserving similarity. We also show that our proposed similarity metric performs better for multi-object images than a previously proposed cosine similarity-based metric. Our proposed method outperforms several state-of-the-art methods on two benchmark datasets.
Maximum inner product search (MIPS) problem has gained much attention in a wide range of applications. In order to overcome the curse of dimensionality in high-dimensional spaces, most of existing methods first transform the MIPS problem into another approximate nearest neighbor search (ANNS) problem and then solve it by Locality Sensitive Hashing (LSH). However, due to the error incurred by the transmission and incomprehensive search strategies, these methods suffer from low precision and have loose probability guarantees. In this paper, we propose a novel search method named Adaptive-LSH (AdaLSH) to solve MIPS problem more efficiently and more precisely. AdaLSH examines objects in the descending order of both norms and (the probably correctly estimated) cosine angles with a query object in support of LSH with extendable windows. Such extendable windows bring not only efficiency in searching but also the probability guarantee of finding exact or approximate MIP objects. AdaLSH gives a better probability guarantee of success than those in conventional algorithms, bringing less running times on various datasets compared with them. In addition, AdaLSH can even support exact MIPS with probability guarantee.
Hashing methods have proven to be effective algorithm for image retrieval. However, learning discriminative hash codes is challenging for unsupervised models. In this paper, we propose a novel distinguishable image retrieval framework, named Unsupervised Deep Embedded Hashing (UDEH), to recursively learn discriminative clustering through soft clustering models and generate highly similar binary codes. We reduce the data dimension by auto-encoder and apply binary constraint loss to reduce quantization error. UDEH can be jointly optimized by standard stochastic gradient descent (SGD) in the embedd layer. We conducted a comprehensive experiment on two popular datasets.
Sangwon SEO Sangbae YUN Jaehong KIM Inkyo KIM Seongwook JIN Seungryoul MAENG
An increasing number of IoT devices are being introduced to the market in many industries, and the number of devices is expected to exceed billions in the near future. With this trend, many researchers have proposed new architectures to manage IoT devices, but the proposed architecture requires a huge memory footprint and computation overheads to look-up billions of devices. This paper proposes a hybrid hashing architecture called H- TLA to solve the problem from an architectural point of view, instead of modifying a hashing algorithm or designing a new one. We implemented a prototype system that shows about a 30% increase in performance while conserving uniformity. Therefore, we show an efficient architecture-level approach for addressing billions of devices.
Yang LI Zhuang MIAO Ming HE Yafei ZHANG Hang LI
How to represent images into highly compact binary codes is a critical issue in many computer vision tasks. Existing deep hashing methods typically focus on designing loss function by using pairwise or triplet labels. However, these methods ignore the attention mechanism in the human visual system. In this letter, we propose a novel Deep Attention Residual Hashing (DARH) method, which directly learns hash codes based on a simple pointwise classification loss function. Compared to previous methods, our method does not need to generate all possible pairwise or triplet labels from the training dataset. Specifically, we develop a new type of attention layer which can learn human eye fixation and significantly improves the representation ability of hash codes. In addition, we embedded the attention layer into the residual network to simultaneously learn discriminative image features and hash codes in an end-to-end manner. Extensive experiments on standard benchmarks demonstrate that our method preserves the instance-level similarity and outperforms state-of-the-art deep hashing methods in the image retrieval application.
Yang LI Zhuang MIAO Jiabao WANG Yafei ZHANG Hang LI
The latest deep hashing methods perform hash codes learning and image feature learning simultaneously by using pairwise or triplet labels. However, generating all possible pairwise or triplet labels from the training dataset can quickly become intractable, where the majority of those samples may produce small costs, resulting in slow convergence. In this letter, we propose a novel deep discriminative supervised hashing method, called DDSH, which directly learns hash codes based on a new combined loss function. Compared to previous methods, our method can take full advantages of the annotated data in terms of pairwise similarity and image identities. Extensive experiments on standard benchmarks demonstrate that our method preserves the instance-level similarity and outperforms state-of-the-art deep hashing methods in the image retrieval application. Remarkably, our 16-bits binary representation can surpass the performance of existing 48-bits binary representation, which demonstrates that our method can effectively improve the speed and precision of large scale image retrieval systems.
Joobeom YUN Junbeom HUR Youngjoo SHIN Dongyoung KOO
Ransomware becomes more and more threatening nowadays. In this paper, we propose CLDSafe, a novel and efficient file backup system against ransomware. It keeps shadow copies of files and provides secure restoration using cloud storage when a computer is infected by ransomware. After our system measures file similarities between a new file on the client and an old file on the server, the old file on the server is backed up securely when the new file is changed substantially. And then, only authenticated users can restore the backup files by using challenge-response mechanism. As a result, our proposed solution will be helpful in recovering systems from ransomware damage.
Audio hashing has been successfully employed for protection, management, and indexing of digital music archives. For a reliable audio hashing system, improving hash matching accuracy is crucial. In this paper, we try to improve a binary audio hash matching performance by utilizing auxiliary information, resilience mask, which is obtained while constructing hash DB. The resilience mask contains reliability information of each hash bit. We propose a new type of resilience mask by considering spectrum scaling and additive noise distortions. Experimental results show that the proposed resilience mask is effective in improving hash matching performance.
In this paper, we exploit MapReduce framework and other optimizations to improve the performance of hash join algorithms on multi-core CPUs, including No partition hash join and partition hash join. We first implement hash join algorithms with a shared-memory MapReduce model on multi-core CPUs, including partition phase, build phase, and probe phase. Then we design an improved cuckoo hash table for our hash join, which consists of a cuckoo hash table and a chained hash table. Based on our implementation, we also propose two optimizations, one for the usage of SIMD instructions, and the other for partition phase. Through experimental result and analysis, we finally find that the partition hash join often outperforms the No partition hash join, and our hash join algorithm is faster than previous work by an average of 30%.
Shunsuke OHASHI Giovanni Yoko KRISTIANTO Goran TOPIC Akiko AIZAWA
Mathematical formulae play an important role in many scientific domains. Regardless of the importance of mathematical formula search, conventional keyword-based retrieval methods are not sufficient for searching mathematical formulae, which are structured as trees. The increasing number as well as the structural complexity of mathematical formulae in scientific articles lead to the necessity for large-scale structure-aware formula search techniques. In this paper, we formulate three types of measures that represent distinctive features of semantic similarity of math formulae, and develop efficient hash-based algorithms for the approximate calculation. Our experiments using NTCIR-11 Math-2 Task dataset, a large-scale test collection for math information retrieval with about 60-million formulae, show that the proposed method improves the search precision while also keeps the scalability and runtime efficiency high.
Hiroaki TAKEBE Yusuke UEHARA Seiichi UCHIDA
Anchor graph hashing (AGH) is a promising hashing method for nearest neighbor (NN) search. AGH realizes efficient search by generating and utilizing a small number of points that are called anchors. In this paper, we propose a method for improving AGH, which considers data distribution in a similarity space and selects suitable anchors by performing principal component analysis (PCA) in the similarity space.
In this paper, generic attacks are presented against hash functions that are constructed by a hashing mode instantiating a Feistel or generalized Feistel networks with an SP-round function. It is observed that the omission of the network twist in the last round can be a weakness against preimage attacks. The first target is a standard Feistel network with an SP round function. Up to 11 rounds can be attacked in generic if a condition on a key schedule function is satisfied. The second target is a 4-branch type-2 generalized Feistel network with an SP round function. Up to 15 rounds can be attacked in generic. These generic attacks are then applied to hashing modes of ISO standard ciphers Camellia-128 without FL and whitening layers and CLEFIA-128.
Zhenfei ZHAO Hao LUO Hua ZHONG Bian YANG Zhe-Ming LU
This letter proposes a mobile application framework named erasable photograph tagging (EPT) for photograph annotation and fast retrieval. The smartphone owner's voice is employed as tags and hidden in the host photograph without an extra feature database aided for retrieval. These digitized tags can be erased anytime with no distortion remaining in the recovered photograph.
Suk-Hwan LEE Seong-Geun KWON Ki-Ryong KWON
With the rapid expansion of vector data model application to digital content such as drawings and digital maps, the security and retrieval for vector data models have become an issue. In this paper, we present a vector data-hashing algorithm for the authentication, copy protection, and indexing of vector data models that are composed of a number of layers in CAD family formats. The proposed hashing algorithm groups polylines in a vector data model and generates group coefficients by the curvatures of the first and second type of polylines. Subsequently, we calculate the feature coefficients by projecting the group coefficients onto a random pattern, and finally generate the binary hash from binarization of the feature coefficients. Based on experimental results using a number of drawings and digital maps, we verified the robustness of the proposed hashing algorithm against various attacks and the uniqueness and security of the random key.
HyungChul KANG Deukjo HONG Dukjae MOON Daesung KWON Jaechul SUNG Seokhie HONG
We present attacks on the generalized Feistel schemes, where each round function consists of a subkey XOR, S-boxes, and then a linear transformation (i.e. a Substitution-Permutation (SP) round function). Our techniques are based on rebound attacks. We assume that the S-boxes have a good differential property and the linear transformation has an optimal branch number. Under this assumption, we firstly describe known-key distinguishers on the type-1, -2, and -3 generalized Feistel schemes up to 21, 13 and 8 rounds, respectively. Then, we use the distinguishers to make several attacks on hash functions where Merkle-Damgård domain extender is used and the compression function is constructed with Matyas-Meyer-Oseas or Miyaguchi-Preneel hash modes from generalized Feistel schemes. Collision attacks are made for 11 rounds of type-1 Feistel scheme. Near collision attacks are made for 13 rounds of type-1 Feistel scheme and 9 rounds of type-2 Feistel scheme. Half collision attacks are made for 15 rounds of type-1 Feistel scheme, 9 rounds of type-2 Feistel scheme, and 5 rounds of type-3 Feistel scheme.
In this paper, we present a new biometric verification system. The proposed system employs a novel biometric hashing scheme that uses our proposed quantization method. The proposed quantization method is based on error-correcting output codes which are used for classification problems in the literature. We improve the performance of the random projection based biometric hashing scheme proposed by Ngo et al. in the literature [5]. We evaluate the performance of the novel biometric hashing scheme with two use case scenarios including the case where an attacker steals the secret key of a legitimate user. Simulation results demonstrate the superior performance of the proposed scheme.
Face image hashing is an emerging method used in biometric verification systems. In this paper, we propose a novel face image hashing method based on a new technique called discriminative projection selection. We apply the Fisher criterion for selecting the rows of a random projection matrix in a user-dependent fashion. Moreover, another contribution of this paper is to employ a bimodal Gaussian mixture model at the quantization step. Our simulation results on three different databases demonstrate that the proposed method has superior performance in comparison to previously proposed random projection based methods.
Gibran FUENTES PINEDA Hisashi KOGA Toshinori WATANABE
We present a scalable approach to automatically discovering particular objects (as opposed to object categories) from a set of images. The basic idea is to search for local image features that consistently appear in the same images under the assumption that such co-occurring features underlie the same object. We first represent each image in the set as a set of visual words (vector quantized local image features) and construct an inverted file to memorize the set of images in which each visual word appears. Then, our object discovery method proceeds by searching the inverted file and extracting visual word sets whose elements tend to appear in the same images; such visual word sets are called co-occurring word sets. Because of unstable and polysemous visual words, a co-occurring word set typically represents only a part of an object. We observe that co-occurring word sets associated with the same object often share many visual words with one another. Hence, to obtain the object models, we further cluster highly overlapping co-occurring word sets in an agglomerative manner. Remarkably, we accelerate both extraction and clustering of co-occurring word sets by Min-Hashing. We show that the models generated by our method can effectively discriminate particular objects. We demonstrate our method on the Oxford buildings dataset. In a quantitative evaluation using a set of ground truth landmarks, our method achieved higher scores than the state-of-the-art methods.