Sendren Sheng-Dong XU Albertus Andrie CHRISTIAN Chien-Peng HO Shun-Long WENG
During the COVID-19 pandemic, a robust system for masked face recognition has been required. Most existing solutions used many samples per identity for the model to recognize, but the processes involved are very laborious in a real-life scenario. Therefore, we propose “CPNet” as a suitable and reliable way of recognizing masked faces from only a few samples per identity. The prototype classifier uses a few-shot learning paradigm to perform the recognition process. To handle complex and occluded facial features, we incorporated the covariance structure of the classes to refine the class distance calculation. We also used sharpness-aware minimization (SAM) to improve the classifier. Extensive in-depth experiments on a variety of datasets show that our method achieves remarkable results with accuracy as high as 95.3%, which is 3.4% higher than that of the baseline prototype network used for comparison.
Lin CAO Xibao HUO Yanan GUO Kangning DU
Sketch face recognition refers to matching photos with sketches, which has effectively been used in various applications ranging from law enforcement agencies to digital entertainment. However, due to the large modality gap between photos and sketches, sketch face recognition remains a challenging task at present. To reduce the domain gap between the sketches and photos, this paper proposes a cascaded transformation generation network for cross-modality image generation and sketch face recognition simultaneously. The proposed cascaded transformation generation network is composed of a generation module, a cascaded feature transformation module, and a classifier module. The generation module aims to generate a high quality cross-modality image, the cascaded feature transformation module extracts high-level semantic features for generation and recognition simultaneously, the classifier module is used to complete sketch face recognition. The proposed transformation generation network is trained in an end-to-end manner, it strengthens the recognition accuracy by the generated images. The recognition performance is verified on the UoM-SGFSv2, e-PRIP, and CUFSF datasets; experimental results show that the proposed method is better than other state-of-the-art methods.
Yujian FENG Fei WU Yimu JI Xiao-Yuan JING Jian YU
Sketch face recognition is to match sketch face images to photo face images. The main challenge of sketch face recognition is learning discriminative feature representations to ensure intra-class compactness and inter-class separability. However, traditional sketch face recognition methods encouraged samples with the same identity to get closer, and samples with different identities to be further, and these methods did not consider the intra-class compactness of samples. In this paper, we propose triplet-margin-center loss to cope with the above problem by combining the triplet loss and center loss. The triplet-margin-center loss can enlarge the distance of inter-class samples and reduce intra-class sample variations simultaneously, and improve intra-class compactness. Moreover, the triplet-margin-center loss applies a hard triplet sample selection strategy. It aims to effectively select hard samples to avoid unstable training phase and slow converges. With our approach, the samples from photos and from sketches taken from the same identity are closer, and samples from photos and sketches come from different identities are further in the projected space. In extensive experiments and comparisons with the state-of-the-art methods, our approach achieves marked improvements in most cases.
In this paper, we present a joint multi-patch and multi-task convolutional neural networks (JMM-CNNs) framework to learn more descriptive and robust face representation for face recognition. In the proposed JMM-CNNs, a set of multi-patch CNNs and a feature fusion network are constructed to learn and fuse global and local facial features, then a multi-task learning algorithm, including face recognition task and pose estimation task, is operated on the fused feature to obtain a pose-invariant face representation for the face recognition task. To further enhance the pose insensitiveness of the learned face representation, we also introduce a similarity regularization term on features of the two tasks to propose a regularization loss. Moreover, a simple but effective patch sampling strategy is applied to make the JMM-CNNs have an end-to-end network architecture. Experiments on Multi-PIE dataset demonstrate the effectiveness of the proposed method, and we achieve a competitive performance compared with state-of-the-art methods on Labeled Face in the Wild (LFW), YouTube Faces (YTF) and MegaFace Challenge.
Linjun SUN Weijun LI Xin NING Liping ZHANG Xiaoli DONG Wei HE
This letter proposes a gradient-enhanced softmax supervisor for face recognition (FR) based on a deep convolutional neural network (DCNN). The proposed supervisor conducts the constant-normalized cosine to obtain the score for each class using a combination of the intra-class score and the soft maximum of the inter-class scores as the objective function. This mitigates the vanishing gradient problem in the conventional softmax classifier. The experiments on the public Labeled Faces in the Wild (LFW) database denote that the proposed supervisor achieves better results when compared with those achieved using the current state-of-the-art softmax-based approaches for FR.
In recent years, deep learning based approaches have substantially improved the performance of face recognition. Most existing deep learning techniques work well, but neglect effective utilization of face correlation information. The resulting performance loss is noteworthy for personal appearance variations caused by factors such as illumination, pose, occlusion, and misalignment. We believe that face correlation information should be introduced to solve this network performance problem originating from by intra-personal variations. Recently, graph deep learning approaches have emerged for representing structured graph data. A graph is a powerful tool for representing complex information of the face image. In this paper, we survey the recent research related to the graph structure of Convolutional Neural Networks and try to devise a definition of graph structure included in Compressed Sensing and Deep Learning. This paper devoted to the story explain of two properties of our graph - sparse and depth. Sparse can be advantageous since features are more likely to be linearly separable and they are more robust. The depth means that this is a multi-resolution multi-channel learning process. We think that sparse graph based deep neural network can more effectively make similar objects to attract each other, the relative, different objects mutually exclusive, similar to a better sparse multi-resolution clustering. Based on this concept, we propose a sparse graph representation based on the face correlation information that is embedded via the sparse reconstruction and deep learning within an irregular domain. The resulting classification is remarkably robust. The proposed method achieves high recognition rates of 99.61% (94.67%) on the benchmark LFW (YTF) facial evaluation database.
Chunxiao FAN Xiaopeng HONG Lei TIAN Yue MING Matti PIETIKÄINEN Guoying ZHAO
PCANet, as one noticeable shallow network, employs the histogram representation for feature pooling. However, there are three main problems about this kind of pooling method. First, the histogram-based pooling method binarizes the feature maps and leads to inevitable discriminative information loss. Second, it is difficult to effectively combine other visual cues into a compact representation, because the simple concatenation of various visual cues leads to feature representation inefficiency. Third, the dimensionality of histogram-based output grows exponentially with the number of feature maps used. In order to overcome these problems, we propose a novel shallow network model, named as PCANet-II. Compared with the histogram-based output, the second order pooling not only provides more discriminative information by preserving both the magnitude and sign of convolutional responses, but also dramatically reduces the size of output features. Thus we combine the second order statistical pooling method with the shallow network, i.e., PCANet. Moreover, it is easy to combine other discriminative and robust cues by using the second order pooling. So we introduce the binary feature difference encoding scheme into our PCANet-II to further improve robustness. Experiments demonstrate the effectiveness and robustness of our proposed PCANet-II method.
Ruisheng RAN Bin FANG Xuegang WU
Neighborhood preserving embedding is a widely used manifold reduced dimensionality technique. But NPE has to encounter two problems. One problem is that it suffers from the small-sample-size (SSS) problem. Another is that the performance of NPE is seriously sensitive to the neighborhood size k. To overcome the two problems, an exponential neighborhood preserving embedding (ENPE) is proposed in this paper. The main idea of ENPE is that the matrix exponential is introduced to NPE, then the SSS problem is avoided and low sensitivity to the neighborhood size k is gotten. The experiments are conducted on ORL, Georgia Tech and AR face database. The results show that, ENPE shows advantageous performance over other unsupervised methods, such as PCA, LPP, ELPP and NPE. Another is that ENPE is much less sensitive to the neighborhood parameter k contrasted with the unsupervised manifold learning methods LPP, ELPP and NPE.
Ruisheng RAN Bin FANG Xuegang WU Shougui ZHANG
As an effective method, exponential discriminant analysis (EDA) has been proposed and widely used to solve the so-called small-sample-size (SSS) problem. In this paper, a simple and effective generalization of EDA is presented and named as GEDA. In GEDA, a general exponential function, where the base of exponential function is larger than the Euler number, is used. Due to the property of general exponential function, the distance between samples belonging to different classes is larger than that of EDA, and then the discrimination property is largely emphasized. The experiment results on the Extended Yale and CMU-PIE face databases show that, GEDA gets more advantageous recognition performance compared to EDA.
Jun WANG Guoqing WANG Leida LI
A quantized index for evaluating the pattern similarity of two different datasets is designed by calculating the number of correlated dictionary atoms. Guided by this theory, task-specific biometric recognition model transferred from state-of-the-art DNN models is realized for both face and vein recognition.
Wenming YANG Riqiang GAO Qingmin LIAO
This paper presents a strategy, Weighted Voting of Discriminative Regions (WVDR), to improve the face recognition performance, especially in Small Sample Size (SSS) and occlusion situations. In WVDR, we extract the discriminative regions according to facial key points and abandon the rest parts. Considering different regions of face make different contributions to recognition, we assign weights to regions for weighted voting. We construct a decision dictionary according to the recognition results of selected regions in the training phase, and this dictionary is used in a self-defined loss function to obtain weights. The final identity of test sample is the weighted voting of selected regions. In this paper, we combine the WVDR strategy with CRC and SRC separately, and extensive experiments show that our method outperforms the baseline and some representative algorithms.
Xiaoguang TU Feng YANG Mei XIE Zheng MA
Numerous methods have been developed to handle lighting variations in the preprocessing step of face recognition. However, most of them only use the high-frequency information (edges, lines, corner, etc.) for recognition, as pixels lied in these areas have higher local variance values, and thus insensitive to illumination variations. In this case, information of low-frequency may be discarded and some of the features which are helpful for recognition may be ignored. In this paper, we present a new and efficient method for illumination normalization using an energy minimization framework. The proposed method aims to remove the illumination field of the observed face images while simultaneously preserving the intrinsic facial features. The normalized face image and illumination field could be achieved by a reciprocal iteration scheme. Experiments on CMU-PIE and the Extended Yale B databases show that the proposed method can preserve a very good visual quality even on the images illuminated with deep shadow and high brightness regions, and obtain promising illumination normalization results for better face recognition performance.
Yong CHENG Zuoyong LI Yuanchen HAN
After exploring the classic Lambertian reflectance model, we proposed an effective illumination estimation model to extract illumination invariants for face recognition under complex illumination conditions in this paper. The estimated illumination by our method not only meets the actual lighting conditions of facial images, but also conforms to the imaging principle. Experimental results on the combined Yale B database show that the proposed method can extract more robust illumination invariants, which improves face recognition rate.
Gee-Sern HSU Hsiao-Chia PENG Ding-Yu LIN Chyi-Yeu LIN
Face recognition across pose is generally tackled by either 2D based or 3D based approaches. The 2D-based often require a training set from which the cross-pose multi-view relationship can be learned and applied for recognition. The 3D based are mostly composed of 3D surface reconstruction of each gallery face, synthesis of 2D images of novel views using the reconstructed model, and match of the synthesized images to the probes. The depth information provides crucial information for arbitrary poses but more methods are yet to be developed. Extended from a latest face reconstruction method using a single 3D reference model and a frontal registered face, this study focuses on using the reconstructed 3D face for recognition. The recognition performance varies with poses, the closer to the front, the better. Several ways to improve the performance are attempted, including different numbers of fiducial points for alignment, multiple reference models considered in the reconstruction phase, and both frontal and profile poses available in the gallery. These attempts make this approach competitive to the state-of-the-art methods.
Face recognition under variable illumination conditions is a challenging task. Numbers of approaches have been developed for solving the illumination problem. In this paper, we summarize and analyze some noteworthy issues in illumination processing for face recognition by reviewing various representative approaches. These issues include a principle that associates various approaches with a commonly used reflectance model and the shared considerations like contribution of basic processing methods, processing domain, feature scale, and a common problem. We also address a more essential question-what to actually normalize. Through the discussion on these issues, we also provide suggestions on potential directions for future research. In addition, we conduct evaluation experiments on 1) contribution of fundamental illumination correction to illumination insensitive face recognition and 2) comparative performance of various approaches. Experimental results show that the approaches with fundamental illumination correction methods are more insensitive to extreme illumination than without them. Tan and Triggs' method (TT) using L1 norm achieves the best results among nine tested approaches.
Cheng ZHANG Yuzhang GU Zhengmin ZHANG Yunlong ZHAN
In this paper, we propose a face representation approach using multi-orientation Log-Gabor local binary pattern (MOLGLBP) for realizing face recognition under facial expressions, illuminations and partial occlusions. Log-Gabor filters with different scales (frequencies) and orientations are applied on Y, I, and Q channel image in the YIQ color space respectively. Then Log-Gabor images of different orientations at the same scale are combined to form a multi-orientation Log-Gabor image (MOLGI) and two LBP operators are applied to it. For face recognition, histogram intersection metric is utilized to measure the similarity of faces. The proposed approach is evaluated on the CurtinFaces database and experiments demonstrate that the proposed approach is effectiveness against two simultaneous variations: expression & illumination, and illumination & occlusion.
Xue CHEN Chunheng WANG Baihua XIAO Song GAO
This paper proposes to obtain high-level, domain-robust representations for cross-view face recognition. Specially, we introduce Convolutional Deep Belief Networks (CDBN) as the feature learning model, and an CDBN based interpolating path between the source and target views is built to model the correlation of cross-view data. The promising results outperform other state-of-the-art methods.
Xue CHEN Chunheng WANG Baihua XIAO Yunxue SHAO
In Still-to-Video (S2V) face recognition, only a few high resolution images are registered for each subject, while the probe is video clips of complex variations. As faces present distinct characteristics under different scenarios, recognition in the original space is obviously inefficient. Thus, in this paper, we propose a novel discriminant analysis method to learn separate mappings for different scenario patterns (still, video), and further pursue a common discriminant space based on these mappings. Concretely, by modeling each video as a manifold and each image as point data, we form the scenario-oriented mapping learning as a Point-Manifold Discriminant Analysis (PMDA) framework. The learning objective is formulated by incorporating the intra-class compactness and inter-class separability for good discrimination. Experiments on the COX-S2V dataset demonstrate the effectiveness of the proposed method.
Raissa RELATOR Yoshihiro HIROHASHI Eisuke ITO Tsuyoshi KATO
Classification tasks in computer vision and brain-computer interface research have presented several applications such as biometrics and cognitive training. However, like in any other discipline, determining suitable representation of data has been challenging, and recent approaches have deviated from the familiar form of one vector for each data sample. This paper considers a kernel between vector sets, the mean polynomial kernel, motivated by recent studies where data are approximated by linear subspaces, in particular, methods that were formulated on Grassmann manifolds. This kernel takes a more general approach given that it can also support input data that can be modeled as a vector sequence, and not necessarily requiring it to be a linear subspace. We discuss how the kernel can be associated with the Projection kernel, a Grassmann kernel. Experimental results using face image sequences and physiological signal data show that the mean polynomial kernel surpasses existing subspace-based methods on Grassmann manifolds in terms of predictive performance and efficiency.
Lei LEI Dae-Hwan KIM Won-Jae PARK Sung-Jea KO
In this paper, we propose a simple and efficient face representation feature that adopts the eigenfaces of Local Binary Pattern (LBP) space, referred to as the LBP eigenfaces, for robust face recognition. In the proposed method, LBP eigenfaces are generated by first mapping the original image space to the LBP space and then projecting the LBP space to the LBP eigenface subspace by Principal Component Analysis (PCA). Therefore, LBP eigenfaces capture both the local and global structures of face images. In the experiments, the proposed LBP eigenfaces are integrated into two types of classification methods, Nearest Neighbor (NN) and Collaborative Representation-based Classification (CRC). Experimental results indicate that the classification with the LBP eigenfaces outperforms that with the original eigenfaces and LBP histogram.