1-9hit |
Accurate visual correspondence is the foundation of many computer vision based applications. Since existing image matching algorithms generate mismatches inevitably, a reliable mismatch-removal algorithm is highly desired to remove mismatches and preserve true matches. This paper proposes a hierarchical progressive trust (HPT) model to solve this problem. The HPT model first adopts a “trust the most trustworthy ones” strategy to select anchor inliers in its bottom layer, and then progressively propagates the trust from bottom layer to other layers in a bottom-up way: 1) bottom layer verifies anchor inliers with the guidance of local features; 2) middle layers progressively estimate local transformations and perform local verifications; 3) top layer estimates a global transformation with an anchor-inliers-guided expectation maximization (EM) algorithm and performs global verifications. Experimental results show that the proposed HPT model achieves higher performance than state-of-the-art mismatch-removal methods under both rigid transformations and non-rigid deformations.
Quanxin MA Xiaolin DU Jianbo LI Yang JING Yuqing CHANG
The estimation problem of structured clutter covariance matrix (CCM) in space-time adaptive processing (STAP) for airborne radar systems is studied in this letter. By employing the prior knowledge and the persymmetric covariance structure, a new estimation algorithm is proposed based on the whitening ability of the covariance matrix. The proposed algorithm is robust to prior knowledge of different accuracy, and can whiten the observed interference data to obtain the optimal solution. In addition, the extended factored approach (EFA) is used in the optimization for dimensionality reduction, which reduces the computational burden. Simulation results show that the proposed algorithm can effectively improve STAP performance even under the condition of some errors in prior knowledge.
Xina CHENG Ziken LI Songlin DU Takeshi IKENAGA
The spike height of volleyball players is important in volleyball analysis as the quantitative criteria to evaluation players' motions, which not only provides rich information to audiences in live broadcast of sports events but also makes contribution to evaluate and improve the performance of players in strategy analysis and players training. In the volleyball game scene, the high similarity between hands, the deformation and the occlusion are three main problems that influence the acquisition performance of spike height. To solve these problems, this paper proposes a body part connection, categorization and occlusion based observation model and a temporal position based correction method. Firstly, skin pixel filter based connection detection solves the problem of high similarity between hands by judging whether a hand is connected to the spike player. Secondly, the body part categorization based observation uses the probability distribution map of hand to determine the category of each body part to solve the deformation problem. Thirdly, the occlusion part detection based observation eliminates the influence of the views with occluded body part by detecting the occluded views with a trained classifier of body part. At last, the temporal position based result correction combines the estimated results, which refers the historical positions, and the posterior result to obtain an optimal result by degree of confidence. The experiments are based on the videos of final and semi-final games of 2014 Japan Inter High School Men's Volleyball in Tokyo Metropolitan Gymnasium, which includes 196 spike sequences of 4 teams. The experiment results of proposed methods are that: 93.37% of test sequences can be successfully detected the spike height, and in which the average error of spike height is 5.96cm.
Songlin DU Yuhao XU Tingting HU Takeshi IKENAGA
High frame rate and ultra-low delay matching system plays an important role in various human-machine interactive applications, which demands better performance in matching deformable and out-of-plane rotating objects. Although many algorithms have been proposed for deformation tracking and matching, few of them are suitable for hardware implementation due to complicated operations and large time consumption. This paper proposes a hardware-oriented template update and recovery method for high frame rate and ultra-low delay deformation matching system. In the proposed method, the new template is generated in real time by partially updating the template descriptor and adding new keypoints simultaneously with the matching process in pixels (proposal #1), which avoids the large inter-frame delay. The size and shape of region of interest (ROI) are made flexible and the Hamming threshold used for brute-force matching is adjusted according to pixel position and the flexible ROI (proposal #2), which solves the problem of template drift. The template is recovered by the previous one with a relative center-shifting vector when it is judged as lost via region-wise difference check (proposal #3). Evaluation results indicate that the proposed method successfully achieves the real-time processing of 784fps at the resolution of 640×480 on field-programmable gate array (FPGA), with a delay of 0.808ms/frame, as well as achieves satisfactory deformation matching results in comparison with other general methods.
Lin DU Chang TIAN Mingyong ZENG Jiabao WANG Shanshan JIAO Qing SHEN Wei BAI Aihong LU
Part based models have been proved to be beneficial for person re-identification (Re-ID) in recent years. Existing models usually use fixed horizontal stripes or rely on human keypoints to get each part, which is not consistent with the human visual mechanism. In this paper, we propose a Self-Channel Attention Weighted Part model (SCAWP) for Re-ID. In SCAWP, we first learn a feature map from ResNet50 and use 1x1 convolution to reduce the dimension of this feature map, which could aggregate the channel information. Then, we learn the weight map of attention within each channel and multiply it with the feature map to get each part. Finally, each part is used for a special identification task to build the whole model. To verify the performance of SCAWP, we conduct experiment on three benchmark datasets, including CUHK03-NP, Market-1501 and DukeMTMC-ReID. SCAWP achieves rank-1/mAP accuracy of 70.4%/68.3%, 94.6%/86.4% and 87.6%/76.8% on three datasets respectively.
Lin DU Chang TIAN Mingyong ZENG Jiabao WANG Shanshan JIAO Qing SHEN Guodong WU
Feature learning based on deep network has been verified as beneficial for person re-identification (Re-ID) in recent years. However, most researches use a single network as the baseline, without considering the fusion of different deep features. By analyzing the attention maps of different networks, we find that the information learned by different networks can complement each other. Therefore, a novel Dual Network Fusion (DNF) framework is proposed. DNF is designed with a trunk branch and two auxiliary branches. In the trunk branch, deep features are cascaded directly along the channel direction. One of the auxiliary branch is channel attention branch, which is used to allocate weight for different deep features. Another one is multi-loss training branch. To verify the performance of DNF, we test it on three benchmark datasets, including CUHK03NP, Market-1501 and DukeMTMC-reID. The results show that the effect of using DNF is significantly better than a single network and is comparable to most state-of-the-art methods.
Songlin DU Yuan LI Takeshi IKENAGA
High frame rate and ultra-low delay are the most essential requirements for building excellent human-machine-interaction systems. As a state-of-the-art local keypoint detection and feature extraction algorithm, A-KAZE shows high accuracy and robustness. Nonlinear scale space is one of the most important modules in A-KAZE, but it not only has at least one frame delay and but also is not hardware friendly. This paper proposes a hardware oriented nonlinear scale space for high frame rate and ultra-low delay A-KAZE matching system. In the proposed matching system, one part of nonlinear scale space is temporally forward and calculated in the previous frame (proposal #1), so that the processing delay is reduced to be less than 1 ms. To improve the matching accuracy affected by proposal #1, pre-adjustment of nonlinear scale (proposal #2) is proposed. Previous two frames are used to do motion estimation to predict the motion vector between previous frame and current frame. For further improvement of matching accuracy, pixel-level pre-adjustment (proposal #3) is proposed. The pre-adjustment changes from block-level to pixel-level, each pixel is assigned an unique motion vector. Experimental results prove that the proposed matching system shows average matching accuracy higher than 95% which is 5.88% higher than the existing high frame rate and ultra-low delay matching system. As for hardware performance, the proposed matching system processes VGA videos (640×480 pixels/frame) at the speed of 784 frame/second (fps) with a delay of 0.978 ms/frame.
Songlin DU Zhe WANG Takeshi IKENAGA
High frame rate and ultra-low delay matching system plays an increasingly important role in human-machine interactions, because it guarantees high-quality experiences for users. Existing image matching algorithms always generate mismatches which heavily weaken the performance the human-machine-interactive systems. Although many mismatch removal algorithms have been proposed, few of them achieve real-time speed with high frame rate and low delay, because of complicated arithmetic operations and iterations. This paper proposes a temporal constraints and block weighting judgement based high frame rate and ultra-low delay mismatch removal system. The proposed method is based on two temporal constraints (proposal #1 and proposal #2) to firstly find some true matches, and uses these true matches to generate block weighting (proposal #3). Proposal #1 finds out some correct matches through checking a triangle route formed by three adjacent frames. Proposal #2 further reduces mismatch risk by adding one more time of matching with opposite matching direction. Finally, proposal #3 distinguishes the unverified matches to be correct or incorrect through weighting of each block. Software experiments show that the proposed mismatch removal system achieves state-of-the-art accuracy in mismatch removal. Hardware experiments indicate that the designed image processing core successfully achieves real-time processing of 784fps VGA (640×480 pixels/frame) video on field programmable gate array (FPGA), with a delay of 0.858 ms/frame.
Establishing local visual correspondences between images taken under different conditions is an important and challenging task in computer vision. A common solution for this task is detecting keypoints in images and then matching the keypoints with a feature descriptor. This paper proposes a robust and low-dimensional local feature descriptor named Adaptively Integrated Gradient and Intensity Feature (AIGIF). The proposed AIGIF descriptor partitions the support region surrounding each keypoint into sub-regions, and classifies the sub-regions into two categories: edge-dominated ones and smoothness-dominated ones. For edge-dominated sub-regions, gradient magnitude and orientation features are extracted; for smoothness-dominated sub-regions, intensity feature is extracted. The gradient and intensity features are integrated to generate the descriptor. Experiments on image matching were conducted to evaluate performances of the proposed AIGIF. Compared with SIFT, the proposed AIGIF achieves 75% reduction of feature dimension (from 128 bytes to 32 bytes); compared with SURF, the proposed AIGIF achieves 87.5% reduction of feature dimension (from 256 bytes to 32 bytes); compared with the state-of-the-art ORB descriptor which has the same feature dimension with AIGIF, AIGIF achieves higher accuracy and robustness. In summary, the AIGIF combines the advantages of gradient feature and intensity feature, and achieves relatively high accuracy and robustness with low feature dimension.