1-2hit |
Yuzhi SHI Takayoshi YAMASHITA Tsubasa HIRAKAWA Hironobu FUJIYOSHI Mitsuru NAKAZAWA Yeongnam CHAE Björn STENGER
Action spotting is a key component in high-level video understanding. The large number of similar frames poses a challenge for recognizing actions in videos. In this paper we use frame saliency to represent the importance of frames for guiding the model to focus on keyframes. We propose the frame saliency weighting module to improve frame saliency and video representation at the same time. Our proposed model contains two encoders, for pre-action and post-action time windows, to encode video context. We validate our design choices and the generality of proposed method in extensive experiments. On the public SoccerNet-v2 dataset, the method achieves an average mAP of 57.3%, improving over the state of the art. Using embedding features obtained from multiple feature extractors, the average mAP further increases to 75%. We show that reducing the model size by over 90% does not significantly impact performance. Additionally, we use ablation studies to prove the effective of saliency weighting module. Further, we show that our frame saliency weighting strategy is applicable to existing methods on more general action datasets, such as SoccerNet-v1, ActivityNet v1.3, and UCF101.
An Ngoc VAN Mitsuru NAKAZAWA Yoshimitsu AOKI
In recent years, the images captured by AVHRR (Advanced Very High Resolution Radiometer) on the NOAA (National Oceanic and Atmospheric Administration) series of satellites have been used very widely for environment and land cover monitoring. In order to use NOAA images, they need to be accurately transformed from the image coordinate system into map coordinate system. This paper proposes a geometric correction method that corrects the errors caused by this transformation. In this method, the errors in NOAA image are corrected in the image coordinate system before transforming into the map coordinate system. First, the elevation values, which are read from GTOPO30 database, are verified to divide data into flat and rough blocks. Next, in order to increase the number of GCPs (Ground Control Points), besides the GCPs in the database, more GCPs are generated based on the feature of the coastline. After using reference images to correct the missing lines and noise pixels in the top and bottom parts of the image, the elevation errors of the GCP templates are corrected and GCP template matching is applied to find the residual errors for the blocks that match GCP templates. Based on these blocks, the residual errors of other flat and rough blocks are calculated by affine and Radial Basis Function transform respectively. According to the residual errors, all pixels in the image are moved to their correct positions. Finally, data is transformed from image into map by bilinear interpolation. With the proposed method, the average values of the error after correction are smaller than 0.2 pixels on both latitude and longitude directions. This result proved that the proposed method is a highly accurate geometric correction method.