The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] MPO(945hit)

21-40hit(945hit)

  • Spherical Style Deformation on Single Component Models

    Xuemei FENG  Qing FANG  Kouichi KONNO  Zhiyi ZHANG  Katsutsugu MATSUYAMA  

     
    PAPER-Computer Graphics

      Pubricized:
    2023/08/22
      Vol:
    E106-D No:11
      Page(s):
    1891-1905

    In this study, we present a spherical style deformation algorithm to be applied on single component models that can deform the models with spherical style, while preserving the local details of the original models. Because 3D models have complex skeleton structures that consist of many components, the deformation around connections between each single component is complicated, especially preventing mesh self-intersections. To the best of our knowledge, there does not exist not only methods to achieve a spherical style in a 3D model consisting of multiple components but also methods suited to a single component. In this study, we focus on spherical style deformation of single component models. Accordingly, we propose a deformation method that transforms the input model with the spherical style, while preserving the local details of the input model. Specifically, we define an energy function that combines the as-rigid-as-possible (ARAP) method and spherical features. The spherical term is defined as l2-regularization on a linear feature; accordingly, the corresponding optimization can be solved efficiently. We also observed that the results of our deformation are dependent on the quality of the input mesh. For instance, when the input mesh consists of many obtuse triangles, the spherical style deformation method fails. To address this problem, we propose an optional deformation method based on convex hull proxy model as the complementary deformation method. Our proxy method constructs a proxy model of the input model and applies our deformation method to the proxy model to deform the input model by projection and interpolation. We have applied our proposed method to simple and complex shapes, compared our experimental results with the 3D geometric stylization method of normal-driven spherical shape analogies, and confirmed that our method successfully deforms models that are smooth, round, and curved. We also discuss the limitations and problems of our algorithm based on the experimental results.

  • Prior Information Based Decomposition and Reconstruction Learning for Micro-Expression Recognition

    Jinsheng WEI  Haoyu CHEN  Guanming LU  Jingjie YAN  Yue XIE  Guoying ZHAO  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2023/07/13
      Vol:
    E106-D No:10
      Page(s):
    1752-1756

    Micro-expression recognition (MER) draws intensive research interest as micro-expressions (MEs) can infer genuine emotions. Prior information can guide the model to learn discriminative ME features effectively. However, most works focus on researching the general models with a stronger representation ability to adaptively aggregate ME movement information in a holistic way, which may ignore the prior information and properties of MEs. To solve this issue, driven by the prior information that the category of ME can be inferred by the relationship between the actions of facial different components, this work designs a novel model that can conform to this prior information and learn ME movement features in an interpretable way. Specifically, this paper proposes a Decomposition and Reconstruction-based Graph Representation Learning (DeRe-GRL) model to efectively learn high-level ME features. DeRe-GRL includes two modules: Action Decomposition Module (ADM) and Relation Reconstruction Module (RRM), where ADM learns action features of facial key components and RRM explores the relationship between these action features. Based on facial key components, ADM divides the geometric movement features extracted by the graph model-based backbone into several sub-features, and learns the map matrix to map these sub-features into multiple action features; then, RRM learns weights to weight all action features to build the relationship between action features. The experimental results demonstrate the effectiveness of the proposed modules, and the proposed method achieves competitive performance.

  • Time-Series Prediction Based on Double Pyramid Bidirectional Feature Fusion Mechanism

    Na WANG  Xianglian ZHAO  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2022/12/20
      Vol:
    E106-A No:6
      Page(s):
    886-895

    The application of time-series prediction is very extensive, and it is an important problem across many fields, such as stock prediction, sales prediction, and loan prediction and so on, which play a great value in production and life. It requires that the model can effectively capture the long-term feature dependence between the output and input. Recent studies show that Transformer can improve the prediction ability of time-series. However, Transformer has some problems that make it unable to be directly applied to time-series prediction, such as: (1) Local agnosticism: Self-attention in Transformer is not sensitive to short-term feature dependence, which leads to model anomalies in time-series; (2) Memory bottleneck: The spatial complexity of regular transformation increases twice with the sequence length, making direct modeling of long time-series infeasible. In order to solve these problems, this paper designs an efficient model for long time-series prediction. It is a double pyramid bidirectional feature fusion mechanism network with parallel Temporal Convolution Network (TCN) and FastFormer. This network structure can combine the time series fine-grained information captured by the Temporal Convolution Network with the global interactive information captured by FastFormer, it can well handle the time series prediction problem.

  • Biofuel Cell Fueled by Decomposing Cellulose Nanofiber to Glucose by Using Cellulase Enzyme

    Ryutaro TANAKA  Satomitsu IMAI  

     
    BRIEF PAPER

      Pubricized:
    2022/11/28
      Vol:
    E106-C No:6
      Page(s):
    262-265

    Conventional enzymatic biofuel cells (EBFCs) use glucose solution or glucose from human body. It is desirable to get glucose from a substance containing glucose because the glucose concentration can be kept at the optimum level. This work developed a biofuel cell that generates electricity from cellulose, which is the main components of plants, by using decomposing enzyme of cellulase. Cellulose nanofiber (CNF) was chosen for the ease of decomposability. It was confirmed by the cyclic voltammetry method that cellulase was effective against CNF. The maximum output of the optimized proposed method was 38.7 μW/cm2, which was 85% of the output by using the glucose solution at the optimized concentration.

  • FSPose: A Heterogeneous Framework with Fast and Slow Networks for Human Pose Estimation in Videos

    Jianfeng XU  Satoshi KOMORITA  Kei KAWAMURA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2023/03/20
      Vol:
    E106-D No:6
      Page(s):
    1165-1174

    We propose a framework for the integration of heterogeneous networks in human pose estimation (HPE) with the aim of balancing accuracy and computational complexity. Although many existing methods can improve the accuracy of HPE using multiple frames in videos, they also increase the computational complexity. The key difference here is that the proposed heterogeneous framework has various networks for different types of frames, while existing methods use the same networks for all frames. In particular, we propose to divide the video frames into two types, including key frames and non-key frames, and adopt three networks including slow networks, fast networks, and transfer networks in our heterogeneous framework. For key frames, a slow network is used that has high accuracy but high computational complexity. For non-key frames that follow a key frame, we propose to warp the heatmap of a slow network from a key frame via a transfer network and fuse it with a fast network that has low accuracy but low computational complexity. Furthermore, when extending to the usage of long-term frames where a large number of non-key frames follow a key frame, the temporal correlation decreases. Therefore, when necessary, we use an additional transfer network that warps the heatmap from a neighboring non-key frame. The experimental results on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed FSPose achieves a better balance between accuracy and computational complexity than the competitor method. Our source code is available at https://github.com/Fenax79/fspose.

  • A QR Decomposition Algorithm with Partial Greedy Permutation for Zero-Forcing Block Diagonalization

    Shigenori KINJO  Takayuki GAMOH  Masaaki YAMANAKA  

     
    PAPER-Communication Theory and Signals

      Pubricized:
    2022/10/18
      Vol:
    E106-A No:4
      Page(s):
    665-673

    A new zero-forcing block diagonalization (ZF-BD) scheme that enables both a more simplified ZF-BD and further increase in sum rate of MU-MIMO channels is proposed in this paper. The proposed scheme provides the improvement in BER performance for equivalent SU-MIMO channels. The proposed scheme consists of two components. First, a permuted channel matrix (PCM), which is given by moving the submatrix related to a target user to the bottom of a downlink MIMO channel matrix, is newly defined to obtain a precoding matrix for ZF-BD. Executing QR decomposition alone for a given PCM provides null space for the target user. Second, a partial MSQRD (PMSQRD) algorithm, which adopts MSQRD only for a target user to provide improvement in bit rate and BER performance for the user, is proposed. Some numerical simulations are performed, and the results show improvement in sum rate performance of the total system. In addition, appropriate bit allocation improves the bit error rate (BER) performance in each equivalent SU-MIMO channel. A successive interference cancellation is applied to achieve further improvement in BER performance of user terminals.

  • An Interpretation Method on Amplitude Intensities for Response Waveforms of Backward Transient Scattered Field Components by a 2-D Coated Metal Cylinder

    Keiji GOTO  Toru KAWANO  

     
    PAPER

      Pubricized:
    2022/09/29
      Vol:
    E106-C No:4
      Page(s):
    118-126

    In this paper, we propose an interpretation method on amplitude intensities for response waveforms of backward transient scattered field components for both E- and H-polarizations by a 2-D coated metal cylinder. A time-domain (TD) asymptotic solution, which is referred to as a TD Fourier transform method (TD-FTM), is derived by applying the FTM to a backward transient scattered field expressed by an integral form. The TD-FTM is represented by a combination of a direct geometric optical ray (DGO) and a reflected GO (RGO) series. We use the TD-FTM to derive amplitude intensity ratios (AIRs) between adjacent backward transient scattered field components. By comparing the numerical values of the AIRs with those of the influence factors that compose the AIRs, major factor(s) can be identified, thereby allowing detailed interpretation method on the amplitude intensities for the response waveforms of backward transient scattered field components. The accuracy and practicality of the TD-FTM are evaluated by comparing it with three reference solutions. The effectiveness of an interpretation method on the amplitude intensities for response waveforms of backward transient scattered field components is revealed by identifying major factor(s) affecting the amplitude intensities.

  • Group Sparse Reduced Rank Tensor Regression for Micro-Expression Recognition

    Sunan LI  Yuan ZONG  Cheng LU  Chuangan TANG  Yan ZHAO  

     
    LETTER-Human-computer Interaction

      Pubricized:
    2023/01/05
      Vol:
    E106-D No:4
      Page(s):
    575-578

    To overcome the challenge in micro-expression recognition that it only emerge in several small facial regions with low intensity, some researchers proposed facial region partition mechanisms and introduced group sparse learning methods for feature selection. However, such methods have some shortcomings, including the complexity of region division and insufficient utilization of critical facial regions. To address these problems, we propose a novel Group Sparse Reduced Rank Tensor Regression (GSRRTR) to transform the fearure matrix into a tensor by laying blocks and features in different dimensions. So we can process grids and texture features separately and avoid interference between grids and features. Furthermore, with the use of Tucker decomposition, the feature tensor can be decomposed into a product of core tensor and a set of matrix so that the number of parameters and the computational complexity of the scheme will decreased. To evaluate the performance of the proposed micro-expression recognition method, extensive experiments are conducted on two micro expression databases: CASME2 and SMIC. The experimental results show that the proposed method achieves comparable recognition rate with less parameters than state-of-the-art methods.

  • Security Evaluation of Initialization Phases and Round Functions of Rocca and AEGIS

    Nobuyuki TAKEUCHI  Kosei SAKAMOTO  Takanori ISOBE  

     
    PAPER

      Pubricized:
    2022/11/09
      Vol:
    E106-A No:3
      Page(s):
    253-262

    Authenticated-Encryption with Associated-Data (AEAD) plays an important role in guaranteeing confidentiality, integrity, and authenticity in network communications. To meet the requirements of high-performance applications, several AEADs make use of AES New Instructions (AES-NI), which can conduct operations of AES encryption and decryption dramatically fast by hardware accelerations. At SAC 2013, Wu and Preneel proposed an AES-based AEAD scheme called AEGIS-128/128L/256, to achieve high-speed software implementation. At FSE 2016, Jean and Nikolić generalized the construction of AEGIS and proposed more efficient round functions. At ToSC 2021, Sakamoto et al. further improved the constructions of Jean and Nikolić, and proposed an AEAD scheme called Rocca for beyond 5G. In this study, we first evaluate the security of the initialization phases of Rocca and AEGIS family against differential and integral attacks using MILP (Mixed Integer Linear Programming) tools. Specifically, according to the evaluation based on the lower bounds for the number of active S-boxes, the initialization phases of AEGIS-128/128L/256 are secure against differential attacks after 4/3/6 rounds, respectively. Regarding integral attacks, we present the integral distinguisher on 6 rounds and 6/5/7 rounds in the initialization phases of Rocca and AEGIS-128/128L/256, respectively. Besides, we evaluate the round function of Rocca and those of Jean and Nikolić as cryptographic permutations against differential, impossible differential, and integral attacks. Our results indicate that, for differential attacks, the growth rate of increasing the number of active S-boxes in Rocca is faster than those of Jean and Nikolić. For impossible differential and integral attacks, we show that the round function of Rocca achieves the sufficient level of the security against these attacks in smaller number of rounds than those of Jean and Nikolić.

  • Accurate Phase Angle Measurement of Backscatter Signal under Noisy Environment

    Tomoya IWASAKI  Osamu TOKUMASU  Jin MITSUGI  

     
    PAPER

      Pubricized:
    2022/09/15
      Vol:
    E106-A No:3
      Page(s):
    464-470

    Backscatter communication is an emerging wireless access technology to realize ultra-low power terminals exploiting the modulated reflection of incident radio wave. This paper proposes a method to measure the phase angle of backscatter link using principal component analysis (PCA). The phase angle measurement of backscatter link at the receiver is essential to maximize the signal quality for subsequent demodulation and to measure the distance and the angle of arrival. The drawback of popular phase angle measurement with naive phase averaging and linear regression analysis is to produce erroneous phase angle, where the phase angle is close to $pm rac{pi}{2}$ radian and the signal quality is poor. The advantage of the proposal is quantified with a computer simulation, a conducted experiment and radio propagation experiments.

  • Ensemble-Based Method for Correcting Global Explanation of Prediction Model

    Masaki HAMAMOTO  Hiroyuki NAMBA  Masashi EGI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2022/11/15
      Vol:
    E106-D No:2
      Page(s):
    218-228

    Explainable artificial intelligence (AI) technology enables us to quantitatively analyze the whole prediction logic of AI as a global explanation. However, unwanted relationships learned by AI due to data sparsity, high dimensionality, and noise are also visualized in the explanation, which deteriorates confidence in the AI. Thus, methods for correcting those unwanted relationships in explanation has been developed. However, since these methods are applicable only to differentiable machine learning (ML) models but not to non-differentiable models such as tree-based models, they are insufficient for covering a wide range of ML technology. Since these methods also require re-training of the model for correcting its explanation (i.e., in-processing method), they cannot be applied to black-box models provided by third parties. Therefore, we propose a method called ensemble-based explanation correction (EBEC) as a post-processing method for correcting the global explanation of a prediction model in a model-agnostic manner by using the Rashomon effect of statistics. We evaluated the performance of EBEC with three different tasks and analyzed its function in more detail. The evaluation results indicate that EBEC can correct global explanation of the model so that the explanation aligns with the domain knowledge given by the user while maintaining its accuracy. EBEC can be extended in various ways and combined with any method to improve correction performance since it is a post-processing-type correction method. Hence, EBEC would contribute to high-productivity ML modeling as a new type of explanation-correction method.

  • Multi-Input Physical Layer Network Coding in Two-Dimensional Wireless Multihop Networks

    Hideaki TSUGITA  Satoshi DENNO  Yafei HOU  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2022/08/10
      Vol:
    E106-B No:2
      Page(s):
    193-202

    This paper proposes multi-input physical layer network coding (multi-input PLNC) for high speed wireless communication in two-dimensional wireless multihop networks. In the proposed PLNC, all the terminals send their packets simultaneously for the neighboring relays to maximize the network throughput in the first slot, and all the relays also do the same to the neighboring terminals in the second slot. Those simultaneous signal transmissions cause multiple signals to be received at the relays and the terminals. Signal reception in the multi-input PLNC uses multichannel filtering to mitigate the difficulties caused by the multiple signal reception, which enables the two-input PLNC to be applied. In addition, a non-linear precoding is proposed to reduce the computational complexity of the signal detection at the relays and the terminals. The proposed multi-input PLNC makes all the terminals exchange their packets with the neighboring terminals in only two time slots. The performance of the proposed multi-input PLNC is confirmed by computer simulation. The proposed multi-input physical layer network coding achieves much higher network throughput than conventional techniques in a two-dimensional multihop wireless network with 7 terminals. The proposed multi-input physical layer network coding attains superior transmission performance in wireless hexagonal multihop networks, as long as more than 6 antennas are placed on the terminals and the relays.

  • Spatial-Temporal Aggregated Shuffle Attention for Video Instance Segmentation of Traffic Scene

    Chongren ZHAO  Yinhui ZHANG  Zifen HE  Yunnan DENG  Ying HUANG  Guangchen CHEN  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2022/11/24
      Vol:
    E106-D No:2
      Page(s):
    240-251

    Aiming at the problem of spatial focus regions distribution dispersion and dislocation in feature pyramid networks and insufficient feature dependency acquisition in both spatial and channel dimensions, this paper proposes a spatial-temporal aggregated shuffle attention for video instance segmentation (STASA-VIS). First, an mixed subsampling (MS) module to embed activating features from the low-level target area of feature pyramid into the high-level is designed, so as to aggregate spatial information on target area. Taking advantage of the coherent information in video frames, STASA-VIS uses the first ones of every 5 video frames as the key-frames and then propagates the keyframe feature maps of the pyramid layers forward in the time domain, and fuses with the non-keyframe mixed subsampled features to achieve time-domain consistent feature aggregation. Finally, STASA-VIS embeds shuffle attention in the backbone to capture the pixel-level pairwise relationship and dimensional dependencies among the channels and reduce the computation. Experimental results show that the segmentation accuracy of STASA-VIS reaches 41.2%, and the test speed reaches 34FPS, which is better than the state-of-the-art one stage video instance segmentation (VIS) methods in accuracy and achieves real-time segmentation.

  • Orthogonal Deep Feature Decomposition Network for Cross-Resolution Person Re-Identification

    Rui SUN  Zi YANG  Lei ZHANG  Yiheng YU  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2022/08/23
      Vol:
    E105-D No:11
      Page(s):
    1994-1997

    Person images captured by surveillance cameras in real scenes often have low resolution (LR), which suffers from severe degradation in recognition performance when matched with pre-stocked high-resolution (HR) images. There are existing methods which typically employ super-resolution (SR) techniques to address the resolution discrepancy problem in person re-identification (re-ID). However, SR techniques are intended to enhance the human eye visual fidelity of images without caring about the recovery of pedestrian identity information. To cope with this challenge, we propose an orthogonal depth feature decomposition network. And we decompose pedestrian features into resolution-related features and identity-related features who are orthogonal to each other, from which we design the identity-preserving loss and resolution-invariant loss to ensure the recovery of pedestrian identity information. When compared with the SOTA method, experiments on the MLR-CUHK03 and MLR-VIPeR datasets demonstrate the superiority of our method.

  • Finite-Horizon Optimal Spatio-Temporal Pattern Control under Spatio-Temporal Logic Specifications

    Takuma KINUGAWA  Toshimitsu USHIO  

     
    PAPER

      Pubricized:
    2022/04/08
      Vol:
    E105-D No:10
      Page(s):
    1658-1664

    In spatially distributed systems such as smart buildings and intelligent transportation systems, control of spatio-temporal patterns is an important issue. In this paper, we consider a finite-horizon optimal spatio-temporal pattern control problem where the pattern is specified by a signal spatio-temporal logic formula over finite traces, which will be called an SSTLf formula. We give the syntax and Boolean semantics of SSTLf. Then, we show linear encodings of the temporal and spatial operators used in SSTLf and we convert the problem into a mixed integer programming problem. We illustrate the effectiveness of this proposed approach through an example of a heat system in a room.

  • Low-Complexity Hybrid Precoding Based on PAST for Millimeter Wave Massive MIMO System Open Access

    Rui JIANG  Xiao ZHOU  You Yun XU  Li ZHANG  

     
    PAPER-Fundamental Theories for Communications

      Pubricized:
    2022/04/21
      Vol:
    E105-B No:10
      Page(s):
    1192-1201

    Millimeter wave (mmWave) massive Multiple-Input Multiple-Output (MIMO) systems generally adopt hybrid precoding combining digital and analog precoder as an alternative to full digital precoding to reduce RF chains and energy consumption. In order to balance the relationship between spectral efficiency, energy efficiency and hardware complexity, the hybrid-connected system structure should be adopted, and then the solution process of hybrid precoding can be simplified by decomposing the total achievable rate into several sub-rates. However, the singular value decomposition (SVD) incurs high complexity in calculating the optimal unconstrained hybrid precoder for each sub-rate. Therefore, this paper proposes PAST, a low complexity hybrid precoding algorithm based on projection approximate subspace tracking. The optimal unconstrained hybrid precoder of each sub-rate is estimated with the PAST algorithm, which avoids the high complexity process of calculating the left and right singular vectors and singular value matrix by SVD. Simulations demonstrate that PAST matches the spectral efficiency of SVD-based hybrid precoding in full-connected (FC), hybrid-connected (HC) and sub-connected (SC) system structure. Moreover, the superiority of PAST over SVD-based hybrid precoding in terms of complexity and increases with the number of transmitting antennas.

  • Class-E Power Amplifier with Improved PAE Bandwidth Using Double CRLH TL Stub for Harmonic Tuning Open Access

    Shinichi TANAKA  Hirotaka ASAMI  Takahiro SUZUKI  

     
    INVITED PAPER

      Pubricized:
    2022/04/11
      Vol:
    E105-C No:10
      Page(s):
    441-448

    This paper presents a class-E power amplifier (PA) with a novel harmonic tuning circuit (HTC) based on composite right-/left-handed transmission lines (CRLH TLs). One of the issues of conventional harmonically tuned PAs is the limited PAE bandwidth. It is shown by simulation that class-E amplifiers have potential of maintaining high PAE over a wider frequency range than for example class-F amplifiers. To make full use of class-E amplifiers with the superior characteristics, an HTC using double CRLH TL stub structure is proposed. The HTC is not only compact but also enhances the inherently wide operation frequency range of class-E amplifier. A 2-GHz 6W GaN-HEMT class-E PA using the proposed HTC demonstrated a PAE bandwidth (≥65%) of 380MHz with maximum drain efficiency and PAE of 78.5% and 74.0%, respectively.

  • Multibeam Patterns Suitable for Massive MIMO Configurations

    Kentaro NISHIMORI  Jiro HIROKAWA  

     
    PAPER

      Pubricized:
    2022/07/13
      Vol:
    E105-B No:10
      Page(s):
    1162-1172

    A multibeam massive multiple input multiple output (MIMO) configuration employs beam selection with high power in the analog part and executes a blind algorithm such as the independent component analysis (ICA), which does not require channel state information in the digital part. Two-dimensional (2-D) multibeams are considered in actual power losses and beam steering errors regarding the multibeam patterns. However, the performance of these 2-D beams depends on the beam pattern of the multibeams, and they are not optimal multibeam patterns suitable for multibeam massive MIMO configurations. In this study, we clarify the performance difference due to the difference of the multibeam pattern and consider the multibeam pattern suitable for the system condition. Specifically, the optimal multibeam pattern was determined with the element spacing and beamwidth of the element directivity as parameters, and the effectiveness of the proposed method was verified via computer simulations.

  • Interpretation Method of Inversion Phenomena on Backward Transient Scattered Field Components by a Coated Metal Cylinder

    Toru KAWANO  Keiji GOTO  

     
    PAPER-Electromagnetic Theory

      Pubricized:
    2022/02/24
      Vol:
    E105-C No:9
      Page(s):
    389-397

    An interpretation method of inversion phenomena is newly proposed for backward transient scattered field components for both E- and H-polarizations when an ultra-wideband (UWB) pulse wave radiated from a line source is incident on a two-dimensional metal cylinder covered with a lossless dielectric medium layer (coated metal cylinder). A time-domain (TD) asymptotic solution, which is referred to as a TD saddle point technique (TD-SPT), is derived by applying the SPT in evaluating a backward transient scattered field which is expressed by an integral form. The TD-SPT is represented by a combination of a direct geometric optical ray (DGO) and a reflected GO (RGO) series, thereby being able to extract and calculate any backward transient scattered field component from a response waveform. The TD-SPT is useful in understanding the response waveform of a backward transient scattered field by a coated metal cylinder because it can give us the peak value and arrival time of any field component, namely DGO and RGO components, and interpret analytically inversion phenomenon of any field component. The accuracy, validity, and practicality of the TD-SPT are clarified by comparing it with two kinds of reference solutions.

  • Spatial-Temporal Regularized Correlation Filter with Precise State Estimation for Visual Tracking

    Zhaoqian TANG  Kaoru ARAKAWA  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2021/12/15
      Vol:
    E105-A No:6
      Page(s):
    914-922

    Recently, the performances of discriminative correlation filter (CF) trackers are getting better and better in visual tracking. In this paper, we propose spatial-temporal regularization with precise state estimation based on discriminative correlation filter (STPSE) in order to achieve more significant tracking performance. First, we consider the continuous change of the object state, using the information from the previous two filters for training the correlation filter model. Here, we train the correlation filter model with the hand-crafted features. Second, we introduce update control in which average peak-to-correlation energy (APCE) and the distance between the object locations obtained by HOG features and hand-crafted features are utilized to detect abnormality of the state around the object. APCE and the distance indicate the reliability of the filter response, thus if abnormality is detected, the proposed method does not update the scale and the object location estimated by the filter response. In the experiment, our tracker (STPSE) achieves significant and real-time performance with only CPU for the challenging benchmark sequence (OTB2013, OTB2015, and TC128).

21-40hit(945hit)