Lei WANG Shanmin YANG Jianwei ZHANG Song GU
Human action recognition (HAR) exhibits limited accuracy in video surveillance due to the 2D information captured with monocular cameras. To address the problem, a depth estimation-based human skeleton action recognition method (SARDE) is proposed in this study, with the aim of transforming 2D human action data into 3D format to dig hidden action clues in the 2D data. SARDE comprises two tasks, i.e., human skeleton action recognition and monocular depth estimation. The two tasks are integrated in a multi-task manner in end-to-end training to comprehensively utilize the correlation between action recognition and depth estimation by sharing parameters to learn the depth features effectively for human action recognition. In this study, graph-structured networks with inception blocks and skip connections are investigated for depth estimation. The experimental results verify the effectiveness and superiority of the proposed method in skeleton action recognition that the method reaches state-of-the-art on the datasets.
Kyohei MURAKATA Koichi KOBAYASHI Yuh YAMASHITA
The multi-agent surveillance problem is to find optimal trajectories of multiple agents that patrol a given area as evenly as possible. In this paper, we consider the multi-agent surveillance problem based on travel cost minimization. The surveillance area is given by an undirected graph. The penalty for each agent is introduced to evaluate the surveillance performance. Through a mixed logical dynamical system model, the multi-agent surveillance problem is reduced to a mixed integer linear programming (MILP) problem. In model predictive control, trajectories of agents are generated by solving the MILP problem at each discrete time. Furthermore, a condition that the MILP problem is always feasible is derived based on the Chinese postman problem. Finally, the proposed method is demonstrated by a numerical example.
Keita TERASHIMA Koichi KOBAYASHI Yuh YAMASHITA
In a multi-agent system, it is important to consider a design method of cooperative actions in order to achieve a common goal. In this paper, we propose two novel multi-agent reinforcement learning methods, where the control specification is described by linear temporal logic formulas, which represent a common goal. First, we propose a simple solution method, which is directly extended from the single-agent case. In this method, there are some technical issues caused by the increase in the number of agents. Next, to overcome these technical issues, we propose a new method in which an aggregator is introduced. Finally, these two methods are compared by numerical simulations, with a surveillance problem as an example.
M.K. JEEVARAJAN P. NIRMAL KUMAR
We present a reconfigurable deep learning pedestrian detection system for surveillance systems that detect people with shadows in different lighting and heavily occluded conditions. This work proposes a region-based CNN, combined with CMOS and thermal cameras to obtain human features even under poor lighting conditions. The main advantage of a reconfigurable system with respect to processor-based systems is its high performance and parallelism when processing large amount of data such as video frames. We discuss the details of hardware implementation in the proposed real-time pedestrian detection algorithm on a Zynq FPGA. Simulation results show that the proposed integrated approach of R-CNN architecture with cameras provides better performance in terms of accuracy, precision, and F1-score. The performance of Zynq FPGA was compared to other works, which showed that the proposed architecture is a good trade-off in terms of quality, accuracy, speed, and resource utilization.
Kazuhisa FUJIMOTO Masanori TAKADA
Neuromorphic computing with a spiking neural network (SNN) is expected to provide a complement or alternative to deep learning in the future. The challenge is to develop optimal SNN models, algorithms, and engineering technologies for real use cases. As a potential use cases for neuromorphic computing, we have investigated a person monitoring and worker support with a video surveillance system, given its status as a proven deep neural network (DNN) use case. In the future, to increase the number of cameras in such a system, we will need a scalable approach that embeds only a few neuromorphic devices in a camera. Specifically, this will require a shallow SNN model that can be implemented in a few neuromorphic devices while providing a high recognition accuracy comparable to a DNN with the same configuration. A shallow SNN was built by converting ResNet, a proven DNN for image recognition, and a new configuration of the shallow SNN model was developed to improve its accuracy. The proposed shallow SNN model was evaluated with a few neuromorphic devices, and it achieved a recognition accuracy of more than 80% with about 1/130 less energy consumption than that of a GPU with the same configuration of DNN as that of SNN.
Yue LI Xiaosheng YU Haijun CAO Ming XU
An autoencoder is trained to generate the background from the surveillance image by setting the training label as the shuffled input, instead of the input itself in a traditional autoencoder. Then the multi-scale features are extracted by a sparse autoencoder from the surveillance image and the corresponding background to detect foreground.
Shuoyan LIU Enze YANG Kai FANG
Abnormal behavior detection is now a widely concerned research field, especially for crowded scenes. However, most traditional unsupervised approaches often suffered from the problem when the normal events in the scenario with large visual variety. This paper proposes a self-learning probabilistic Latent Semantic Analysis, which aims at taking full advantage of the high-level abnormal information to solve problems. We select the informative observations to construct the “reference events” from the training sets as a high-level guidance cue. Specifically, the training set is randomly divided into two separate subsets. One is used to learn this model, which is defined as the initialization sequence of “reference events”. The other aims to update this model and the the infrequent samples are chosen into the “reference events”. Finally, we define anomalies using events that are least similar to “reference events”. The experimental result demonstrates that the proposed model can detect anomalies accurately and robustly in the real-world crowd environment.
We propose a video authentication scheme to verify whether a given video file is recorded by a camera device or touched by a video editing tool. The proposed scheme prepares software characteristics of camera devices and video editing tools in advance, and compares them with the metadata of the given video file. Through practical implementation, we show that the proposed scheme has benefits of fast analysis time, high accuracy and full automation.
Guowei TENG Hao LI Zhenglong YANG
This paper proposes a temporal domain difference based secondary background modeling algorithm for surveillance video coding. The proposed algorithm has three key technical contributions as following. Firstly, the LDBCBR (Long Distance Block Composed Background Reference) algorithm is proposed, which exploits IBBS (interval of background blocks searching) to weaken the temporal correlation of the foreground. Secondly, both BCBR (Block Composed Background Reference) and LDBCBR are exploited at the same time to generate the temporary background reference frame. The secondary modeling algorithm utilizes the temporary background blocks generated by BCBR and LDBCBR to get the final background frame. Thirdly, monitor the background reference frame after it is generated is also important. We would update the background blocks immediately when it has a big change, shorten the modeling period of the areas where foreground moves frequently and check the stable background regularly. The proposed algorithm is implemented in the platform of IEEE1857 and the experimental results demonstrate that it has significant improvement in coding efficiency. In surveillance test sequences recommended by the China AVS (Advanced Audio Video Standard) working group, our method achieve BD-Rate gain by 6.81% and 27.30% comparing with BCBR and the baseline profile.
Ryo MASUDA Koichi KOBAYASHI Yuh YAMASHITA
The surveillance problem is to find optimal trajectories of agents that patrol a given area as evenly as possible. In this paper, we consider multiple agents with fuel constraints. The surveillance area is given by a weighted directed graph, where the weight assigned to each arc corresponds to the fuel consumption/supply. For each node, the penalty to evaluate the unattended time is introduced. Penalties, agents, and fuels are modeled by a mixed logical dynamical system model. Then, the surveillance problem is reduced to a mixed integer linear programming (MILP) problem. Based on the policy of model predictive control, the MILP problem is solved at each discrete time. In this paper, the feasibility condition for the MILP problem is derived. Finally, the proposed method is demonstrated by a numerical example.
Houari SABIRIN Hitoshi NISHIMURA Sei NAITO
A multi-camera setup for a surveillance system enables a larger coverage area, especially when a single camera has limited monitoring capability due to certain obstacles. Therefore, for large-scale coverage, multiple cameras are the best option. In this paper, we present a method for detecting multiple objects using several cameras with large overlapping views as this allows synchronization of object identification from a number of views. The proposed method uses a graph structure that is robust enough to represent any detected moving objects by defining their vertices and edges to determine their relationships. By evaluating these object features, represented as a set of attributes in a graph, we can perform lightweight multiple object detection using several cameras, as well as performing object tracking within each camera's field of view and between two cameras. By evaluating each vertex hierarchically as a subgraph, we can further observe the features of the detected object and perform automatic separation of occluding objects. Experimental results show that the proposed method would improve the accuracy of object tracking by reducing the occurrences of incorrect identification compared to individual camera-based tracking.
Zhenglong YANG Guozhong WANG GuoWei TENG
Although HEVC rate control can achieve high coding efficiency, it still does not fully utilize the special characteristics of surveillance videos, which typically have a moving foreground and relatively static background. For surveillance videos, it is usually necessary to provide a better coding quality of the moving foreground. In this paper, a foreground-background CTU λ separate decision scheme is proposed. First, low-complexity pixel-based segmentation is presented to obtain the foreground and the background. Second, the rate distortion (RD) characteristics of the foreground and the background are explored. With the rate distortion optimization (RDO) process, the average CTU λ value of the foreground or the background should be equal to the frame λ. Then, a separate optimal CTU λ decision is proposed with a separate λ clipping method. Finally, a separate updating process is used to obtain reasonable parameters for the foreground and the background. The experimental results show that the quality of the foreground is improved by 0.30 dB in the random access configuration and 0.45 dB in the low delay configuration without degradation of either the rate control accuracy or whole frame quality.
Kaimin CHEN Wei LI Zhaohuan ZHAN Binbin LIANG Songchen HAN
Since camera networks for surveillance are becoming extremely dense, finding the most informative and desirable views from different cameras are of increasing importance. In this paper, we propose a camera selection method to achieve the goal of providing the clearest visibility possible and selecting the cameras which exactly capture targets for the far-field surveillance. We design a benefit function that takes into account image visibility and the degree of target matching between different cameras. Here, visibility is defined using the entropy of intensity histogram distribution, and the target correspondence is based on activity features rather than photometric features. The proposed solution is tested in both artificial and real environments. A performance evaluation shows that our target correspondence method well suits far-field surveillance, and our proposed selection method is more effective at identifying the cameras that exactly capture the surveillance target than existing methods.
Koichi KOBAYASHI Mifuyu KIDO Yuh YAMASHITA
In this paper, a surveillance system by multiple agents, which is called a multi-agent surveillance system, is studied. A surveillance area is given by an undirected connected graph. Then, the optimal control problem for multi-agent surveillance systems (the optimal surveillance problem) is to find trajectories of multiple agents that travel each node as evenly as possible. In our previous work, this problem is reduced to a mixed integer linear programming problem. However, the computation time for solving it exponentially grows with the number of agents. To overcome this technical issue, a new model predictive control method for multi-agent surveillance systems is proposed. First, a procedure of individual optimization, which is a kind of approximate solution methods, is proposed. Next, a method to improve the control performance is proposed. In addition, an event-triggering condition is also proposed. The effectiveness of the proposed method is presented by a numerical example.
Masato WATANABE Junichi HONDA Takuya OTSUYAMA
Multi-static Primary Surveillance Radar (MSPSR) has recently attracted attention as a new surveillance technology for civil aviation. Using multiple receivers, Primary Surveillance Radar (PSR) detection performance can be improved by synthesizing the reflection characteristics which change due to the aircraft's position. In this paper, we report experimental results from our proposed optical-fiber-connected passive PSR system with transmit signal installed at the Sendai Airport in Japan. The signal-to noise ratio of experimental data is evaluated to verify moving target detection. In addition, we confirm the operation of the proposed system using a two-receiver setup, to resemble a conventional multi-static radar. Finally, after applying time correction, the delay of the reflected signal from a stationary target remains within the expected range.
Chun-Yu LIU Wei-Hao LIAO Shanq-Jang RUAN
The abnormal crowd behavior detection is an important research topic in computer vision to improve the response time of critical events. In this letter, we introduce a novel method to detect and localize the crowd gathering in surveillance videos. The proposed foreground stillness model is based on the foreground object mask and the dense optical flow to measure the instantaneous crowd stillness level. Further, we obtain the long-term crowd stillness level by the leaky bucket model, and the crowd gathering behavior can be detected by the threshold analysis. Experimental results indicate that our proposed approach can detect and locate crowd gathering events, and it is capable of distinguishing between standing and walking crowd. The experiments in realistic scenes with 88.65% accuracy for detection of gathering frames show that our method is effective for crowd gathering behavior detection.
This letter considers a legitimate proactive eavesdropping scenario, where a half-duplex legitimate monitor hires a third-party jammer for jamming the suspicious communication to improve the eavesdropping performance. The interaction between the third-party jammer and the monitor is modeled as a Stackelberg game, where the jammer moves first and sets the price for jamming the suspicious communication, and then the legitimate monitor moves subsequently and determines the requested transmit power of the jamming signals. We derive the optimal jamming price and the optimal jamming transmit power. It is shown that the proposed price-based proactive eavesdropping scheme is effective in improving the successful eavesdropping probability compared to the case without jamming. It is also shown that the proposed scheme outperforms the existing full-duplex scheme when the residual self-interference cannot be neglected.
Kenji KANAI Keigo OGAWA Masaru TAKEUCHI Jiro KATTO Toshitaka TSUDA
To reduce the backbone video traffic generated by video surveillance, we propose an intelligent video surveillance system that offers multi-modal sensor-based event detection and event-driven video rate adaptation. Our proposed system can detect pedestrian existence and movements in the monitoring area by using multi-modal sensors (camera, laser scanner and infrared distance sensor) and control surveillance video quality according to the detected events. We evaluate event detection accuracy and video traffic volume in the experiment scenarios where up to six pedestrians pass through and/or stop at the monitoring area. Evaluation results conclude that our system can significantly reduce video traffic while ensuring high-quality surveillance.
This letter investigates the performance of a legitimate surveillance system, where a wireless powered legitimate monitor aims to eavesdrop a suspicious communication link. Power splitting technique is adopted at the monitor for simultaneous information eavesdropping and energy harvesting. In order to maximize the successful eavesdropping probability, the power splitting ratio is optimized under the minimum harvested energy constraint. Assuming that perfect channel state information (CSI) or only the channel distribution information (CDI) is available, the closed-form maximum successful eavesdropping probability is obtained in Rayleigh fading channels. It is shown that the minimum harvested energy constraint has no impact on the eavesdropping performance if the minimum harvested energy constraint is loose. It is also shown that the eavesdropping performance loss due to partial knowledge of CSI is negligible when the eavesdropping link channel condition is much better than that of the suspicious communication link channel.
Ryouichi NISHIMURA Seigo ENOMOTO Hiroaki KATO
Surveillance with multiple cameras and microphones is promising to trace activities of suspicious persons for security purposes. When these sensors are connected to the Internet, they might also jeopardize innocent people's privacy because, as a result of human error, signals from sensors might allow eavesdropping by malicious persons. This paper presents a proposal for exploiting super-resolution to address this problem. Super-resolution is a signal processing technique by which a high-resolution version of a signal can be reproduced from a low-resolution version of the same signal source. Because of this property, an intelligible speech signal is reconstructed from multiple sensor signals, each of which is completely unintelligible because of its sufficiently low sampling rate. A method based on Bayesian linear regression is proposed in comparison with one based on maximum likelihood. Computer simulations using a simple sinusoidal input demonstrate that the methods restore the original signal from those which are actually measured. Moreover, results show that the method based on Bayesian linear regression is more robust than maximum likelihood under various microphone configurations in noisy environments and that this advantage is remarkable when the number of microphones enrolled in the process is as small as the minimum required. Finally, listening tests using speech signals confirmed that mean opinion score (MOS) of the reconstructed signal reach 3, while those of the original signal captured at each single microphone are almost 1.