1-8hit |
Guowei TENG Hao LI Zhenglong YANG
This paper proposes a temporal domain difference based secondary background modeling algorithm for surveillance video coding. The proposed algorithm has three key technical contributions as following. Firstly, the LDBCBR (Long Distance Block Composed Background Reference) algorithm is proposed, which exploits IBBS (interval of background blocks searching) to weaken the temporal correlation of the foreground. Secondly, both BCBR (Block Composed Background Reference) and LDBCBR are exploited at the same time to generate the temporary background reference frame. The secondary modeling algorithm utilizes the temporary background blocks generated by BCBR and LDBCBR to get the final background frame. Thirdly, monitor the background reference frame after it is generated is also important. We would update the background blocks immediately when it has a big change, shorten the modeling period of the areas where foreground moves frequently and check the stable background regularly. The proposed algorithm is implemented in the platform of IEEE1857 and the experimental results demonstrate that it has significant improvement in coding efficiency. In surveillance test sequences recommended by the China AVS (Advanced Audio Video Standard) working group, our method achieve BD-Rate gain by 6.81% and 27.30% comparing with BCBR and the baseline profile.
Axel BEAUGENDRE Satoshi GOTO Takeshi YOSHIMURA
The vast majority of foreground detection methods require heavy hardware optimization to process in real-time standard definition videos. Indeed, those methods process the whole frame for the detection but also for the background modelling part which makes them resource-guzzlers (time, memory, etc.) unable to be applied to Ultra High Definition (UHD) videos. This paper presents a real-time background modelling method called Mixed Block Background Modelling (MBBM). It is a spatio-temporal approach which updates the background model by carefully selecting block by a linear and pseudo-random orders and update the corresponding model's block parts. The two block selection orders make sure that every block will be updated. For foreground detection purposes, the method is combined with a foreground detection designed for UHD videos such as the Adaptive Block-Propagative Background Subtraction method. Experimental results show that the proposed MBBM can process 50min. of 4K UHD videos in less than 6 hours. while other methods are estimated to take from 8 days to more than 21 years. Compared to 10 state-of-the-art foreground detection methods, the proposed MBBM shows the best quality results with an average global quality score of 0.597 (1 being the maximum) on a dataset of 4K UHDTV sequences containing various situation like illumination variation. Finally, the processing time per pixel of the MBBM is the lowest of all compared methods with an average of 3.18×10-8s.
Speaker change detection involves the identification of the time indices of an audio stream, where the identity of the speaker changes. This paper proposes novel measures for speaker change detection over the centroid model, which divides the feature space into non-overlapping clusters for effective speaker-change comparison. The centroid model is a computationally-efficient variant of the widely-used mixture-distribution based background models for speaker recognition. Experiments on both synthetic and real-world data were performed; the results show that the proposed approach yields promising results compared with the conventional statistical measures.
Ayaka YAMAMOTO Yoshio IWAI Hiroshi ISHIGURO
Background subtraction is widely used in detecting moving objects; however, changing illumination conditions, color similarity, and real-time performance remain important problems. In this paper, we introduce a sequential method for adaptively estimating background components using Kalman filters, and a novel method for detecting objects using margined sign correlation (MSC). By applying MSC to our adaptive background model, the proposed system can perform object detection robustly and accurately. The proposed method is suitable for implementation on a graphics processing unit (GPU) and as such, the system realizes real-time performance efficiently. Experimental results demonstrate the performance of the proposed system.
Yuuji MUKAI Hideki NODA Takashi OSANAI
This paper discusses speaker verification (SV) using Gaussian mixture models (GMMs), where only utterances of enrolled speakers are required. Such an SV system can be realized using artificially generated cohorts instead of real cohorts from speaker databases. This paper presents a rational approach to set GMM parameters for artificial cohorts based on statistics of GMM parameters for real cohorts. Equal error rates for the proposed method are about 10% less than those for the previous method, where GMM parameters for artificial cohorts were set in an ad hoc manner.
Fan-Chieh CHENG Shih-Chia HUANG Shanq-Jang RUAN
In this letter, we propose a novel motion detection method in order to accurately perform the detection of moving objects in the automatic video surveillance system. Based on the proposed Background Generation Mechanism, the presence of either moving object or background information is firstly checked in order to supply the selective updating of the high-quality adaptive background model, which facilitates the further motion detection using the Laplacian distribution model. The overall results of the detection accuracy will be demonstrated that our proposed method attains a substantially higher degree of efficacy, outperforming the state-of-the-art method by average Similarity accuracy rates of up to 56.64%, 27.78%, 50.04%, 43.33%, and 44.09%, respectively.
Xiang ZHANG Hongbin SUO Qingwei ZHAO Yonghong YAN
In this letter, we propose a new approach to SVM based speaker recognition, which utilizes a kind of novel phonotactic information as the feature for SVM modeling. Gaussian mixture models (GMMs) have been proven extremely successful for text-independent speaker recognition. The GMM universal background model (UBM) is a speaker-independent model, each component of which can be considered as modeling some underlying phonetic sound classes. We assume that the utterances from different speakers should get different average posterior probabilities on the same Gaussian component of the UBM, and the supervector composed of the average posterior probabilities on all components of the UBM for each utterance should be discriminative. We use these supervectors as the features for SVM based speaker recognition. Experiment results on a NIST SRE 2006 task show that the proposed approach demonstrates comparable performance with the commonly used systems. Fusion results are also presented.
Yuuji MUKAI Hideki NODA Michiharu NIIMI Takashi OSANAI
This paper presents a text-independent speaker verification method using Gaussian mixture models (GMMs), where only utterances of enrolled speakers are required. Artificial cohorts are used instead of those from speaker databases, and GMMs for artificial cohorts are generated by changing model parameters of the GMM for a claimed speaker. Equal error rates by the proposed method are about 60% less than those by a conventional method which also uses only utterances of enrolled speakers.