Ning AN Xiao-Guang ZHAO Zeng-Guang HOU
In this study, we address the problem of online RGB-D tracking which confronted with various challenges caused by deformation, occlusion, background clutter, and abrupt motion. Various trackers have different strengths and weaknesses, and thus a single tracker can merely perform well in specific scenarios. We propose a 3D tracker-level fusion algorithm (TLF3D) which enhances the strengths of different trackers and suppresses their weaknesses to achieve robust tracking performance in various scenarios. The fusion result is generated from outputs of base trackers by optimizing an energy function considering both the 3D cube attraction and 3D trajectory smoothness. In addition, three complementary base RGB-D trackers with intrinsically different tracking components are proposed for the fusion algorithm. We perform extensive experiments on a large-scale RGB-D benchmark dataset. The evaluation results demonstrate the effectiveness of the proposed fusion algorithm and the superior performance of the proposed TLF3D tracker against state-of-the-art RGB-D trackers.
Rachelle RIVERO Richard LEMENCE Tsuyoshi KATO
With the huge influx of various data nowadays, extracting knowledge from them has become an interesting but tedious task among data scientists, particularly when the data come in heterogeneous form and have missing information. Many data completion techniques had been introduced, especially in the advent of kernel methods — a way in which one can represent heterogeneous data sets into a single form: as kernel matrices. However, among the many data completion techniques available in the literature, studies about mutually completing several incomplete kernel matrices have not been given much attention yet. In this paper, we present a new method, called Mutual Kernel Matrix Completion (MKMC) algorithm, that tackles this problem of mutually inferring the missing entries of multiple kernel matrices by combining the notions of data fusion and kernel matrix completion, applied on biological data sets to be used for classification task. We first introduced an objective function that will be minimized by exploiting the EM algorithm, which in turn results to an estimate of the missing entries of the kernel matrices involved. The completed kernel matrices are then combined to produce a model matrix that can be used to further improve the obtained estimates. An interesting result of our study is that the E-step and the M-step are given in closed form, which makes our algorithm efficient in terms of time and memory. After completion, the (completed) kernel matrices are then used to train an SVM classifier to test how well the relationships among the entries are preserved. Our empirical results show that the proposed algorithm bested the traditional completion techniques in preserving the relationships among the data points, and in accurately recovering the missing kernel matrix entries. By far, MKMC offers a promising solution to the problem of mutual estimation of a number of relevant incomplete kernel matrices.
Weicheng XIE Junxu WEI Zhichao CHEN Tianqian LI
Particle filter algorithm is an important algorithm in the field of target tracking. however, this algorithm faces the problem of sample impoverishment which is caused by the introduction of re-sampling and easily affected by illumination variation. This problem seriously affects the tracking performance of a particle filter algorithm. To solve this problem, we introduce a particle filter target tracking algorithm based on a dynamic niche genetic algorithm. The application of this dynamic niche genetic algorithm to re-sampling ensures particle diversity and dynamically fuses the color and profile features of the target in order to increase the algorithm accuracy under the illumination variation. According to the test results, the proposed algorithm accurately tracks the target, significantly increases the number of particles, enhances the particle diversity, and exhibits better robustness and better accuracy.
Miki ENOKI Issei YOSHIDA Masato OGUCHI
In Twitter-like services, countless messages are being posted in real-time every second all around the world. Timely knowledge about what kinds of information are diffusing in social media is quite important. For example, in emergency situations such as earthquakes, users provide instant information on their situation through social media. The collective intelligence of social media is useful as a means of information detection complementary to conventional observation. We have developed a system for monitoring and analyzing information diffusion data in real-time by tracking retweeted tweets. A tweet retweeted by many users indicates that they find the content interesting and impactful. Analysts who use this system can find tweets retweeted by many users and identify the key people who are retweeted frequently by many users or who have retweeted tweets about particular topics. However, bursting situations occur when thousands of social media messages are suddenly posted simultaneously, and the lack of machine resources to handle such situations lowers the system's query performance. Since our system is designed to be used interactively in real-time by many analysts, waiting more than one second for a query results is simply not acceptable. To maintain an acceptable query performance, we propose a capacity control method for filtering incoming tweets using extra attribute information from tweets themselves. Conventionally, there is a trade-off between the query performance and the accuracy of the analysis results. We show that the query performance is improved by our proposed method and that our method is better than the existing methods in terms of maintaining query accuracy.
Xin XU Jiro HIROKAWA Makoto ANDO
This paper presents the design and characterization of an E-band 16×16-slot monopulse array antenna with full-corporate-feed fabricated by the commercially available batch process of diffusion bonding of laminated copper plates. The antenna is multi-layered, and consists of vertically-interconnected radiating elements, a corporate-feed circuit and a comparator. It has four input ports for different excitations. Sum and difference beams in different cut-planes for monopulse operation can be generated. The antenna has a quasi-planar profile, and a total size of 13.31 λ0×13.31λ0×1.52λ0 (λ0 is the wavelength at the design frequency of 78.5GHz). The antenna demonstrates a wide operation bandwidth of 17.2 (70-87.2) GHz for VSWR < 2. At 78.5GHz: 1) for the sum beam, there is a 32.6-dBi realized gain (83% antenna efficiency) and a 33.3-dBi directivity (95% aperture efficiency); 2) for the difference beams in the E-, H-, 45°-, and 135°-planes, the null depths are -53.0, -58.0, -57.8, and -65.6dB, respectively. Across the full operation band where the sum main-beam and difference null are able to consistently point at the boresight, the antenna also demonstrates excellent performance in terms of high gain, high efficiency, high isolation, low cross-polarization, and distinguished monopulse capability.
Hiraku OKADA Shuhei SUZAKI Tatsuya KATO Kentaro KOBAYASHI Masaaki KATAYAMA
We proposed to apply compressed sensing to realize information sharing of link quality for wireless mesh networks (WMNs) with grid topology. In this paper, we extend the link quality sharing method to be applied for WMNs with arbitrary topology. For arbitrary topology WMNs, we introduce a link quality matrix and a matrix formula for compressed sensing. By employing a diffusion wavelets basis, the link quality matrix is converted to its sparse equivalent. Based on the sparse matrix, information sharing is achieved by compressed sensing. In addition, we propose compressed transmission for arbitrary topology WMNs, in which only the compressed link quality information is transmitted. Experiments and simulations clarify that the proposed methods can reduce the amount of data transmitted for information sharing and maintain the quality of the shared information.
This paper presents a method to accelerate target recognition processing in advanced driver assistance systems (ADAS). A histogram of oriented gradients (HOG) is an effective descriptor for object recognition in computer vision and image processing. The HOG is expected to replace conventional descriptors, e.g., template-matching, in ADAS. However, the HOG does not consider the occurrences of gradient orientation on objects when localized portions of an image, i.e., a region of interest (ROI), are not set precisely. The size and position of the ROI should be set precisely for each frame in an automotive environment where the target distance changes dynamically. We use radar to determine the size and position of the ROI in a HOG and propose a radar and camera sensor fusion algorithm. Experimental results are discussed.
Vijay JOHN Qian LONG Yuquan XU Zheng LIU Seiichi MITA
Environment perception is an important task for intelligent vehicles applications. Typically, multiple sensors with different characteristics are employed to perceive the environment. To robustly perceive the environment, the information from the different sensors are often integrated or fused. In this article, we propose to perform the sensor fusion and registration of the LIDAR and stereo camera using the particle swarm optimization algorithm, without the aid of any external calibration objects. The proposed algorithm automatically calibrates the sensors and registers the LIDAR range image with the stereo depth image. The registered LIDAR range image functions as the disparity map for the stereo disparity estimation and results in an effective sensor fusion mechanism. Additionally, we perform the image denoising using the modified non-local means filter on the input image during the stereo disparity estimation to improve the robustness, especially at night time. To evaluate our proposed algorithm, the calibration and registration algorithm is compared with baseline algorithms on multiple datasets acquired with varying illuminations. Compared to the baseline algorithms, we show that our proposed algorithm demonstrates better accuracy. We also demonstrate that integrating the LIDAR range image within the stereo's disparity estimation results in an improved disparity map with significant reduction in the computational complexity.
Akihiko HIRATA Jun TAKEUCHI Keisuke HASHIMOTO Jiro HIROKAWA
An alignment control system using beam-tilting 1-D arrays for a 120-GHz-band corporate-feed 2-D waveguide-slot array antenna is presented. The 2-D waveguide-slot array antenna transmits data, and the 1-D arrays are used to determine array alignment. We design two types of 1-D array antenna and fabricate a corporate-feed 2-D waveguide-slot array antenna surrounded by four beam-tilting 1-D arrays. We then construct an alignment control system and evaluate the performance of the control. We find that the angular accuracy of the antenna alignment control was within ±1deg.
Sound source localization is an essential technique in many applications, e.g., speech enhancement, speech capturing and human-robot interaction. However, the performance of traditional methods degrades in noisy or reverberant environments, and it is sensitive to the spatial location of sound source. To solve these problems, we propose a sound source localization framework based on bi-direction interaural matching filter (IMF) and decision weighting fusion. Firstly, bi-directional IMF is put forward to describe the difference between binaural signals in forward and backward directions, respectively. Then, a hybrid interaural matching filter (HIMF), which is obtained by the bi-direction IMF through decision weighting fusion, is used to alleviate the affection of sound locations on sound source localization. Finally, the cosine similarity between the HIMFs computed from the binaural audio and transfer functions is employed to measure the probability of the source location. Constructing the similarity for all the spatial directions as a matrix, we can determine the source location by Maximum A Posteriori (MAP) estimation. Compared with several state-of-the-art methods, experimental results indicate that HIMF is more robust in noisy environments.
Although many approaches about ideal channels have been proposed in previous researches, few authors considered the situation of nonideal communication links. In this paper, we study the problem of distributed decision fusion over nonideal channels by using the scan statistics. In order to obtain the fusion rule under nonideal channels, we set up the nonideal channels model with the modulation error, noise and signal attenuation. Under this model, we update the fusion rule by using the scan statstics. We firstly consider the fusion rule when sensors are distributed in grid, then derive the expressions of the detection probability and false alarm probability when sensors follow an uniform distribution. Extensive simulations are conducted in order to investigate the performance of our fusion rule and the influence of signal-noise ratio (SNR) on the detection and false alarm probability. These simulations show that the theoretical values of the global detection probability and the global false alarm probability are close to the experimental results, and the fusion rule also has high performance at the high SNR region. But there are some further researches need to do for solving the large computational complexity.
Naoki SAWADA Hiromitsu NISHIZAKI
This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections.
Hengjun YU Kohei INOUE Kenji HARA Kiichi URAHAMA
In this paper, we propose a method for color error diffusion based on the Neugebauer model for color halftone printing. The Neugebauer model expresses an arbitrary color as a trilinear interpolation of basic colors. The proposed method quantizes the color of each pixel to a basic color which minimizes an accumulated quantization error, and the quantization error is diffused to the ratios of basic colors in subsequent pixels. Experimental results show that the proposed method outperforms conventional color error diffusion methods including separable method in terms of eye model-based mean squared error.
Masatsugu ICHINO Hiroaki MAEDA Hiroshi YOSHIURA
A method based on score level fusion using logistic regression has been developed that uses packet header information to classify Internet applications. Applications are classified not on the basis of the individual flows for each type of application but on the basis of all the flows for each type of application, i.e., the “overall traffic flow.” The overall traffic flow is divided into equal time slots, and the applications are classified using statistical information obtained for each time slot. Evaluation using overall traffic flow generated by five types of applications showed that its true and false positive rates are better than those of methods using feature level fusion.
Li FENG Yujun KUANG Binwei WU Zeyang DAI Qin YU
In this paper, we propose a novel censor-based cooperative spectrum sensing strategy, called adaptive energy-efficient sensing (AES), in which both sequential sensing and censoring report mechanism are employed, aiming to reduce the sensing energy consumption of secondary user relays (SRs). In AES, an anchor secondary user (SU) requires cooperative sensing only when it does not detect the presence of PU by itself, and the cooperative SR adopts decision censoring report only if the sensing result differs from its previous one. We derive the generalized-form expressions false alarm and detection probabilities over Rayleigh fading channels for AES. The sensing energy consumption is also analyzed. Then, we study sensing energy overhead minimization problem and show that the sensing time allocation can be optimized to minimize the miss detection probability and sensing energy overhead. Finally, numerical results show that the proposed strategy can remarkably reduce the sensing energy consumption while only slightly degrading the detection performance compared with traditional scheme.
Ye AI Feng MIAO Qingmao HU Weifeng LI
In this paper, a novel method of high-grade brain tumor segmentation from multi-sequence magnetic resonance images is presented. Firstly, a Gaussian mixture model (GMM) is introduced to derive an initial posterior probability by fitting the fluid attenuation inversion recovery histogram. Secondly, some grayscale and region properties are extracted from different sequences. Thirdly, grayscale and region characteristics with different weights are proposed to adjust the posterior probability. Finally, a cost function based on the posterior probability and neighborhood information is formulated and optimized via graph cut. Experiment results on a public dataset with 20 high-grade brain tumor patient images show the proposed method could achieve a dice coefficient of 78%, which is higher than the standard graph cut algorithm without a probability-adjusting step or some other cost function-based methods.
Yuan LIANG Koji IWANO Koichi SHINODA
Most error correction interfaces for speech recognition applications on smartphones require the user to first mark an error region and choose the correct word from a candidate list. We propose a simple multimodal interface to make the process more efficient. We develop Long Context Match (LCM) to get candidates that complement the conventional word confusion network (WCN). Assuming that not only the preceding words but also the succeeding words of the error region are validated by users, we use such contexts to search higher-order n-grams corpora for matching word sequences. For this purpose, we also utilize the Web text data. Furthermore, we propose a combination of LCM and WCN (“LCM + WCN”) to provide users with candidate lists that are more relevant than those yielded by WCN alone. We compare our interface with the WCN-based interface on the Corpus of Spontaneous Japanese (CSJ). Our proposed “LCM + WCN” method improved the 1-best accuracy by 23%, improved the Mean Reciprocal Rank (MRR) by 28%, and our interface reduced the user's load by 12%.
This letter proposes an image fusion method which adopts a union of multiple directional lapped orthogonal transforms (DirLOTs). DirLOTs are used to generate symmetric orthogonal discrete wavelet transforms and then to construct a union of unitary transforms as a redundant dictionary with a multiple directional property. The multiple DirLOTs can overcome a disadvantage of separable wavelets to represent images which contain slant textures and edges. We analyse the characteristic of local luminance contrast, and propose a fusion rule based on interscale relation of wavelet coefficients. Relying on the above, a novel image fusion method is proposed. Some experimental results show that the proposed method is able to significantly improve the fusion performance from those with the conventional discrete wavelet transforms.
In social websites, users acquire information from adjacent neighbors as well as distant users by seeking along hyperlinks, and therefore, information diffusions, also seen as processes of “user infection”, show both cascading and jumping routes in social networks. Currently, existing analysis suffers from the difficulty in distinguishing between the impacts of information seeking behaviors, i.e. random walks, and other factors leading to user infections. To this end, we present a mechanism to recognize and measure influences of random walks on information diffusions. Firstly, we propose the concept of information propagation structure (IPS), which is also a directed acyclic graph, to represent frequent information diffusion routes in social networks. In IPS, we represent “jumping routes” as virtual arcs and regard them as the traces of random walks. Secondly, we design a frequent IPS mining algorithm (FIPS). By considering descendant node infections as a consequence of ancestor node infections in IPS, we can use a Bayesian network to model each IPS, and learn parameters based on the records of information diffusions passing through the IPS. Finally, we present a quantitative description method of random walks influence, the method is based on Bayesian probabilistic inferring in IPS, which is used to determine the ancestors, whose infection causes the infection of target users. We also employ betweenness centralities of arcs to evaluate contributions of random walks to certain infections. Experiments are carried out with real datasets and simulations. The results show random walks are influential in early and steady phases of information diffusions. They help diffusions pass through some topology limitations in social networks.
Yoshitatsu MATSUDA Kazunori YAMAGUCHI Ken-ichiro NISHIOKA
In this paper, a new approach is proposed for extracting the spatio-temporal patterns from a location-based social networking system (SNS) such as Foursquare. The proposed approach consists of the following procedures. First, the spatio-temporal behaviors of users in SNS are approximated as a probabilistic distribution by using a diffusion-type formula. Since the SNS datasets generally consist of sparse check-in's of users at some time points and locations, it is difficult to investigate the spatio-temporal patterns on a wide range of time and space scales. The proposed method can estimate such wide range patterns by smoothing the sparse datasets by a diffusion-type formula. It is crucial in this method to estimate robustly the scale parameter by giving a prior generative model on check-in's of users. The robust estimation enables the method to extract appropriate patterns even in small local areas. Next, the covariance matrix among the time points is calculated from the estimated distribution. Then, the principal eigenfunctions are approximately extracted as the spatio-temporal patterns by principal component analysis (PCA). The distribution is a mixture of various patterns, some of which are regular ones with a periodic cycle and some of which are irregular ones corresponding to transient events. Though it is generally difficult to separate such complicated mixtures, the experiments on an actual Foursquare dataset showed that the proposed method can extract many plausible and interesting spatio-temporal patterns.