Chenggang GUO Dongyi CHEN Zhiqi HUANG
Sparse representation has been successfully applied to visual tracking. Recent progresses in sparse tracking are mainly made within the particle filter framework. However, most sparse trackers need to extract complex feature representations for each particle in the limited sample space, leading to expensive computation cost and yielding inferior tracking performance. To deal with the above issues, we propose a novel sparse tracking method based on the circulant reverse lasso model. Benefiting from the properties of circulant matrices, densely sampled target candidates are implicitly generated by cyclically shifting the base feature descriptors, and then embedded into a reverse sparse reconstruction model as a dictionary to encode a robust appearance template. The alternating direction method of multipliers is employed for solving the reverse sparse model and the optimization process can be efficiently solved in the frequency domain, which enables the proposed tracker to run in real-time. The calculated sparse coefficient map represents the similarity scores between the template and circular shifted samples. Thus the target location can be directly predicted according to the coordinates of the peak coefficient. A scale-aware template updating strategy is combined with the correlation filter template learning to take into account both appearance deformations and scale variations. Both quantitative and qualitative evaluations on two challenging tracking benchmarks demonstrate that the proposed algorithm performs favorably against several state-of-the-art sparse representation based tracking methods.
Kehai CHEN Tiejun ZHAO Muyun YANG
Learning semantic representation for translation context is beneficial to statistical machine translation (SMT). Previous efforts have focused on implicitly encoding syntactic and semantic knowledge in translation context by neural networks, which are weak in capturing explicit structural syntax information. In this paper, we propose a new neural network with a tree-based convolutional architecture to explicitly learn structural syntax information in translation context, thus improving translation prediction. Specifically, we first convert parallel sentences with source parse trees into syntax-based linear sequences based on a minimum syntax subtree algorithm, and then define a tree-based convolutional network over the linear sequences to learn syntax-based context representation and translation prediction jointly. To verify the effectiveness, the proposed model is integrated into phrase-based SMT. Experiments on large-scale Chinese-to-English and German-to-English translation tasks show that the proposed approach can achieve a substantial and significant improvement over several baseline systems.
Takahiro OGAWA Sho TAKAHASHI Naofumi WADA Akira TANAKA Miki HASEYAMA
Binary sparse representation based on arbitrary quality metrics and its applications are presented in this paper. The novelties of the proposed method are twofold. First, the proposed method newly derives sparse representation for which representation coefficients are binary values, and this enables selection of arbitrary image quality metrics. This new sparse representation can generate quality metric-independent subspaces with simplification of the calculation procedures. Second, visual saliency is used in the proposed method for pooling the quality values obtained for all of the parts within target images. This approach enables visually pleasant approximation of the target images more successfully. By introducing the above two novel approaches, successful image approximation considering human perception becomes feasible. Since the proposed method can provide lower-dimensional subspaces that are obtained by better image quality metrics, realization of several image reconstruction tasks can be expected. Experimental results showed high performance of the proposed method in terms of two image reconstruction tasks, image inpainting and super-resolution.
Shan JIANG Cheng HAN Xiaoqiang DI
Sparse representation has been widely applied to visual tracking for several years. In the sparse representation framework, tracking problem is transferred into solving an L1 minimization issue. However, during the tracking procedure, the appearance of target was affected by external environment. Therefore, we proposed a robust tracking algorithm based on the traditional sparse representation jointly particle filter framework. First, we obtained the observation image set from particle filter. Furthermore, we introduced a 2D transformation on the observation image set, which enables the tracking target candidates set more robust to handle misalignment problem in complex scene. Moreover, we adopt the occlusion detection mechanism before template updating, reducing the drift problem effectively. Experimental evaluations on five public challenging sequences, which exhibit occlusions, illuminating variations, scale changes, motion blur, and our tracker demonstrate accuracy and robustness in comparisons with the state-of-the-arts.
Multi-task joint sparse representation (MTJSR) is one kind of efficient multi-task learning (MTL) method for solving different problems together using a shared sparse representation. Based on the learning mechanism in human, which is a self-paced learning by gradually training the tasks from easy to difficult, I apply this mechanism into MTJSR, and propose a multi-task joint sparse representation with self-paced learning (MTJSR-SP) algorithm. In MTJSR-SP, the self-paced learning mechanism is considered as a regularizer of optimization function, and an iterative optimization is applied to solve it. Comparing with the traditional MTL methods, MTJSR-SP has more robustness to the noise and outliers. The experimental results on some datasets, i.e. two synthesized datasets, four datasets from UCI machine learning repository, an oxford flower dataset and a Caltech-256 image categorization dataset, are used to validate the efficiency of MTJSR-SP.
Ping ZENG Qingping TAN Haoyu ZHANG Xiankai MENG Zhuo ZHANG Jianjun XU Yan LEI
The deep neural named entity recognition model automatically learns and extracts the features of entities and solves the problem of the traditional model relying heavily on complex feature engineering and obscure professional knowledge. This issue has become a hot topic in recent years. Existing deep neural models only involve simple character learning and extraction methods, which limit their capability. To further explore the performance of deep neural models, we propose two character feature learning models based on convolution neural network and long short-term memory network. These two models consider the local semantic and position features of word characters. Experiments conducted on the CoNLL-2003 dataset show that the proposed models outperform traditional ones and demonstrate excellent performance.
Yukihiro TAGAMI Hayato KOBAYASHI Shingo ONO Akira TAJIMA
Modeling user activities on the Web is a key problem for various Web services, such as news article recommendation and ad click prediction. In our work-in-progress paper[1], we introduced an approach that summarizes each sequence of user Web page visits using Paragraph Vector[3], considering users and URLs as paragraphs and words, respectively. The learned user representations are used among the user-related prediction tasks in common. In this paper, on the basis of analysis of our Web page visit data, we propose Backward PV-DM, which is a modified version of Paragraph Vector. We show experimental results on two ad-related data sets based on logs from Web services of Yahoo! JAPAN. Our proposed method achieved better results than those of existing vector models.
Ping ZENG Qingping TAN Xiankai MENG Haoyu ZHANG Jianjun XU
Determining the validity of knowledge triples and filling in the missing entities or relationships in the knowledge graph are the crucial tasks for large-scale knowledge graph completion. So far, the main solutions use machine learning methods to learn the low-dimensional distributed representations of entities and relationships to complete the knowledge graph. Among them, translation models obtain excellent performance. However, the proposed translation models do not adequately consider the indirect relationships among entities, affecting the precision of the representation. Based on the long short-term memory neural network and existing translation models, we propose a multiple-module hybrid neural network model called TransP. By modeling the entity paths and their relationship paths, TransP can effectively excavate the indirect relationships among the entities, and thus, improve the quality of knowledge graph completion tasks. Experimental results show that TransP outperforms state-of-the-art models in the entity prediction task, and achieves the comparable performance with previous models in the relationship prediction task.
Xina ZHANG Xiaoni DU Chenhuang WU
A family of quaternary sequences over Z4 is defined based on the Ding-Helleseth generalized cyclotomic classes modulo pq for two distinct odd primes p and q. The linear complexity is determined by computing the defining polynomial of the sequences, which is in fact connected with the discrete Fourier transform of the sequences. The results show that the sequences possess large linear complexity and are “good” sequences from the viewpoint of cryptography.
Hiroki CHIBA Yuki HYOGO Kazuo MISUE
Spatio-temporal dependent data, such as weather observation data, are data of which the attribute values depend on both time and space. Typical methods for the visualization of such data include plotting the attribute values at each point in time on a map and displaying series of the maps in chronological order with animation, or displaying them by juxtaposing horizontally or vertically. However, these methods are problematic in that they compel readers interested in grasping the spatial changes of the attribute values to memorize the representations on the maps. The problem is exacerbated by considering that the longer the time-period covered by the data, the higher the cognitive load. In order to solve these problems, the authors propose a visualization method capable of overlaying the representations of multiple instantaneous values on a single static map. This paper explains the design of the proposed method and reports two experiments conducted by the authors to investigate the usefulness of the method. The experimental results show that the proposed method is useful in terms of the speed and accuracy with which it reads the spatial changes and its ability to present data with long time series efficiently.
Ying TIAN Mingyong ZENG Aihong LU Bin GAO Zhangkai LUO
A novel and efficient coding method is proposed to improve person re-identification in the XQDA subspace. Traditional CRC (Collaborative Representation based Classification) conducts independent dictionary coding for each image and can not guarantee improved results over conventional euclidian distance. In this letter, however, a specific model is separately constructed for each probe image and each gallery image, i.e. in probe-galley pairwise manner. The proposed pairwise-specific CRC method can excavate extra discriminative information by enforcing a similarity item to pull similar sample-pairs closer. The approach has been evaluated against current methods on two benchmark datasets, achieving considerable improvement and outstanding performance.
Jun WANG Yuanyun WANG Chengzhi DENG Shengqian WANG Yong QIN
Developing a robust appearance model is a challenging task due to appearance variations of objects such as partial occlusion, illumination variation, rotation and background clutter. Existing tracking algorithms employ linear combinations of target templates to represent target appearances, which are not accurate enough to deal with appearance variations. The underlying relationship between target candidates and the target templates is highly nonlinear because of complicated appearance variations. To address this, this paper presents a regularized kernel representation for visual tracking. Namely, the feature vectors of target appearances are mapped into higher dimensional features, in which a target candidate is approximately represented by a nonlinear combination of target templates in a dimensional space. The kernel based appearance model takes advantage of considering the non-linear relationship and capturing the nonlinear similarity between target candidates and target templates. l2-regularization on coding coefficients makes the approximate solution of target representations more stable. Comprehensive experiments demonstrate the superior performances in comparison with state-of-the-art trackers.
Yanxia QIN Yue ZHANG Min ZHANG Dequan ZHENG
Large scale first-hand tweets motivate automatic event detection on Twitter. Previous approaches model events by clustering tweets, words or segments. On the other hand, event clusters represented by tweets are easier to understand than those represented by words/segments. However, compared to words/segments, tweets are sparser and therefore makes clustering less effective. This article proposes to represent events with triple structures called frames, which are as efficient as, yet can be easier to understand than words/segments. Frames are extracted based on shallow syntactic information of tweets with an unsupervised open information extraction method, which is introduced for domain-independent relation extraction in a single pass over web scale data. This is then followed by bursty frame element extraction functions as feature selection by filtering frame elements with bursty frequency pattern via a probabilistic model. After being clustered and ranked, high-quality events are yielded and then reported by linking frame elements back to frames. Experimental results show that frame-based event detection leads to improved precision over a state-of-the-art baseline segment-based event detection method. Superior readability of frame-based events as compared with segment-based events is demonstrated in some example outputs.
Ikuo KESHI Yu SUZUKI Koichiro YOSHINO Satoshi NAKAMURA
The problem with distributed representations generated by neural networks is that the meaning of the features is difficult to understand. We propose a new method that gives a specific meaning to each node of a hidden layer by introducing a manually created word semantic vector dictionary into the initial weights and by using paragraph vector models. We conducted experiments to test the hypotheses using a single domain benchmark for Japanese Twitter sentiment analysis and then evaluated the expandability of the method using a diverse and large-scale benchmark. Moreover, we tested the domain-independence of the method using a Wikipedia corpus. Our experimental results demonstrated that the learned vector is better than the performance of the existing paragraph vector in the evaluation of the Twitter sentiment analysis task using the single domain benchmark. Also, we determined the readability of document embeddings, which means distributed representations of documents, in a user test. The definition of readability in this paper is that people can understand the meaning of large weighted features of distributed representations. A total of 52.4% of the top five weighted hidden nodes were related to tweets where one of the paragraph vector models learned the document embeddings. For the expandability evaluation of the method, we improved the dictionary based on the results of the hypothesis test and examined the relationship of the readability of learned word vectors and the task accuracy of Twitter sentiment analysis using the diverse and large-scale benchmark. We also conducted a word similarity task using the Wikipedia corpus to test the domain-independence of the method. We found the expandability results of the method are better than or comparable to the performance of the paragraph vector. Also, the objective and subjective evaluation support each hidden node maintaining a specific meaning. Thus, the proposed method succeeded in improving readability.
Shilei CHENG Song GU Maoquan YE Mei XIE
Human action recognition in videos draws huge research interests in computer vision. The Bag-of-Word model is quite commonly used to obtain the video level representations, however, BoW model roughly assigns each feature vector to its nearest visual word and the collection of unordered words ignores the interest points' spatial information, inevitably causing nontrivial quantization errors and impairing improvements on classification rates. To address these drawbacks, we propose an approach for action recognition by encoding spatio-temporal log Euclidean covariance matrix (ST-LECM) features within the low-rank and sparse representation framework. Motivated by low rank matrix recovery, local descriptors in a spatial temporal neighborhood have similar representation and should be approximately low rank. The learned coefficients can not only capture the global data structures, but also preserve consistent. Experimental results showed that the proposed approach yields excellent recognition performance on synthetic video datasets and are robust to action variability, view variations and partial occlusion.
Kattiuscia BITENCOURT Frederico ARAÚJO DURÃO Manoel MENDONÇA Lassion LAIQUE BOMFIM DE SOUZA SANTANA
The emergency response process is quite complex since there is a wide variety of elements to be evaluated for taking decisions. Uncertainties generated by subjectivity and imprecision affect the safety and effectiveness of actions. The aim of this paper is to develop an onto-logy for emergency response protocols, in particular, to fires in buildings. This developed ontology supports the knowledge sharing, evaluation and review of the protocols used, contributing to the tactical and strategic planning of organizations. The construction of the ontology was based on the methodology Methontology. The domain specification and conceptualization were based in qualitative research, in which were evaluated 131 terms with definitions, of which 85 were approved by specialists. From there, in the Protégé tool, the domain's taxonomy and the axioms were created. The specialists validated the ontology using the assessment by human approach (taxonomy, application and structure). Thus, a sustainable ontology model to the rescue tactical phase was ensured.
Jun WANG Guoqing WANG Leida LI
A quantized index for evaluating the pattern similarity of two different datasets is designed by calculating the number of correlated dictionary atoms. Guided by this theory, task-specific biometric recognition model transferred from state-of-the-art DNN models is realized for both face and vein recognition.
Knowledge graphs have been shown to be useful to many tasks in artificial intelligence. Triples of knowledge graphs are traditionally structured by human editors or extracted from semi-structured information; however, editing is expensive, and semi-structured information is not common. On the other hand, most such information is stored as text. Hence, it is necessary to develop a method that can extract knowledge from texts and then construct or populate a knowledge graph; this has been attempted in various ways. Currently, there are two approaches to constructing a knowledge graph. One is open information extraction (Open IE), and the other is knowledge graph embedding; however, neither is without problems. Stanford Open IE, the current best such system, requires labeled sentences as training data, and knowledge graph embedding systems require numerous triples. Recently, distributed representations of words have become a hot topic in the field of natural language processing, since this approach does not require labeled data for training. These require only plain text, but Mikolov showed that it can perform well with the word analogy task, answering questions such as, “a is to b as c is to __?.” This can be considered as a knowledge extraction task from a text for finding the missing entity of a triple. However, the accuracy is not sufficiently high when applied in a straightforward manner to relations in knowledge graphs, since the method uses only one triple as a positive example. In this paper, we analyze why distributed representations perform such tasks well; we also propose a new method for extracting knowledge from texts that requires much less annotated data. Experiments show that the proposed method achieves considerable improvement compared with the baseline; in particular, the improvement in HITS@10 was more than doubled for some relations.
Viet-Hang DUONG Manh-Quan BUI Jian-Jiun DING Yuan-Shan LEE Bach-Tung PHAM Pham The BAO Jia-Ching WANG
This work presents a new approach which derives a learned data representation method through matrix factorization on the complex domain. In particular, we introduce an encoding matrix-a new representation of data-that satisfies the simplicial constraint of the projective basis matrix on the field of complex numbers. A complex optimization framework is provided. It employs the gradient descent method and computes the derivative of the cost function based on Wirtinger's calculus.
Kengo TSUDA Takanori FUJISAWA Masaaki IKEHARA
In this paper, we introduce a new method to remove random-valued impulse noise in an image. Random-valued impulse noise replaces the pixel value at a random position by a random value. Due to the randomness of the noisy pixel values, it is difficult to detect them by comparison with neighboring pixels, which is used in many conventional methods. Then we improve the recent noise detector which uses a non-local search of similar structure. Next we propose a new noise removal algorithm by sparse representation using DCT basis. Furthermore, the sparse representation can remove impulse noise by using the neighboring similar image patch. This method has much more superior noise removal performance than conventional methods at images. We confirm the effectiveness of the proposed method quantitatively and qualitatively.