1-10hit |
Xibin WANG Fengji LUO Chunyan SANG Jun ZENG Sachio HIROKAWA
With the rapid development of information and Web technologies, people are facing ‘information overload’ in their daily lives. The personalized recommendation system (PRS) is an effective tool to assist users extract meaningful information from the big data. Collaborative filtering (CF) is one of the most widely used personalized recommendation techniques to recommend the personalized products for users. However, the conventional CF technique has some limitations, such as the low accuracy of of similarity calculation, cold start problem, etc. In this paper, a PRS model based on the Support Vector Machine (SVM) is proposed. The proposed model not only considers the items' content information, but also the users' demographic and behavior information to fully capture the users' interests and preferences. An improved Particle Swarm Optimization (PSO) algorithm is also proposed to improve the performance of the model. The efficiency of the proposed method is verified by multiple benchmark datasets.
Wanghan LV Lihong HU Weijun ZENG Huali WANG Zhangkai LUO
As known to us all, L-shaped co-prime array (LCA) is a recently introduced two-dimensional (2-D) sparse array structure, which is extended from linear co-prime array (CA). Such sparse array geometry can be used for 2-D parameters estimation with higher degrees-of-freedom (DOF). However, in the scenario where several narrowband transmissions spread over a wide spectrum, existing technique based on LCA with Nyquist sampling may encounter a bottleneck for both analog and digital processing. To alleviate the burden of high-rate Nyquist sampling, a method of joint wideband spectrum and direction-of-arrival (DOA) estimation with compressed sampling based on LCA, which is recognized as LCA-based modulated wideband converter (MWC), is presented in this work. First, the received signal along each antenna is mixed to basebands, low-pass filtered and down-sampled to get the compressed sampling data. Then by constructing the virtual received data of 2-D difference coarray, we estimate the wideband spectrum and DOA jointly using two recovery methods where the first is a joint ESPRIT method and the other is a joint CS method. Numerical simulations illustrate the validity of the proposed LCA based MWC system and show the superiority.
Zhuo JIANG Junhao WEN Jun ZENG Yihao ZHANG Xibin WANG Sachio HIROKAWA
The success of heuristic search in AI planning largely depends on the design of the heuristic. On the other hand, previous experience contains potential domain information that can assist the planning process. In this context, we have studied dynamic macro-based heuristic planning through action relationship analysis. We present an approach for analyzing the action relationship and design an algorithm that learns macros in solved cases. We then propose a dynamic macro-based heuristic that appropriately reuses the macros rather than immediately assigning them to domains. The above ideas are incorporated into a working planning system called Dynamic Macro-based Fast Forward planner. Finally, we evaluate our method in a series of experiments. Our method effectively optimizes planning since it reduces the result length by an average of 10% relative to the FF, in a time-economic manner. The efficiency is especially improved when invoking an action consumes time.
Jun ZENG Brendan FLANAGAN Sachio HIROKAWA Eisuke ITO
Web page segmentation has a variety of benefits and potential web applications. Early techniques of web page segmentation are mainly based on machine learning algorithms and rule-based heuristics, which cannot be used for large-scale page segmentation. In this paper, we propose a formulated page segmentation method using visual semantics. Instead of analyzing the visual cues of web pages, this method utilizes three measures to formulate the visual semantics: layout tree is used to recognize the visual similar blocks; seam degree is used to describe how neatly the blocks are arranged; content similarity is used to describe the content coherent degree between blocks. A comparison experiment was done using the VIPS algorithm as a baseline. Experiment results show that the proposed method can divide a Web page into appropriate semantic segments.
Jun ZENG Feng LI Brendan FLANAGAN Sachio HIROKAWA
Content extraction from deep Web pages has received great attention in recent years. However, the increasingly complicated HTML structure of Web documents makes it more difficult to recognize the data records by only analyzing the HTML source code. In this paper, we propose a method named LTDE to extract data records from a deep Web page. Instead of analyzing the HTML source code, LTDE utilizes the visual features of data records in deep Web pages. A Web page is considered as a finite set of visual blocks. The data records are the visual blocks that have similar layout. We also propose a pattern recognizing method named layout tree to cluster the similar layout visual blocks. The weight of all clusters is calculated, and the visual blocks in the cluster that has the highest weight are chosen as the data records to be extracted. The experiment results show that LTDE has higher effectiveness and better robustness for Web data extraction compared to previous works.
Weijun ZENG Huali WANG Hui TIAN
In this letter, a new scheme for multirate coprime sampling and reconstructing of sparse multiband signals with very high carrier frequencies is proposed, where the locations of the signal bands are not known a priori. Simulation results show that the new scheme can simultaneously reduce both the number of sampling channels and the sampling rate for perfect reconstruction, compared to the existing schemes requiring high number of sampling channels or high sampling rate.
Weijun ZENG Huali WANG Xiaofu WU Hui TIAN
In this paper, we propose a compressed sensing scheme using sparse-graph codes and peeling decoder (SGPD). By using a mix method for construction of sensing matrices proposed by Pawar and Ramchandran, it generates local sensing matrices and implements sensing and signal recovery in an adaptive manner. Then, we show how to optimize the construction of local sensing matrices using the theory of sparse-graph codes. Like the existing compressed sensing schemes based on sparse-graph codes with “good” degree profile, SGPD requires only O(k) measurements to recover a k-sparse signal of dimension n in the noiseless setting. In the presence of noise, SGPD performs better than the existing compressed sensing schemes based on sparse-graph codes, still with a similar implementation cost. Furthermore, the average variable node degree for sensing matrices is empirically minimized for SGPD among various existing CS schemes, which can reduce the sensing computational complexity.
Wentao LI Min GAO Hua LI Jun ZENG Qingyu XIONG Sachio HIROKAWA
Collaborative filtering (CF) has been widely used in recommender systems to generate personalized recommendations. However, recommender systems using CF are vulnerable to shilling attacks, in which attackers inject fake profiles to manipulate recommendation results. Thus, shilling attacks pose a threat to the credibility of recommender systems. Previous studies mainly derive features from characteristics of item ratings in user profiles to detect attackers, but the methods suffer from low accuracy when attackers adopt new rating patterns. To overcome this drawback, we derive features from properties of item popularity in user profiles, which are determined by users' different selecting patterns. This feature extraction method is based on the prior knowledge that attackers select items to rate with man-made rules while normal users do this according to their inner preferences. Then, machine learning classification approaches are exploited to make use of these features to detect and remove attackers. Experiment results on the MovieLens dataset and Amazon review dataset show that our proposed method improves detection performance. In addition, the results justify the practical value of features derived from selecting patterns.
Huan HAO Huali WANG Weijun ZENG Hui TIAN
This paper presents a novel MEMD interval thresholding denoising, where relevant modes are selected by the similarity measure between the probability density functions of the input and that of each mode. Simulation and measured EEG data processing results show that the proposed scheme achieves better performance than other traditional denoisings.
Jie ZOU Ling XU Mengning YANG Xiaohong ZHANG Jun ZENG Sachio HIROKAWA
The bug reports expressed in natural language text usually suffer from vast, ambiguous and poorly written, which causes the challenge to the duplicate bug reports detection. Current automatic duplicate bug reports detection techniques have mainly focused on textual information and ignored some useful factors. To improve the detection accuracy, in this paper, we propose a new approach calls LNG (LDA and N-gram) model which takes advantages of the topic model LDA and word-based model N-gram. The LNG considers multiple factors, including textual information, semantic correlation, word order, contextual connections, and categorial information, that potentially affect the detection accuracy. Besides, the N-gram adopted in our LNG model is improved by modifying the similarity algorithm. The experiment is conducted under more than 230,000 real bug reports of the Eclipse project. In the evaluation, we propose a new evaluation metric, namely exact-accuracy (EA) rate, which can be used to enhance the understanding of the performance of duplicates detection. The evaluation results show that all the recall rate, precision rate, and EA rate of the proposed method are higher than treating them separately. Also, the recall rate is improved by 2.96%-10.53% compared to the state-of-art approach DBTM.