1-6hit |
With the popularity of smart devices, mobile crowdsensing, in which the crowdsensing platform gathers useful data from users of smart devices, e.g., smartphones, has become a prevalent paradigm. Various incentive mechanisms have been extensively adopted for the crowdsensing platform to incentivize users of smart devices to offer sensing data. Existing works have concentrated on rewarding smart-device users for their short term effort to provide data without considering the long-term factors of smart-device users and the quality of data. Our previous work has considered the quality of data of smart-device users by incorporating the long-term reputation of smart-device users. However, our previous work only considered a quality maximization problem with budget constraints on one location. In this paper, multiple locations are considered. Stackelberg game is utilized to solve a two-stage optimization problem. In the first stage, the crowdsensing platform allocates the budget to different locations and sets price as incentives for users to maximize the total data quality. In the second stage, the users make efforts to provide data to maximize its utility. Extensive numerical simulations are conducted to evaluate proposed algorithm.
In the current era of data science, data quality has a significant and critical impact on business operations. This is no different for the meteorological data encountered in the field of meteorology. However, the conventional methods of meteorological data quality control mainly focus on error detection and null-value detection; that is, they only consider the results of the data output but ignore the quality problems that may also arise in the workflow. To rectify this issue, this paper proposes the Total Meteorological Data Quality (TMDQ) framework based on the Total Quality Management (TQM) perspective, especially considering the systematic nature of data warehousing and process focus needs. In practical applications, this paper uses the proposed framework as the basis for the development of a system to help meteorological observers improve and maintain the quality of meteorological data in a timely and efficient manner. To verify the feasibility of the proposed framework and demonstrate its capabilities and usage, it was implemented in the Tamsui Meteorological Observatory (TMO) in Taiwan. The four quality dimension indicators established through the proposed framework will help meteorological observers grasp the various characteristics of meteorological data from different aspects. The application and research limitations of the proposed framework are discussed and possible directions for future research are presented.
In many applications, tables are distributively stored in different data sources, but the frequency of updates on each data source is different. Some techniques have been proposed to effectively express the temporal orders between different values, and the most current, i.e. up-to-date, value of a given data item can be easily picked up according to the temporal orders. However, the currency of the data items in the same table may be different. That is, when a user asks for a table D, it cannot be ensured that all the most current values of the data items in D are stored in a single table. Since different data sources may have overlaps, we can construct a conjunctive query on multiple tables to get all the required current values. In this paper, we formalize the conjunctive query as currency preserving query, and study how to generate the minimized currency preserving query to reduce the cost of visiting different data sources. First, a graph model is proposed to represent the distributed tables and their relationships. Based on the model, we prove that a currency preserving query is equivalent to a terminal tree in the graph, and give an algorithm to generate a query from a terminal tree. After that, we study the problem of finding minimized currency preserving query. The problem is proved to be NP-hard, and some heuristics strategies are provided to solve the problem. Finally, we conduct experiments on both synthetic and real data sets to verify the effectiveness and efficiency of the proposed techniques.
Mohan LI Jianzhong LI Siyao CHENG Yanbin SUN
Currency is one of the important measurements of data quality. The main purpose of the study on data currency is to determine whether a given data item is up-to-date. Though there are already several works on determining data currency, all the proposed methods have limitations. Some works require timestamps of data items that are not always available, and others are based on certain currency rules that can only decide relevant currency and cannot express uncertain semantics. To overcome the limitations of the previous methods, this paper introduces a new approach for determining data currency based on uncertain currency rules. First, a class of uncertain currency rules is provided to infer the possible valid time for a given data item, and then based on the rules, data currency is formally defined. After that, a polynomial time algorithm for evaluating data currency is given based on the uncertain currency rules. Using real-life data sets, the effectiveness and efficiency of the proposed method are experimentally verified.
Md-Mizanur RAHOMAN Ryutaro ICHISE
These days, the Web contains a huge volume of (semi-)structured data, called Linked Data (LD). However, LD suffer in data quality, and this poor data quality brings the need to identify erroneous data. Because manual erroneous data checking is impractical, automatic erroneous data detection is necessary. According to the data publishing guidelines of LD, data should use (already defined) ontology which populates type-annotated LD. Usually, the data type annotation helps in understanding the data. However, in our observation, the data type annotation could be used to identify erroneous data. Therefore, to automatically identify possible erroneous data over the type-annotated LD, we propose a framework that uses a novel nearest-neighbor based error detection technique. We conduct experiments of our framework on DBpedia, a type-annotated LD dataset, and found that our framework shows better performance of error detection in comparison with state-of-the-art framework.
Jinling ZHOU Xingchun DIAO Jianjun CAO Zhisong PAN
Compared to the traditional functional dependency (FD), the extended conditional functional dependency (CFD) has shown greater potential for detecting and repairing inconsistent data. CFDMiner is a widely used algorithm for mining constant-CFDs. But the search space of CFDMiner is too large, and there is still room for efficiency improvement. In this paper, an efficient pruning strategy is proposed to optimize the algorithm by reducing the search space. Both theoretical analysis and experiments have proved the optimized algorithm can produce the consistent results as the original CFDMiner.