Truth Discovery of Multi-Source Text Data

Chen CHANG; Jianjun CAO; Qin FENG; Nianfeng WENG; Yuling SHANG

doi:10.1587/transinf.2018EDL8267

Truth Discovery of Multi-Source Text Data

Chen CHANG, Jianjun CAO, Qin FENG, Nianfeng WENG, Yuling SHANG

Full Text Views

0

Cite this

Summary :

Most existing truth discovery approaches are designed for structured data, and cannot meet the strong need to extract trustworthy information from raw text data for its unique characteristics such as multifactorial property of text answers (i.e., an answer may contain multiple key factors) and the diversity of word usages (i.e., different words may have the same semantic meaning). As for text answers, there are no absolute correctness or errors, most answers may be partially correct, which is quite different from the situation of traditional truth discovery. To solve these challenges, we propose an optimization-based text truth discovery model which jointly groups keywords extracted from the answers of the specific question into a set of multiple factors. Then, we select the subset of multiple factors as identified truth set for each question by parallel ant colony synchronization optimization algorithm. After that, the answers to each question can be ranked based on the similarities between factors answer provided and identified truth factors. The experiment results on real dataset show that though text data structures are complex, our model can still find reliable answers compared with retrieval-based and state-of-the-art approaches.

Publication: IEICE TRANSACTIONS on Information Vol.E102-D No.11 pp.2249-2252

Publication Date: 2019/11/01

Publicized: 2019/08/22

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2018EDL8267

Type of Manuscript: LETTER

Category: Fundamentals of Information Systems

Authors

Chen CHANG
  Army Engineering University of PLA
Jianjun CAO
  National University of Defense Technology
Qin FENG
  Army Engineering University of PLA
Nianfeng WENG
  National University of Defense Technology
Yuling SHANG
  Army Engineering University of PLA

Keyword

truth discovery, ant colony optimization, text mining

Cite this

Copy

Chen CHANG, Jianjun CAO, Qin FENG, Nianfeng WENG, Yuling SHANG, "Truth Discovery of Multi-Source Text Data" in IEICE TRANSACTIONS on Information, vol. E102-D, no. 11, pp. 2249-2252, November 2019, doi: 10.1587/transinf.2018EDL8267.
Abstract: Most existing truth discovery approaches are designed for structured data, and cannot meet the strong need to extract trustworthy information from raw text data for its unique characteristics such as multifactorial property of text answers (i.e., an answer may contain multiple key factors) and the diversity of word usages (i.e., different words may have the same semantic meaning). As for text answers, there are no absolute correctness or errors, most answers may be partially correct, which is quite different from the situation of traditional truth discovery. To solve these challenges, we propose an optimization-based text truth discovery model which jointly groups keywords extracted from the answers of the specific question into a set of multiple factors. Then, we select the subset of multiple factors as identified truth set for each question by parallel ant colony synchronization optimization algorithm. After that, the answers to each question can be ranked based on the similarities between factors answer provided and identified truth factors. The experiment results on real dataset show that though text data structures are complex, our model can still find reliable answers compared with retrieval-based and state-of-the-art approaches.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDL8267/_p

Copy

@ARTICLE{e102-d_11_2249,
author={Chen CHANG, Jianjun CAO, Qin FENG, Nianfeng WENG, Yuling SHANG, },
journal={IEICE TRANSACTIONS on Information},
title={Truth Discovery of Multi-Source Text Data},
year={2019},
volume={E102-D},
number={11},
pages={2249-2252},
abstract={Most existing truth discovery approaches are designed for structured data, and cannot meet the strong need to extract trustworthy information from raw text data for its unique characteristics such as multifactorial property of text answers (i.e., an answer may contain multiple key factors) and the diversity of word usages (i.e., different words may have the same semantic meaning). As for text answers, there are no absolute correctness or errors, most answers may be partially correct, which is quite different from the situation of traditional truth discovery. To solve these challenges, we propose an optimization-based text truth discovery model which jointly groups keywords extracted from the answers of the specific question into a set of multiple factors. Then, we select the subset of multiple factors as identified truth set for each question by parallel ant colony synchronization optimization algorithm. After that, the answers to each question can be ranked based on the similarities between factors answer provided and identified truth factors. The experiment results on real dataset show that though text data structures are complex, our model can still find reliable answers compared with retrieval-based and state-of-the-art approaches.},
keywords={},
doi={10.1587/transinf.2018EDL8267},
ISSN={1745-1361},
month={November},}

Copy

TY - JOUR
TI - Truth Discovery of Multi-Source Text Data
T2 - IEICE TRANSACTIONS on Information
SP - 2249
EP - 2252
AU - Chen CHANG
AU - Jianjun CAO
AU - Qin FENG
AU - Nianfeng WENG
AU - Yuling SHANG
PY - 2019
DO - 10.1587/transinf.2018EDL8267
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 11
JA - IEICE TRANSACTIONS on Information
Y1 - November 2019
AB - Most existing truth discovery approaches are designed for structured data, and cannot meet the strong need to extract trustworthy information from raw text data for its unique characteristics such as multifactorial property of text answers (i.e., an answer may contain multiple key factors) and the diversity of word usages (i.e., different words may have the same semantic meaning). As for text answers, there are no absolute correctness or errors, most answers may be partially correct, which is quite different from the situation of traditional truth discovery. To solve these challenges, we propose an optimization-based text truth discovery model which jointly groups keywords extracted from the answers of the specific question into a set of multiple factors. Then, we select the subset of multiple factors as identified truth set for each question by parallel ant colony synchronization optimization algorithm. After that, the answers to each question can be ranked based on the similarities between factors answer provided and identified truth factors. The experiment results on real dataset show that though text data structures are complex, our model can still find reliable answers compared with retrieval-based and state-of-the-art approaches.
ER -