Real-time content analysis is typically a bottleneck in Web filtering. To accelerate the filtering process, this work presents a simple, but effective early decision algorithm that analyzes only part of the Web content. This algorithm can make the filtering decision, either to block or to pass the Web content, as soon as it is confident with a high probability that the content really belongs to a banned or an allowed category. Experiments show the algorithm needs to examine only around one-fourth of the Web content on average, while the accuracy remains fairly good: 89% for the banned content and 93% for the allowed content. This algorithm can complement other Web filtering approaches, such as URL blocking, to filter the Web content with high accuracy and efficiency. Text classification algorithms in other applications can also follow the principle of early decision to accelerate their applications.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Po-Ching LIN, Ming-Dao LIU, Ying-Dar LIN, Yuan-Cheng LAI, "Accelerating Web Content Filtering by the Early Decision Algorithm" in IEICE TRANSACTIONS on Information,
vol. E91-D, no. 2, pp. 251-257, February 2008, doi: 10.1093/ietisy/e91-d.2.251.
Abstract: Real-time content analysis is typically a bottleneck in Web filtering. To accelerate the filtering process, this work presents a simple, but effective early decision algorithm that analyzes only part of the Web content. This algorithm can make the filtering decision, either to block or to pass the Web content, as soon as it is confident with a high probability that the content really belongs to a banned or an allowed category. Experiments show the algorithm needs to examine only around one-fourth of the Web content on average, while the accuracy remains fairly good: 89% for the banned content and 93% for the allowed content. This algorithm can complement other Web filtering approaches, such as URL blocking, to filter the Web content with high accuracy and efficiency. Text classification algorithms in other applications can also follow the principle of early decision to accelerate their applications.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e91-d.2.251/_p
Copy
@ARTICLE{e91-d_2_251,
author={Po-Ching LIN, Ming-Dao LIU, Ying-Dar LIN, Yuan-Cheng LAI, },
journal={IEICE TRANSACTIONS on Information},
title={Accelerating Web Content Filtering by the Early Decision Algorithm},
year={2008},
volume={E91-D},
number={2},
pages={251-257},
abstract={Real-time content analysis is typically a bottleneck in Web filtering. To accelerate the filtering process, this work presents a simple, but effective early decision algorithm that analyzes only part of the Web content. This algorithm can make the filtering decision, either to block or to pass the Web content, as soon as it is confident with a high probability that the content really belongs to a banned or an allowed category. Experiments show the algorithm needs to examine only around one-fourth of the Web content on average, while the accuracy remains fairly good: 89% for the banned content and 93% for the allowed content. This algorithm can complement other Web filtering approaches, such as URL blocking, to filter the Web content with high accuracy and efficiency. Text classification algorithms in other applications can also follow the principle of early decision to accelerate their applications.},
keywords={},
doi={10.1093/ietisy/e91-d.2.251},
ISSN={1745-1361},
month={February},}
Copy
TY - JOUR
TI - Accelerating Web Content Filtering by the Early Decision Algorithm
T2 - IEICE TRANSACTIONS on Information
SP - 251
EP - 257
AU - Po-Ching LIN
AU - Ming-Dao LIU
AU - Ying-Dar LIN
AU - Yuan-Cheng LAI
PY - 2008
DO - 10.1093/ietisy/e91-d.2.251
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E91-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2008
AB - Real-time content analysis is typically a bottleneck in Web filtering. To accelerate the filtering process, this work presents a simple, but effective early decision algorithm that analyzes only part of the Web content. This algorithm can make the filtering decision, either to block or to pass the Web content, as soon as it is confident with a high probability that the content really belongs to a banned or an allowed category. Experiments show the algorithm needs to examine only around one-fourth of the Web content on average, while the accuracy remains fairly good: 89% for the banned content and 93% for the allowed content. This algorithm can complement other Web filtering approaches, such as URL blocking, to filter the Web content with high accuracy and efficiency. Text classification algorithms in other applications can also follow the principle of early decision to accelerate their applications.
ER -