Conditional random fields (CRFs) have been successfully applied to various applications of predicting and labeling structured data, such as natural language tagging & parsing, image segmentation & object recognition, and protein secondary structure prediction. The key advantages of CRFs are the ability to encode a variety of overlapping, non-independent features from empirical data as well as the capability of reaching the global normalization and optimization. However, estimating parameters for CRFs is very time-consuming due to an intensive forward-backward computation needed to estimate the likelihood function and its gradient during training. This paper presents a high-performance training of CRFs on massively parallel processing systems that allows us to handle huge datasets with hundreds of thousand data sequences and millions of features. We performed the experiments on an important natural language processing task (text chunking) on large-scale corpora and achieved significant results in terms of both the reduction of computational time and the improvement of prediction accuracy.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Xuan-Hieu PHAN, Le-Minh NGUYEN, Yasushi INOGUCHI, Susumu HORIGUCHI, "High-Performance Training of Conditional Random Fields for Large-Scale Applications of Labeling Sequence Data" in IEICE TRANSACTIONS on Information,
vol. E90-D, no. 1, pp. 13-21, January 2007, doi: .
Abstract: Conditional random fields (CRFs) have been successfully applied to various applications of predicting and labeling structured data, such as natural language tagging & parsing, image segmentation & object recognition, and protein secondary structure prediction. The key advantages of CRFs are the ability to encode a variety of overlapping, non-independent features from empirical data as well as the capability of reaching the global normalization and optimization. However, estimating parameters for CRFs is very time-consuming due to an intensive forward-backward computation needed to estimate the likelihood function and its gradient during training. This paper presents a high-performance training of CRFs on massively parallel processing systems that allows us to handle huge datasets with hundreds of thousand data sequences and millions of features. We performed the experiments on an important natural language processing task (text chunking) on large-scale corpora and achieved significant results in terms of both the reduction of computational time and the improvement of prediction accuracy.
URL: https://global.ieice.org/en_transactions/information/10.1587/e90-d_1_13/_p
Copy
@ARTICLE{e90-d_1_13,
author={Xuan-Hieu PHAN, Le-Minh NGUYEN, Yasushi INOGUCHI, Susumu HORIGUCHI, },
journal={IEICE TRANSACTIONS on Information},
title={High-Performance Training of Conditional Random Fields for Large-Scale Applications of Labeling Sequence Data},
year={2007},
volume={E90-D},
number={1},
pages={13-21},
abstract={Conditional random fields (CRFs) have been successfully applied to various applications of predicting and labeling structured data, such as natural language tagging & parsing, image segmentation & object recognition, and protein secondary structure prediction. The key advantages of CRFs are the ability to encode a variety of overlapping, non-independent features from empirical data as well as the capability of reaching the global normalization and optimization. However, estimating parameters for CRFs is very time-consuming due to an intensive forward-backward computation needed to estimate the likelihood function and its gradient during training. This paper presents a high-performance training of CRFs on massively parallel processing systems that allows us to handle huge datasets with hundreds of thousand data sequences and millions of features. We performed the experiments on an important natural language processing task (text chunking) on large-scale corpora and achieved significant results in terms of both the reduction of computational time and the improvement of prediction accuracy.},
keywords={},
doi={},
ISSN={1745-1361},
month={January},}
Copy
TY - JOUR
TI - High-Performance Training of Conditional Random Fields for Large-Scale Applications of Labeling Sequence Data
T2 - IEICE TRANSACTIONS on Information
SP - 13
EP - 21
AU - Xuan-Hieu PHAN
AU - Le-Minh NGUYEN
AU - Yasushi INOGUCHI
AU - Susumu HORIGUCHI
PY - 2007
DO -
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E90-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2007
AB - Conditional random fields (CRFs) have been successfully applied to various applications of predicting and labeling structured data, such as natural language tagging & parsing, image segmentation & object recognition, and protein secondary structure prediction. The key advantages of CRFs are the ability to encode a variety of overlapping, non-independent features from empirical data as well as the capability of reaching the global normalization and optimization. However, estimating parameters for CRFs is very time-consuming due to an intensive forward-backward computation needed to estimate the likelihood function and its gradient during training. This paper presents a high-performance training of CRFs on massively parallel processing systems that allows us to handle huge datasets with hundreds of thousand data sequences and millions of features. We performed the experiments on an important natural language processing task (text chunking) on large-scale corpora and achieved significant results in terms of both the reduction of computational time and the improvement of prediction accuracy.
ER -