The search functionality is under construction.

The search functionality is under construction.

When applying estimation methods, the issue of outliers is inevitable. The extent of their influence has not been clarified, though several studies have evaluated outlier elimination methods. It is unclear whether we should always be sensitive to outliers, whether outliers should always be removed before estimation, and what amount of precaution is required for collecting project data. Therefore, the goal of this study is to illustrate a guideline that suggests how sensitively we should handle outliers. In the analysis, we experimentally add outliers to three datasets, to analyze their influence. We modified the percentage of outliers, their extent (e.g., we varied the actual effort from 100 to 200 person-hours when the extent was 100%), the variables including outliers (e.g., adding outliers to function points or effort), and the locations of outliers in a dataset. Next, the effort was estimated using these datasets. We used multiple linear regression analysis and analogy based estimation to estimate the development effort. The experimental results indicate that the influence of outliers on the estimation accuracy is non-trivial when the extent or percentage of outliers is considerable (i.e., 100% and 20%, respectively). In contrast, their influence is negligible when the extent and percentage are small (i.e., 50% and 10%, respectively). Moreover, in some cases, the linear regression analysis was less affected by outliers than analogy based estimation.

- Publication
- IEICE TRANSACTIONS on Information Vol.E104-D No.1 pp.91-105

- Publication Date
- 2021/01/01

- Publicized
- 2020/10/02

- Online ISSN
- 1745-1361

- DOI
- 10.1587/transinf.2020MPP0005

- Type of Manuscript
- Special Section PAPER (Special Section on Empirical Software Engineering)

- Category

Kenichi ONO

Nara Institute of Science and Technology

Masateru TSUNODA

Kindai University

Akito MONDEN

Okayama University

Kenichi MATSUMOTO

Nara Institute of Science and Technology

The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.

Copy

Kenichi ONO, Masateru TSUNODA, Akito MONDEN, Kenichi MATSUMOTO, "Influence of Outliers on Estimation Accuracy of Software Development Effort" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 1, pp. 91-105, January 2021, doi: 10.1587/transinf.2020MPP0005.

Abstract: When applying estimation methods, the issue of outliers is inevitable. The extent of their influence has not been clarified, though several studies have evaluated outlier elimination methods. It is unclear whether we should always be sensitive to outliers, whether outliers should always be removed before estimation, and what amount of precaution is required for collecting project data. Therefore, the goal of this study is to illustrate a guideline that suggests how sensitively we should handle outliers. In the analysis, we experimentally add outliers to three datasets, to analyze their influence. We modified the percentage of outliers, their extent (e.g., we varied the actual effort from 100 to 200 person-hours when the extent was 100%), the variables including outliers (e.g., adding outliers to function points or effort), and the locations of outliers in a dataset. Next, the effort was estimated using these datasets. We used multiple linear regression analysis and analogy based estimation to estimate the development effort. The experimental results indicate that the influence of outliers on the estimation accuracy is non-trivial when the extent or percentage of outliers is considerable (i.e., 100% and 20%, respectively). In contrast, their influence is negligible when the extent and percentage are small (i.e., 50% and 10%, respectively). Moreover, in some cases, the linear regression analysis was less affected by outliers than analogy based estimation.

URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020MPP0005/_p

Copy

@ARTICLE{e104-d_1_91,

author={Kenichi ONO, Masateru TSUNODA, Akito MONDEN, Kenichi MATSUMOTO, },

journal={IEICE TRANSACTIONS on Information},

title={Influence of Outliers on Estimation Accuracy of Software Development Effort},

year={2021},

volume={E104-D},

number={1},

pages={91-105},

abstract={When applying estimation methods, the issue of outliers is inevitable. The extent of their influence has not been clarified, though several studies have evaluated outlier elimination methods. It is unclear whether we should always be sensitive to outliers, whether outliers should always be removed before estimation, and what amount of precaution is required for collecting project data. Therefore, the goal of this study is to illustrate a guideline that suggests how sensitively we should handle outliers. In the analysis, we experimentally add outliers to three datasets, to analyze their influence. We modified the percentage of outliers, their extent (e.g., we varied the actual effort from 100 to 200 person-hours when the extent was 100%), the variables including outliers (e.g., adding outliers to function points or effort), and the locations of outliers in a dataset. Next, the effort was estimated using these datasets. We used multiple linear regression analysis and analogy based estimation to estimate the development effort. The experimental results indicate that the influence of outliers on the estimation accuracy is non-trivial when the extent or percentage of outliers is considerable (i.e., 100% and 20%, respectively). In contrast, their influence is negligible when the extent and percentage are small (i.e., 50% and 10%, respectively). Moreover, in some cases, the linear regression analysis was less affected by outliers than analogy based estimation.},

keywords={},

doi={10.1587/transinf.2020MPP0005},

ISSN={1745-1361},

month={January},}

Copy

TY - JOUR

TI - Influence of Outliers on Estimation Accuracy of Software Development Effort

T2 - IEICE TRANSACTIONS on Information

SP - 91

EP - 105

AU - Kenichi ONO

AU - Masateru TSUNODA

AU - Akito MONDEN

AU - Kenichi MATSUMOTO

PY - 2021

DO - 10.1587/transinf.2020MPP0005

JO - IEICE TRANSACTIONS on Information

SN - 1745-1361

VL - E104-D

IS - 1

JA - IEICE TRANSACTIONS on Information

Y1 - January 2021

AB - When applying estimation methods, the issue of outliers is inevitable. The extent of their influence has not been clarified, though several studies have evaluated outlier elimination methods. It is unclear whether we should always be sensitive to outliers, whether outliers should always be removed before estimation, and what amount of precaution is required for collecting project data. Therefore, the goal of this study is to illustrate a guideline that suggests how sensitively we should handle outliers. In the analysis, we experimentally add outliers to three datasets, to analyze their influence. We modified the percentage of outliers, their extent (e.g., we varied the actual effort from 100 to 200 person-hours when the extent was 100%), the variables including outliers (e.g., adding outliers to function points or effort), and the locations of outliers in a dataset. Next, the effort was estimated using these datasets. We used multiple linear regression analysis and analogy based estimation to estimate the development effort. The experimental results indicate that the influence of outliers on the estimation accuracy is non-trivial when the extent or percentage of outliers is considerable (i.e., 100% and 20%, respectively). In contrast, their influence is negligible when the extent and percentage are small (i.e., 50% and 10%, respectively). Moreover, in some cases, the linear regression analysis was less affected by outliers than analogy based estimation.

ER -