The Open UniversitySkip to content

An Algorithmic Approach to Missing Data Problem in Modeling Human Aspects in Software Development

Calikli, Gul and Bener, Ayse (2013). An Algorithmic Approach to Missing Data Problem in Modeling Human Aspects in Software Development. In: PROMISE '13: 9th International Conference on Predictive Models in Software Engineering, ACM, New York, USA, article no. 10.

Full text available as:
PDF (Accepted Manuscript) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (341kB) | Preview
DOI (Digital Object Identifier) Link:
Google Scholar: Look up in Google Scholar


Background: In our previous research, we built defect prediction models by using confirmation bias metrics. Due to confirmation bias developers tend to perform unit tests to make their programs run rather than breaking their code. This, in turn, leads to an increase in defect density. The performance of prediction model that is built using confirmation bias was as good as the models that were built with static code or churn metrics.

Aims: Collection of confirmation bias metrics may result in partially "missing data" due to developers' tight schedules, evaluation apprehension and lack of motivation as well as staff turnover. In this paper, we employ Expectation-Maximization (EM) algorithm to impute missing confirmation bias data.

Method: We used four datasets from two large-scale companies. For each dataset, we generated all possible missing data configurations and then employed Roweis' EM algorithm to impute missing data. We built defect prediction models using the imputed data. We compared the performances of our proposed models with the ones that used complete data.

Results: In all datasets, when missing data percentage is less than or equal to 50% on average, our proposed model that used imputed data yielded performance results that are comparable with the performance results of the models that used complete data.

Conclusions: We may encounter the "missing data" problem in building defect prediction models. Our results in this study showed that instead of discarding missing or noisy data, in our case confirmation bias metrics, we can use effective techniques such as EM based imputation to overcome this problem.

Item Type: Conference or Workshop Item
Copyright Holders: 2013 ACM
ISBN: 1-4503-2016-3, 978-1-4503-2016-0
Project Funding Details:
Funded Project NameProject IDFunding Body
Discovery Grant402003-2012NSERC (Natural Sciences and Engineering Research Council of Canada)
Keywords: Algorithms; Human Factors; Measurement
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Item ID: 45347
Depositing User: Gul Calikli
Date Deposited: 18 Feb 2016 13:36
Last Modified: 09 May 2019 10:25
Share this page:


Altmetrics from Altmetric

Citations from Dimensions

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU