Addressing problems with replicability and validity of repository mining studies through a smart data platform

2017 | journal article. A publication with affiliation to the University of Göttingen.

Jump to: Cite & Linked | Documents & Media | Details | Version history

Cite this publication

​Addressing problems with replicability and validity of repository mining studies through a smart data platform​
Trautsch, F. ; Herbold, S. ; Makedonski, P.   & Grabowski, J. ​ (2017) 
Empirical Software Engineering23(2) pp. 1036​-1083​.​ DOI: https://doi.org/10.1007/s10664-017-9537-x 

Documents & Media

License

GRO License GRO License

Details

Authors
Trautsch, Fabian ; Herbold, Steffen ; Makedonski, Philip ; Grabowski, Jens 
Abstract
The usage of empirical methods has grown common in software engineering. This trend spawned hundreds of publications, whose results are helping to understand and improve the software development process. Due to the data-driven nature of this venue of investigation, we identified several problems within the current state-of-the-art that pose a threat to the replicability and validity of approaches. The heavy re-use of data sets in many studies may invalidate the results in case problems with the data itself are identified. Moreover, for many studies data and/or the implementations are not available, which hinders a replication of the results and, thereby, decreases the comparability between studies. Furthermore, many studies use small data sets, which comprise of less than 10 projects. This poses a threat especially to the external validity of these studies. Even if all information about the studies is available, the diversity of the used tooling can make their replication even then very hard. Within this paper, we discuss a potential solution to these problems through a cloud-based platform that integrates data collection and analytics. We created SmartSHARK, which implements our approach. Using SmartSHARK, we collected data from several projects and created different analytic examples. Within this article, we present SmartSHARK and discuss our experiences regarding the use of it and the mentioned problems. Additionally, we show how we have addressed the issues that we have identified during our work with SmartSHARK.
Issue Date
2017
Journal
Empirical Software Engineering 
ISSN
1382-3256; 1573-7616
Language
English

Reference

Citations


Social Media