Differential testing for machine learning: an analysis for classification algorithms beyond deep learning

Herbold, Steffen; Tunkel, Steffen

Differential testing for machine learning: an analysis for classification algorithms beyond deep learning

2023 | journal article. A publication with affiliation to the University of Göttingen.

Jump to: Cite & Linked | Documents & Media | Details | Version history

Cite this publication

Differential testing for machine learning: an analysis for classification algorithms beyond deep learning
Herbold, S. & Tunkel, S. (2023)
Empirical Software Engineering, 28(2). DOI: https://doi.org/10.1007/s10664-022-10273-9

Copy

GRO View APA Chicago MLA Vancouver

Citable link

GRO.publications Link

Further links

DOI

Documents & Media

document.pdf2.76 MBAdobe PDF

License

GRO License

Details

Authors: Herbold, Steffen; Tunkel, Steffen
Abstract: Abstract Differential testing is a useful approach that uses different implementations of the same algorithms and compares the results for software testing. In recent years, this approach was successfully used for test campaigns of deep learning frameworks. There is little knowledge about the application of differential testing beyond deep learning. Within this article, we want to close this gap for classification algorithms. We conduct a case study using Scikit-learn, Weka, Spark MLlib, and Caret in which we identify the potential of differential testing by considering which algorithms are available in multiple frameworks, the feasibility by identifying pairs of algorithms that should exhibit the same behavior, and the effectiveness by executing tests for the identified pairs and analyzing the deviations. While we found a large potential for popular algorithms, the feasibility seems limited because, often, it is not possible to determine configurations that are the same in other frameworks. The execution of the feasible tests revealed that there is a large number of deviations for the scores and classes. Only a lenient approach based on statistical significance of classes does not lead to a huge amount of test failures. The potential of differential testing beyond deep learning seems limited for research into the quality of machine learning libraries. Practitioners may still use the approach if they have deep knowledge about implementations, especially if a coarse oracle that only considers significant differences of classes is sufficient.
Issue Date: 2023
Journal: Empirical Software Engineering
ISSN: 1382-3256
eISSN: 1573-7616
Language: English

Export Metadata

Refman EndNote BibTeX RefWorks Excel CSV

Differential testing for machine learning: an analysis for classification algorithms beyond deep learning

Cite this publication

Citable link

Further links

Documents & Media

License

Details

Export Metadata

Reference

Citations

Social Media