P-values for classification

2008 | journal article; research paper. A publication with affiliation to the University of Göttingen.

Jump to: Cite & Linked | Documents & Media | Details | Version history

Cite this publication

​P-values for classification​
Duembgen, L.; Igl, B.-W. & Munk, A. ​ (2008) 
Electronic Journal of Statistics2 pp. 468​-493​.​ DOI: https://doi.org/10.1214/08-EJS245 

Documents & Media

document.pdf552.9 kBAdobe PDF

License

GRO License GRO License

Details

Authors
Duembgen, Lutz; Igl, Bernd-Wolfgang; Munk, Axel 
Abstract
Let (X, Y) be a random variable consisting of an observed feature vector X is an element of X and an unobserved class label Y is an element of {1, 2, ... , L} with unknown joint distribution. In addition, let D be a training data set consisting of n completely observed independent copies of (X, Y) Usual classification procedures provide point predictors (classifiers) (Y) over cap (X, D) of Y or estimate the conditional distribution of Y given X. In order to quantify the certainty of classifying X we propose to construct for each theta = 1,2, .... , L a p-value pi(theta)(X, D) for the null hypothesis that Y = theta, treating Y temporarily as a fixed parameter. In other words, the point predictor (Y) over cap (X, D) is replaced with a prediction region for Y with a certain confidence. We argue that (i) this approach is advantageous over traditional approaches and (ii) any reasonable classifier can be modified to yield nonparametric p-values. We discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects.
Issue Date
2008
Journal
Electronic Journal of Statistics 
ISSN
1935-7524
Language
English

Reference

Citations


Social Media