Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

DNA methylation-based classification of central nervous system tumours

Abstract

Accurate pathological diagnosis is crucial for optimal management of patients with cancer. For the approximately 100 known tumour types of the central nervous system, standardization of the diagnostic process has been shown to be particularly challenging—with substantial inter-observer variability in the histopathological diagnosis of many tumour types. Here we present a comprehensive approach for the DNA methylation-based classification of central nervous system tumours across all entities and age groups, and demonstrate its application in a routine diagnostic setting. We show that the availability of this method may have a substantial impact on diagnostic precision compared to standard methods, resulting in a change of diagnosis in up to 12% of prospective cases. For broader accessibility, we have designed a free online classifier tool, the use of which does not require any additional onsite data processing. Our results provide a blueprint for the generation of machine-learning-based tumour classifiers across other cancer entities, with the potential to fundamentally transform tumour pathology.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Establishing the DNA methylation-based CNS tumour reference cohort.
Figure 2: Development and cross-validation of the DNA methylation-based CNS tumour classifier.
Figure 3: Implementation of the classifier in diagnostic practice.
Figure 4: Reassessment of discrepant cases and establishment of new diagnosis.
Figure 5: DNA methylation-based identification of potential new CNS tumour entities.

Accession codes

Primary accessions

Gene Expression Omnibus

References

  1. Louis, D. N., Ohgaki, H., Wiestler, O. D. & Cavenee, W. K. WHO Classification of Tumours of the Central Nervous System revised 4th edn (IARC, 2016)

  2. van den Bent, M. J. Interobserver variation of the histopathological diagnosis in clinical trials on glioma: a clinician’s perspective. Acta Neuropathol. 120, 297–304 (2010)

    Article  Google Scholar 

  3. Ellison, D. W. et al. Histopathological grading of pediatric ependymoma: reproducibility and clinical relevance in European trial cohorts. J. Negat. Results Biomed. 10, 7 (2011)

    Article  Google Scholar 

  4. Sturm, D. et al. New brain tumor entities emerge from molecular classification of CNS-PNETs. Cell 164, 1060–1072 (2016)

    Article  CAS  Google Scholar 

  5. Fernandez, A. F. et al. A DNA methylation fingerprint of 1628 human samples. Genome Res. 22, 407–419 (2012)

    Article  CAS  Google Scholar 

  6. Hovestadt, V. et al. Decoding the regulatory landscape of medulloblastoma using DNA methylation sequencing. Nature 510, 537–541 (2014)

    Article  CAS  ADS  Google Scholar 

  7. Moran, S. et al. Epigenetic profiling to classify cancer of unknown primary: a multicentre, retrospective analysis. Lancet Oncol. 17, 1386–1395 (2016)

    Article  Google Scholar 

  8. Hovestadt, V. et al. Robust molecular subgrouping and copy-number profiling of medulloblastoma from small amounts of archival tumour material using high-density DNA methylation arrays. Acta Neuropathol. 125, 913–916 (2013)

    Article  Google Scholar 

  9. Sturm, D. et al. Hotspot mutations in H3F3A and IDH1 define distinct epigenetic and biological subgroups of glioblastoma. Cancer Cell 22, 425–437 (2012)

    Article  CAS  Google Scholar 

  10. Reuss, D. E. et al. Adult IDH wild type astrocytomas biologically and clinically resolve into other tumor entities. Acta Neuropathol. 130, 407–417 (2015)

    Article  CAS  Google Scholar 

  11. Pajtler, K. W. et al. Molecular classification of ependymal tumors across all CNS compartments, histopathological grades, and age groups. Cancer Cell 27, 728–743 (2015)

    Article  CAS  Google Scholar 

  12. Lambert, S. R. et al. Differential expression and methylation of brain developmental genes define location-specific subsets of pilocytic astrocytoma. Acta Neuropathol. 126, 291–301 (2013)

    Article  CAS  Google Scholar 

  13. Thomas, C. et al. Methylation profiling of choroid plexus tumors reveals 3 clinically distinct subgroups. Neuro-oncol. 18, 790–796 (2016)

    Article  CAS  Google Scholar 

  14. Mack, S. C. et al. Epigenomic alterations define lethal CIMP-positive ependymomas of infancy. Nature 506, 445–450 (2014)

    Article  CAS  ADS  Google Scholar 

  15. Johann, P. D. et al. Atypical teratoid/rhabdoid tumors are comprised of three epigenetic subgroups with distinct enhancer landscapes. Cancer Cell 29, 379–393 (2016)

    Article  CAS  Google Scholar 

  16. Wiestler, B. et al. Integrated DNA methylation and copy-number profiling identify three clinically and biologically relevant groups of anaplastic glioma. Acta Neuropathol. 128, 561–571 (2014)

    Article  CAS  Google Scholar 

  17. Van Der Maaten, L. & Hinton, G. H. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  18. Ceccarelli, M. et al. Molecular profiling reveals biologically discrete subsets and pathways of progression in diffuse glioma. Cell 164, 550–563 (2016)

    Article  CAS  Google Scholar 

  19. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  Google Scholar 

  20. Sokolova, M. & Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45, 427–437 (2009)

    Article  Google Scholar 

  21. Sahm, F. et al. Next-generation sequencing in routine brain tumor diagnostics enables an integrated diagnosis and identifies actionable targets. Acta Neuropathol. 131, 903–910 (2016)

    Article  CAS  Google Scholar 

  22. Weller, M. et al. Molecular classification of diffuse cerebral WHO grade II/III gliomas using genome- and transcriptome-wide profiling improves stratification of prognostically distinct patient groups. Acta Neuropathol. 129, 679–693 (2015)

    Article  CAS  Google Scholar 

  23. The Cancer Genome Atlas Research Network. Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas. N. Engl. J. Med. 372, 2481–2498 (2015)

  24. Hovestadt, V. & Zapatka, M. conumee: enhanced copy-number variation analysis using Illumina methylation arrays. v.1.4.2 R package v.0.99.4 http://www.bioconductor.org/packages/release/bioc/html/conumee.html (2015)

  25. Bady, P., Delorenzi, M. & Hegi, M. E. Sensitivity analysis of the MGMT-STP27 model and impact of genetic and epigenetic context to predict the MGMT methylation status in gliomas and other tumors. J. Mol. Diagn. 18, 350–361 (2016)

    Article  CAS  Google Scholar 

  26. Korshunov, A. et al. Histologically distinct neuroepithelial tumors with histone 3 G34 mutation are molecularly similar and comprise a single nosologic entity. Acta Neuropathol. 131, 137–146 (2016)

    Article  CAS  Google Scholar 

  27. Korshunov, A. et al. Embryonal tumor with abundant neuropil and true rosettes (ETANTR), ependymoblastoma, and medulloepithelioma share molecular similarity and comprise a single clinicopathological entity. Acta Neuropathol. 128, 279–289 (2014)

    Article  Google Scholar 

  28. Hölsken, A. et al. Adamantinomatous and papillary craniopharyngiomas are characterized by distinct epigenomic as well as mutational and transcriptomic profiles. Acta Neuropathol. Commun. 4, 20 (2016)

    Article  Google Scholar 

  29. Heim, S. et al. Papillary Tumor of the pineal region: a distinct molecular entity. Brain Pathol. 26, 199–205 (2016)

    Article  CAS  Google Scholar 

  30. Koelsche, C. et al. Melanotic tumors of the nervous system are characterized by distinct mutational, chromosomal and epigenomic profiles. Brain Pathol. 25, 202–208 (2015)

    Article  CAS  Google Scholar 

  31. Jones, D. T. et al. Recurrent somatic alterations of FGFR1 and NTRK2 in pilocytic astrocytoma. Nat. Genet. 45, 927–932 (2013)

    Article  CAS  Google Scholar 

  32. Jones, D. T. et al. Dissecting the genomic complexity underlying medulloblastoma. Nature 488, 100–105 (2012)

    Article  CAS  ADS  Google Scholar 

  33. Pietsch, T. et al. Prognostic significance of clinical, histopathological, and molecular characteristics of medulloblastomas in the prospective HIT2000 multicenter clinical trial cohort. Acta Neuropathol. 128, 137–149 (2014)

    Article  CAS  Google Scholar 

  34. R Core Team. R: A Language and Environment for Statistical Computing. http://www.R-project.org/ (R Foundation for Statistical Computing, Vienna, Austria, 2016)

  35. Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015)

    Article  CAS  Google Scholar 

  36. Aryee, M. J. et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363–1369 (2014)

    Article  CAS  Google Scholar 

  37. Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, e161 (2007)

    Article  Google Scholar 

  38. Leek, J. T. & Storey, J. D. A general framework for multiple testing dependence. Proc. Natl Acad. Sci. USA 105, 18718–18723 (2008)

    Article  CAS  ADS  Google Scholar 

  39. Breiman, L. Classification and Regression Trees (Chapman & Hall/CRC, 1984)

  40. Liaw, A. & Wiener, M. Classification and Regression by randomForest. R News 2/3, 18–22 (2002)

  41. Chen, C., Liaw, A. & Breiman, L. Using Random Forest to Learn Imbalanced Data. Report 666 (Univ. California, Berkeley,2004)

  42. Kim, K. I. & Simon, R. Overfitting, generalization, and MSE in class probability estimation with high-dimensional data. Biom. J. 56, 256–269 (2014)

    Article  MathSciNet  Google Scholar 

  43. Simon, R. Class probability estimation for medical studies. Biom. J. 56, 597–600 (2014)

    Article  MathSciNet  Google Scholar 

  44. Boström, H. Calibrating Random Forests. In Proc. 7th International Conference on Machine Learning and Applications 121–126 (ICMLA, 2008)

  45. Smola, A. J. Advances in Large Margin Classifiers (MIT press, 2000)

  46. Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)

    Article  Google Scholar 

  47. Appel, I. J., Gronwald, W. & Spang, R. Estimating classification probabilities in high-dimensional diagnostic studies. Bioinformatics 27, 2563–2570 (2011)

    CAS  PubMed  Google Scholar 

  48. Hand, D. J. & Till, R. J. A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001)

    Article  Google Scholar 

  49. Brier, G. W. Verification of forecasts expressed in terms of probability. Mon. Weath. Rev. 78, 1–3 (1950)

    Article  ADS  Google Scholar 

  50. Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012)

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank U. Lass, A. Habel, I. Oezen for technical and administrative support, the Microarray unit of the Genomics and Proteomics Core Facility (DKFZ) for methylation services, the German Glioma Network and the Neuroonkologische Arbeitsgemeinschaft for sharing their data. This research was supported by the DKFZ-Heidelberg Center for Personalized Oncology (DKFZ-HIPO_036), the German Childhood Cancer Foundation (DKS 2015.01), an Illumina Medical Research Grant, the DKTK joint funding project ‘Next Generation Molecular Diagnostics of Malignant Gliomas’, the A Kids’ Brain Tumour Cure (PLGA) Foundation, the Brain Tumour Charity (UK) for the Everest Centre for Paediatric Low-Grade Brain Tumour Research, the Friedberg Charitable Foundation and the Sohn Conference Foundation (to M. Snuderl and M. Karajannis), the RKA-Förderpool (Project 37) and Stichting Kinderen Kankervrij and Stichting AMC Foundation (to E. Aronica), NIH/NCI 5T32CA163185 (to A.O.), NIH/NCI Cancer Center Support Grant P30 CA008748 to MSKCC, the Luxembourg National Research Fond (FNR PEARL P16/BM/11192868 to M.M.) and the National Institute of Health Research (NIHR) UCLH/UCL Biomedical Research Centre (S.Bra.).

Author information

Authors and Affiliations

Authors

Contributions

D.C. and D.T.W.J. composed the reference cohort and defined methylation classes; M.Si. and V.Ho. developed and technically validated the classification algorithm; D.Sc. developed the classification website; D.C., D.T.W.J., M.Si., A.Ben., V.Ho., D.Sc., D.Sti., M.Z., A.v.D. and S.M.P. developed additional methodology and software; D.C., D.T.W.J., M.Si., D.Stu., C.Ko., F.Sa., L.C., D.E.R., A.Kr., A.K.W., K.H., L.S., P.N.H., K.H.P., J.Schi., G.R., M.Pri, W.B., F.Se., H.W., T.M., O.W., S.Bre., M.S.-R., D.H., A.Ku., C.M.K., H.L.M., S.Ru., K.v.H., M.C.F., A.Gn., G.F., S.T., G.C., C.-M.M., M.G., T.P., M.B., J.D., M.Pl., A.U., W.W., M.M., C.Har., C.H.-M., M.H., A.Kor., A.v.D. and S.M.P. performed the prospective cohort analysis; P.N.H., K.H.P., H.D., B.K.G., J.H., S.F., P.W., Z.J., T.A., S.Bra. generated and collected the external centre data; K.W.P., A.O., N.W.E., A.K.B., R.C., A.Hö., E.H., R.Be., J.Schi., O.S., K.W., P.V., M.Pa., P.T., D.L., E.A., F.G., E.R., W.S., C.G., F.J.R., A.Bec., M.Pre., C.Hab., R.Bj., J.C., M.F., M.D., S.H., V.Ha., S.Ro., J.R.H., P.K., B.W.K., M.L., B.L., C.M., R.K., Z.K., F.H., A.Koc., A.J., C.Ke., H.M., W.M., U.P., M.Pri., N.G.G., P.H.D., A.P., C.J., T.S.J., B.R., T.P., J.Schr., G.S., M.Wes., G.R., P.W., M.Wel., V.P.C., I.B., A.Hu., N.J., P.A.N., W.P., A.Ga., G.W.R., M.D.T., M.R., M.A.K., M.M., C.Har., K.A., U.S., R.Bu., P.L., M.K., C.H.-M., D.W.E., M.H., S.Bra., A.Kor., A.v.D. and S.M.P. provided reference cohort material and data; K.L., M.B.-H., M.Sc. and R.F. performed methylation profiling; J.Se., K.K., A.T., M.K. and M.Sn. performed technical validation experiments; A.v.D. and S.M.P. supervised the project. The manuscript underwent an internal collaboration-wide review process. All authors approved the final version of the manuscript.

Corresponding authors

Correspondence to Andreas von Deimling or Stefan M. Pfister.

Ethics declarations

Competing interests

A patent for a “DNA-methylation based method for classifying tumor species of the brain” has been applied for by the Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts and Ruprecht-Karls-Universität Heidelberg (EP 3067432 A1) with S.M.P., A.v.D., D.T.W.J., D.C., V.Ho., M.Si., M.B.-H. and M.Sc. as inventors. The other authors declare no competing interests.

Additional information

Reviewer Information Nature thanks S. Pomeroy, M. L. Suva, R. Verhaak and S. Yip for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Figure 1 Unsupervised clustering of the DNA methylation-based reference cohort.

a, Heat map showing the pairwise Pearson correlation (bottom left) of the 32,000 most variably methylated CpG probes of all 2,801 biologically independent samples of the reference cohort. A detailed view of closely related ependymal classes (top right) and the three subclasses identified in atypical teratoid rhabdoid tumours (ATRTs) (bottom right) indicates higher correlation within classes. The colour code and abbreviations are identical to Fig. 1a. b, Eigenvalue frequencies of a PCA using the 32,000 most variably methylated CpG probes of all 2,801 biologically independent samples as in a. The number of non-trivial components was determined by comparing eigenvalues to the maximum eigenvalue of a PCA using randomized beta values (shuffling of sample labels per probe). c, x and y coordinates of the first five of a total of 500 iterations of t-SNE dimensionality reduction generated by random down-sampling to 90% of the 2,801 biologically independent samples to assess clustering stability. Axis positions of individual cases are connected by a line coloured according to the colour code of Fig. 1a. The depiction illustrates the close proximity of cases of the same class across iterations, indicative of a high stability independent of the exact composition of the reference cohort. d, Pairwise correlation of x and y coordinates between 2,801 biologically independent samples over all iterations of the down-sampling analysis demonstrates a very high correlation within classes (average correlation 0.982), indicating a high stability of the t-SNE analysis.

Source data

Extended Data Figure 2 Unsupervised clustering is not biased by a range of possible confounding factors.

a, t-SNE representations of the 2,801 biologically independent samples constituting the reference cohort as shown in Fig. 1b overlaid with potentially confounding factors (bf). b, Distribution of patient sex among the classes illustrates equal or near equal distribution of many classes, but also an expected enrichment for one sex in some classes (for example, female in meningioma or CNS high-grade neuroepithelial tumours with MN1 alteration). c, Patient age illustrates the expected age distribution of many tumour classes. df, The slightly uneven distribution of type of material (for example, pilocytic astrocytoma or meningioma) (d), array preparation date (e) and tissue source (f) are related to the specifics of assembling the reference cohort and do not indicate an apparent confounding effect on the unsupervised clustering.

Source data

Extended Data Figure 3 Estimation of tumour purity and relation to TCGA pan-glioma methylation classes.

a, A random forest model was trained to predict absolute tumour purity estimates50 using the TCGA pan-glioma dataset (795 biologically independent samples)18. The plot shows absolute purity estimates and out-of-bag random forest tumour purity predictions (that is, using only random forest trees for which the respective sample was not involved in the training). The estimated mean squared error is 0.015, indicating that this model is able to yield reasonable predictions of tumour purity from methylation data. b, The distribution of random forest predicted purity in the reference dataset (2,801 biologically independent samples). Purity estimates have been transformed into five categories indicated by different shades of blue. The exact case-by-case values are provided in Supplementary Table 2. The median estimated purity in the reference cohort is 66% (range 42% to 87%) and 78% of samples have an estimated purity of at least 60%. c, t-SNE representation of the reference cohort (2,801 biologically independent samples) overlaid with random forest predicted purity categories. Methylation classes are generally composed of mixed tumour purity categories. Tumour purity shows some association with the WHO grade (WHO I median tumour purity 60%, range 39–77%; WHO II median 66%, range 43–80%; WHO III median 68%, range 54–84%; WHO IV median 69%, range 49–87%). A further association of tumour purity with the composition of classes in the unsupervised t-SNE analysis was not evident. d, t-SNE representation of the reference cohort (2,801 biologically independent samples) overlaid with predicted TCGA pan-glioma DNA methylation classes according to the previously published dataset18. Pan-glioma methylation classes were predicted by training a random forest on the previously published dataset18, which included methylation data of 418 low-grade glioma and 377 glioblastoma samples that were acquired using the Illumina 450k and 27k platforms. The random forest algorithm was trained using the 1,300 CpG signature as described in ref. 18 and using the default settings of the random forest algorithm implemented in the R package randomForest. Pan-glioma class prediction was only performed for subsets of mostly adult astrocytomas, oligodendrogliomas and glioblastomas (magnified areas) included in the previously published dataset18. LGm1, LGm2 and LGm3 show a high overlap with the methylation classes A IDH HG, A IDH and O IDH, respectively. LGm4 shows the highest overlap with methylation class GBM RTK II. LGm5 shows the highest overlap with methylation classes GBM MES and GBM RTK I. LGm6 show the highest overlap with DMG K27, GBM MID and GBM MYCN.

Source data

Extended Data Figure 4 Development of the random forest classifier.

a, The random forest training consists of four steps. First, basic filtering of probes that were not included on the EPIC array, probes located on the X and Y chromosomes, probes affected by single nucleotide polymorphisms, and probes not mapping uniquely to the genome was performed. In the second step, the probe-wise batch effects between samples from FFPE and frozen material were estimated and adjusted by a linear model approach. In the third step, feature selection was performed by training a random forest algorithm using all probes and selecting the 10,000 probes with highest variable importance measure. In the last step, the final random forest is trained using only the 10,000 selected probes. The validation of the random forest classifier involves a threefold nested cross-validation. In the outer loop of the cross-validation, the complete random forest training procedure consisting of four steps as described above are applied to the training data and the resulting random forest is used to predict the test data to generate random forest scores. In the inner loop of the cross-validation a threefold cross-validation is applied to training data of the outer loop in order to generate random forest scores independent of the test data in the outer loop. These scores are then used to fit a calibration model, that is, a L2-penalized, multinomial, logistic regression that takes the random forest scores of the test data in the outer cross-validation loop to estimate tumour class probabilities (P1, P2, P3). To fit a calibration model to estimate class probabilities of diagnostic samples using all data in the reference set, the random forest scores generated in the outer cross-validation loop were used. b, Schematic depiction of three example binary decision trees of the random forest classifier (left), and magnification on five example decisions nodes relevant for glioblastoma classification (right). For prediction, a diagnostic sample enters the root node of each of the 10,000 trees. At every decision node, the decision path is determined on the methylation level of a single CpG, until it reaches a terminal node that provides the class prediction. The joint class prediction of all trees represents the raw prediction score. The colour code and abbreviations are identical to Fig. 1a.

Extended Data Figure 5 Comparison of raw and calibrated classifier scores and threshold definition.

a, Density plots illustrating the distribution of raw and calibrated classifier scores for samples correctly classified during cross-validation (n = 2,701 independent biological samples for raw and n = 2,769 independent biological samples for calibrated), depicted for each methylation class or methylation class family. Score calibration results in a harmonization of score distribution and allows the establishment of a shared classification threshold. Three thresholds for maximizing specificity (0.958), maximizing the Youden index (0.836), and the cutoff used in this study (0.9) are indicated by red lines (see also d and e). b, Multivariate score calibration illustrated as a ternary plot showing scores of the three ATRT subclasses (MYC, SHH and TYR; together n = 112 independent biological samples). Arrows indicate transformation of the scores for individual samples by the calibration model, which increases the discrimination between the three subclasses. c, The accuracy of prediction of the random forest classifier constructed of n = 2,801 biologically independent samples (measured by misclassification error, AUC, Brier score, multiclass sensitivity and specificity) is improved by score calibration and by combining classes into MCF). d, To determine a common threshold for the calibrated MCF scores, we performed a ROC analysis of the maximum calibrated MCF scores of all n = 2,801 biologically independent samples calculated via cross-validation. For this ROC analysis, we defined a new binary class, that is, samples correctly classified during the cross-validation using the maximum calibrated MCF score for classification were considered as ‘classifiable’ (n = 2,769) and samples that got falsely classified using this score were considered ‘non-classifiable’ (n = 32). Three thresholds for different sensitivity and specificity are highlighted in the ROC curve: a threshold of 0.958 achieving a maximum specificity of 1 with a sensitivity of 0.827, a threshold of 0.836 obtaining a maximum Youden index with specificity 0.938 and sensitivity 0.934, and our recommended threshold of 0.9 that results in a specificity of 0.938 and a sensitivity of 0.9. Bootstrapped 95% confidence intervals for estimated sensitivity and specificity are indicated in grey. e, Sensitivity and specificity for all possible thresholds applied to cross-validated maximum MCF classifier scores of all n = 2,801 biologically independent samples. Three thresholds for maximizing specificity (0.958), maximizing the Youden index (0.836) and 0.9 are highlighted by red lines.

Source data

Extended Data Figure 6 Diagnostic utility of the DNA methylation-based classifier, assessed at different centres.

a, Implementation of the DNA methylation classifier by five external centres. In total, 401 independent biological samples were analysed. 78% matched to an established class with a cut-off score of ≥0.9 (class colours as in Fig. 1a). A new diagnosis was established in 12% of cases. b, Depiction of individual centre results, illustrating the different composition of samples included in the analysis, variation in the rate of non-matching cases, and of cases for which a new diagnosis was established. Case-by-case details are provided in Supplementary Table 6.

Source data

Extended Data Figure 7 Inter-centre and inter-platform reproducibility of the DNA methylation-based classification.

a, Calibrated scores of 53 independent biological samples representing diagnostic CNS tumour cases analysed at the University of Heidelberg and at the New York University pathology department. Both laboratories performed independent DNA extraction, array hybridization and data analysis. Cases falling into green areas were classified identically in both centres (96%); cases in the red area were non-classifiable in one centre (4%). None of the 53 samples was assigned to a different methylation class by the two centres. b, Copy-number profiles calculated from the array data generated at both centres were highly comparable and allowed identification of chromosomal gains, losses, amplifications and deletions. Calculations and interpretation were performed once at each centre. c, Plot of maximum raw classification scores of 16 different tumour samples generated using both 450k and EPIC arrays. All cases fall close to the bisecting line (red) indicating a high concordance of the scores. Further, the methylation class prediction was identical for all samples. d, The CNS tumour classifier also performs well with data generated by WGBS. The plot shows classifier scores calculated from WGBS and 450k arrays of 50 cases comprising 11 different brain tumour entities (bisecting line in red). Methylation beta values were calculated from high-coverage WGBS data (>10 fold average coverage) and run through the CNS tumour classifier and plotted against the same case analysed using 450k arrays. The highest class prediction score was identical in all cases.

Source data

Extended Data Figure 8 Example of the PDF report of an IDH wild-type glioblastoma sample.

Extended Data Figure 9 Example work flow and timeline of diagnostic methylation profiling.

Supplementary information

Life Sciences Reporting Summary (PDF 74 kb)

Supplementary Table 1

Overview of reference methylation class characteristics. This table gives an overview of the main characteristics of the 82 tumour and 9 non-tumour methylation classes including full names of the methylation class, association of class with a methylation class family, number of cases per class, class age characteristics, male / female ratio, tumour localization, most frequent pathological diagnoses and a running text summarizing typical class features. Further, the Hex colour code of the reference classes used throughout this manuscript is provided. (XLSX 37 kb)

Supplementary Table 2

Case by case list of reference cohort. This table gives case-by-case details of the n=2801 biologically independent samples constituting the reference cohort including the Sentrix ID (.idat), tissue source, clinical data, methylation class and technical specifications. (XLSX 301 kb)

Supplementary Table 3

Single class sensitivity and specificity. This table provides single class specificity and sensitivity for the ≥0.9 calibrated classifier score for methylation class families and methylation classes that are not assigned to a methylation class family. In addition single class specificity and sensitivity is provided for the ≥0.5 calibrated classifier score for methylation classes that are part of a methylation class family and that can be used for subclassification for individual family member identification. The data was generated using n=2801 biologically independent samples. (XLSX 15 kb)

Supplementary Table 4

Case by case list of prospective validation cohort. This table gives case-by-case details of the n=1104 biologically independent samples constituting the prospective clinical cohort including information on the tissue source, clinical data, methylation class prediction (Classifier version V11b2), interpretation of classification and technical specifications. (XLSX 165 kb)

Supplementary Table 5

Case by case list of of discordant cases. This table gives case-by-case details of the n=139 biologically independent samples with discordant results between pathological diagnosis and methylation profiling. The cases are categorized into reclassified ("establishing new diagnosis", n=129) or misleading profile (n=10). Information on the orthogonal methods used for reassessment as well as the key information resulting in reclassification is provided. (XLSX 39 kb)

Supplementary Table 6

Case by case list of external diagnostic cohort. This table gives case-by-case details of the n=401 biologically independent samples constituting the external centre diagnostic cohort including clinical data, original pathological diagnosis, methylation class prediction, interpretation of classification and the final pathological diagnosis (after integration with classifier result). (XLSX 54 kb)

PowerPoint slides

Source data

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Capper, D., Jones, D., Sill, M. et al. DNA methylation-based classification of central nervous system tumours. Nature 555, 469–474 (2018). https://doi.org/10.1038/nature26000

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature26000

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer