A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells.

2018 | journal article. A publication with affiliation to the University of Göttingen.

Erratum to this publication

Jump to: Cite & Linked | Documents & Media | Details | Version history

Cite this publication

​A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells.​
Wolff, A. ; Bayerlová, M. ; Gaedcke, J. ; Kube, D.   & Beißbarth, T. ​ (2018) 
PLOS ONE13(5) art. e0197162​.​ DOI: https://doi.org/10.1371/journal.pone.0197162 

Documents & Media

journal.pone.0197162.pdf3.09 MBAdobe PDF

License

Published Version

Attribution 4.0 CC BY 4.0

Details

Authors
Wolff, Alexander ; Bayerlová, Michaela ; Gaedcke, Jochen ; Kube, Dieter ; Beißbarth, Tim 
Abstract
BACKGROUND: Pipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances. METHODS: Four commonly used RNA-Seq pipelines (STAR/HTSeq-Count/edgeR, STAR/RSEM/edgeR, Sailfish/edgeR, TopHat2/Cufflinks/CuffDiff)) were investigated on multiple levels (alignment and counting) and cross-compared with the microarray counterpart on the level of gene expression and gene ontology enrichment. For these comparisons we generated two matched microarray and RNA-Seq datasets: Burkitt Lymphoma cell line data and rectal cancer patient data. RESULTS: The overall mapping rate of STAR was 98.98% for the cell line dataset and 98.49% for the patient dataset. Tophat's overall mapping rate was 97.02% and 96.73%, respectively, while Sailfish had only an overall mapping rate of 84.81% and 54.44%. The correlation of gene expression in microarray and RNA-Seq data was moderately worse for the patient dataset (ρ = 0.67-0.69) than for the cell line dataset (ρ = 0.87-0.88). An exception were the correlation results of Cufflinks, which were substantially lower (ρ = 0.21-0.29 and 0.34-0.53). For both datasets we identified very low numbers of differentially expressed genes using the microarray platform. For RNA-Seq we checked the agreement of differentially expressed genes identified in the different pipelines and of GO-term enrichment results. CONCLUSION: In conclusion the combination of STAR aligner with HTSeq-Count followed by STAR aligner with RSEM and Sailfish generated differentially expressed genes best suited for the dataset at hand and in agreement with most of the other transcriptomics pipelines.
Issue Date
2018
Journal
PLOS ONE 
Organization
Institut für Medizinische Statistik ; Klinik für Allgemein-, Viszeral- und Kinderchirurgie ; Klinik für Hämatologie und Medizinische Onkologie 
ISSN
1932-6203
Language
English

Reference

Citations


Social Media