Multiscale DNA partitioning: statistical evidence for segments

2014 | journal article; research paper. A publication with affiliation to the University of Göttingen.

Jump to: Cite & Linked | Documents & Media | Details | Version history

Cite this publication

​Multiscale DNA partitioning: statistical evidence for segments​
Futschik, A.; Hotz, T. ; Munk, A.   & Sieling, H. ​ (2014) 
Bioinformatics30(16) pp. 2255​-2262​.​ DOI: https://doi.org/10.1093/bioinformatics/btu180 

Documents & Media

License

GRO License GRO License

Details

Authors
Futschik, Andreas; Hotz, Thomas ; Munk, Axel ; Sieling, Hannes 
Abstract
Motivation: DNA segmentation, i.e. the partitioning of DNA in compositionally homogeneous segments, is a basic task in bioinformatics. Different algorithms have been proposed for various partitioning criteria such as Guanine/Cytosine (GC) content, local ancestry in population genetics or copy number variation. A critical component of any such method is the choice of an appropriate number of segments. Some methods use model selection criteria and do not provide a suitable error control. Other methods that are based on simulating a statistic under a null model provide suitable error control only if the correct null model is chosen. Results: Here, we focus on partitioning with respect to GC content and propose a new approach that provides statistical error control: as in statistical hypothesis testing, it guarantees with a user-specified probability 1 - alpha that the number of identified segments does not exceed the number of actually present segments. The method is based on a statistical multiscale criterion, rendering this as a segmentation method that searches segments of any length (on all scales) simultaneously. It is also accurate in localizing segments: under benchmark scenarios, our approach leads to a segmentation that is more accurate than the approaches discussed in the comparative review of Elhaik et al. In our real data examples, we find segments that often correspond well to features taken from standard University of California at Santa Cruz (UCSC) genome annotation tracks.
Issue Date
2014
Journal
Bioinformatics 
ISSN
1367-4803
eISSN
1460-2059
Language
English

Reference

Citations


Social Media