Estimating evolutionary distances between genomic sequences from spaced-word matches
2015 | journal article. A publication with affiliation to the University of Göttingen.
Jump to: Cite & Linked | Documents & Media | Details | Version history
Documents & Media
Details
- Authors
- Morgenstern, Burkhard ; Zhu, Bingyao; Horwege, Sebastian; Leimeister, Chris Andre
- Abstract
- Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator d(N) of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than other distance measures that are used by alignment-free methods. In addition, we calculate the variance of the normalized number N of (spaced) word matches. We show that the variance of N is smaller for spaced words than for contiguous words, and that the variance is further reduced if our spaced-words approach is used with multiple patterns of 'match positions' and 'don't care positions'. Our software is available online and as downloadable source code at: http://spaced.gobics.de/.
- Issue Date
- 2015
- Status
- published
- Publisher
- Biomed Central Ltd
- Journal
- Algorithms for Molecular Biology
- ISSN
- 1748-7188
- Sponsor
- Open-Access-Publikationsfonds 2015