Estimating evolutionary distances between genomic sequences from spaced-word matches

2015 | journal article. A publication with affiliation to the University of Göttingen.

Jump to: Cite & Linked | Documents & Media | Details | Version history

Cite this publication

​Estimating evolutionary distances between genomic sequences from spaced-word matches​
Morgenstern, B. ; Zhu, B.; Horwege, S. & Leimeister, C. A.​ (2015) 
Algorithms for Molecular Biology10 art. x​.​ DOI: https://doi.org/10.1186/s13015-015-0032-x 

Documents & Media

License

Published Version

Attribution 4.0 CC BY 4.0

Details

Authors
Morgenstern, Burkhard ; Zhu, Bingyao; Horwege, Sebastian; Leimeister, Chris Andre
Abstract
Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator d(N) of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than other distance measures that are used by alignment-free methods. In addition, we calculate the variance of the normalized number N of (spaced) word matches. We show that the variance of N is smaller for spaced words than for contiguous words, and that the variance is further reduced if our spaced-words approach is used with multiple patterns of 'match positions' and 'don't care positions'. Our software is available online and as downloadable source code at: http://spaced.gobics.de/.
Issue Date
2015
Status
published
Publisher
Biomed Central Ltd
Journal
Algorithms for Molecular Biology 
ISSN
1748-7188
Sponsor
Open-Access-Publikationsfonds 2015

Reference

Citations


Social Media