Fast alignment-free sequence comparison using spaced-word frequencies

2014 | journal article. A publication with affiliation to the University of Göttingen.

Jump to: Cite & Linked | Documents & Media | Details | Version history

Cite this publication

​Fast alignment-free sequence comparison using spaced-word frequencies​
Leimeister, C.-A.; Boden, M.; Horwege, S.; Lindner, S. & Morgenstern, B. ​ (2014) 
Bioinformatics30(14) pp. 1991​-1999​.​ DOI: https://doi.org/10.1093/bioinformatics/btu177 

Documents & Media

1991.full.pdf1.2 MBAdobe PDFSupplement168 kBAdobe PDF

License

Published Version

Attribution 3.0 CC BY 3.0

Details

Authors
Leimeister, Chris-Andre; Boden, Marcus; Horwege, Sebastian; Lindner, Sebastian; Morgenstern, Burkhard 
Abstract
Motivation: Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free approaches work by comparing the word composition of sequences. A well-known problem with these methods is that neighbouring word matches are far from independent. Results: To reduce the statistical dependency between adjacent word matches, we propose to use 'spaced words', defined by patterns of 'match' and 'don't care' positions, for alignment-free sequence comparison. We describe a fast implementation of this approach using recursive hashing and bit operations, and we show that further improvements can be achieved by using multiple patterns instead of single patterns. To evaluate our approach, we use spaced-word frequencies as a basis for fast phylogeny reconstruction. Using real-world and simulated sequence data, we demonstrate that our multiple-pattern approach produces better phylogenies than approaches relying on contiguous words.
Issue Date
2014
Status
published
Publisher
Oxford Univ Press
Journal
Bioinformatics 
ISSN
1460-2059; 1367-4803

Reference

Citations


Social Media