Fast alignment-free sequence comparison using spaced-word frequencies

2014 | Zeitschriftenartikel. Eine Publikation mit Affiliation zur Georg-August-Universität Göttingen.

Spring zu: Zitieren & Links | Dokumente & Medien | Details | Versionsgeschichte

Zitiervorschlag

​Fast alignment-free sequence comparison using spaced-word frequencies​
Leimeister, C.-A.; Boden, M.; Horwege, S.; Lindner, S. & Morgenstern, B. ​ (2014) 
Bioinformatics30(14) pp. 1991​-1999​.​ DOI: https://doi.org/10.1093/bioinformatics/btu177 

Dokumente & Medien

1991.full.pdf1.2 MBAdobe PDFSupplement168 kBAdobe PDF

Lizenz

Published Version

Attribution 3.0 CC BY 3.0

Details

Autor(en)
Leimeister, Chris-Andre; Boden, Marcus; Horwege, Sebastian; Lindner, Sebastian; Morgenstern, Burkhard 
Zusammenfassung
Motivation: Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free approaches work by comparing the word composition of sequences. A well-known problem with these methods is that neighbouring word matches are far from independent. Results: To reduce the statistical dependency between adjacent word matches, we propose to use 'spaced words', defined by patterns of 'match' and 'don't care' positions, for alignment-free sequence comparison. We describe a fast implementation of this approach using recursive hashing and bit operations, and we show that further improvements can be achieved by using multiple patterns instead of single patterns. To evaluate our approach, we use spaced-word frequencies as a basis for fast phylogeny reconstruction. Using real-world and simulated sequence data, we demonstrate that our multiple-pattern approach produces better phylogenies than approaches relying on contiguous words.
Erscheinungsdatum
2014
Status
published
Herausgeber
Oxford Univ Press
Zeitschrift
Bioinformatics 
ISSN
1460-2059; 1367-4803

Export Metadaten

Referenzen

Zitationen


Social Media