A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen-Shannon Divergence

2016 | journal article. A publication with affiliation to the University of Göttingen.

Jump to: Cite & Linked | Documents & Media | Details | Version history

Cite this publication

​A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen-Shannon Divergence​
Dang, T. K. L.; Meckbach, C. ; Tacke, R.; Waack, S. & Gueltas, M.​ (2016) 
Entropy18(10) art. 379​.​ DOI: https://doi.org/10.3390/e18100379 

Documents & Media

entropy-18-00379.pdf1.17 MBAdobe PDF

License

Published Version

Attribution 4.0 CC BY 4.0

Details

Authors
Dang, Truong Khanh Linh; Meckbach, Cornelia ; Tacke, Rebecca; Waack, Stephan; Gueltas, Mehmet
Abstract
The knowledge of protein-DNA interactions is essential to fully understand the molecular activities of life. Many research groups have developed various tools which are either structure-or sequence-based approaches to predict the DNA-binding residues in proteins. The structure-based methods usually achieve good results, but require the knowledge of the 3D structure of protein; while sequence-based methods can be applied to high-throughput of proteins, but require good features. In this study, we present a new information theoretic feature derived from Jensen-Shannon Divergence (JSD) between amino acid distribution of a site and the background distribution of non-binding sites. Our new feature indicates the difference of a certain site from a non-binding site, thus it is informative for detecting binding sites in proteins. We conduct the study with a five-fold cross validation of 263 proteins utilizing the Random Forest classifier. We evaluate the functionality of our new features by combining them with other popular existing features such as position-specific scoring matrix (PSSM), orthogonal binary vector (OBV), and secondary structure (SS). We notice that by adding our features, we can significantly boost the performance of Random Forest classifier, with a clear increment of sensitivity and Matthews correlation coefficient (MCC).
Issue Date
2016
Status
published
Publisher
Mdpi Ag
Journal
Entropy 
ISSN
1099-4300
Sponsor
Open-Access-Publikationsfonds 2016

Reference

Citations


Social Media