Combining features in a graphical model to predict protein binding sites

Wierschin, Torsten; Wang, K.; Welter, Marlon; Waack, Stephan; Stanke, Mario

Combining features in a graphical model to predict protein binding sites

2015 | journal article. A publication with affiliation to the University of Göttingen.

Jump to: Cite & Linked | Documents & Media | Details | Version history

Cite this publication

Combining features in a graphical model to predict protein binding sites
Wierschin, T.; Wang, K.; Welter, M.; Waack, S. & Stanke, M. (2015)
Proteins Structure Function and Bioinformatics, 83(5) pp. 844-852. DOI: https://doi.org/10.1002/prot.24775

Copy

GRO View APA Chicago MLA Vancouver

Citable link

GRO.publications Link

Further links

Documents & Media

License

GRO License

Details

Authors: Wierschin, Torsten; Wang, K.; Welter, Marlon; Waack, Stephan; Stanke, Mario
Abstract: Large efforts have been made in classifying residues as binding sites in proteins using machine learning methods. The prediction task can be translated into the computational challenge of assigning each residue the label binding site or non-binding site. Observational data comes from various possibly highly correlated sources. It includes the structure of the protein but not the structure of the complex. The model class of conditional random fields (CRFs) has previously successfully been used for protein binding site prediction. Here, a new CRF-approach is presented that models the dependencies of residues using a general graphical structure defined as a neighborhood graph and thus our model makes fewer independence assumptions on the labels than sequential labeling approaches. A novel node feature change in free energy is introduced into the model, which is then denoted by F-CRF. Parameters are trained with an online large-margin algorithm. Using the standard feature class relative accessible surface area alone, the general graph-structure CRF already achieves higher prediction accuracy than the linear chain CRF of Li et al. F-CRF performs significantly better on a large range of false positive rates than the support-vector-machine-based program PresCont of Zellner et al. on a homodimer set containing 128 chains. F-CRF has a broader scope than PresCont since it is not constrained to protein subgroups and requires no multiple sequence alignment. The improvement is attributed to the advantageous combination of the novel node feature with the standard feature and to the adopted parameter training method. Proteins 2015; 83:844-852. (c) 2015 Wiley Periodicals, Inc.
Issue Date: 2015
Status: published
Publisher: Wiley-blackwell
Journal: Proteins Structure Function and Bioinformatics
ISSN: 1097-0134; 0887-3585

Export Metadata

Refman EndNote BibTeX RefWorks Excel CSV

Combining features in a graphical model to predict protein binding sites

Cite this publication

Citable link

Further links

Documents & Media

License

Details

Export Metadata

Reference

Citations

Social Media