Local structure prediction with local structure-based sequence profiles |
| |
Authors: | Yang An-Suei Wang Lu-yong |
| |
Affiliation: | Department of Pharmacology, Columbia Genome Center, and Center for Computational Biology and Bioinformatics, Columbia University, 630 West 168th street, PH 7 W Room 318, New York, NY 10032, USA. ay1@columbia.edu |
| |
Abstract: | MOTIVATION: A large body of experimental and theoretical evidence suggests that local structural determinants are frequently encoded in short segments of protein sequence. Although the local structural information, once recognized, is particularly useful in protein structural and functional analyses, it remains a difficult problem to identify embedded local structural codes based solely on sequence information. RESULTS: In this paper, we describe a local structure prediction method aiming at predicting the backbone structures of nine-residue sequence segments. Two elements are the keys for this local structure prediction procedure. The first key element is the LSBSP1 database, which contains a large number of non-redundant local structure-based sequence profiles for nine-residue structure segments. The second key element is the consensus approach, which identifies a consensus structure from a set of hit structures. The local structure prediction procedure starts by matching a query sequence segment of nine consecutive amino acid residues to all the sequence profiles in the local structure-based sequence profile database (LSBSP1). The consensus structure, which is at the center of the largest structural cluster of the hit structures, is predicted to be the native state structure adopted by the query sequence segment. This local structure prediction method is assessed with a large set of random test protein structures that have not been used in constructing the LSBSP1 database. The benchmark results indicate that the prediction capacities of the novel local structure prediction procedure exceed the prediction capacities of the local backbone structure prediction methods based on the I-sites library by a significant margin. AVAILABILITY: All the computational and assessment procedures have been implemented in the integrated computational system PrISM.1 (Protein Informatics System for Modeling). The system and associated databases for LINUX systems can be downloaded from the website: http://www.columbia.edu/~ay1/. |
| |
Keywords: | |
本文献已被 PubMed Oxford 等数据库收录! |
|