Use of a structural alphabet to find compatible folds for amino acid sequences |
| |
Authors: | Swapnil Mahajan Alexandre G de Brevern Yves‐Henri Sanejouand Narayanaswamy Srinivasan Bernard Offmann |
| |
Institution: | 1. Université de La Réunion, DSIMB, UMR‐S S1134, La Réunion, France;2. INSERM, UMR‐S 1134, DSIMB, Paris, France;3. Laboratoire d'Excellence, GR‐Ex, Paris, France;4. Université de Nantes, UFIP CNRS UMR 6286 Faculté des Sciences et Techniques, Nantes Cedex 03, France;5. Univ Paris Diderot, Sorbonne Paris Cité, UMR‐S 1134, Paris, France;6. Institut National de la Transfusion Sanguine (INTS), Paris, France;7. Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India;8. Peaccel, Inc., Cambridge, Massachusetts |
| |
Abstract: | The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence‐search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino‐acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as “Protein Blocks” (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence‐search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z‐score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales‐up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web‐server that is freely available at http://www.bo‐protscience.fr/forsa . |
| |
Keywords: | protein structures structural alphabet fold recognition protein domains threading sequence– structure relationship structural annotation protein blocks |
|
|