期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An analysis of core deformations in protein superfamilies

Leo-Macias A Lopez-Romero P Lupyan D Zerbino D Ortiz AR 《Biophysical journal》2005,88(2):1291-1299

An analysis is presented on how structural cores modify their shape across homologous proteins, and whether or not a relationship exists between these structural changes and the vibrational normal modes that proteins experience as a result of the topological constraints imposed by the fold. A set of 35 representative, well-populated protein families is studied. The evolutionary directions of deformation are obtained by using multiple structural alignments to superimpose the structures and extract a conserved core, together with principal components analysis to extract the main deformation modes from the three-dimensional superimposition. In parallel, a low-resolution normal mode analysis technique is employed to study the properties of the mechanical core plasticity of these same families. We show that the evolutionary deformations span a low dimensional space of 4-5 dimensions on average. A statistically significant correspondence exists between these principal deformations and the approximately 20 slowest vibrational modes accessible to a particular topology. We conclude that, to a significant extent, the structural response of a protein topology to sequence changes takes place by means of collective deformations along combinations of a small number of low-frequency modes. The findings have implications in structure prediction by homology modeling. 相似文献

2.

Evolutionary plasticity of protein families: coupling between sequence and structure variation

Panchenko AR Wolf YI Panchenko LA Madej T 《Proteins》2005,61(3):535-544

In this work we examine how protein structural changes are coupled with sequence variation in the course of evolution of a family of homologs. The sequence-structure correlation analysis performed on 81 homologous protein families shows that the majority of them exhibit statistically significant linear correlation between the measures of sequence and structural similarity. We observed, however, that there are cases where structural variability cannot be mainly explained by sequence variation, such as protein families with a number of disulfide bonds. To understand whether structures from different families and/or folds evolve in the same manner, we compared the degrees of structural change per unit of sequence change ("the evolutionary plasticity of structure") between those families with a significant linear correlation. Using rigorous statistical procedures we find that, with a few exceptions, evolutionary plasticity does not show a statistically significant difference between protein families. Similar sequence-structure analysis performed for protein loop regions shows that evolutionary plasticity of loop regions is greater than for the protein core. 相似文献

3.

The S. cerevisiae architectural HMGB protein NHP6A complexed with DNA: DNA and protein conformational changes upon binding

Masse JE Wong B Yen YM Allain FH Johnson RC Feigon J 《Journal of molecular biology》2002,323(2):263-284

NHP6A is a non-sequence-specific DNA-binding protein from Saccharomyces cerevisiae which belongs to the HMGB protein family. Previously, we have solved the structure of NHP6A in the absence of DNA and modeled its interaction with DNA. Here, we present the refined solution structures of the NHP6A-DNA complex as well as the free 15bp DNA. Both the free and bound forms of the protein adopt the typical L-shaped HMGB domain fold. The DNA in the complex undergoes significant structural rearrangement from its free form while the protein shows smaller but significant conformational changes in the complex. Structural and mutational analysis as well as comparison of the complex with the free DNA provides insight into the factors that contribute to binding site selection and DNA deformations in the complex. Further insight into the amino acid determinants of DNA binding by HMGB domain proteins is given by a correlation study of NHP6A and 32 other HMGB domains belonging to both the DNA-sequence-specific and non-sequence-specific families of HMGB proteins. The resulting correlations can be rationalized by comparison of solved structures of HMGB proteins. 相似文献

4.

Pre-existing soft modes of motion uniquely defined by native contact topology facilitate ligand binding to proteins

Meireles L Gur M Bakan A Bahar I 《Protein science : a publication of the Protein Society》2011,20(10):1645-1658

Modeling protein flexibility constitutes a major challenge in accurate prediction of protein-ligand and protein-protein interactions in docking simulations. The lack of a reliable method for predicting the conformational changes relevant to substrate binding prevents the productive application of computational docking to proteins that undergo large structural rearrangements. Here, we examine how coarse-grained normal mode analysis has been advantageously applied to modeling protein flexibility associated with ligand binding. First, we highlight recent studies that have shown that there is a close agreement between the large-scale collective motions of proteins predicted by elastic network models and the structural changes experimentally observed upon ligand binding. Then, we discuss studies that have exploited the predicted soft modes in docking simulations. Two general strategies are noted: pregeneration of conformational ensembles that are then utilized as input for standard fixed-backbone docking and protein structure deformation along normal modes concurrent to docking. These studies show that the structural changes apparently "induced" upon ligand binding occur selectively along the soft modes accessible to the protein prior to ligand binding. They further suggest that proteins offer suitable means of accommodating/facilitating the recognition and binding of their ligand, presumably acquired by evolutionary selection of the suitable three-dimensional structure. 相似文献

5.

Walking through the protein sequence space: towards new generation of the homology modeling

Frenkel ZM Trifonov EN 《Proteins》2007,67(2):271-284

A new method is proposed to reveal apparent evolutionary relationships between protein fragments with similar 3D structures by finding "intermediate" sequences in the proteomic database. Instead of looking for homologies and intermediates for a whole protein domain, we build a chain of intermediate short sequences, which allows one to link similar structural modules of proteins belonging to the same or different families. Several such chains of intermediates can be combined into an evolutionary tree of structural protein modules. All calculations were made for protein fragments of 20 aa residues. Three evolutionary trees for different module structures are described. The aim of the paper is to introduce the new method and to demonstrate its potential for protein structural predictions. The approach also opens new perspectives for protein evolution studies. 相似文献

6.

The use of soluble protein structures in modeling helical proteins in a layered membrane

Hong Wing Lee Hong Ching Lee Lawrence K. Lee Erdahl T. Teber 《Journal of biomolecular structure & dynamics》2013,31(2):308-318

Major advances have been made in the prediction of soluble protein structures, led by the knowledge-based modeling methods that extract useful structural trends from known protein structures and incorporate them into scoring functions. The same cannot be reported for the class of transmembrane proteins, primarily due to the lack of high-resolution structural data for transmembrane proteins, which render many of the knowledge-based method unreliable or invalid. We have developed a method that harnesses the vast structural knowledge available in soluble protein data for use in the modeling of transmembrane proteins. At the core of the method, a set of transmembrane protein decoy sets that allow us to filter and train features recognized from soluble proteins for transmembrane protein modeling into a set of scoring functions. We have demonstrated that structures of soluble proteins can provide significant insight into transmembrane protein structures. A complementary novel two-stage modeling/selection process that mimics the two-stage helical membrane protein folding was developed. Combined with the scoring function, the method was successfully applied to model 5 transmembrane proteins. The root mean square deviations of the predicted models ranged from 5.0 to 8.8?Å to the native structures. 相似文献

7.

The combined effects of amino acid substitutions and indels on the evolution of structure within protein families

Zhang Z Wang Y Wang L Gao P 《PloS one》2010,5(12):e14316

Background

In the process of protein evolution, sequence variations within protein families can cause changes in protein structures and functions. However, structures tend to be more conserved than sequences and functions. This leads to an intriguing question: what is the evolutionary mechanism by which sequence variations produce structural changes? To investigate this question, we focused on the most common types of sequence variations: amino acid substitutions and insertions/deletions (indels). Here their combined effects on protein structure evolution within protein families are studied.

Results

Sequence-structure correlation analysis on 75 homologous structure families (from SCOP) that contain 20 or more non-redundant structures shows that in most of these families there is, statistically, a bilinear correlation between the amount of substitutions and indels versus the degree of structure variations. Bilinear regression of percent sequence non-identity (PNI) and standardized number of gaps (SNG) versus RMSD was performed. The coefficients from the regression analysis could be used to estimate the structure changes caused by each unit of substitution (structural substitution sensitivity, SSS) and by each unit of indel (structural indel sensitivity, SIDS). An analysis on 52 families with high bilinear fitting multiple correlation coefficients and statistically significant regression coefficients showed that SSS is mainly constrained by disulfide bonds, which almost have no effects on SIDS.

Conclusions

Structural changes in homologous protein families could be rationally explained by a bilinear model combining amino acid substitutions and indels. These results may further improve our understanding of the evolutionary mechanisms of protein structures. 相似文献

8.

Acyl carrier protein structural classification and normal mode analysis

Cantu DC Forrester MJ Charov K Reilly PJ 《Protein science : a publication of the Protein Society》2012,21(5):655-666

All acyl carrier protein primary and tertiary structures were gathered into the ThYme database. They are classified into 16 families by amino acid sequence similarity, with members of the different families having sequences with statistically highly significant differences. These classifications are supported by tertiary structure superposition analysis. Tertiary structures from a number of families are very similar, suggesting that these families may come from a single distant ancestor. Normal vibrational mode analysis was conducted on experimentally determined freestanding structures, showing greater fluctuations at chain termini and loops than in most helices. Their modes overlap more so within families than between different families. The tertiary structures of three acyl carrier protein families that lacked any known structures were predicted as well. 相似文献

9.

Three-dimensional structures of membrane proteins from genomic sequencing 总被引：1，自引：0，他引：1

Hopf TA Colwell LJ Sheridan R Rost B Sander C Marks DS 《Cell》2012,149(7):1607-1621

We show that amino acid covariation in proteins, extracted from the evolutionary sequence record, can be used to fold transmembrane proteins. We use this technique to predict previously unknown 3D structures for 11 transmembrane proteins (with up to 14 helices) from their sequences alone. The prediction method (EVfold_membrane) applies a maximum entropy approach to infer evolutionary covariation in pairs of sequence positions within a protein family and then generates all-atom models with the derived pairwise distance constraints. We benchmark the approach with blinded de novo computation of known transmembrane protein structures from 23 families, demonstrating unprecedented accuracy of the method for large transmembrane proteins. We show how the method can predict oligomerization, functional sites, and conformational changes in transmembrane proteins. With the rapid rise in large-scale sequencing, more accurate and more comprehensive information on evolutionary constraints can be decoded from genetic variation, greatly expanding the repertoire of transmembrane proteins amenable to modeling by this method. 相似文献

10.

Classification of 29 families of secondary transport proteins into a single structural class using hydropathy profile analysis

Lolkema JS Slotboom DJ 《Journal of molecular biology》2003,327(5):901-909

A classification scheme for membrane proteins is proposed that clusters families of proteins into structural classes based on hydropathy profile analysis. The averaged hydropathy profiles of protein families are taken as fingerprints of the 3D structure of the proteins and, therefore, are able to detect more distant evolutionary relationships than amino acid sequences. A procedure was developed in which hydropathy profile analysis is used initially as a filter in a BLAST search of the NCBI protein database. The strength of the procedure is demonstrated by the classification of 29 families of secondary transporters into a single structural class, termed ST[3]. An exhaustive search of the database revealed that the 29 families contain 568 unique sequences. The proteins are predominantly from prokaryotic origin and most of the characterized transporters in ST[3] transport organic and inorganic anions and a smaller number are Na(+)/H(+) antiporters. All modes of energy coupling (symport, antiport, uniport) are found in structural class ST[3]. The relevance of the classification for structure/function prediction of uncharacterised transporters in the class is discussed. 相似文献

11.

Emergence of protein fold families through rational design

下载免费PDF全文

Ding F Dokholyan NV 《PLoS computational biology》2006,2(7):e85

Diverse proteins with similar structures are grouped into families of homologs and analogs, if their sequence similarity is higher or lower, respectively, than 20%–30%. It was suggested that protein homologs and analogs originate from a common ancestor and diverge in their distinct evolutionary time scales, emerging as a consequence of the physical properties of the protein sequence space. Although a number of studies have determined key signatures of protein family organization, the sequence-structure factors that differentiate the two evolution-related protein families remain unknown. Here, we stipulate that subtle structural changes, which appear due to accumulating mutations in the homologous families, lead to distinct packing of the protein core and, thus, novel compositions of core residues. The latter process leads to the formation of distinct families of homologs. We propose that such differentiation results in the formation of analogous families. To test our postulate, we developed a molecular modeling and design toolkit, Medusa, to computationally design protein sequences that correspond to the same fold family. We find that analogous proteins emerge when a backbone structure deviates only 1–2 Å root-mean-square deviation from the original structure. For close homologs, core residues are highly conserved. However, when the overall sequence similarity drops to ~25%–30%, the composition of core residues starts to diverge, thereby forming novel families of protein homologs. This direct observation of the formation of protein homologs within a specific fold family supports our hypothesis. The conservation of amino acids in designed sequences recapitulates that of the naturally occurring sequences, thereby validating our computational design methodology. 相似文献

12.

Quantitative organization of the known protein x-ray structures. I. Methods and short-length-scale results 总被引：5，自引：0，他引：5

S Rackovsky 《Proteins》1990,7(4):378-402

We address herein the problem of delineating the relationships between the known protein structures. In order to study this problem, methods have been developed to represent arbitrarily sized fragments of biopolymer backbone, and to compare distributions of such fragments. These methods are applied to a classification of 123 structures representing the entire set of known x-ray structures. The resulting data are analyzed (on the four-C alpha length scale) to determine both the large-scale organization of the set of known structures (i.e., the relationships between large groups of structures, each comprised of proteins that are structurally related) and its local structure (i.e., the quantitative degree of similarity between any two specific structures). It is shown that the set of structures forms a continuum of structural types, ranging from all-helical to all-sheet/barrel proteins. It is further demonstrated that the density of protein structures is not uniform across this continuum, but rather that structures cluster in certain regions, separated by regions of lower population. The properties of the various regions of the structural space are determined. The existence is demonstrated of strong quantitative correlations between the contents of different types of four-C alpha fragments within protein structures, which imply significant constraints on the types of architecture that can occur in proteins. Analysis of the distribution of structures demonstrates some hitherto unsuspected similarities and suggests that, in some circumstances, neither structural similarity nor sequence homology may be necessary conditions for evolutionary relationship between proteins. It is also suggested that these unsuspected similarities may imply similar folding mechanisms for structures of apparently different global architecture. Cases are also noted in which apparently similar structures may fold by different mechanisms. The connection between structure and dynamic properties is discussed, and a possible role of dynamics in the evolution of protein structures is suggested. The sensitivity of the methods presented herein to anomalies of structure refinement is demonstrated. It is suggested that the present results provide a framework for analyzing experimental results on structural similarity obtained using vibrational circular dichroism spectra, which are sensitive to local backbone structure. 相似文献

13.

Exploiting protein structure data to explore the evolution of protein function and biological complexity

Marsden RL Ranea JA Sillero A Redfern O Yeats C Maibaum M Lee D Addou S Reeves GA Dallman TJ Orengo CA 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2006,361(1467):425-440

New directions in biology are being driven by the complete sequencing of genomes, which has given us the protein repertoires of diverse organisms from all kingdoms of life. In tandem with this accumulation of sequence data, worldwide structural genomics initiatives, advanced by the development of improved technologies in X-ray crystallography and NMR, are expanding our knowledge of structural families and increasing our fold libraries. Methods for detecting remote sequence similarities have also been made more sensitive and this means that we can map domains from these structural families onto genome sequences to understand how these families are distributed throughout the genomes and reveal how they might influence the functional repertoires and biological complexities of the organisms. We have used robust protocols to assign sequences from completed genomes to domain structures in the CATH database, allowing up to 60% of domain sequences in these genomes, depending on the organism, to be assigned to a domain family of known structure. Analysis of the distribution of these families throughout bacterial genomes identified more than 300 universal families, some of which had expanded significantly in proportion to genome size. These highly expanded families are primarily involved in metabolism and regulation and appear to make major contributions to the functional repertoire and complexity of bacterial organisms. When comparisons are made across all kingdoms of life, we find a smaller set of universal domain families (approx. 140), of which families involved in protein biosynthesis are the largest conserved component. Analysis of the behaviour of other families reveals that some (e.g. those involved in metabolism, regulation) have remained highly innovative during evolution, making it harder to trace their evolutionary ancestry. Structural analyses of metabolic families provide some insights into the mechanisms of functional innovation, which include changes in domain partnerships and significant structural embellishments leading to modulation of active sites and protein interactions. 相似文献

14.

Folding studies of immunoglobulin-like beta-sandwich proteins suggest that they share a common folding pathway.

J Clarke E Cota S B Fowler S J Hamill 《Structure (London, England : 1993)》1999,7(9):1145-1153

BACKGROUND: Are folding pathways conserved in protein families? To test this explicitly and ask to what extent structure specifies folding pathways requires comparison of proteins with a common fold. Our strategy is to choose members of a highly diverse protein family with no conservation of function and little or no sequence identity, but with structures that are essentially the same. The immunoglobulin-like fold is one of the most common structural families, and is subdivided into superfamilies with no detectable evolutionary or functional relationship. RESULTS: We compared the folding of a number of immunoglobulin-like proteins that have a common structural core and found a strong correlation between folding rate and stability. The results suggest that the folding pathways of these immunoglobulin-like proteins share common features. CONCLUSIONS: This study is the first to compare the folding of structurally related proteins that are members of different superfamilies. The most likely explanation for the results is that interactions that are important in defining the structure of immunoglobulin-like proteins are also used to guide folding. 相似文献

15.

S100-annexin complexes--structural insights

Rintala-Dempsey AC Rezvanpour A Shaw GS 《The FEBS journal》2008,275(20):4956-4966

Annexins and S100 proteins represent two large, but distinct, calcium-binding protein families. Annexins are made up of a highly alpha-helical core domain that binds calcium ions, allowing them to interact with phospholipid membranes. Furthermore, some annexins, such as annexins A1 and A2, contain an N-terminal region that is expelled from the core domain on calcium binding. These events allow for the interaction of the annexin N-terminus with target proteins, such as S100. In addition, when an S100 protein binds calcium ions, it undergoes a structural reorientation of its helices, exposing a hydrophobic patch capable of interacting with its targets, including the N-terminal sequences of annexins. Structural studies of the complexes between members of these two families have revealed valuable details regarding the mechanisms of the interactions, including the binding surfaces and conformation of the annexin N-terminus. However, other S100-annexin interactions, such as those between S100A11 and annexin A6, or between dicalcin and annexins A1, A2 and A5, appear to be more complicated, involving the annexin core region, perhaps in concert with the N-terminus. The diversity of these interactions indicates that multiple forms of recognition exist between S100 proteins and annexins. S100-annexin interactions have been suggested to play a role in membrane fusion events by the bridging together of two annexin proteins, bound to phospholipid membranes, by an S100 protein. The structures and differential interactions of S100-annexin complexes may indicate that this process has several possible modes of protein-protein recognition. 相似文献

16.

The Structural Determinants of Intra-Protein Compensatory Substitutions

Shilpi Chaurasia Julien Y. Dutheil 《Molecular biology and evolution》2022,39(4)

Compensatory substitutions happen when one mutation is advantageously selected because it restores the loss of fitness induced by a previous deleterious mutation. How frequent such mutations occur in evolution and what is the structural and functional context permitting their emergence remain open questions. We built an atlas of intra-protein compensatory substitutions using a phylogenetic approach and a dataset of 1,630 bacterial protein families for which high-quality sequence alignments and experimentally derived protein structures were available. We identified more than 51,000 positions coevolving by the mean of predicted compensatory mutations. Using the evolutionary and structural properties of the analyzed positions, we demonstrate that compensatory mutations are scarce (typically only a few in the protein history) but widespread (the majority of proteins experienced at least one). Typical coevolving residues are evolving slowly, are located in the protein core outside secondary structure motifs, and are more often in contact than expected by chance, even after accounting for their evolutionary rate and solvent exposure. An exception to this general scheme is residues coevolving for charge compensation, which are evolving faster than noncoevolving sites, in contradiction with predictions from simple coevolutionary models, but similar to stem pairs in RNA. While sites with a significant pattern of coevolution by compensatory mutations are rare, the comparative analysis of hundreds of structures ultimately permits a better understanding of the link between the three-dimensional structure of a protein and its fitness landscape. 相似文献

17.

Resonance Raman spectra of plastocyanin and pseudoazurin: evidence for conserved cysteine ligand conformations in cupredoxins (blue copper proteins) 总被引：1，自引：0，他引：1

J Han E T Adman T Beppu R Codd H C Freeman L L Huq T M Loehr J Sanders-Loehr 《Biochemistry》1991,30(45):10904-10913

New resonance Raman (RR) spectra at 15 K are reported for poplar (Populus nigra) and oleander (Oleander nerium) plastocyanins and for Alcaligenes faecalis pseudoazurin. The spectra are compared with those of other blue copper proteins (cupredoxins). In all cases, nine or more vibrational modes between 330 and 460 cm-1 can be assigned to a coupling of the Cu-S(Cys) stretch with Cys ligand deformations. The fact that these vibrations occur at a relatively constant set of frequencies is testimony to the highly conserved ground-state structure of the Cu-Cys moiety. Shifts of the vibrational modes by 1-3 cm-1 upon deuterium exchange can be correlated with N-H...S hydrogen bonds from the protein backbone to the sulfur of the Cys ligand. There is marked variability in the intensities of these Cys-related vibrations, such that each class of cupredoxin has its own pattern of RR intensities. For example, plastocyanins from poplar, oleander, French bean, and spinach have their most intense feature at approximately 425 cm-1; azurins show greatest intensity at approximately 410 cm-1, stellacyanin and ascorbate oxidase at approximately 385 cm-1, and nitrite reductase at approximately 360 cm-1. These variable intensity patterns are related to differences in the electronic excited-state structures. We propose that they have a basis in the protein environment of the copper-cysteinate chromophore. A further insight into the vibrational spectra is provided by the structures of the six cupredoxins for which crystallographic refinements at high resolution are available (plastocyanins from P. nigra, O. nerium, and Enteromorpha prolifera, pseudoazurin from A. faecalis, azurin from Alcaligenes denitrificans, and cucumber basic blue protein). The average of the Cu-S(Cys) bond lengths is 2.12 +/- 0.05 A. Since the observed range of bond lengths falls within the precision of the determinations, this variation is considered insignificant. The Cys ligand dihedral angles are also highly conserved. Cu-S gamma-C beta-C alpha is always near -170 degrees and S gamma-C beta-C alpha-N near 170 degrees. As a result, the Cu-S gamma bond is coplanar with the Cys side-chain atoms and part of the polypeptide backbone. The coplanarity accounts for the extensive coupling of Cu-S stretching and Cys deformation modes as seen in the RR spectrum. The conservation of this copper-cysteinate conformation in cupredoxins may indicate a favored pathway for electron transfer. 相似文献

18.

The role of internal duplication in the evolution of multi-domain proteins

J.C. Nacher M. Hayashida T. Akutsu 《Bio Systems》2010,101(2):127-135

Many proteins consist of several structural domains. These multi-domain proteins have likely been generated by selective genome growth dynamics during evolution to perform new functions as well as to create structures that fold on a biologically feasible time scale. Domain units frequently evolved through a variety of genetic shuffling mechanisms. Here we examine the protein domain statistics of more than 1000 organisms including eukaryotic, archaeal and bacterial species. The analysis extends earlier findings on asymmetric statistical laws for proteome to a wider variety of species. While proteins are composed of a wide range of domains, displaying a power-law decay, the computation of domain families for each protein reveals an exponential distribution, characterizing a protein universe composed of a thin number of unique families. Structural studies in proteomics have shown that domain repeats, or internal duplicated domains, represent a small but significant fraction of genome. In spite of its importance, this observation has been largely overlooked until recently. We model the evolutionary dynamics of proteome and demonstrate that these distinct distributions are in fact rooted in an internal duplication mechanism. This process generates the contemporary protein structural domain universe, determines its reduced thickness, and tames its growth. These findings have important implications, ranging from protein interaction network modeling to evolutionary studies based on fundamental mechanisms governing genome expansion. 相似文献

19.

Multiple flexible structure alignment using partial order graphs 总被引：2，自引：0，他引：2

Ye Y Godzik A 《Bioinformatics (Oxford, England)》2005,21(10):2362-2369

MOTIVATION: Existing comparisons of protein structures are not able to describe structural divergence and flexibility in the structures being compared because they focus on identifying a common invariant core and ignore parts of the structures outside this core. Understanding the structural divergence and flexibility is critical for studying the evolution of functions and specificities of proteins. RESULTS: A new method of multiple protein structure alignment, POSA (Partial Order Structure Alignment), was developed using a partial order graph representation of multiple alignments. POSA has two unique features: (1) identifies and classifies regions that are conserved only in a subset of input structures and (2) allows internal rearrangements in protein structures. POSA outperforms other programs in the cases where structural flexibilities exist and provides new insights by visualizing the mosaic nature of multiple structural alignments. POSA is an ideal tool for studying the variation of protein structures within diverse structural families. AVAILABILITY: POSA is freely available for academic users on a Web server at http://fatcat.burnham.org/POSA 相似文献

20.

Sequence and hydropathy profile analysis of two classes of secondary transporters

Lolkema JS Slotboom DJ 《Molecular membrane biology》2005,22(3):177-189

A structural class in the MemGen classification of membrane proteins is a set of evolutionary related proteins sharing a similar global fold. A structural class contains both closely related pairs of proteins for which homology is clear from sequence comparison and very distantly related pairs, for which it is not possible to establish homology based on sequence similarity alone. In the latter case the evolutionary link is based on hydropathy profile analysis. Here, we use these evolutionary related sets of proteins to analyze the relationship between E-values in BLAST searches, sequence similarities in multiple sequence alignments and structural similarities in hydropathy profile analyses. Two structural classes of secondary transporters termed ST[3], which includes the Ion Transporter (IT) superfamily and ST[4], which includes the DAACS family (TC# 2.A.23) were extracted from the NCBI protein database. ST[3] contains 2051 unique sequences distributed over 32 families and 59 subfamilies. ST[4] is a smaller class containing 399 unique sequences distributed over 2 families and 7 subfamilies. One subfamily in ST[4] contains a new class of binding protein dependent secondary transporters. Comparison of the averaged hydropathy profiles of the subfamilies in ST[3] and ST[4] revealed that the two classes represent different folds. Divergence of the sequences in ST[4] is much smaller than observed in ST[3], suggesting different constraints on the proteins during evolution. Analysis of the correlation between the evolutionary relationship of pairs of proteins in a class and the BLAST E-value revealed that: (i) the BLAST algorithm is unable to pick up the majority of the links between proteins in structural class ST[3], (ii) "low complexity filtering" and "composition based statistics" improve the specificity, but strongly reduce the sensitivity of BLAST searches for distantly related proteins, indicating that these filters are too stringent for the proteins analyzed, and (iii) the E-value cut-off, which may be used to evaluate evolutionary significance of a hit in a BLAST search is very different for the two structural classes of membrane proteins. 相似文献