首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
We classified the carboxylic ester hydrolases (CEHs) into families and clans by use of multiple sequence alignments, secondary structure analysis, and tertiary structure superpositions. Our work for the first time has fully established their systematic structural classification. Family members have similar primary, secondary, and tertiary structures, and their active sites and reaction mechanisms are conserved. Families may be gathered into clans by their having similar secondary and tertiary structures, even though primary structures of members of different families are not similar. CEHs were gathered from public databases by use of Basic Local Alignment Search Tool (BLAST) and divided into 91 families, with 36 families being grouped into five clans. Members of one clan have standard α/β‐hydrolase folds, while those of other two clans have similar folds but with different sequences of their β‐strands. The other two clans have members with six‐bladed β‐propeller and three‐α‐helix bundle tertiary structures. Those families not in clans have a large variety of structures or have no members with known structures. At the time of writing, the 91 families contained 321,830 primary structures and 1378 tertiary structures. From these data, we constructed an accessible database: CASTLE (CArboxylic eSTer hydroLasEs, http://www.castle.cbe.iastate.edu ).  相似文献   

2.
From the most recent Brookhaven Protein Co-ordinate Databank, 229 sequence-identical pentapeptide pairs, each found in two unrelated protein structures, were collected; 9115 such pairs differing in only one residue were also gathered. For both samples the main-chain fold was conserved about 20% of the time, despite the different atomic environments presented by the unrelated protein architectures. An analysis of the substituted residues as well as the composition of the sequence-similar pentapeptides allowed several suggestions regarding protein folding mechanisms. An examination of the most frequently observed residue substitutions and their correlation with structural changes in the oligopeptide pairs yields a possible guide for site-directed mutagenesis experiments, especially when no tertiary structural information is at hand.  相似文献   

3.
To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within alpha-helical, beta-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual alpha-helix, beta-strand, and coil states were respectively predicted at 66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins.  相似文献   

4.
Protein folding involves the formation of secondary structural elements from the primary sequence and their association with tertiary assemblies. The relation of this primary sequence to a specific folded protein structure remains a central question in structural biology. An increasing body of evidence suggests that variations in homologous sequence ranging from point mutations to substantial insertions or deletions can yield stable proteins with markedly different folds. Here we report the structural characterization of domain IV (D4) and ΔD4 (polypeptides with 222 and 160 amino acids, respectively) that differ by virtue of an N-terminal deletion of 62 amino acids (28% of the overall D4 sequence). The high-resolution crystal structures of the monomeric D4 and the dimeric ΔD4 reveal substantially different folds despite an overall conservation of secondary structure. These structures show that the formation of tertiary structures, even in extended polypeptide sequences, can be highly context dependent, and they serve as a model for structural plasticity in protein isoforms.  相似文献   

5.
Protein structure prediction by comparative modeling benefits greatly from the use of multiple sequence alignment information to improve the accuracy of structural template identification and the alignment of target sequences to structural templates. Unfortunately, this benefit is limited to those protein sequences for which at least several natural sequence homologues exist. We show here that the use of large diverse alignments of computationally designed protein sequences confers many of the same benefits as natural sequences in identifying structural templates for comparative modeling targets. A large-scale massively parallelized application of an all-atom protein design algorithm, including a simple model of peptide backbone flexibility, has allowed us to generate 500 diverse, non-native, high-quality sequences for each of 264 protein structures in our test set. PSI-BLAST searches using the sequence profiles generated from the designed sequences ("reverse" BLAST searches) give near-perfect accuracy in identifying true structural homologues of the parent structure, with 54% coverage. In 41 of 49 genomes scanned using reverse BLAST searches, at least one novel structural template (not found by the standard method of PSI-BLAST against PDB) is identified. Further improvements in coverage, through optimizing the scoring function used to design sequences and continued application to new protein structures beyond the test set, will allow this method to mature into a useful strategy for identifying distantly related structural templates.  相似文献   

6.
7.
Structural repertoire of the human VH segments.   总被引:16,自引:0,他引:16  
The VH gene segments produce the part of the VH domains of antibodies that contains the first two hypervariable regions. The sequences of 83 human VH segments with open reading frames, from several individuals, are currently known. It has been shown that these sequences are likely to form a high proportion of the total human repertoire and that an individual's gene repertoire produces about 50 VH segments with different protein sequences. In this paper we present a structural analysis of the amino acid sequences produced by the 83 segments. Particular residue patterns in the sequences of V domains imply particular main-chain conformations, canonical structures, for the hypervariable regions. We show that, in almost all cases, the residue patterns in the VH segments imply that the first hypervariable regions have one of three different canonical structures and that the second hypervariable regions have one of five different canonical structures. The different observed combinations of the canonical structures in the first and second regions means that almost all sequences have one of seven main-chain folds. We describe, in outline, structures of the antigen binding site loops produced by nearly all the VH segments. The exact specificity of the loops is produced by (1) sequence differences in their surface residues, particularly at sites near the centre of the combining site, and (2) sequence differences in the hypervariable and framework regions that modulate the relative positions of the loops.  相似文献   

8.
The globin family of protein structures was the first for which it was recognized that tertiary structure can be highly conserved even when primary sequences have diverged to a virtually undetectable level of similarity. This principle of structural inertia in molecular evolution is now evident for many other protein families. We have performed a systematic comparison of the sequences and structures of 6 representative hemoglobin subunits as diverse in origin as plants, clams, and humans. Our analysis is based on a 97-residue helical core in common to all 6 structures. Amino acid sequence identities range from 12.4% to 42.3% in pairwise comparisons, and, despite these variations, the maximal RMS deviation in alpha-carbon positions is 3.02 A. Overall, sequence similarity and structural deviation are significantly anticorrelated, with a correlation coefficient of -0.71, but for a set of structures having under 20% pairwise identity, this anticorrelation falls to -0.38, which emphasizes the weak connection between a specific sequence and the tertiary fold. There is substantial variability in structure outside the helical core, and functional characteristics of these globins also differ appreciably. Nevertheless, despite variations in detail that the sequence dissimilarities and functional differences imply, the core structures of these globins remain remarkably preserved.  相似文献   

9.
Kosloff M  Kolodny R 《Proteins》2008,71(2):891-902
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).  相似文献   

10.
In order to study protein function and activity structural data is required. Since experimental structures are available for just a small fraction of all known protein sequences, computational methods such as protein modelling can provide useful information. Over the last few decades we have predicted, with homology modelling methods, the structures for numerous proteins. In this study we assess the structural quality and validity of the biological and medical interpretations and predictions made based on the models. All the models had correct scaffolding and were ranked at least as correct or good by numerical evaluators even though the sequence identity with the template was as low as 8%. The biological explanations made based on models were well in line with experimental structures and other experimental studies. Retrospective analysis of homology models indicates the power of protein modelling when made carefully from sequence alignment to model building and refinement. Modelling can be applied to studying and predicting different kinds of biological phenomena and according to our results it can be done so with success.  相似文献   

11.
Designating amino-acid sequences that fold into a common main-chain structure as "neutral sequences" for the structure, regardless of their function or stability, we investigated the distribution of neutral sequences in protein sequence space. For four distinct target structures (alpha, beta,alpha/beta and alpha+beta types) with the same chain length of 108, we generated the respective neutral sequences by using the inverse folding technique with a knowledge-based potential function. We assumed that neutral sequences for a protein structure have Z scores higher than or equal to fixed thresholds, where thresholds are defined as the Z score for the corresponding native sequence (case 1) or much greater Z score (case 2). An exploring walk simulation suggested that the neutral sequences mapped into the sequence space were connected with each other through straight neutral paths and formed an inherent neutral network over the sequence space. Through another exploring walk simulation, we investigated contiguous regions between or among the neutral networks for the distinct protein structures and obtained the following results. The closest approach distance between the two neutral networks ranged from 5 to 29 on the Hamming distance scale, showing a linear increase against the threshold values. The sequences located at the "interchange" regions between the two neutral networks have intermediate sequence-profile-scores for both corresponding structures. Introducing a "ball" in the sequence space that contains at least one neutral sequence for each of the four structures, we found that the minimal radius of the ball that is centered at an arbitrary position ranged from 35 to 50, while the minimal radius of the ball that is centered at a certain special position ranged from 20 to 30, in the Hamming distance scale. The relatively small Hamming distances (5-30) may support an evolution mechanism by transferring from a network for a structure to another network for a more beneficial structure via the interchange regions.  相似文献   

12.
Locating sequences compatible with a protein structural fold is the well‐known inverse protein‐folding problem. While significant progress has been made, the success rate of protein design remains low. As a result, a library of designed sequences or profile of sequences is currently employed for guiding experimental screening or directed evolution. Sequence profiles can be computationally predicted by iterative mutations of a random sequence to produce energy‐optimized sequences, or by combining sequences of structurally similar fragments in a template library. The latter approach is computationally more efficient but yields less accurate profiles than the former because of lacking tertiary structural information. Here we present a method called SPIN that predicts Sequence Profiles by Integrated Neural network based on fragment‐derived sequence profiles and structure‐derived energy profiles. SPIN improves over the fragment‐derived profile by 6.7% (from 23.6 to 30.3%) in sequence identity between predicted and wild‐type sequences. The method also reduces the number of residues in low complex regions by 15.7% and has a significantly better balance of hydrophilic and hydrophobic residues at protein surface. The accuracy of sequence profiles obtained is comparable to those generated from the protein design program RosettaDesign 3.5. This highly efficient method for predicting sequence profiles from structures will be useful as a single‐body scoring term for improving scoring functions used in protein design and fold recognition. It also complements protein design programs in guiding experimental design of the sequence library for screening and directed evolution of designed sequences. The SPIN server is available at http://sparks‐lab.org . Proteins 2014; 82:2565–2573. © 2014 Wiley Periodicals, Inc.  相似文献   

13.
Cheng J  Randall A  Baldi P 《Proteins》2006,62(4):1125-1132
Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. We use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. We evaluate our approach using cross-validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy-a significant improvement over previously published results. Moreover, the experimental results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because our method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MUpro), software, and datasets are available at http://www.igb.uci.edu/servers/servers.html.  相似文献   

14.
We describe a method to identify protein domain boundaries from sequence information alone based on the assumption that hydrophobic residues cluster together in space. SnapDRAGON is a suite of programs developed to predict domain boundaries based on the consistency observed in a set of alternative ab initio three-dimensional (3D) models generated for a given protein multiple sequence alignment. This is achieved by running a distance geometry-based folding technique in conjunction with a 3D-domain assignment algorithm. The overall accuracy of our method in predicting the number of domains for a non-redundant data set of 414 multiple alignments, representing 185 single and 231 multiple-domain proteins, is 72.4 %. Using domain linker regions observed in the tertiary structures associated with each query alignment as the standard of truth, inter-domain boundary positions are delineated with an accuracy of 63.9 % for proteins comprising continuous domains only, and 35.4 % for proteins with discontinuous domains. Overall, domain boundaries are delineated with an accuracy of 51.8 %. The prediction accuracy values are independent of the pair-wise sequence similarities within each of the alignments. These results demonstrate the capability of our method to delineate domains in protein sequences associated with a wide variety of structural domain organisation.  相似文献   

15.
Two computational methods widely used in time series analysis were applied to protein sequences, and their ability to derive structural information not directly accessible through classical sequence comparisons methods was assessed. The primary structures of 19 rubredoxins of both mesophilic and thermophilic bacteria, coded with hydrophobicity values of amino acid residues, were considered as time series and were analyzed by 1) recurrence quantification analysis and 2) spectral analysis of the sequence major eigenfunctions. The results of the two methods agreed to a large extent and generated a classification consistent with known 3D structural characteristics of the studied proteins. This classification separated in a clearcut manner a thermophilic protein from mesophilic proteins. The classification of primary structures given by the two dynamical methods was demonstrated to be basically different from classification stemming from classical sequence homology metrics. Moreover, on a more detailed scale, the method was able to discriminate between thermophilic and mesophilic proteins from a set of chimeric sequences generated from the mixing of a mesophilic (Rubr Clopa) and a thermophilic (Rubr Pyrfu) protein. Overall, our results point to a new way of looking at protein sequence comparisons.  相似文献   

16.
DNA harvested directly from complex natural microbial communities by PCR has been successfully used to predict RNase P RNA structure, and can potentially provide an abundant source of information for structural predictions of other RNAs. In this study, we utilized genetic variation in natural communities to test and refine the secondary and tertiary structural model for the bacterial tmRNA. The variability of proposed tmRNA secondary structures in different organisms and the lack of any predicted tertiary structure suggested that further refinement of the tmRNA could be useful. To increase the phylogenetic representation of tmRNA sequences, and thereby provide additional data for statistical comparative analysis, we amplified, sequenced, and compared tmRNA sequences from natural microbial communities. Using primers designed from gamma proteobacterial sequences, we determined 44 new tmRNA sequences from a variety of environmental DNA samples. Covariation analyses of these sequences, along with sequences from cultured organisms, confirmed most of the proposed tmRNA model but also provided evidence for a new tertiary interaction. This approach of gathering sequence information from natural microbial communities seems generally applicable in RNA structural analysis.  相似文献   

17.
Finding structural similarities between proteins often helps reveal shared functionality, which otherwise might not be detected by native sequence information alone. Such similarity is usually detected and quantified by protein structure alignment. Determining the optimal alignment between two protein structures, however, remains a hard problem. An alternative approach is to approximate each three-dimensional protein structure using a sequence of motifs derived from a structural alphabet. Using this approach, structure comparison is performed by comparing the corresponding motif sequences or structural sequences. In this article, we measure the performance of such alphabets in the context of the protein structure classification problem. We consider both local and global structural sequences. Each letter of a local structural sequence corresponds to the best matching fragment to the corresponding local segment of the protein structure. The global structural sequence is designed to generate the best possible complete chain that matches the full protein structure. We use an alphabet of 20 letters, corresponding to a library of 20 motifs or protein fragments having four residues. We show that the global structural sequences approximate well the native structures of proteins, with an average coordinate root mean square of 0.69 Å over 2225 test proteins. The approximation is best for all α-proteins, while relatively poorer for all β-proteins. We then test the performance of four different sequence representations of proteins (their native sequence, the sequence of their secondary-structure elements, and the local and global structural sequences based on our fragment library) with different classifiers in their ability to classify proteins that belong to five distinct folds of CATH. Without surprise, the primary sequence alone performs poorly as a structure classifier. We show that addition of either secondary-structure information or local information from the structural sequence considerably improves the classification accuracy. The two fragment-based sequences perform better than the secondary-structure sequence but not well enough at this stage to be a viable alternative to more computationally intensive methods based on protein structure alignment.  相似文献   

18.
Conversion of local structural state of a protein from an α-helix to a β-strand is usually associated with a major change in the tertiary structure. Similar changes were observed during the self assembly of amyloidogenic proteins to form fibrils, which are implicated in severe diseases conditions, e.g., Alzheimer disease. Studies have emphasized that certain protein sequence fragments known as chameleon sequences do not have a strong preference for either helical or the extended conformations. Surprisingly, the information on the local sequence neighborhood can be used to predict their secondary at a high accuracy level. Here we report a large scale-analysis of chameleon sequences to estimate their propensities to be associated with different local structural states such as α -helices, β-strands and coils. With the help of the propensity information derived from the amino acid composition, we underline their complexity, as more than one quarter of them prefers coil state over to the regular secondary structures. About half of them show preference for both α-helix and β-sheet conformations and either of these two states is favored by the rest.  相似文献   

19.
Thioesterases (TEs) are classified into EC 3.1.2.1 through EC 3.1.2.27 based on their activities on different substrates, with many remaining unclassified (EC 3.1.2.–). Analysis of primary and tertiary structures of known TEs casts a new light on this enzyme group. We used strong primary sequence conservation based on experimentally proved proteins as the main criterion, followed by verification with tertiary structure superpositions, mechanisms, and catalytic residue positions, to accurately define TE families. At present, TEs fall into 23 families almost completely unrelated to each other by primary structure. It is assumed that all members of the same family have essentially the same tertiary structure; however, TEs in different families can have markedly different folds and mechanisms. Conversely, the latter sometimes have very similar tertiary structures and catalytic mechanisms despite being only slightly or not at all related by primary structure, indicating that they have common distant ancestors and can be grouped into clans. At present, four clans encompass 12 TE families. The new constantly updated ThYme (Thioester‐active enzYmes) database contains TE primary and tertiary structures, classified into families and clans that are different from those currently found in the literature or in other databases. We review all types of TEs, including those cleaving CoA, ACP, glutathione, and other protein molecules, and we discuss their structures, functions, and mechanisms.  相似文献   

20.
All acyl carrier protein primary and tertiary structures were gathered into the ThYme database. They are classified into 16 families by amino acid sequence similarity, with members of the different families having sequences with statistically highly significant differences. These classifications are supported by tertiary structure superposition analysis. Tertiary structures from a number of families are very similar, suggesting that these families may come from a single distant ancestor. Normal vibrational mode analysis was conducted on experimentally determined freestanding structures, showing greater fluctuations at chain termini and loops than in most helices. Their modes overlap more so within families than between different families. The tertiary structures of three acyl carrier protein families that lacked any known structures were predicted as well.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号