首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
The availability of fast and robust algorithms for protein structure comparison provides an opportunity to produce a database of three-dimensional comparisons, called families of structurally similar proteins (FSSP). The database currently contains an extended structural family for each of 154 representative (below 30% sequence identity) protein chains. Each data set contains: the search structure; all its relatives with 70-30% sequence identity, aligned structurally; and all other proteins from the representative set that contain substructures significantly similar to the search structure. Very close relatives (above 70% sequence identity) rarely have significant structural differences and are excluded. The alignments of remote relatives are the result of pairwise all-against-all structural comparisons in the set of 154 representative protein chains. The comparisons were carried out with each of three novel automatic algorithms that cover different aspects of protein structure similarity. The user of the database has the choice between strict rigid-body comparisons and comparisons that take into account interdomain motion or geometrical distortions; and, between comparisons that require strictly sequential ordering of segments and comparisons, which allow altered topology of loop connections or chain reversals. The data sets report the structurally equivalent residues in the form of a multiple alignment and as a list of matching fragments to facilitate inspection by three-dimensional graphics. If substructures are ignored, the result is a database of structure alignments of full-length proteins, including those in the twilight zone of sequence similarity.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

Detection of homologous proteins by an intermediate sequence search   总被引:2,自引:0,他引:2  
We developed a variant of the intermediate sequence search method (ISS(new)) for detection and alignment of weakly similar pairs of protein sequences. ISS(new) relates two query sequences by an intermediate sequence that is potentially homologous to both queries. The improvement was achieved by a more robust overlap score for a match between the queries through an intermediate. The approach was benchmarked on a data set of 2369 sequences of known structure with insignificant sequence similarity to each other (BLAST E-value larger than 0.001); 2050 of these sequences had a related structure in the set. ISS(new) performed significantly better than both PSI-BLAST and a previously described intermediate sequence search method. PSI-BLAST could not detect correct homologs for 1619 of the 2369 sequences. In contrast, ISS(new) assigned a correct homolog as the top hit for 121 of these 1619 sequences, while incorrectly assigning homologs for only nine targets; it did not assign homologs for the remainder of the sequences. By estimate, ISS(new) may be able to assign the folds of domains in approximately 29,000 of the approximately 500,000 sequences unassigned by PSI-BLAST, with 90% specificity (1 - false positives fraction). In addition, we show that the 15 alignments with the most significant BLAST E-values include the nearly best alignments constructed by ISS(new).  相似文献   

Ochagavía ME  Wodak S 《Proteins》2004,55(2):436-454
MALECON is a progressive combinatorial procedure for multiple alignments of protein structures. It searches a library of pairwise alignments for all three-protein alignments in which a specified number of residues is consistently aligned. These alignments are progressively expanded to include additional proteins and more spatially equivalent residues, subject to certain criteria. This action involves superimposing the aligned proteins by their hitherto equivalent residues and searching for additional Calpha atoms that lie close in space. The performance of MALECON is illustrated and compared with several extant multiple structure alignment methods by using as test the globin homologous superfamily, the OB and the Jellyrolls folds. MALECON gives better definitions of the common structural features in the structurally more diverse proteins of the OB and Jellyrolls folds, but it yields comparable results for the more similar globins. When no consistent multiple alignments can be derived for all members of a protein group, our procedure is still capable of automatically generating consistent alignments and common core definitions for subgroups of the members. This finding is illustrated for proteins of the OB fold and SH3 domains, believed to share common structural features, and should be very instrumental in homology modeling and investigations of protein evolution.  相似文献   

R B Russell  G J Barton 《Proteins》1992,14(2):309-323
An algorithm is presented for the accurate and rapid generation of multiple protein sequence alignments from tertiary structure comparisons. A preliminary multiple sequence alignment is performed using sequence information, which then determines an initial superposition of the structures. A structure comparison algorithm is applied to all pairs of proteins in the superimposed set and a similarity tree calculated. Multiple sequence alignments are then generated by following the tree from the branches to the root. At each branchpoint of the tree, a structure-based sequence alignment and coordinate transformations are output, with the multiple alignment of all structures output at the root. The algorithm encoded in STAMP (STructural Alignment of Multiple Proteins) is shown to give alignments in good agreement with published structural accounts within the dehydrogenase fold domains, globins, and serine proteinases. In order to reduce the need for visual verification, two similarity indices are introduced to determine the quality of each generated structural alignment. Sc quantifies the global structural similarity between pairs or groups of proteins, whereas Pij' provides a normalized measure of the confidence in the alignment of each residue. STAMP alignments have the quality of each alignment characterized by Sc and Pij' values and thus provide a reproducible resource for studies of residue conservation within structural motifs.  相似文献   

Manfred J. Sippl 《Proteins》1993,17(4):355-362
A major problem in the determination of the three-dimensional structure of proteins concerns the quality of the structural models obtained from the interpretation of experimental data. New developments in X-ray crystallography and nuclear magnetic resonance spectroscopy have acceleratedd the process of structure determination and the biological community is confronted with a steadily increasing number of experimentally determined protein folds. However, in the recent past several experimentally determined protein structures have been proven to contain major errors, indicating that in some cases the interpretation of experimental data is difficult and may yield incorrect models. Such problems can be avoided when computational methods are employed which complement experimental structure determinations. A prerequisite of such computational tools is that they are independent of the parameters obtained from a particular experiment. In addition such techniques are able to support and accelerate experimental structure determinations. Here we present techniques based on knowledge based mean fields which can be used to judge the quality of protein folds. The methods can be used to identify misfolded structures as well as faulty parts of structural models. The techniques are even applicable in cases where only the Cα trace of a protein conformation is available. The capabilities of the technique are demonstrated using correct and incorrect protein folds. © 1993 Wiley-Liss, Inc.  相似文献   

We report an interesting case of structural similarity between 2 small, nonhomologous proteins, the third domain of ovomucoid (ovomucoid) and the C-terminal fragment of ribosomal L7/L12 protein (CTF). The region of similarity consists of a 3-stranded beta-sheet and an alpha-helix. This region is highly similar; the corresponding elements of secondary structure share a common topology, and the RMS difference for "equivalent" C alpha atoms is 1.6 A. Surprisingly, this common structure arises from completely different sequences. For the common core, the sequence identity is less than 3%, and there is neither significant sequence similarity nor similarity in the position or orientation of conserved hydrophobic residues. This superposition raises the question of how 2 entirely different sequences can produce an identical structure. Analyzing this common region in ovomucoid revealed that it is stabilized by disulfide bonds. In contrast, the corresponding structure in CTF is stabilized in the alpha-helix by a composition of residues with high helix-forming propensities. This result suggests that different sequences and different stabilizing interactions can produce an identical structure.  相似文献   

We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop classification of fold families. This allows us to assess, understand, and improve our automatic method against an objective, manually derived standard, a type of comprehensive evaluation that has not yet been possible for other structural alignment algorithms. Our basic approach directly matches the backbones of two structures, using repeated cycles of dynamic programming and least-squares fitting to determine an alignment minimizing coordinate difference. Because of simplicity, our method can be readily modified to take into account additional features of protein structure such as the orientation of side chains or the location-dependent cost of opening a gap. Our basic method, augmented by such modifications, can find reasonable alignments for all but 1.5% of the known structural similarities in scop, i.e., all but 32 of the 2,107 superfamily pairs. We discuss the specific protein structural features that make these 32 pairs so difficult to align and show how our procedure effectively partitions the relationships in scop into different categories, depending on what aspects of protein structure are involved (e.g., depending on whether or not consideration of side-chain orientation is necessary for proper alignment). We also show how our pairwise alignment procedure can be extended to generate a multiple alignment for a group of related structures. We have compared these alignments in detail with corresponding manual ones culled from the literature. We find good agreement (to within 95% for the core regions), and detailed comparison highlights how particular protein structural features (such as certain strands) are problematical to align, giving somewhat ambiguous results. With these improvements and systematic tests, our procedure should be useful for the development of scop and the future classification of protein folds.  相似文献   

Multiple protein structure alignment.   总被引:3,自引:2,他引:3       下载免费PDF全文
A method was developed to compare protein structures and to combine them into a multiple structure consensus. Previous methods of multiple structure comparison have only concatenated pairwise alignments or produced a consensus structure by averaging coordinate sets. The current method is a fusion of the fast structure comparison program SSAP and the multiple sequence alignment program MULTAL. As in MULTAL, structures are progressively combined, producing intermediate consensus structures that are compared directly to each other and all remaining single structures. This leads to a hierarchic "condensation," continually evaluated in the light of the emerging conserved core regions. Following the SSAP approach, all interatomic vectors were retained with well-conserved regions distinguished by coherent vector bundles (the structural equivalent of a conserved sequence position). Each bundle of vectors is summarized by a resultant, whereas vector coherence is captured in an error term, which is the only distinction between conserved and variable positions. Resultant vectors are used directly in the comparison, which is weighted by their error values, giving greater importance to the matching of conserved positions. The resultant vectors and their errors can also be used directly in molecular modeling. Applications of the method were assessed by the quality of the resulting sequence alignments, phylogenetic tree construction, and databank scanning with the consensus. Visual assessment of the structural superpositions and consensus structure for various well-characterized families confirmed that the consensus had identified a reasonable core.  相似文献   

Cooperative unfolding penalties are calculated by statistically evaluating an ensemble of denatured states derived from native structures. The ensemble of denatured states is determined by dividing the native protein into short contiguous segments and defining all possible combinations of native, i.e., interacting, and non-native, i.e., non-interacting, segments. We use a novel knowledge-based scoring function, derived from a set of non-homologous proteins in the Protein Data Bank, to describe the interactions among residues. This procedure is used for the structural identification of cooperative folding cores for four globular proteins: bovine pancreatic trypsin inhibitor, horse heart cytochrome c, French bean plastocyanin, and staphylococcal nuclease. The theoretical folding units are shown to correspond to regions that exhibit enhanced stability against denaturation as determined from experimental hydrogen exchange protection factors. Using a sequence similarity score for related sequences, we show that, in addition to residues necessary for enzymatic function, those amino acids comprising structurally important folding cores are also preferentially conserved during evolution. This implies that the identified folding cores may be part of an array of fundamental structural folding units.  相似文献   

Fuzzy cluster analysis has been applied to the 20 amino acids by using 65 physicochemical properties as a basis for classification. The clustering products, the fuzzy sets (i.e., classical sets with associated membership functions), have provided a new measure of amino acid similarities for use in protein folding studies. This work demonstrates that fuzzy sets of simple molecular attributes, when assigned to amino acid residues in a protein''s sequence, can predict the secondary structure of the sequence with reasonable accuracy. An approach is presented for discriminating standard folding states, using near-optimum information splitting in half-overlapping segments of the sequence of assigned membership functions. The method is applied to a nonredundant set of 252 proteins and yields approximately 73% matching for correctly predicted and correctly rejected residues with approximately 60% overall success rate for the correctly recognized ones in three folding states: alpha-helix, beta-strand, and coil. The most useful attributes for discriminating these states appear to be related to size, polarity, and thermodynamic factors. Van der Waals volume, apparent average thickness of surrounding molecular free volume, and a measure of dimensionless surface electron density can explain approximately 95% of prediction results. hydrogen bonding and hydrophobicity induces do not yet enable clear clustering and prediction.  相似文献   

基于知识的蛋白质结构预测   总被引:5,自引:0,他引:5  
介绍了近几年基于知识的蛋白质三维结构预测方法及其进展.目前,基于知识的结构预测方法主要有两类,一类是同源蛋白模建,这种技术比较成熟,模建的结果可靠性比较高,但只适用于同源性比较高的目标序列的模建;另一类方法即蛋白质逆折叠技术,主要包括3D profile方法和基于势函数的方法,给出的是目标蛋白质的空间走向,它主要可用于序列同源性比较低的蛋白质的结构预测.  相似文献   

It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.  相似文献   

Based on the recently determined X-ray structures of Torpedo californica acetylcholinesterase and Geotrichum candidum lipase and on their three-dimensional superposition, an improved alignment of a collection of 32 related amino acid sequences of other esterases, lipases, and related proteins was obtained. On the basis of this alignment, 24 residues are found to be invariant in 29 sequences of hydrolytic enzymes, and an additional 49 are well conserved. The conservation in the three remaining sequences is somewhat lower. The conserved residues include the active site, disulfide bridges, salt bridges, and residues in the core of the proteins. Most invariant residues are located at the edges of secondary structural elements. A clear structural basis for the preservation of many of these residues can be determined from comparison of the two X-ray structures.  相似文献   

P A Rice  A Goldman  T A Steitz 《Proteins》1990,8(4):334-340
By exhaustive structural comparisons, we have found that about one-third of the alpha-helix-turn-beta-strand polypeptides in alpha-beta barrel domains share a common structural motif. The chief characteristics of this motif are that first, the geometry of the turn between the alpha-helix and the beta-strand is somewhat constrained, and second, the beta-strand contains a hydrophobic patch that fits into a hydrophobic pocket on the alpha-helix. The geometry of the turn does not seem to be a major determinant of the alpha-beta unit, because the turns vary in length from four to six residues. However, the motif does not occur when there are few constraints on the geometry of the turn-for instance, when the turns between the alpha-helix and the beta-strands are very long. It also occurs much less frequently in flat-sheet alpha-beta proteins, where the topology is much less regular and the amount of twist on the sheet varies considerably more than in the barrel proteins. The motif may be one of the basic building blocks from which alpha-beta barrels are constructed.  相似文献   

Liu J  Rost B 《Proteins》2004,55(3):678-688
We developed a method CHOP dissecting proteins into domain-like fragments. The basic idea was to cut proteins beginning from very reliable experimental information (PDB), proceeding to expert annotations of domain-like regions (Pfam-A), and completing through cuts based on termini of known proteins. In this way, CHOP dissected more than two thirds of all proteins from 62 proteomes. Analysis of our structural domain-like fragments revealed four surprising results. First, >70% of all dissected proteins contained more than one fragment. Second, most domains spanned on average over approximately 100 residues. This average was similar for eukaryotic and prokaryotic proteins, and it is also valid-although previously not described-for all proteins in the PDB. Third, single-domain proteins were significant longer than most domains in multidomain proteins. Fourth, three fourths of all domains appeared shorter than 210 residues. We believe that our CHOP fragments constituted an important resource for functional and structural genomics. Nevertheless, our main motivation to develop CHOP was that the single-linkage clustering method failed to adequately group full-length proteins. In contrast, CLUP-the simple clustering scheme CLUP introduced here-succeeded largely to group the CHOP fragments from 62 proteomes such that all members of one cluster shared a basic structural core. CLUP found >63,000 multi- and >118,000 single-member clusters. Although most fragments were restricted to a particular cluster, approximately 24% of the fragments were duplicated in at least two clusters. Our thresholds for grouping two fragments into the same cluster were rather conservative. Nevertheless, our results suggested that structural genomics initiatives have to target >30,000 fragments to at least cover the multimember clusters in 62 proteomes.  相似文献   

The three-dimensional structure of protein is encoded in the sequence, but many amino acid residues carry no essential conformational information, and the identity of those that are structure-determining is elusive. By circular permutation and terminal deletion, we produced and purified 25 Bacillus licheniformis beta-lactamase (ESBL) variants that lack 5-21 contiguous residues each, and collectively have 82% of the sequence and 92% of the non-local atom-atom contacts eliminated. Circular dichroism and size-exclusion chromatography showed that most of the variants form conformationally heterogeneous mixtures, but by measuring catalytic constants, we found that all populate, to a greater or lesser extent, conformations with the essential features of the native fold. This suggests that no segment of the ESBL sequence is essential to the structure as a whole, which is congruent with the notion that local information and modular organization can impart most of the tertiary fold specificity and cooperativity.  相似文献   

An efficient algorithm was characterized that determines the similarity in main chain conformation between short protein substructures. The algorithm computes Δt, the root mean square difference in ? and ψ torsion angles over a small number of amino acids (typically 3–5). Using this algorithm, large number of protein substrates comparisons were feasible. The parameter Δt was sensitive to variations in local protein conformation, and it correlates with Δr, the root mean square deviation in atomic coordinates. Values for Δt were obtained that define similarity thresholds, which determine whether two substructure are considered structurally similar. To set a lower bound on the similarity threshold, we estimated the component of Δt due to measurement noise fromcomparisons of independently refined coordinates of the same protein. A sample distribution of Δt from nonhomologous protein comparisons identified an upper bound on the similarity threshold, one that refrains from incorporating large numbers of nonmatching comparisons large numbers of nonmatching comparisons. Unlike methods based on Cα atoms alone, Δt was sensitive to rotations in the peptide plane, shown to occur in several proteins. Comparisons of homologus proteins by Δt showed that the active site torsion angles are highly conserved. The Δt method was applied to the α-chain of human hemoglobin, where it readily demonstrated the local differences in the structures of different ligation states.  相似文献   

Many protein pairs that share the same fold do not have any detectable sequence similarity, providing a valuable source of information for studying sequence-structure relationship. In this study, we use a stringent data set of structurally similar, sequence-dissimilar protein pairs to characterize residues that may play a role in the determination of protein structure and/or function. For each protein in the database, we identify amino-acid positions that show residue conservation within both close and distant family members. These positions are termed "persistently conserved". We then proceed to determine the "mutually" persistently conserved (MPC) positions: those structurally aligned positions in a protein pair that are persistently conserved in both pair mates. Because of their intra- and interfamily conservation, these positions are good candidates for determining protein fold and function. We find that 45% of the persistently conserved positions are mutually conserved. A significant fraction of them are located in critical positions for secondary structure determination, they are mostly buried, and many of them form spatial clusters within their protein structures. A substitution matrix based on the subset of MPC positions shows two distinct characteristics: (i) it is different from other available matrices, even those that are derived from structural alignments; (ii) its relative entropy is high, emphasizing the special residue restrictions imposed on these positions. Such a substitution matrix should be valuable for protein design experiments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号