首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Using an information theoretic formalism, we optimize classes of amino acid substitution to be maximally indicative of local protein structure. Our statistically-derived classes are loosely identifiable with the heuristic constructions found in previously published work. However, while these other methods provide a more rigid idealization of physicochemically constrained residue substitution, our classes provide substantially more structural information with many fewer parameters. Moreover, these substitution classes are consistent with the paradigmatic view of the sequence-to-structure relationship in globular proteins which holds that the three-dimensional architecture is predominantly determined by the arrangement of hydrophobic and polar side chains with weak constraints on the actual amino acid identities. More specific constraints are imposed on the placement of prolines, glycines, and the charged residues. These substitution classes have been used in highly accurate predictions of residue solvent accessibility. They could also be used in the identification of homologous proteins, the construction and refinement of multiple sequence alignments, and as a means of condensing and codifying the information in multiple sequence alignments for secondary structure prediction and tertiary fold recognition. © 1996 Wiley-Liss, Inc.  相似文献   

2.
3.
The conservation of residues in columns of a multiple sequence alignment (MSA) reflects the importance of these residues for maintaining the structure and function of a protein. To date, many scores have been suggested for quantifying residue conservation, but none has achieved the full rigor both in biology and statistics. In this paper, we present a new approach for measuring the evolutionary conservation at aligned positions. Our conservation measure is related to the logarithmic probabilities for aligned positions, and combines the physicochemical properties and the frequencies of amino acids. Such a measure is both biologically and statistically meaningful. For testing the relationship between an amino acid's evolutionary conservation and its role in the Phi-value defined protein folding kinetics, our results indicate that the folding nucleus residues may not be significantly more conserved than other residues by using the biological-relevance weighted statistical scoring method suggested in this paper as an alternative to entropy-based procedures.  相似文献   

4.
To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within alpha-helical, beta-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual alpha-helix, beta-strand, and coil states were respectively predicted at 66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins.  相似文献   

5.
Shatsky M  Nussinov R  Wolfson HJ 《Proteins》2006,62(1):209-217
Routinely used multiple-sequence alignment methods use only sequence information. Consequently, they may produce inaccurate alignments. Multiple-structure alignment methods, on the other hand, optimize structural alignment by ignoring sequence information. Here, we present an optimization method that unifies sequence and structure information. The alignment score is based on standard amino acid substitution probabilities combined with newly computed three-dimensional structure alignment probabilities. The advantage of our alignment scheme is in its ability to produce more accurate multiple alignments. We demonstrate the usefulness of the method in three applications: 1) computing more accurate multiple-sequence alignments, 2) analyzing protein conformational changes, and 3) computation of amino acid structure-sequence conservation with application to protein-protein docking prediction. The method is available at http://bioinfo3d.cs.tau.ac.il/staccato/.  相似文献   

6.
The HSSP (Homology-Derived Secondary Structure of Proteins) database provides multiple sequence alignments (MSAs) for proteins of known three-dimensional (3D) structure in the Protein Data Bank (PDB). The database also contains an estimate of the degree of evolutionary conservation at each amino acid position. This estimate, which is based on the relative entropy, correlates with the functional importance of the position; evolutionarily conserved positions (i.e., positions with limited variability and low entropy) are occasionally important to maintain the 3D structure and biological function(s) of the protein. We recently developed the Rate4Site algorithm for scoring amino acid conservation based on their calculated evolutionary rate. This algorithm takes into account the phylogenetic relationships between the homologs and the stochastic nature of the evolutionary process. Here we present the ConSurf-HSSP database of Rate4Site estimates of the evolutionary rates of the amino acid positions, calculated using HSSP's MSAs. The database provides precalculated evolutionary rates for nearly all of the PDB. These rates are projected, using a color code, onto the protein structure, and can be viewed online using the ConSurf server interface. To exemplify the database, we analyzed in detail the conservation pattern obtained for pyruvate kinase and compared the results with those observed using the relative entropy scores of the HSSP database. It is reassuring to know that the main functional region of the enzyme is detectable using both conservation scores. Interestingly, the ConSurf-HSSP calculations mapped additional functionally important regions, which are moderately conserved and were overlooked by the original HSSP estimate. The ConSurf-HSSP database is available online (http://consurf-hssp.tau.ac.il).  相似文献   

7.
8.
The architecture and weights of an artificial neural network model that predicts putative transmembrane sequences have been developed and optimized by the algorithm of structure evolution. The resulting filter is able to classify membrane/nonmembrane transition regions in sequences of integral human membrane proteins with high accuracy. Similar results have been obtained for both training and test set data, indicating that the network has focused on general features of transmembrane sequences rather than specializing on the training data. Seven physicochemical amino acid properties have been used for sequence encoding. The predictions are compared to hydrophobicity plots.  相似文献   

9.
To investigate the relationships between sequence conservation, protein stability, and protein function, we have measured the thermodynamic stability, folding kinetics, and in vitro peptide-binding activity of a large number of single-site substitutions in the hydrophobic core of the Fyn SH3 domain. Comparison of these data to that derived from an analysis of a large alignment of SH3 domain sequences revealed a very good correlation between the distinct pattern of conservation observed at each core position and the thermodynamic stability of mutants. Conservation was also found to correlate well with the unfolding rates of mutants, but not to the folding rates, suggesting that evolution selects more strongly for optimal native state packing interactions than for maximal folding rates. Structural analysis suggests that residue-residue core packing interactions are very similar in all SH3 domains, which provides an explanation for the correlation between conservation and mutant stability effects studied in a single SH3 domain. We also demonstrate a correlation between stability and the in vivo activity of mutants, and between conservation and activity. However, the relationship between conservation and activity was very strong only for the three most conserved hydrophobic core positions. The weaker correlation between activity and conservation seen at the other seven core positions indicates that maintenance of protein stability is the dominant selective pressure at these positions. In general, the pattern of conservation at hydrophobic core positions appears to arise from conserved packing constraints, and can be effectively utilized to predict the destabilizing effects of amino acid substitutions.  相似文献   

10.
This paper discusses the benefit of mapping paired cysteine mutation patterns as a guide to identifying the positions of protein disulfide bonds. This information can facilitate the computer modeling of protein tertiary structure. First, a simple, paired natural-cysteine-mutation map is presented that identifies the positions of putative disulfide bonds in protein families. The method is based on the observation that if, during the process of evolution, a disulfide-bonded cysteine residue is not conserved, then it is likely that its counterpart will also be mutated. For each target protein, protein databases were searched for the primary amino acid sequences of all known members of distinct protein families. Primary sequence alignment was carried out using PileUp algorithms in the GCG package. To search for correlated mutations, we listed only the positions where cysteine residues were highly conserved and emphasized the mutated residues. In proteins of known three-dimensional structure, a striking pattern of paired cysteine mutations correlated with the positions of known disulfide bridges. For proteins of unknown architecture, the mutation maps showed several positions where disulfide bridging might occur.  相似文献   

11.
Liu XS  Guo WL 《Amino acids》2008,34(4):643-652
Measuring residue conservation at aligned positions has many applications in biology. Recently, a new conservation score has been defined. Unlike the previous methods, the new approach considers both residue frequencies and physicochemistries. Specifically, it measures physicochemistries based on BLOSUM matrices disregarding the meaning of the entries in such matrices, which may involve the problem of log–log probability. In this paper we present a conservation measure that also reflects both frequencies and physicochemistries while considering the fact that the entries of BLOSUM matrices are already interpreted as log probability. When the supposed score is applied to 14 protein examples, the results show that these two conservation scores are equivalent aside from the different score ranges. The method is also used to score the functional sites of three protein families. Compared with the widely used entropy-based methods, the resulting scores are more robust and consistent in the sense that the functional sites are much more conserved because of functional constraints.  相似文献   

12.
Lee S  Lee BC  Kim D 《Proteins》2006,62(4):1107-1114
Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few.  相似文献   

13.
Cyclin-dependent kinase 2 (CDK2) is the most thoroughly studied of the cyclin-dependent kinases that regulate essential cellular processes, including the cell cycle, and it has become a model for studies of regulatory mechanisms at the molecular level. This contribution identifies flexible and rigid regions of CDK2 based on temperature B-factors acquired from both X-ray data and molecular dynamics simulations. In addition, the biological relevance of the identified flexible regions and their motions is explored using information from the essential dynamics analysis related to conformational changes of CDK2 and knowledge of its biological function(s). The conserved regions of CMGC protein kinases' primary sequences are located in the most rigid regions identified in our analyses, with the sole exception of the absolutely conserved G13 in the tip of the glycine-rich loop. The conserved rigid regions are important for nucleotide binding, catalysis, and substrate recognition. In contrast, the most flexible regions correlate with those where large conformational changes occur during CDK2 regulation processes. The rigid regions flank and form a rigid skeleton for the flexible regions, which appear to provide the plasticity required for CDK2 regulation. Unlike the rigid regions (which as mentioned are highly conserved) no evidence of evolutionary conservation was found for the flexible regions.  相似文献   

14.
The analysis of sequence conservation is commonly used to predict functionally important sites in proteins. We have developed an approach that first identifies highly conserved sites in a set of orthologous sequences using a weighted substitution‐matrix‐based conservation score and then filters these conserved sites based on the pattern of conservation present in a wider alignment of sequences from the same family and structural information to identify surface‐exposed sites. This allows us to detect specific functional sites in the target protein and exclude regions that are likely to be generally important for the structure or function of the wider protein family. We applied our method to two members of the serpin family of serine protease inhibitors. We first confirmed that our method successfully detected the known heparin binding site in antithrombin while excluding residues known to be generally important in the serpin family. We next applied our sequence analysis approach to neuroserpin and used our results to guide site‐directed polyalanine mutagenesis experiments. The majority of the mutant neuroserpin proteins were found to fold correctly and could still form inhibitory complexes with tissue plasminogen activator (tPA). Kinetic analysis of tPA inhibition, however, revealed altered inhibitory kinetics in several of the mutant proteins, with some mutants showing decreased association with tPA and others showing more rapid dissociation of the covalent complex. Altogether, these results confirm that our sequence analysis approach is a useful tool that can be used to guide mutagenesis experiments for the detection of specific functional sites in proteins. Proteins 2015; 83:135–152. © 2014 Wiley Periodicals, Inc.  相似文献   

15.
Sequence alignment profiles have been shown to be very powerful in creating accurate sequence alignments. Profiles are often used to search a sequence database with a local alignment algorithm. More accurate and longer alignments have been obtained with profile-to-profile comparison. There are several steps that must be performed in creating profile-profile alignments, and each involves choices in parameters and algorithms. These steps include (1) what sequences to include in a multiple alignment used to build each profile, (2) how to weight similar sequences in the multiple alignment and how to determine amino acid frequencies from the weighted alignment, (3) how to score a column from one profile aligned to a column of the other profile, (4) how to score gaps in the profile-profile alignment, and (5) how to include structural information. Large-scale benchmarks consisting of pairs of homologous proteins with structurally determined sequence alignments are necessary for evaluating the efficacy of each scoring scheme. With such a benchmark, we have investigated the properties of profile-profile alignments and found that (1) with optimized gap penalties, most column-column scoring functions behave similarly to one another in alignment accuracy; (2) some functions, however, have much higher search sensitivity and specificity; (3) position-specific weighting schemes in determining amino acid counts in columns of multiple sequence alignments are better than sequence-specific schemes; (4) removing positions in the profile with gaps in the query sequence results in better alignments; and (5) adding predicted and known secondary structure information improves alignments.  相似文献   

16.
We present a method for prediction of functional sites in a set of aligned protein sequences. The method selects sites which are both well conserved and clustered together in space, as inferred from the 3D structures of proteins included in the alignment. We tested the method using 86 alignments from the NCBI CDD database, where the sites of experimentally determined ligand and/or macromolecular interactions are annotated. In agreement with earlier investigations, we found that functional site predictions are most successful when overall background sequence conservation is low, such that sites under evolutionary constraint become apparent. In addition, we found that averaging of conservation values across spatially clustered sites improves predictions under certain conditions: that is, when overall conservation is relatively high and when the site in question involves a large macromolecular binding interface. Under these conditions it is better to look for clusters of conserved sites than to look for particular conserved sites.  相似文献   

17.
The basic DNA-binding modules of 128 protein-DNA interfaces have been analyzed. Although these are less planar, like the protein-protein interfaces, the protein-DNA interfaces can also be dissected into core regions in which all the fully-buried atoms are located, and rim regions having atoms with residual accessibilities. The sequence entropy of the core residues is smaller than those in the rim, indicating that the former are better conserved and possibly contribute more towards the binding free energy, as has been implicated in protein-protein interactions. On the protein side, 1014 A(2) of the surface is buried of which 63% belong to the core. There are some differences in the propensities of residues to occur in the core and the rim. In the DNA strands, the nucleotide(s) containing fully-buried atoms in all three components usually occupy central positions of the binding region. A new classification scheme for the interfaces has been introduced based on the composition of secondary structural elements of residues and the results compared with the conventional classification of DNA-binding proteins, as well as the protein class of the molecule. It appears that a common framework may be developed to understand both protein-protein and protein-DNA interactions.  相似文献   

18.
It has been shown previously that some membrane proteins have a conserved core of amino acid residues. This idea not only serves to orient helices during model building exercises but may also provide insight into the structural role of residues mediating helix-helix interactions. Using experimentally determined high-resolution structures of alpha-helical transmembrane proteins we show that, of the residues within the hydrophobic transmembrane spans, the residues at lipid and subunit interfaces are more evolutionarily variable than those within the lipid-inaccessible core of a polypeptide's transmembrane domain. This supports the idea that helix-helix interactions within the same polypeptide chain and those at the interface between different polypeptide chains may arise in distinct ways. To show this, we use a new method to estimate the substitution rate of an amino acid residue given an alignment and phylogenetic tree of closely related proteins. This method gives better sensitivity in the otherwise-conserved transmembrane domains than a conventional similarity analysis and is relatively insensitive to the sequences used.  相似文献   

19.
McGuffin LJ  Jones DT 《Proteins》2002,48(1):44-52
The ultimate goal of structural genomics is to obtain the structure of each protein coded by each gene within a genome to determine gene function. Because of cost and time limitations, it remains impractical to solve the structure for every gene product experimentally. Up to a point, reasonably accurate three‐dimensional structures can be deduced for proteins with homologous sequences by using comparative modeling. Beyond this, fold recognition or threading methods can be used for proteins showing little homology to any known fold, although this is relatively time‐consuming and limited by the library of template folds currently available. Therefore, it is appropriate to develop methods that can increase our knowledge base, expanding our fold libraries by earmarking potentially “novel” folds for experimental structure determination. How can we sift through proteomic data rapidly and yet reliably identify novel folds as targets for structural genomics? We have analyzed a number of simple methods that discriminate between “novel” and “known” folds. We propose that simple alignments of secondary structure elements using predicted secondary structure could potentially be a more selective method than both a simple fold recognition method (GenTHREADER) and standard sequence alignment at finding novel folds when sequences show no detectable homology to proteins with known structures. Proteins 2002;48:44–52. © 2002 Wiley‐Liss, Inc.  相似文献   

20.
Multiple sequence alignments are successfully applied in many studies for under- standing the structural and functional relations among single nucleic acids and protein sequences as well as whole families. Because of the rapid growth of sequence databases, multiple sequence alignments can often be very large and difficult to visualize and analyze. We offer a new service aimed to visualize and analyze the multiple alignments obtained with different external algorithms, with new features useful for the comparison of the aligned sequences as well as for the creation of a final image of the alignment. The service is named FASMA and is available at http://bioinformatica.isa.cnr.it/FASMA/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号