共查询到20条相似文献,搜索用时 0 毫秒
1.
Babajide A Farber R Hofacker IL Inman J Lapedes AS Stadler PF 《Journal of theoretical biology》2001,212(1):35-46
Knowledge-based potentials can be used to decide whether an amino acid sequence is likely to fold into a prescribed native protein structure. We use this idea to survey the sequence-structure relations in protein space. In particular, we test the following two propositions which were found to be important for efficient evolution: the sequences folding into a particular native fold form extensive neutral networks that percolate through sequence space. The neutral networks of any two native folds approach each other to within a few point mutations. Computer simulations using two very different potential functions, M. Sippl's PROSA pair potential and a neural network based potential, are used to verify these claims. 相似文献
2.
Modifications of the amino acid sequence generally affect protein stability. Here, we use knowledge-based potentials to estimate the stability of protein structures under sequence variation. Calculations on a variety of protein scaffolds result in a clear distinction of known mutable regions from arbitrarily chosen control patches. For example, randomly changing the sequence of an antibody paratope yields a significantly lower number of destabilized mutants as compared to the randomization of comparable regions on the protein surface. The technique is computationally efficient and can be used to screen protein structures for regions that are amenable to molecular tinkering by preserving the stability of the mutated proteins. 相似文献
3.
Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended. 相似文献
4.
RNA molecules play integral roles in gene regulation, and understanding their structures gives us important insights into their biological functions. Despite recent developments in template-based and parameterized energy functions, the structure of RNA--in particular the nonhelical regions--is still difficult to predict. Knowledge-based potentials have proven efficient in protein structure prediction. In this work, we describe two differentiable knowledge-based potentials derived from a curated data set of RNA structures, with all-atom or coarse-grained representation, respectively. We focus on one aspect of the prediction problem: the identification of native-like RNA conformations from a set of near-native models. Using a variety of near-native RNA models generated from three independent methods, we show that our potential is able to distinguish the native structure and identify native-like conformations, even at the coarse-grained level. The all-atom version of our knowledge-based potential performs better and appears to be more effective at discriminating near-native RNA conformations than one of the most highly regarded parameterized potential. The fully differentiable form of our potentials will additionally likely be useful for structure refinement and/or molecular dynamics simulations. 相似文献
5.
The structurally constrained protein evolution (SCPE) model simulates protein divergence considering protein structure explicitly. The model is based on the observation that protein structure is more conserved during evolution than the sequences encoding for that structure. In the previous work, the SCPE model considered only the tertiary structure. Here we show that the performance of the model is enhanced when the oligomeric structure is taken into account. Our results agree with recent evolutionary studies of oligomeric proteins, which show that conservation of the quaternary structure imposes additional constraints on sequence divergence. The incorporation of protein-protein interactions into protein evolution models may be important in the study of quaternary protein structures and complex protein assemblies. 相似文献
6.
Background
Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes. 相似文献7.
8.
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. 相似文献
9.
Karl J. Niklas 《Brittonia》1978,30(3):373-394
A model for speciation is given in which a taxon’s phenotypic variation and concomitant variation in fitness are related to gradients within the environment. Phenotypic expressions within the population are shown to undergo abrupt transitions as a result of discontinuous fitness-functions. Evidence for rapid and abrupt phenotypic variation is explored by analyses of speciation (= origination) rates within the fossil record. In general, a high correlation exists among the area/volume changes of sedimentary rocks known for each geologic period and the apparent speciation rates seen in selected vascular/non-vascular plant groups. Regressions of speciation rates on rock area/volume plots indicate that the Devonian, Carboniferous, and Permian show significant divergences (= residual) rates from predicted origination rates. The Cretaceous shows the highest residual value as a consequence of the rapid appearance of angiosperm fossils. A similar pattern in diversity changes for the mandibulate terrestrial invertebrates is also apparent. The coupled evolution of the angiosperms with specific insect groups appears to be the most tenable explanation for the residual Cretaceous origination rate of the former group. It is postulated that the angiospems have evolved in part as the result of a phytochemical cost-function such that phytophagous insects are warded off, while potential pollinators are favored. Quantitative/qualitative differences observed in the distribution of secondary metabolites may be evidence for coupled evolution. A “predatorprey mediated co-existence” between phytophagous insects and angiosperms may have served as a factor in allowing the co-existence of less than optimal plant species providing an impetus for relatively rapid speciation turnover. 相似文献
10.
Background
Homology is a key concept in both evolutionary biology and genomics. Detection of homology is crucial in fields like the functional annotation of protein sequences and the identification of taxon specific genes. Basic homology searches are still frequently performed by pairwise search methods such as BLAST. Vast improvements have been made in the identification of homologous proteins by using more advanced methods that use sequence profiles. However additional improvement could be made by exploiting sources of genomic information other than the primary sequence or tertiary structure. 相似文献11.
Knowledge-based potentials are extensively used to represent atomic interactions in modeling the protein structure. We consider a number of problems in constructing efficient knowledge-based potentials for biopolymer modeling. We show that some limitations can be overcome by normalizing estimated interactions through the distribution of distances between noninteracting random probes in protein structure space. We demonstrate that knowledge-based potentials thus constructed can be efficiently applied for analysis of the hydration state of proteins atoms. With this approach, one can predict the locations of structural water molecules in a protein globule. We have also succeeded in recognizing the correctly folded protein structure among many misfolded decoys in cases when the interaction with water solvent is dominant for structure formation. 相似文献
12.
A new approach, MOBILE, is presented that models protein binding-sites including bound ligand molecules as restraints. Initially generated, homology models of the target protein are refined iteratively by including information about bioactive ligands as spatial restraints and optimising the mutual interactions between the ligands and the binding-sites. Thus optimised models can be used for structure-based drug design and virtual screening. In a first step, ligands are docked into an averaged ensemble of crude homology models of the target protein. In the next step, improved homology models are generated, considering explicitly the previously placed ligands by defining restraints between protein and ligand atoms. These restraints are expressed in terms of knowledge-based distance-dependent pair potentials, which were compiled from crystallographically determined protein-ligand complexes. Subsequently, the most favourable models are selected by ranking the interactions between the ligands and the generated pockets using these potentials. Final models are obtained by selecting the best-ranked side-chain conformers from various models, followed by an energy optimisation of the entire complex using a common force-field. Application of the knowledge-based pair potentials proved efficient to restrain the homology modelling process and to score and optimise the modelled protein-ligand complexes. For a test set of 46 protein-ligand complexes, taken from the Protein Data Bank (PDB), the success rate of producing near-native binding-site geometries (rmsd<2.0A) with MODELLER is 70% when the ligand restrains the homology modelling process in its native orientation. Scoring these complexes with the knowledge-based potentials, in 66% of the cases a pose with rmsd <2.0A is found on rank 1. Finally, MOBILE has been applied to two case studies modelling factor Xa based on trypsin and aldose reductase based on aldehyde reductase. 相似文献
13.
The variational approach of evaluation for knowledge-based potentials is considered for the first time. In this approach, the problem to derive knowledge-based potentials is solved as the optimization task in the multiparametric model of atom types, reference states and interaction cutoff radii. Using analogy to liquid state theory we offered four new reference states and derived corresponding knowledge-based potentials. The cutoff radii and atom types are optimized to minimize averaged root-mean square deviations (RMSD) of the ligand docked positions regarding to the experimentally determined poses. The number of atom types is varied on the developed atom type tree with 6 root (C, N, O, S, P and the halogen type) and 49 apical atom types. We showed a pronounced effect of atom type choice on docking accuracy and proved that splitting of elements C, N and O of the periodic system up to the 18 optimal atom types essentially improves docking accuracy. 相似文献
14.
Background
An increasing number of long noncoding RNAs (lncRNAs) have been identified recently. Different from all the others that function in cis to regulate local gene expression, the newly identified HOTAIR is located between HoxC11 and HoxC12 in the human genome and regulates HoxD expression in multiple tissues. Like the well-characterised lncRNA Xist, HOTAIR binds to polycomb proteins to methylate histones at multiple HoxD loci, but unlike Xist, many details of its structure and function, as well as the trans regulation, remain unclear. Moreover, HOTAIR is involved in the aberrant regulation of gene expression in cancer. 相似文献15.
Transfer RNAs (tRNAs) are ancient molecules that are central to translation. Since they probably carry evolutionary signatures that were left behind when the living world diversified, we reconstructed phylogenies directly from the sequence and structure of tRNA using well-established phylogenetic methods. The trees placed tRNAs with long variable arms charging Sec, Tyr, Ser, and Leu consistently at the base of the rooted phylogenies, but failed to reveal groupings that would indicate clear evolutionary links to organismal origin or molecular functions. In order to uncover evolutionary patterns in the trees, we forced tRNAs into monophyletic groups using constraint analyses to generate timelines of organismal diversification and test competing evolutionary hypotheses. Remarkably, organismal timelines showed Archaea was the most ancestral superkingdom, followed by viruses, then superkingdoms Eukarya and Bacteria, in that order, supporting conclusions from recent phylogenomic studies of protein architecture. Strikingly, constraint analyses showed that the origin of viruses was not only ancient, but was linked to Archaea. Our findings have important implications. They support the notion that the archaeal lineage was very ancient, resulted in the first organismal divide, and predated diversification of tRNA function and specificity. Results are also consistent with the concept that viruses contributed to the development of the DNA replication machinery during the early diversification of the living world. 相似文献
16.
17.
We have examined the cleavage of several synthetic DNA sequences by iron(II)-bleomycin. We find that, although bleomycin cuts mixed sequence DNAs with a preference for GC = GT > GA >> GG, it efficiently cleaves regions of (AT)n cutting exclusively at ApT, not TpA. Isolated ApT steps show very little cleavage while blocks of three or more contiguous ATs are cut as efficiently as GpT. This cleavage is specific for (AT)n, since sequences of the type (TAA)n.(TTA)n and (ATT)n.(AAT)n are hardly cut at all. No cleavage is observed at ApC or CpA within sequences of the type (AC)n.(GT)n; regions of An.Tn are also not cut. Although the cobalt-bleomycin complex (which binds to but does not cleave DNA) yields good DNase I footprints at GT and GC sites, no footprints are observed within (AT)n, suggesting that although the cleavage reaction is efficient, the binding affinity is relatively weak. We propose a model in which bleomycin cleavage is determined by local DNA structure, while strong binding requires the presence of a guanine residue. 相似文献
18.
19.
Fox SW 《The American biology teacher》1986,48(3):140-9, 169
The evolutionary sequence is being reexamined experimentally from a "Big Bang"origin to the protocell and from the emergence of protocell and variety of species to Darwin's mental power (mind) and society (The Descent of Man). A most fundamentally revisionary consequence of experiments is an emphasis on endogenous ordering. This principle, seen vividly in ordered copolymerization of amino acids, has had new impact on the theory of Darwinian evolution and has been found to apply to the entire sequence. Herein, I will discuss some problems of dealing with teaching controversial subjects. 相似文献
20.
ABSTRACT: BACKGROUND: The Pi2/9 locus contains multiple nucleotide binding site--leucine-rich repeat (NBS-LRR) genes in the rice genome. Although three functional R-genes have been cloned from this locus, little is known about the origin and evolutionary history of these genes. Herein, an extensive genome-wide survey of Pi2/9 homologs in rice, sorghum, Brachypodium and Arabidopsis, was conducted to explore this theme. RESULTS: In our study, 1, 1, 5 and 156 Pi2/9 homologs were detected in Arabidopsis, Brachypodium, sorghum and rice genomes, respectively. Two distinct evolutionary patterns of Pi2/9 homologs, Type I and Type II, were observed in rice lines. Type I Pi2/9 homologs showed evidence of rapid gene diversification, including substantial copy number variations, obscured orthologous relationships, high levels of nucleotide diversity or/and divergence, frequent sequence exchanges and strong positive selection, whereas Type II Pi2/9 homologs exhibited a fairly slow evolutionary rate. Interestingly, the three cloned R-genes from the Pi2/9 locus all belonged to the Type I genes. CONCLUSIONS: Our data show that the Pi2/9 locus had an ancient origin predating the common ancestor of gramineous species. The existence of two types of Pi2/9 homologs suggest that diversifying evolution should be an important strategy of rice to cope with different types of pathogens. The relationship of cloned Pi2/9 genes and Type I genes also suggests that rapid gene diversification might facilitate rice to adapt quickly to the changing spectrum of the fungal pathogen M. grisea. Based on these criteria, other potential candidate genes that might confer novel resistance specificities to rice blast could be predicted. 相似文献