首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
M J Sippl  S Weitckus 《Proteins》1992,13(3):258-271
We present an approach which can be used to identify native-like folds in a data base of protein conformations in the absence of any sequence homology to proteins in the data base. The method is based on a knowledge-based force field derived from a set of known protein conformations. A given sequence is mounted on all conformations in the data base and the associated energies are calculated. Using several conformations and sequences from the globin family we show that the native conformation is identified correctly. In fact the resolution of the force field is high enough to discriminate between a native fold and several closely related conformations. We then apply the procedure to several globins of known sequence but unknown three dimensional structure. The homology of these sequences to globins of known structures in the data base ranges from 49 to 17%. With one exception we find that for all globin sequences one of the known globin folds is identified as the most favorable conformation. These results are obtained using a force field derived from a data base devoid of globins of known structure. We briefly discuss useful applications in protein structural research and future development of our approach.  相似文献   

2.
I Lafontaine  R Lavery 《Biopolymers》2000,56(4):292-310
We describe an original approach to determining sequence-structure relationships for DNA. This approach, termed ADAPT, combines all-atom molecular mechanics with a multicopy algorithm to build nucleotides that contain all four standard bases in variable proportions. These nucleotides enable us to search very rapidly for base sequences that energetically favor chosen types of DNA deformation or chosen DNA-protein or DNA-ligand interactions. Sequences satisfying the chosen criteria can be found by energy minimization, combinatorial sequence searching, or genome scanning, in a manner similar to the threading approaches developed for protein structure prediction. In the latter case, we are able to analyze roughly 2000 base pairs per second. Applications of the method to DNA allomorphic transitions, DNA deformation, and specific DNA interactions are presented.  相似文献   

3.
Many modeling studies of supercoiled DNA are based on equilibrium structures from theoretical calculations or energy minimization. Since closed circular DNAs are flexible, it is possible that errors are introduced by calculating properties from a single minimum energy structure, rather than from a complete thermodynamic ensemble. We have investigated this question using molecular dynamics simulations on a low resolution molecular mechanics model in which each base pair is represented by three points (a plane). This allows the inclusion of sequence-dependent variations of tip, inclination, and twist. Three kinds of sequences were tested: (1) homogeneous DNA, in which all base pairs have the helicoidal parameters of an ideal, average B-DNA; (2) random sequence DNA; and (3) curved DNA. We examined the rate of convergence of various structural parameters. Convergence for most of these is slowest for homogeneous sequences, more rapid for random sequences, and most rapid for curved sequences. The most slowly converging parameter is the antipodes profile. In a plasmid with N base pairs (bp), the antipodes distance is the distance d ij from base pair i to base pair j halfway around the plasmid, j = i + N/2. The antipodes profile at time t is a plot of d ij over the range i = 1, N/2. In a homogeneous plasmid, convergence requires that the antipodes profile averaged over time must be flat. Even in the small plasmids examined here, the average properties of the ensembles were found to differ from those of static equilibrium structures. These effects will be even more dramatic for larger plasmids. Further, average and dynamic properties are affected by both plasmid size and sequence. © 1996 John Wiley & Sons, Inc.  相似文献   

4.
Analyzing protein-DNA recognition mechanisms   总被引:1,自引:0,他引:1  
We present a computational algorithm that can be used to analyze the generic mechanisms involved in protein-DNA recognition. Our approach is based on energy calculations for the full set of base sequences that can be threaded onto the DNA within a protein-DNA complex. It is able to reproduce experimental consensus binding sequences for a variety of DNA binding proteins and also correlates well with the order of measured binding free energies. These results suggest that the crystal structure of a protein-DNA complex can be used to identify all potential binding sequences. By analyzing the energy contributions that lead to base sequence selectivity, it is possible to quantify the concept of direct versus indirect recognition and to identify a new concept describing whether the protein-DNA interaction and DNA deformation terms select optimal binding sites by acting in accord or in disaccord.  相似文献   

5.
Tertiary interactions are crucial in maintaining the tRNA structure and functionality. We used a combined sequence analysis and quantum mechanics approach to calculate accurate energies of the most frequent tRNA tertiary base pairing interactions. Our analysis indicates that six out of the nine classical tertiary interactions are held in place mainly by H-bonds between the bases. In the remaining three cases other effects have to be considered. Tertiary base pairing interaction energies range from -8 to -38 kcal/mol in yeast tRNA(Phe) and are estimated to contribute roughly 25% of the overall tRNA base pairing interaction energy. Six analyzed posttranslational chemical modifications were shown to have minor effect on the geometry of the tertiary interactions. Modifications that introduce a positive charge strongly stabilize the corresponding tertiary interactions. Non-additive effects contribute to the stability of base triplets.  相似文献   

6.
7.
Kosloff M  Kolodny R 《Proteins》2008,71(2):891-902
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).  相似文献   

8.
Looking into DNA recognition: zinc finger binding specificity   总被引:5,自引:2,他引:3       下载免费PDF全文
We present a quantitative, theoretical analysis of the recognition mechanisms used by two zinc finger proteins: Zif268, which selectively binds to GC-rich sequences, and a Zif268 mutant, which binds to a TATA box site. This analysis is based on a recently developed method (ADAPT), which allows binding specificity to be analyzed via the calculation of complexation energies for all possible DNA target sequences. The results obtained with the zinc finger proteins show that, although both mainly select their targets using direct, pairwise protein–DNA interactions, they also use sequence-dependent DNA deformation to enhance their selectivity. A new extension of our methodology enables us to determine the quantitative contribution of these two components and also to measure the contributions of individual residues to overall specificity. The results show that indirect recognition is particularly important in the case of the TATA box binding mutant, accounting for 30% of the total selectivity. The residue-by-residue analysis of the protein–DNA interaction energy indicates that the existence of amino acid–base contacts does not necessarily imply sequence selectivity, and that side chains without contacts can nevertheless contribute to defining the protein's target sequence.  相似文献   

9.
MOTIVATION: DNA structure plays an important role in a variety of biological processes. Different di- and tri-nucleotide scales have been proposed to capture various aspects of DNA structure including base stacking energy, propeller twist angle, protein deformability, bendability, and position preference. Yet, a general framework for the computational analysis and prediction of DNA structure is still lacking. Such a framework should in particular address the following issues: (1) construction of sequences with extremal properties; (2) quantitative evaluation of sequences with respect to a given genomic background; (3) automatic extraction of extremal sequences and profiles from genomic databases; (4) distribution and asymptotic behavior as the length N of the sequences increases; and (5) complete analysis of correlations between scales. RESULTS: We develop a general framework for sequence analysis based on additive scales, structural or other, that addresses all these issues. We show how to construct extremal sequences and calibrate scores for automatic genomic and database extraction. We show that distributions rapidly converge to normality as Nincreases. Pairwise correlations between scales depend both on background distribution and sequence length and rapidly converge to an analytically predictable asymptotic value. For di- and tri-nucleotide scales, normal behavior and asymptotic correlation values are attained over a characteristic window length of about 10-15 bp. With a uniform background distribution, pairwise correlations between empirically-derived scales remain relatively small and roughly constant at all lengths, except for propeller twist and protein deformability which are positively correlated. There is a positive (resp. negative) correlation between dinucleotide base stacking (resp. propeller twist and protein deformability) and AT-content that increases in magnitude with length. The framework is applied to the analysis of various DNA tandem repeats. We derive exact expressions for counting the number of repeat unit classes at all lengths. Tandem repeats are likely to result from a variety of different mechanisms, a fraction of which is likely to depend on profiles characterized by extreme structural features.  相似文献   

10.
RNA molecules, which are found in all living cells, fold into characteristic structures that account for their diverse functional activities. Many of these RNA structures consist of a collection of fundamental RNA motifs. The various combinations of RNA basic components form different RNA classes and define their unique structural and functional properties. The availability of many genome sequences makes it possible to search computationally for functional RNAs. Biological experiments indicate that functional RNAs have characteristic RNA structural motifs represented by specific combinations of base pairings and conserved nucleotides in the loop regions. The searching for those well-ordered RNA structures and their homologues in genomic sequences is very helpful for the understanding of RNA-based gene regulation. In this paper, we consider the following problem: given an RNA sequence with a known secondary structure, efficiently determine candidate segments in genomic sequences that can potentially form RNA secondary structures similar to the given RNA secondary structure. Our new bottom-up approach searches all potential stem-loops similar to ones of the given RNA secondary structure first, and then based on located stem-loops, detects potential homologous structural RNAs in genomic sequences.  相似文献   

11.
We have developed a technique of partially-restrained molecular mechanics enthalpy minimisation which enables the sequence-dependence of the DNA binding of a non-intercalating ligand to be studied for arbitrary sequences of considerable length (greater than = 60 base-pairs). The technique has been applied to analyse the binding of berenil to the minor groove of a 60 base-pair sequence derived from the tyrT promoter; the results are compared with those obtained by DNAse I and hydroxyl radical footprinting on the same sequence. The calculated and experimentally observed patterns of binding are in good agreement. Analysis of the modelling data highlights the importance of DNA flexibility in ligand binding. Further, the electrostatic component of the interaction tends to favour binding to AT-rich regions, whilst the van der Waals interaction energy term favours GC-rich ones. The results also suggest that an important contribution to the observed preference for binding in AT-rich regions arises from lower DNA perturbation energies and is not accompanied by reduced DNA structural perturbations in such sequences. It is therefore concluded that those modes of DNA distortion favourable to binding are probably more flexible in AT-rich regions. The structure of the modelled DNA sequence has also been analysed in terms of helical parameters. For the DNA energy-minimised in the absence of berenil, certain helical parameters show marked sequence-dependence. For example, purine-pyrimidine (R-Y) base pairs show a consistent positive buckle whereas this feature is consistently negative for Y-R pairs. Further, CG steps show lower than average values of slide while GC steps show lower than average values of rise. Similar analysis of the modelling data from the calculations including berenil highlights the importance of DNA flexibility in ligand binding. We observe that the binding of berenil induces characteristic responses in different helical parameters for the base-pairs around the binding site. For example, buckle and tilt tend to become more negative to the 5'-side of the binding site and more positive to the 3'-side, while the base steps at either side of the centre of the site show increased twist and decreased roll.  相似文献   

12.
The structural and energetic consequences of cytosine methylation in the 5-position on the supercoil-dependent B-Z equilibrium in alternating dC-dG sequences cloned into recombinant plasmids were investigated. The helical parameters determined with the band shift method for right-handed [10.7 base pairs (bp)/turn] and left-handed (12.8 bp/turn) 5MedC-dG inserts were different from the helical repeat values for unmethylated dC-dG inserts (10.5 bp/turn in the right-handed and 11.5 bp/turn in the left-handed form). We analyzed the thermodynamic parameters delta GBZ (free energy difference per base pair between right-handed and left-handed helix structure), delta Gjx (free energy for formation of one B-Z junction), and b (helix unwinding at a junction region) for varying lengths of dC-dG inserts by two-dimensional gel electrophoresis and application of a statistical mechanics model. A comparison of plasmids fully methylated in vitro with HhaI methylase and their unmethylated counterparts revealed that delta Gjx is not significantly changed by cytosine methylation. However, this base modification results in an approximate 3-fold decrease of delta GBZ and an approximate 2-fold decrease of the unwinding b at B-Z junction regions. Analysis of a pair of related plasmids, each containing two dC-dG blocks, revealed qualitatively different transition behaviors. When the two dC-dG blocks were separated by 95 bp of a mixed sequence, they underwent independent B to Z transitions with separate nucleation events and junction formations. When the two blocks were separated by only a 4 bp GATC sequence, only one nucleation event was necessary, and the Z-helix spread across the nonalternating GATC region.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

13.
14.
The success of comparative analysis in resolving RNA secondary structure and numerous tertiary interactions relies on the presence of base covariations. Although the majority of base covariations in aligned sequences is associated to Watson-Crick base pairs, many involve non-canonical or restricted base pair exchanges (e.g. only G:C/A:U), reflecting more specific structural constraints. We have developed a computer program that determines potential base pairing conformations for a given set of paired nucleotides in a sequence alignment. This program (ISOPAIR) assumes that the base pair conformation is maintained through sequence variation without significantly affecting the path of the sugar-phosphate backbone. ISOPAIR identifies such 'isomorphic' structures for any set of input base pair or base triple sequences. The program was applied to base pairs and triples with known structures and sequence exchanges. In several instances, isomorphic structures were correctly identified with ISOPAIR. Thus, ISOPAIR is useful when assessing non-canonical base pair conformations in comparative analysis. ISOPAIR applications are limited to those cases where unusual base pair exchanges indeed reflect a non-canonical conformation.  相似文献   

15.
MOTIVATION: Base pairing probability matrices have been frequently used for the analyses of structural RNA sequences. Recently, there has been a growing need for computing these probabilities for long DNA sequences by constraining the maximal span of base pairs to a limited value. However, none of the existing programs can exactly compute the base pairing probabilities associated with the energy model of secondary structures under such a constraint. RESULTS: We present an algorithm that exactly computes the base pairing probabilities associated with the energy model under the constraint on the maximal span W of base pairs. The complexity of our algorithm is given by O(NW2) in time and O(N+W2) in memory, where N is the sequence length. We show that our algorithm has a higher sensitivity to the true base pairs as compared to that of RNAplfold. We also present an algorithm that predicts a mutually consistent set of local secondary structures by maximizing the expected accuracy function. The comparison of the local secondary structure predictions with those of RNALfold indicates that our algorithm is more accurate. Our algorithms are implemented in the software named 'Rfold.' AVAILABILITY: The C++ source code of the Rfold software and the test dataset used in this study are available at http://www.ncrna.org/software/Rfold/.  相似文献   

16.
It is generally accepted that many different protein sequences have similar folded structures, and that there is a relatively high probability that a new sequence possesses a previously observed fold. An indirect consequence of this is that protein design should define the sequence space accessible to a given structure, rather than providing a single optimized sequence. We have recently developed a new approach for protein sequence design, which optimizes the complete sequence of a protein based on the knowledge of its backbone structure, its amino acid composition and a physical energy function including van der Waals interactions, electrostatics, and environment free energy. The specificity of the designed sequence for its template backbone is imposed by keeping the amino acid composition fixed. Here, we show that our procedure converges in sequence space, albeit not to the native sequence of the protein. We observe that while polar residues are well conserved in our designed sequences, non-polar amino acids at the surface of a protein are often replaced by polar residues. The designed sequences provide a multiple alignment of sequences that all adopt the same three-dimensional fold. This alignment is used to derive a profile matrix for chicken triose phosphate isomerase, TIM. The matrix is found to recognize significantly the native sequence for TIM, as well as closely related sequences. Possible application of this approach to protein fold recognition is discussed.  相似文献   

17.
Computer model building with a dynamic energy minimization procedure is used here to study the interaction of a pentapeptide sequence from the lac repressor headpiece (lac 53-57) with different base sequences of DNA. The peptide fragment for this purpose was considered in the classical beta-antiparallel as well as the beta-associated conformation. The model of its interaction with DNA was optimised for various binding positions and base sequences. Partitioning of energy is analysed for different dielectric constant values and the main contributing factors to sequence-specific binding are discussed.  相似文献   

18.
Topoisomerase IB (Top1) inhibitors, such as camptothecin (CPT), stabilize the Top1-DNA cleavage complex in a DNA sequence-dependent manner. The sequence selectivity of Top1 inhibitors is important for targeting specific genomic sequences of therapeutic value. However, the molecular mechanisms underlying this selectivity remain largely unknown. We performed molecular dynamics simulations to delineate structural, dynamic and energetic features that contribute to the differential sequence selectivity of the Top1 inhibitors. We found the sequence selectivity of CPT to be highly correlated with the drug binding energies, dynamic and structural properties of the linker domain. Chemical insights, gained by per-residue binding energy analysis revealed that the non-polar interaction between CPT and nucleotide at the +1 position of the cleavage site was the major (favorable) contributor to the total binding energy. Mechanistic insights gained by a potential of mean force analysis implicated that the drug dissociation step was associated with the sequence selectivity. Pharmaceutical insights gained by our molecular dynamics analyses explained why LMP-776, an indenoisoquinoline derivative under clinical development at the National Institutes of Health, displays different sequence selectivity when compared with camptothecin and its clinical derivatives.  相似文献   

19.
Finding structural similarities between proteins often helps reveal shared functionality, which otherwise might not be detected by native sequence information alone. Such similarity is usually detected and quantified by protein structure alignment. Determining the optimal alignment between two protein structures, however, remains a hard problem. An alternative approach is to approximate each three-dimensional protein structure using a sequence of motifs derived from a structural alphabet. Using this approach, structure comparison is performed by comparing the corresponding motif sequences or structural sequences. In this article, we measure the performance of such alphabets in the context of the protein structure classification problem. We consider both local and global structural sequences. Each letter of a local structural sequence corresponds to the best matching fragment to the corresponding local segment of the protein structure. The global structural sequence is designed to generate the best possible complete chain that matches the full protein structure. We use an alphabet of 20 letters, corresponding to a library of 20 motifs or protein fragments having four residues. We show that the global structural sequences approximate well the native structures of proteins, with an average coordinate root mean square of 0.69 Å over 2225 test proteins. The approximation is best for all α-proteins, while relatively poorer for all β-proteins. We then test the performance of four different sequence representations of proteins (their native sequence, the sequence of their secondary-structure elements, and the local and global structural sequences based on our fragment library) with different classifiers in their ability to classify proteins that belong to five distinct folds of CATH. Without surprise, the primary sequence alone performs poorly as a structure classifier. We show that addition of either secondary-structure information or local information from the structural sequence considerably improves the classification accuracy. The two fragment-based sequences perform better than the secondary-structure sequence but not well enough at this stage to be a viable alternative to more computationally intensive methods based on protein structure alignment.  相似文献   

20.
Nanda V  DeGrado WF 《Proteins》2005,59(3):454-466
In the absence of experimental structural determination, numerous methods are available to indirectly predict or probe the structure of a target molecule. Genetic modification of a protein sequence is a powerful tool for identifying key residues involved in binding reactions or protein stability. Mutagenesis data is usually incorporated into the modeling process either through manual inspection of model compatibility with empirical data, or through the generation of geometric constraints linking sensitive residues to a binding interface. We present an approach derived from statistical studies of lattice models for introducing mutation information directly into the fitness score. The approach takes into account the phenotype of mutation (neutral or disruptive) and calculates the energy for a given structure over an ensemble of sequences. The structure prediction procedure searches for the optimal conformation where neutral sequences either have no impact or improve stability and disruptive sequences reduce stability relative to wild type. We examine three types of sequence ensembles: information from saturation mutagenesis, scanning mutagenesis, and homologous proteins. Incorporating multiple sequences into a statistical ensemble serves to energetically separate the native state and misfolded structures. As a result, the prediction of structure with a poor force field is sufficiently enhanced by mutational information to improve accuracy. Furthermore, by separating misfolded conformations from the target score, the ensemble energy serves to speed up conformational search algorithms such as Monte Carlo-based methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号