首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Non-canonical base pairs, mostly present in the RNA, often play a prominent role towards maintaining their structural diversity. Higher order structures like base triples are also important in defining and stabilizing the tertiary folded structure of RNA. We have developed a new program BPFIND to analyze different types of canonical and non-canonical base pairs and base triples involving at least two direct hydrogen bonds formed between polar atoms of the bases or sugar O2' only. We considered 104 possible types of base pairs, out of which examples of 87 base pair types are found to occur in the available RNA crystal structures. Analysis indicates that approximately 32.7% base pairs in the functional RNA structures are non-canonical, which include different types of GA and GU Wobble base pairs apart from a wide range of base pair possibilities. We further noticed that more than 10.4% of these base pairs are involved in triplet formation, most of which play important role in maintaining long-range tertiary contacts in the three-dimensional folded structure of RNA. Apart from detection, the program also gives a quantitative estimate of the conformational deformation of detected base pairs in comparison to an ideal planar base pair. This helps us to gain insight into the extent of their structural variations and thus assists in understanding their specific role towards structural and functional diversity.  相似文献   

2.
When protein sequences divergently evolve under functional constraints, some individual amino acid replacements that reverse the charge (e.g. Lys to Asp) may be compensated by a replacement at a second position that reverses the charge in the opposite direction (e.g. Glu to Arg). When these side-chains are near in space (proximal), such double replacements might be driven by natural selection, if either is selectively disadvantageous, but both together restore fully the ability of the protein to contribute to fitness (are together "neutral"). Accordingly, many have sought to identify pairs of positions in a protein sequence that suffer compensatory replacements, often as a way to identify positions near in space in the folded structure. A "charge compensatory signal" might manifest itself in two ways. First, proximal charge compensatory replacements may occur more frequently than predicted from the product of the probabilities of individual positions suffering charge reversing replacements independently. Conversely, charge compensatory pairs of changes may be observed to occur more frequently in proximal pairs of sites than in the average pair. Normally, charge compensatory covariation is detected by comparing the sequences of extant proteins at the "leaves" of phylogenetic trees. We show here that the charge compensatory signal is more evident when it is sought by examining individual branches in the tree between reconstructed ancestral sequences at nodes in the tree. Here, we find that the signal is especially strong when the positions pairs are in a single secondary structural unit (e.g. alpha helix or beta strand) that brings the side-chains suffering charge compensatory covariation near in space, and may be useful in secondary structure prediction. Also, "node-node" and "node-leaf" compensatory covariation may be useful to identify the better of two equally parsimonious trees, in a way that is independent of the mathematical formalism used to construct the tree itself. Further, compensatory covariation may provide a signal that indicates whether an episode of sequence evolution contains more or less divergence in functional behavior. Compensatory covariation analysis on reconstructed evolutionary trees may become a valuable tool to analyze genome sequences, and use these analyses to extract biomedically useful information from proteome databases.  相似文献   

3.
Comparative sequence analysis complements experimental methods for the determination of RNA three-dimensional structure. This approach is based on the concept that different sequences within the same gene family form similar higher-order structures. The large number of rRNA sequences with sufficient variation, along with improved covariation algorithms, are providing us with the opportunity to identify new base triples in 16S rRNA. The three-dimensional conformations for one of our strongest candidates involving U121 (C124:G237) and/or U121 (U125:A236) (Escherichia coli sequence and numbering) are analyzed here with different molecular modeling tools. Molecular modeling shows that U121 interacts with C124 in the U121 (C124:G237) base triple. This arrangement maintains isomorphic structures for the three most frequent sequence motifs (approximately 93% of known bacterial and archaeal sequences), is consistent with chemical reactivity of U121 in E. coli ribosomes, and is geometrically favorable. Further, the restricted set of observed canonical (GU, AU, GC) base-pair types at positions 124:237 and 125:236 is consistent with the fact that the canonical base-pair sets (for both base pairs) that are not observed in nature prevent the formation of the 121 (124:237) base triple. The analysis described here serves as a general scheme for the prediction of specific secondary and tertiary structure base pairing where there is a network of correlated base changes.  相似文献   

4.
Covariation between positions in a multiple sequence alignment may reflect structural, functional, and/or phylogenetic constraints and can be analyzed by a wide variety of methods. We explored several of these methods for their ability to identify covarying positions related to the divergence of a protein family at different hierarchical levels. Specifically, we compared seven methods on a model system composed of three nested sets of G‐protein‐coupled receptors (GPCRs) in which a divergence event occurred. The covariation methods analyzed were based on: χ2 test, mutual information, substitution matrices, and perturbation methods. We first analyzed the dependence of the covariation scores on residue conservation (measured by sequence entropy), and then we analyzed the networking structure of the top pairs. Two methods out of seven—OMES (Observed minus Expected Squared) and ELSC (Explicit Likelihood of Subset Covariation)—favored pairs with intermediate entropy and a networking structure with a central residue involved in several high‐scoring pairs. This networking structure was observed for the three sequence sets. In each case, the central residue corresponded to a residue known to be crucial for the evolution of the GPCR family and the subfamily specificity. These central residues can be viewed as evolutionary hubs, in relation with an epistasis‐based mechanism of functional divergence within a protein family. Proteins 2014; 82:2141–2156. © 2014 Wiley Periodicals, Inc.  相似文献   

5.
Most of the hairpin, internal and junction loops that appear single-stranded in standard RNA secondary structures form recurrent 3D motifs, where non-Watson–Crick base pairs play a central role. Non-Watson–Crick base pairs also play crucial roles in tertiary contacts in structured RNA molecules. We previously classified RNA base pairs geometrically so as to group together those base pairs that are structurally similar (isosteric) and therefore able to substitute for each other by mutation without disrupting the 3D structure. Here, we introduce a quantitative measure of base pair isostericity, the IsoDiscrepancy Index (IDI), to more accurately determine which base pair substitutions can potentially occur in conserved motifs. We extract and classify base pairs from a reduced-redundancy set of RNA 3D structures from the Protein Data Bank (PDB) and calculate centroids (exemplars) for each base combination and geometric base pair type (family). We use the exemplars and IDI values to update our online Basepair Catalog and the Isostericity Matrices (IM) for each base pair family. From the database of base pairs observed in 3D structures we derive base pair occurrence frequencies for each of the 12 geometric base pair families. In order to improve the statistics from the 3D structures, we also derive base pair occurrence frequencies from rRNA sequence alignments.  相似文献   

6.
During protein evolution, amino acids change due to a combination of functional constraints and genetic drift. Proteins frequently contain pairs of amino acids that appear to change together (covariation). Analysis of covariation from naturally occurring sets of orthologs cannot distinguish between residue pairs retained by functional requirements of the protein and those pairs existing due to changes along a common evolutionary path. Here, we have separated the two types of covariation by independently recombining every naturally occurring amino acid variant within a set of 15 subtilisin orthologs. Our analysis shows that in this family of subtilisin orthologs, almost all possible pairwise combinations of amino acids can coexist. This suggests that amino acid covariation found in the subtilisin orthologs is almost entirely due to common ancestral origin of the changes rather than functional constraints. We conclude that naturally occurring sequence diversity can be used to identify positions that can vary independently without destroying protein function.  相似文献   

7.
RNA secondary structure prediction using free energy minimization is one method to gain an approximation of structure. Constraints generated by enzymatic mapping or chemical modification can improve the accuracy of secondary structure prediction. We report a facile method that identifies single-stranded regions in RNA using short, randomized DNA oligonucleotides and RNase H cleavage. These regions are then used as constraints in secondary structure prediction. This method was used to improve the secondary structure prediction of Escherichia coli 5S rRNA. The lowest free energy structure without constraints has only 27% of the base pairs present in the phylogenetic structure. The addition of constraints from RNase H cleavage improves the prediction to 100% of base pairs. The same method was used to generate secondary structure constraints for yeast tRNAPhe, which is accurately predicted in the absence of constraints (95%). Although RNase H mapping does not improve secondary structure prediction, it does eliminate all other suboptimal structures predicted within 10% of the lowest free energy structure. The method is advantageous over other single-stranded nucleases since RNase H is functional in physiological conditions. Moreover, it can be used for any RNA to identify accessible binding sites for oligonucleotides or small molecules.  相似文献   

8.
GuhaThakurta D  Draper DE 《Biochemistry》1999,38(12):3633-3640
Comparative sequence analysis has successfully predicted secondary structure and tertiary interactions in ribosomal and other RNAs. Experiments presented here ask whether the scope of comparative sequence-based predictions can be extended to specific interactions between proteins and RNA, using as a system the well-characterized C-terminal RNA binding domain of ribosomal protein L11 (L11-C76) and its 58 nucleotide binding region in 23S rRNA. The surface of L11-C76 alpha-helix 3 is known to contact RNA; position 69 in this helix is conserved as serine in most organisms but varies to asparagine (all plastids) or glutamine (Mycoplasma). RNA sequence substitutions unique to these groups of organisms occur at base pairs 1062/1076 or 1058/1080, respectively. The possibility that rRNA base pair substitutions compensate for variants in L11 alpha-helix 3 has been tested by measuring binding affinities between sets of protein and RNA sequence variants. Stability of the RNA tertiary structure, as measured by UV melting experiments, was unexpectedly affected by a 1062/1076 base pair substitution; additional mutations were required to restore a stably folded structure to this RNA. The results show that the asparagine variant of L11-C76 residue 69 has been compensated by substitution of a 1062/1076 base pair, and plausibly suggest a direct contact between the amino acid and base pair. For some of the protein and RNA mutations studied, changes in binding affinity probably reflect longer-range adjustments of the protein-RNA contact surface.  相似文献   

9.
Comparative sequence analysis addresses the problem of RNA folding and RNA structural diversity, and is responsible for determining the folding of many RNA molecules, including 5S, 16S, and 23S rRNAs, tRNA, RNAse P RNA, and Group I and II introns. Initially this method was utilized to fold these sequences into their secondary structures. More recently, this method has revealed numerous tertiary correlations, elucidating novel RNA structural motifs, several of which have been experimentally tested and verified, substantiating the general application of this approach. As successful as the comparative methods have been in elucidating higher-order structure, it is clear that additional structure constraints remain to be found. Deciphering such constraints requires more sensitive and rigorous protocols, in addition to RNA sequence datasets that contain additional phylogenetic diversity and an overall increase in the number of sequences. Various RNA databases, including the tRNA and rRNA sequence datasets, continue to grow in number as well as diversity. Described herein is the development of more rigorous comparative analysis protocols. Our initial development and applications on different RNA datasets have been very encouraging. Such analyses on tRNA, 16S and 23S rRNA are substantiating previously proposed associations and are now beginning to reveal additional constraints on these molecules. A subset of these involve several positions that correlate simultaneously with one another, implying units larger than a basepair can be under a phylogenetic constraint.  相似文献   

10.
The success of comparative analysis in resolving RNA secondary structure and numerous tertiary interactions relies on the presence of base covariations. Although the majority of base covariations in aligned sequences is associated to Watson-Crick base pairs, many involve non-canonical or restricted base pair exchanges (e.g. only G:C/A:U), reflecting more specific structural constraints. We have developed a computer program that determines potential base pairing conformations for a given set of paired nucleotides in a sequence alignment. This program (ISOPAIR) assumes that the base pair conformation is maintained through sequence variation without significantly affecting the path of the sugar-phosphate backbone. ISOPAIR identifies such 'isomorphic' structures for any set of input base pair or base triple sequences. The program was applied to base pairs and triples with known structures and sequence exchanges. In several instances, isomorphic structures were correctly identified with ISOPAIR. Thus, ISOPAIR is useful when assessing non-canonical base pair conformations in comparative analysis. ISOPAIR applications are limited to those cases where unusual base pair exchanges indeed reflect a non-canonical conformation.  相似文献   

11.
12.
Tertiary interacting elements are important features of functional RNA molecules, for example, in all small nucleolytic ribozymes. The recent crystal structure of a tertiary stabilized type I hammerhead ribozyme revealed a conventional Watson-Crick base pair in the catalytic core, formed between nucleotides C3 and G8. We show that any Watson-Crick base pair between these positions retains cleavage competence in two type III ribozymes. In the Arabidopsis thaliana sequence, only moderate differences in cleavage rates are observed for the different base pairs, while the peach latent mosaic viroid (PLMVd) ribozyme exhibits a preference for a pyrimidine at position 3 and a purine at position 8. To understand these differences, we created a series of chimeric ribozymes in which we swapped sequence elements that surround the catalytic core. The kinetic characterization of the resulting ribozymes revealed that the tertiary interacting loop sequences of the PLMVd ribozyme are sufficient to induce the preference for Y3-R8 base pairs in the A. thaliana hammerhead ribozyme. In contrast to this, only when the entire stem-loops I and II of the A. thaliana sequences are grafted on the PLMVd ribozyme is any Watson-Crick base pair similarly tolerated. The data provide evidence for a complex interplay of secondary and tertiary structure elements that lead, mediated by long-range effects, to an individual modulation of the local structure in the catalytic core of different hammerhead ribozymes.  相似文献   

13.
The G x U wobble base pair is a fundamental unit of RNA secondary structure that is present in nearly every class of RNA from organisms of all three phylogenetic domains. It has comparable thermodynamic stability to Watson-Crick base pairs and is nearly isomorphic to them. Therefore, it often substitutes for G x C or A x U base pairs. The G x U wobble base pair also has unique chemical, structural, dynamic and ligand-binding properties, which can only be partially mimicked by Watson-Crick base pairs or other mispairs. These features mark sites containing G x U pairs for recognition by proteins and other RNAs and allow the wobble pair to play essential functional roles in a remarkably wide range of biological processes.  相似文献   

14.
A minimum cycle basis of the tertiary structure of a large ribosomal subunit (LSU) X-ray crystal structure was analyzed. Most cycles are small, as they are composed of 3- to 5 nt, and repeated across the LSU tertiary structure. We used hierarchical clustering to quantify and classify the 4 nt cycles. One class is defined by the GNRA tetraloop motif. The inspection of the GNRA class revealed peculiar instances in sequence. First is the presence of UA, CA, UC and CC base pairs that substitute the usual sheared GA base pair. Second is the revelation of GNR(X(n))A tetraloops, where X(n) is bulged out of the classical GNRA structure, and of GN/RA formed by the two strands of interior-loops. We were able to unambiguously characterize the cycle classes using base stacking and base pairing annotations. The cycles identified correspond to small and cyclic motifs that compose most of the LSU RNA tertiary structure and contribute to its thermodynamic stability. Consequently, the RNA minimum cycles could well be used as the basic elements of RNA tertiary structure prediction methods.  相似文献   

15.
The accurate prediction of the secondary and tertiary structure of an RNA with different folding algorithms is dependent on several factors, including the energy functions. However, an RNA higher-order structure cannot be predicted accurately from its sequence based on a limited set of energy parameters. The inter- and intramolecular forces between this RNA and other small molecules and macromolecules, in addition to other factors in the cell such as pH, ionic strength, and temperature, influence the complex dynamics associated with transition of a single stranded RNA to its secondary and tertiary structure. Since all of the factors that affect the formation of an RNAs 3D structure cannot be determined experimentally, statistically derived potential energy has been used in the prediction of protein structure. In the current work, we evaluate the statistical free energy of various secondary structure motifs, including base-pair stacks, hairpin loops, and internal loops, using their statistical frequency obtained from the comparative analysis of more than 50,000 RNA sequences stored in the RNA Comparative Analysis Database (rCAD) at the Comparative RNA Web (CRW) Site. Statistical energy was computed from the structural statistics for several datasets. While the statistical energy for a base-pair stack correlates with experimentally derived free energy values, suggesting a Boltzmann-like distribution, variation is observed between different molecules and their location on the phylogenetic tree of life. Our statistical energy values calculated for several structural elements were utilized in the Mfold RNA-folding algorithm. The combined statistical energy values for base-pair stacks, hairpins and internal loop flanks result in a significant improvement in the accuracy of secondary structure prediction; the hairpin flanks contribute the most.  相似文献   

16.

Background

The analysis of RNA sequences, once a small niche field for a small collection of scientists whose primary emphasis was the structure and function of a few RNA molecules, has grown most significantly with the realizations that 1) RNA is implicated in many more functions within the cell, and 2) the analysis of ribosomal RNA sequences is revealing more about the microbial ecology within all biological and environmental systems. The accurate and rapid alignment of these RNA sequences is essential to decipher the maximum amount of information from this data.

Methods

Two computer systems that utilize the Gutell lab's RNA Comparative Analysis Database (rCAD) were developed to align sequences to an existing template alignment available at the Gutell lab's Comparative RNA Web (CRW) Site. Multiple dimensions of cross-indexed information are contained within the relational database - rCAD, including sequence alignments, the NCBI phylogenetic tree, and comparative secondary structure information for each aligned sequence. The first program, CRWAlign-1 creates a phylogenetic-based sequence profile for each column in the alignment. The second program, CRWAlign-2 creates a profile based on phylogenetic, secondary structure, and sequence information. Both programs utilize their profiles to align new sequences into the template alignment.

Results

The accuracies of the two CRWAlign programs were compared with the best template-based rRNA alignment programs and the best de-novo alignment programs. We have compared our programs with a total of eight alternative alignment methods on different sets of 16S rRNA alignments with sequence percent identities ranging from 50% to 100%. Both CRWAlign programs were superior to these other programs in accuracy and speed.

Conclusions

Both CRWAlign programs can be used to align the very extensive amount of RNA sequencing that is generated due to the rapid next-generation sequencing technology. This latter technology is augmenting the new paradigm that RNA is intimately implicated in a significant number of functions within the cell. In addition, the use of bacterial 16S rRNA sequencing in the identification of the microbiome in many different environmental systems creates a need for rapid and highly accurate alignment of bacterial 16S rRNA sequences.
  相似文献   

17.
The natural bases of nucleic acids form a great variety of base pairs with at least two hydrogen bonds between them. They are classified in twelve main families, with the Watson–Crick family being one of them. In a given family, some of the base pairs are isosteric between them, meaning that the positions and the distances between the C1′ carbon atoms are very similar. The isostericity of Watson–Crick pairs between the complementary bases forms the basis of RNA helices and of the resulting RNA secondary structure. Several defined suites of non-Watson–Crick base pairs assemble into RNA modules that form recurrent, rather regular, building blocks of the tertiary architecture of folded RNAs. RNA modules are intrinsic to RNA architecture are therefore disconnected from a biological function specifically attached to a RNA sequence. RNA modules occur in all kingdoms of life and in structured RNAs with diverse functions. Because of chemical and geometrical constraints, isostericity between non-Watson–Crick pairs is restricted and this leads to higher sequence conservation in RNA modules with, consequently, greater difficulties in extracting 3D information from sequence analysis. Nucleic acid helices have to be recognised in several biological processes like replication or translational decoding. In polymerases and the ribosomal decoding site, the recognition occurs on the minor groove sides of the helical fragments. With the use of alternative conformations, protonated or tautomeric forms of the bases, some base pairs with Watson–Crick-like geometries can form and be stabilized. Several of these pairs with Watson–Crick-like geometries extend the concept of isostericity beyond the number of isosteric pairs formed between complementary bases. These observations set therefore limits and constraints to geometric selection in molecular recognition of complementary Watson–Crick pairs for fidelity in replication and translation processes.  相似文献   

18.
Analysis of the catalytic activity of identical mutations in the catalytic cores of nHH8, a very active "extended" hammerhead, and HH16, a less active "minimal" hammerhead, reveal that the tertiary Watson-Crick base pair between C3 and G8 seen in the recent structure of the Schistosoma mansoni extended hammerhead can be replaced by other base pairs in both backgrounds. This supports the model that both hammerheads utilize a similar catalytic mechanism but HH16 is slower because it infrequently samples the active conformation. The relative effect of different mutations at positions 3 and 8 also depends on the identity of residue 17 in both nHH8 and HH16. This synergistic effect can best be explained by transient pairing between residues 3 and 17 and 8 and 13, which stabilize an inactive conformation. Thus, mutants of nHH8 and possibly nHH8 itself are also in dynamic equilibrium with an inactive conformation that may resemble the X-ray structure of a minimal hammerhead. Therefore, both minimal and extended hammerhead structures must be considered to fully understand hammerhead catalysis.  相似文献   

19.
The trans Watson-Crick/Watson-Crick family of base pairs represent a geometric class that play important structural and possible functional roles in the ribosome, tRNA, and other functional RNA molecules. They nucleate base triplets and quartets, participate as loop closing terminal base pairs in hair pin motifs and are also responsible for several tertiary interactions that enable sequentially distant regions to interact with each other in RNA molecules. Eleven representative examples spanning nine systems belonging to this geometric family of RNA base pairs, having widely different occurrence statistics in the PDB database, were studied at the HF/6-31G (d, p) level using Morokuma decomposition, Atoms in Molecules as well as Natural Bond Orbital methods in the optimized gas phase geometries and in their crystal structure geometries, respectively. The BSSE and deformation energy corrected interaction energy values for the optimized geometries are compared with the corresponding values in the crystal geometries of the base pairs. For non protonated base pairs in their optimized geometry, these values ranged from -8.19 kcal/mol to -21.84 kcal/mol and compared favorably with those of canonical base pairs. The interaction energies of these base pairs, in their respective crystal geometries, were, however, lesser to varying extents and in one case, that of A:A W:W trans, it was actually found to be positive. The variation in RMSD between the two geometries was also large and ranged from 0.32-2.19 A. Our analysis shows that the hydrogen bonding characteristics and interaction energies obtained, correlated with the nature and type of hydrogen bonds between base pairs; but the occurrence frequencies, interaction energies, and geometric variabilities were conspicuous by the absence of any apparent correlation. Instead, the nature of local interaction energy hyperspace of different base pairs as inferred from the degree of their respective geometric variability could be correlated with the identities of free and bound hydrogen bond donor/acceptor groups present in interacting bases in conjunction with their tertiary and neighboring group interaction potentials in the global context. It also suggests that the concept of isostericity alone may not always determine covariation potentials for base pairs, particularly for those which may be important for RNA dynamics. These considerations are more important than the absolute values of the interaction energies in their respective optimized geometries in rationalizing their occurrences in functional RNAs. They highlight the importance of revising some of the existing DNA based structure analysis approaches and may have significant implications for RNA structure and dynamics, especially in the context of structure prediction algorithms.  相似文献   

20.
The origin of replication ( oriR ) involved in the initiation of (-) strand enterovirus RNA synthesis is a quasi-globular multi-domain RNA structure which is maintained by a tertiary kissing interaction. The kissing interaction is formed by base pairing of complementary sequences within the predominant hairpin-loop structures of the enteroviral 3' untranslated region. In this report, we have fully characterised the kissing interaction. Site-directed mutations which affected the different base pairs involved in the kissing interaction were generated in an infectious coxsackie B3 virus cDNA clone. The kissing interaction appeared to consist of 6 bp. Distortion of the interaction by mispairing of each of the base pairs involved in this higher order RNA structure resulted in either temperature sensitive or lethal phenotypes. The nucleotide constitution of the base which gaps the major groove of the kissing domain was not relevant for virus growth. The reciprocal exchange of the complete sequence involved in the kissing resulted in a mutant virus with wild type virus growth characteristics arguing that the base pair constitution is of less importance for the initiation of (-) strand RNA synthesis than the existence of the tertiary structure itself.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号