首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Modeling RNA pseudoknotted structures remains challenging. Methods have previously been developed to model RNA stem-loops successfully using stochastic context-free grammars (SCFG) adapted from computational linguistics; however, the additional complexity of pseudoknots has made modeling them more difficult. Formally a context-sensitive grammar is required, which would impose a large increase in complexity. RESULTS: We introduce a new grammar modeling approach for RNA pseudoknotted structures based on parallel communicating grammar systems (PCGS). Our new approach can specify pseudoknotted structures, while avoiding context-sensitive rules, using a single CFG synchronized with a number of regular grammars. Technically, the stochastic version of the grammar model can be as simple as an SCFG. As with SCFG, the new approach permits automatic generation of a single-RNA structure prediction algorithm for each specified pseudoknotted structure model. This approach also makes it possible to develop full probabilistic models of pseudoknotted structures to allow the prediction of consensus structures by comparative analysis and structural homology recognition in database searches.  相似文献   

2.
Accurate prediction of RNA pseudoknotted secondary structures from the base sequence is a challenging computational problem. Since prediction algorithms rely on thermodynamic energy models to identify low-energy structures, prediction accuracy relies in large part on the quality of free energy change parameters. In this work, we use our earlier constraint generation and Boltzmann likelihood parameter estimation methods to obtain new energy parameters for two energy models for secondary structures with pseudoknots, namely, the Dirks–Pierce (DP) and the Cao–Chen (CC) models. To train our parameters, and also to test their accuracy, we create a large data set of both pseudoknotted and pseudoknot-free secondary structures. In addition to structural data our training data set also includes thermodynamic data, for which experimentally determined free energy changes are available for sequences and their reference structures. When incorporated into the HotKnots prediction algorithm, our new parameters result in significantly improved secondary structure prediction on our test data set. Specifically, the prediction accuracy when using our new parameters improves from 68% to 79% for the DP model, and from 70% to 77% for the CC model.  相似文献   

3.
Accurate free energy estimation is essential for RNA structure prediction. The widely used Turner''s energy model works well for nested structures. For pseudoknotted RNAs, however, there is no effective rule for estimation of loop entropy and free energy. In this work we present a new free energy estimation method, termed the pseudoknot predictor in three-dimensional space (pk3D), which goes beyond Turner''s model. Our approach treats nested and pseudoknotted structures alike in one unifying physical framework, regardless of how complex the RNA structures are. We first test the ability of pk3D in selecting native structures from a large number of decoys for a set of 43 pseudoknotted RNA molecules, with lengths ranging from 23 to 113. We find that pk3D performs slightly better than the Dirks and Pierce extension of Turner''s rule. We then test pk3D for blind secondary structure prediction, and find that pk3D gives the best sensitivity and comparable positive predictive value (related to specificity) in predicting pseudoknotted RNA secondary structures, when compared with other methods. A unique strength of pk3D is that it also generates spatial arrangement of structural elements of the RNA molecule. Comparison of three-dimensional structures predicted by pk3D with the native structure measured by nuclear magnetic resonance or X-ray experiments shows that the predicted spatial arrangement of stems and loops is often similar to that found in the native structure. These close-to-native structures can be used as starting points for further refinement to derive accurate three-dimensional structures of RNA molecules, including those with pseudoknots.  相似文献   

4.
The core protein of hepatitis c virus (HCV) is a structural protein with potent RNA chaperoning activities mediated by its hydrophilic N-terminal domain D1, which is thought to play a key role in HCV replication. To further characterize the core chaperoning properties, we studied the interactions between core D1 and the conserved HCV 3'X genomic region required for genome replication. To this end, we monitored the real-time annealing kinetics of native and mutated fluorescently labelled 16-nt palindromic sequence (DLS) and 27-nt Stem Loop II (SL2) from X with their respective complementary sequences. Core D1 and peptides consisting of the core basic domains were found to promote both annealing reactions and partly switch the loop-loop interaction pathway, which predominates in the absence of peptide, towards a pathway involving the stem termini. The chaperone properties of the core D1 peptides were found to be mediated through interaction of their basic clusters with the oligonucleotide phosphate groups, in line with the absence of high affinity site for core on HCV genomic RNA. The core ability to facilitate the interconversion between different RNA structures may explain how this protein regulates RNA structural transitions during HCV replication.  相似文献   

5.
Structural genomics projects require strategies for rapidly recognizing protein sequences appropriate for routine structure determination. For large proteins, this strategy includes the dissection of proteins into structural domains that form stable native structures. However, protein dissection essentially remains an empirical and often a tedious process. Here, we describe a simple strategy for rapidly identifying structural domains and assessing their structures. This approach combines the computational prediction of sequence regions corresponding to putative domains with an experimental assessment of their structures and stabilities by NMR and biochemical methods. We tested this approach with nine putative domains predicted from a set of 108 Thermus thermophilus HB8 sequences using PASS, a domain prediction program we previously reported. To facilitate the experimental assessment of the domain structures, we developed a generic 6-hour His-tag-based purification protocol, which enables the sample quality evaluation of a putative structural domain in a single day. As a result, we observed that half of the predicted structural domains were indeed natively folded, as judged by their HSQC spectra. Furthermore, two of the natively folded domains were novel, without related sequences classified in the Pfam and SMART databases, which is a significant result with regard to the ability of structural genomics projects to uniformly cover the protein fold space.  相似文献   

6.
7.
Alternative RNA splicing in multicellular organisms is regulated by a large group of proteins of mainly unknown origin. To predict the functions of these proteins, classification of their domains at the sequence and structural level is necessary. We have focused on four groups of splicing regulators, the heterogeneous nuclear ribonucleoprotein (hnRNP), serine?Carginine (SR), embryonic lethal, abnormal vision (ELAV)-like, and CUG-BP and ETR-like factor (CELF) proteins, that show increasing diversity among metazoa. Sequence and phylogenetic analyses were used to obtain a broader understanding of their evolutionary relationships. Surprisingly, when we characterised sequence similarities across full-length sequences and conserved domains of ten metazoan species, we found some hnRNPs were more closely related to SR, ELAV-like and CELF proteins than to other hnRNPs. Phylogenetic analyses and the distribution of the RRM domains suggest that these proteins diversified before the last common ancestor of the metazoans studied here through domain acquisition and duplication to create genes of mixed evolutionary origin. We propose that these proteins were derived independently rather than through the expansion of a single protein family. Our results highlight inconsistencies in the current classification system for these regulators, which does not adequately reflect their evolutionary relationships, and suggests that a domain-based classification scheme may have more utility.  相似文献   

8.
Recently, several domain-based computational models for predicting protein-protein interactions (PPIs) have been proposed. The conventional methods usually infer domain or domain combination (DC) interactions from already known interacting sets of proteins, and then predict PPIs using the information. However, the majority of these models often have limitations in providing detailed information on which domain pair (single domain interaction) or DC pair (multidomain interaction) will actually interact for the predicted protein interaction. Therefore, a more comprehensive and concrete computational model for the prediction of PPIs is needed. We developed a computational model to predict PPIs using the information of intraprotein domain cohesion and interprotein DC coupling interaction. A method of identifying the primary interacting DC pair was also incorporated into the model in order to infer actual participants in a predicted interaction. Our method made an apparent improvement in the PPI prediction accuracy, and the primary interacting DC pair identification was valid specifically in predicting multidomain protein interactions. In this paper, we demonstrate that 1) the intraprotein domain cohesion is meaningful in improving the accuracy of domain-based PPI prediction, 2) a prediction model incorporating the intradomain cohesion enables us to identify the primary interacting DC pair, and 3) a hybrid approach using the intra/interdomain interaction information can lead to a more accurate prediction.  相似文献   

9.
Accurate prediction of pseudoknotted nucleic acid secondary structure is an important computational challenge. Prediction algorithms based on dynamic programming aim to find a structure with minimum free energy according to some thermodynamic ("sum of loop energies") model that is implicit in the recurrences of the algorithm. However, a clear definition of what exactly are the loops in pseudoknotted structures, and their associated energies, has been lacking. In this work, we present a complete classification of loops in pseudoknotted nucleic secondary structures, and describe the Rivas and Eddy and other energy models as sum-of-loops energy models. We give a linear time algorithm for parsing a pseudoknotted secondary structure into its component loops. We give two applications of our parsing algorithm. The first is a linear time algorithm to calculate the free energy of a pseudoknotted secondary structure. This is useful for heuristic prediction algorithms, which are widely used since (pseudoknotted) RNA secondary structure prediction is NP-hard. The second application is a linear time algorithm to test the generality of the dynamic programming algorithm of Akutsu for secondary structure prediction.Together with previous work, we use this algorithm to compare the generality of state-of-the-art algorithms on real biological structures.  相似文献   

10.
MOTIVATION: Protein interactions are of biological interest because they orchestrate a number of cellular processes such as metabolic pathways and immunological recognition. Domains are the building blocks of proteins; therefore, proteins are assumed to interact as a result of their interacting domains. Many domain-based models for protein interaction prediction have been developed, and preliminary results have demonstrated their feasibility. Most of the existing domain-based methods, however, consider only single-domain pairs (one domain from one protein) and assume independence between domain-domain interactions. RESULTS: In this paper, we introduce a domain-based random forest of decision trees to infer protein interactions. Our proposed method is capable of exploring all possible domain interactions and making predictions based on all the protein domains. Experimental results on Saccharomyces cerevisiae dataset demonstrate that our approach can predict protein-protein interactions with higher sensitivity (79.78%) and specificity (64.38%) compared with that of the maximum likelihood approach. Furthermore, our model can be used to infer interactions not only for single-domain pairs but also for multiple domain pairs.  相似文献   

11.
Sumedha  Martin OC  Wagner A 《Bio Systems》2007,90(2):475-485
RNA secondary structure is an important computational model to understand how genetic variation maps into phenotypic (structural) variation. Evolutionary innovation in RNA structures is facilitated by neutral networks, large connected sets of RNA sequences that fold into the same structure. Our work extends and deepens previous studies on neutral networks. First, we show that even the 1-mutant neighborhood of a given sequence (genotype) G0 with structure (phenotype) P contains many structural variants that are not close to P. This holds for biological and generic RNA sequences alike. Second, we analyze the relation between new structures in the 1-neighborhoods of genotypes Gk that are only a moderate Hamming distance k away from G0, and the structure of G0 itself, both for biological and for generic RNA structures. Third, we analyze the relation between mutational robustness of a sequence and the distances of structural variants near this sequence. Our findings underscore the role of neutral networks in evolutionary innovation, and the role that high robustness can play in diminishing the potential for such innovation.  相似文献   

12.
PKR, an interferon-induced double-stranded RNA activated serine-threonine kinase, is a component of signal transduction pathways mediating cell growth control and responses to stress and viral infection. Analysis of separate PKR functional domains by NMR and X-ray crystallography has revealed details of PKR RNA binding domains and kinase domain, respectively. Here, we report the structural characteristics, calculated from biochemical and neutron scattering data, of a native PKR fraction with a high level of autophosphorylation and constitutive kinase activity. The experiments reveal association of the protein monomer into dimers and tetramers, in the absence of double-stranded RNA or other activators. Low-resolution structures of the association states were obtained from the large angle neutron scattering data and reveal the relative orientation of all protein domains in the activated kinase dimer. Low-resolution structures were also obtained for a PKR tetramer-monoclonal antibody complex. Taken together, this information leads to a new model for the structure of the functioning unit of the enzyme, highlights the flexibility of PKR and sheds light on the mechanism of PKR activation. The results of this study emphasize the usefulness of low-resolution structural studies in solution on large flexible multiple domain proteins.  相似文献   

13.
Phospholipases C (PLCs) reversibly associate with membranes to hydrolyze phosphatidylinositol-4, 5-bisphosphate (PI[4,5]P(2)) and comprise four main classes: beta, gamma, delta, and epsilon. Most eukaryotic PLCs contain a single, N-terminal pleckstrin homology (PH) domain, which is thought to play an important role in membrane targeting. The structure of a single PLC PH domain, that from PLCdelta1, has been determined; this PH domain binds PI(4,5)P(2) with high affinity and stereospecificity and has served as a paradigm for PH domain functionality. However, experimental studies demonstrate that PH domains from different PLC classes exhibit diverse modes of membrane interaction, reflecting the dissimilarity in their amino acid sequences. To elucidate the structural basis for their differential membrane-binding specificities, we modeled the three-dimensional structures of all mammalian PLC PH domains by using bioinformatic tools and calculated their biophysical properties by using continuum electrostatic approaches. Our computational analysis accounts for a large body of experimental data, provides predictions for those PH domains with unknown functions, and indicates functional roles for regions other than the canonical lipid-binding site identified in the PLCdelta1-PH structure. In particular, our calculations predict that (1). members from each of the four PLC classes exhibit strikingly different electrostatic profiles than those ordinarily observed for PH domains in general, (2). nonspecific electrostatic interactions contribute to the membrane localization of PLCdelta-, PLCgamma-, and PLCbeta-PH domains, and (3). phosphorylation regulates the interaction of PLCbeta-PH with its effectors through electrostatic repulsion. Our molecular models for PH domains from all of the PLC classes clearly demonstrate how a common structural fold can serve as a scaffold for a wide range of surface features and biophysical properties that support distinctive functional roles.  相似文献   

14.
Poly(C)-binding proteins (PCBPs) are KH (hnRNP K homology) domain-containing proteins that recognize poly(C) DNA and RNA sequences in mammalian cells. Binding poly(C) sequences via the KH domains is critical for PCBP functions. To reveal the mechanisms of KH domain-D/RNA recognition and its functional importance, we have determined the crystal structures of PCBP2 KH1 domain in complex with a 12-nucleotide DNA corresponding to two repeats of the human C-rich strand telomeric DNA and its RNA equivalent. The crystal structures reveal molecular details for not only KH1-DNA/RNA interaction but also protein-protein interaction between two KH1 domains. NMR studies on a protein construct containing two KH domains (KH1 + KH2) of PCBP2 indicate that KH1 interacts with KH2 in a way similar to the KH1-KH1 interaction. The crystal structures and NMR data suggest possible ways by which binding certain nucleic acid targets containing tandem poly(C) motifs may induce structural rearrangement of the KH domains in PCBPs; such structural rearrangement may be crucial for some PCBP functions.  相似文献   

15.
16.
Algorithms for prediction of RNA secondary structure-the set of base pairs that form when an RNA molecule folds-are valuable to biologists who aim to understand RNA structure and function. Improving the accuracy and efficiency of prediction methods is an ongoing challenge, particularly for pseudoknotted secondary structures, in which base pairs overlap. This challenge is biologically important, since pseudoknotted structures play essential roles in functions of many RNA molecules, such as splicing and ribosomal frameshifting. State-of-the-art methods, which are based on free energy minimization, have high run-time complexity (typically Theta(n(5)) or worse), and can handle (minimize over) only limited types of pseudoknotted structures. We propose a new approach for prediction of pseudoknotted structures, motivated by the hypothesis that RNA structures fold hierarchically, with pseudoknot-free (non-overlapping) base pairs forming first, and pseudoknots forming later so as to minimize energy relative to the folded pseudoknot-free structure. Our HFold algorithm uses two-phase energy minimization to predict hierarchically formed secondary structures in O(n(3)) time, matching the complexity of the best algorithms for pseudoknot-free secondary structure prediction via energy minimization. Our algorithm can handle a wide range of biological structures, including kissing hairpins and nested kissing hairpins, which have previously required Theta(n(6)) time.  相似文献   

17.
18.
The sequence and structural analysis of cadherins allow us to find sequence determinants-a few positions in sequences whose residues are characteristic and specific for the structures of a given family. Comparison of the five extracellular domains of classic cadherins showed that they share the same sequence determinants despite only a nonsignificant sequence similarity between the N-terminal domain and other extracellular domains. This allowed us to predict secondary structures and propose three-dimensional structures for these domains that have not been structurally analyzed previously. A new method of assigning a sequence to its proper protein family is suggested: analysis of sequence determinants. The main advantage of this method is that it is not necessary to know all or almost all residues in a sequence as required for other traditional classification tools such as BLAST, FASTA, and HMM. Using the key positions only, that is, residues that serve as the sequence determinants, we found that all members of the classic cadherin family were unequivocally selected from among 80,000 examined proteins. In addition, we proposed a model for the secondary structure of the cytoplasmic domain of cadherins based on the principal relations between sequences and secondary structure multialignments. The patterns of the secondary structure of this domain can serve as the distinguishing characteristics of cadherins.  相似文献   

19.
20.
The algorithm and the program for the prediction of RNA secondary structure with pseudoknot formation have been proposed. The algorithm simulates stepwise folding by generating random structures using Monte Carlo method, followed by the selection of helices to final structure on the basis of both their probabilities of occurrence in a random structure and free energy parameters. The program versions have been tested on ribosomal RNA structures and on RNAs with pseudoknots evidenced by experimental data. It is shown that the simulation of folding during RNA synthesis improves the results. The introduction of pseudoknot formation permits to predict the pseudoknotted structures and to improve the prediction of long-range interactions. The computer program is rather fast and allows to predict the structures for long RNAs without using large memory volumes in usual personal computer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号