首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Background  

Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered.  相似文献   

2.
The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.  相似文献   

3.
A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http://rna.urmc.rochester.edu.  相似文献   

4.
We have developed a phylogeny-aware progressive alignment method that recognizes insertions and deletions as distinct evolutionary events and thus avoids systematic errors created by traditional alignment methods. We now extend this method to simultaneously model regional heterogeneity and evolution. This novel method can be flexibly adapted to alignment of nucleotide or amino acid sequences evolving under processes that vary over genomic regions and, being fully probabilistic, provides an estimate of regional heterogeneity of the evolutionary process along the alignment and a measure of local reliability of the solution. Furthermore, the evolutionary modelling of substitution process permits adjusting the sensitivity and specificity of the alignment and, if high specificity is aimed at, leaving sequences unaligned when their divergence is beyond a meaningful detection of homology.  相似文献   

5.
Human observers can perceive the three- dimensional (3-D) structure of their environment using various cues, an important one of which is optic flow. The motion of any point’s projection on the retina depends both on the point’s movement in space and on its distance from the eye. Therefore, retinal motion can be used to extract the 3-D structure of the environment and the shape of objects, in a process known as structure-from-motion (SFM). However, because many combinations of 3-D structure and motion can lead to the same optic flow, SFM is an ill-posed inverse problem. The rigidity hypothesis is a constraint supposed to formally solve the SFM problem and to account for human performance. Recently, however, a number of psychophysical results, with both moving and stationary human observers, have shown that the rigidity hypothesis alone cannot account for human performance in SFM tasks, but no model is known to account for the new results. Here, we construct a Bayesian model of SFM based mainly on one new hypothesis, that of stationarity, coupled with the rigidity hypothesis. The predictions of the model, calculated using a new and powerful methodology called Bayesian programming, account for a wide variety of experimental findings.  相似文献   

6.
The main aim of this paper is to present a simple probabilistic model for the early stage of neuron growth: the specification on an axon out of several initially similar neurites. The model is a Markov process with competition between the growing neurites, wherein longer objects have more chances to grow, and parameter alpha determines the intensity of the competition. For alpha > 1 the model provides results which are qualitatively similar to the experimental ones, i.e. selection of one rapidly elongating axon out of several neurites while other less successful neurites stop growing at some random time. Rigorous mathematical proofs are given.  相似文献   

7.
RNA secondary structure and compensatory evolution   总被引:6,自引:0,他引:6  
The classic concept of epistatic fitness interactions between genes has been extended to study interactions within gene regions, especially between nucleotides that are important in maintaining pre-mRNA/mRNA secondary structures. It is shown that the majority of linkage disequilibria found within the Drosophila Adh gene are likely to be caused by epistatic selection operating on RNA secondary structures. A recently proposed method of RNA secondary structure prediction based on DNA sequence comparisons is reviewed and applied to several types of RNAs, including tRNA, rRNA, and mRNA. The patterns of covariation in these RNAs are analyzed based on Kimura's compensatory evolution model. The results suggest that this model describes the substitution process in the pairing regions (helices) of RNA secondary structures well when the helices are evolutionarily conserved and thermodynamically stable, but fails in some other cases. Epistatic selection maintaining pre-mRNA/mRNA secondary structures is compared to weak selective forces that determine features such as base composition and synonymous codon usage. The relationships among these forces and their relative strengths are addressed. Finally, our mutagenesis experiments using the Drosophila Adh locus are reviewed. These experiments analyze long-range compensatory interactions between the 5' and 3' ends of Adh mRNA, the different constraints on secondary structures in introns and exons, and the possible role of secondary structures in RNA splicing.  相似文献   

8.
9.
A probabilistic generative model for GO enrichment analysis   总被引:1,自引:0,他引:1  
The Gene Ontology (GO) is extensively used to analyze all types of high-throughput experiments. However, researchers still face several challenges when using GO and other functional annotation databases. One problem is the large number of multiple hypotheses that are being tested for each study. In addition, categories often overlap with both direct parents/descendents and other distant categories in the hierarchical structure. This makes it hard to determine if the identified significant categories represent different functional outcomes or rather a redundant view of the same biological processes. To overcome these problems we developed a generative probabilistic model which identifies a (small) subset of categories that, together, explain the selected gene set. Our model accommodates noise and errors in the selected gene set and GO. Using controlled GO data our method correctly recovered most of the selected categories, leading to dramatic improvements over current methods for GO analysis. When used with microarray expression data and ChIP-chip data from yeast and human our method was able to correctly identify both general and specific enriched categories which were overlooked by other methods.  相似文献   

10.
11.
We develop a probabilistic approach to optimum reserve design based on the species-area relationship. Specifically, we focus on the distribution of areas among a set of reserves maximizing biodiversity. We begin by presenting analytic solutions for the neutral case in which all species have the same colonization probability. The optimum size distribution is determined by the local-to-regional species richness ratio k. There is a critical k(t) ratio defined by the number of reserves raised to the scaling exponent of the species-area relationship. Below k(t), a uniform area distribution across reserves maximizes biodiversity. Beyond k(t), biodiversity is maximized by allocating a certain area to one reserve and uniformly allocating the remaining area to the other reserves. We proceed by numerically exploring the robustness of our analytic results when departing from the neutral assumption of identical colonization probabilities across species.  相似文献   

12.
RNA viruses: genome structure and evolution   总被引:3,自引:0,他引:3  
The explosive pace of sequencing of RNA viruses is leading to rapid advances in our understanding of the evolution of these viruses and of the ways in which their genomes are organized and expressed. New insights are coming not only from genomic nucleotide sequence comparisons, but also from direct sequencing of transcribed mRNAs and of RNAs that serve as intermediates in replication.  相似文献   

13.

Background  

Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction.  相似文献   

14.
The evolution of mutualisms presents a puzzle. Why does selection favour cooperation among species rather than cheaters that accept benefits but provide nothing in return? Here we present a general model that predicts three key factors will be important in mutualism evolution: (i) high benefit to cost ratio, (ii) high within‐species relatedness and (iii) high between‐species fidelity. These factors operate by moderating three types of feedback benefit from mutualism: cooperator association, partner‐fidelity feedback and partner choice. In defining the relationship between these processes, our model also allows an assessment of their relative importance. Importantly, the model suggests that phenotypic feedbacks (partner‐fidelity feedback, partner choice) are a more important explanation for between‐species cooperation than the development of genetic correlations among species (cooperator association). We explain the relationship of our model to existing theories and discuss the empirical evidence for our predictions.  相似文献   

15.
Abstract: Many animals and plants show a correlation between the traits of the individuals in the mating pair, implying assortative mating. Given the ubiquity of assortative mating in nature, why and how it has evolved remain open questions. Here we attempt to answer these questions in those cases where the trait under assortment is the same in males and females. We consider the most favorable scenario for assortment to evolve, where the same trait is under assortment and viability selection. We find conditions for assortment to evolve using a multilocus formalism in a haploid population. Our results show how epistasis in fitness between the loci that control the focal trait is crucial for assortment to evolve. We then assume specific forms of assortment in haploids and diploids and study the limiting cases of selective and nonselective mating. We find that selection for increased assortment is weak and that where increased assortment is costly, it does not invade.  相似文献   

16.
We give a mathematical model of the evolution of enzymes, the molecular structure of which is like metalloporphyrins or chlorophylls. We show, for this model, that even a small amount of these enzymes at the first stage is sufficient to increase and dominate the majority in a cell (like phenomena of gene fixation). For this purpose we use Kimura's equation, which has been explored for the study of evolution of genetics and has been known as a neutral theory of molecular evolution. Our model is a non-linear, non-equilibrium and non-closed (open to the external world) model.  相似文献   

17.
Conformational sampling is one of the bottlenecks in fragment-based protein structure prediction approaches. They generally start with a coarse-grained optimization where mainchain atoms and centroids of side chains are considered, followed by a fine-grained optimization with an all-atom representation of proteins. It is during this coarse-grained phase that fragment-based methods sample intensely the conformational space. If the native-like region is sampled more, the accuracy of the final all-atom predictions may be improved accordingly. In this work we present EdaFold, a new method for fragment-based protein structure prediction based on an Estimation of Distribution Algorithm. Fragment-based approaches build protein models by assembling short fragments from known protein structures. Whereas the probability mass functions over the fragment libraries are uniform in the usual case, we propose an algorithm that learns from previously generated decoys and steers the search toward native-like regions. A comparison with Rosetta AbInitio protocol shows that EdaFold is able to generate models with lower energies and to enhance the percentage of near-native coarse-grained decoys on a benchmark of [Formula: see text] proteins. The best coarse-grained models produced by both methods were refined into all-atom models and used in molecular replacement. All atom decoys produced out of EdaFold's decoy set reach high enough accuracy to solve the crystallographic phase problem by molecular replacement for some test proteins. EdaFold showed a higher success rate in molecular replacement when compared to Rosetta. Our study suggests that improving low resolution coarse-grained decoys allows computational methods to avoid subsequent sampling issues during all-atom refinement and to produce better all-atom models. EdaFold can be downloaded from http://www.riken.jp/zhangiru/software/.  相似文献   

18.
19.
The glycosphingolipids have been found in many animal tissues, but the complexity of their molecular structure varies considerably among the different phyla. Relatively simple structures have been found in invertebrate species, while the most complex have been demonstrated in brain tissue of modern fishes and amphibians. The data on the phylogenetic distribution of the glycosphingolipids has been interpreted to indicate that a significant number of gene duplications, involving many different structural genes, may have occurred during a few specific periods of vertebrate evolution. The transition from invertebrate to jawless vertebrate, the divergence of rays and skates from true sharks, the advent of modern bony fishes and the transition from aquatic to terrestrial vertebrates, each could have veen accompained by duplications of genes involved in the synthesis and degradation of glycosphingolipids. The evolutionary study of such a multi-enzyme system may be one means to detect alterations in the genome as a whole. The apparent correspondence in time of these gene duplications involved in glycosphingolipid metabolism and periods of rapid vertebrate evolution which may have been accompanied by significant increases in the amount of cellular DNA suggests that such changes may have occurred via the mechanism of tetraploidization.  相似文献   

20.
Secondary structure model for 23S ribosomal RNA.   总被引:31,自引:32,他引:31       下载免费PDF全文
A secondary structure model for 23S ribosomal RNA has been constructed on the basis of comparative sequence data, including the complete sequences from E. coli. Bacillus stearothermophilis, human and mouse mitochondria and several partial sequences. The model has been tested extensively with single strand-specific chemical and enzymatic probes. Long range base-paired interactions organize the molecule into six major structural domains containing over 100 individual helices in all. Regions containing the sites of interaction with several ribosomal proteins and 5S RNA have been located. Segments of the 23S RNA structure corresponding to eucaryotic 5.8S and 25 RNA have been identified, and base paired interactions in the model suggest how they are attached to 28S RNA. Functionally important regions, including possible sites of contact with 30S ribosomal subunits, the peptidyl transferase center and locations of intervening sequences in various organisms are discussed. Models for molecular 'switching' of RNA molecules based on coaxial stacking of helices are presented, including a scheme for tRNA-23S RNA interaction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号