首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒


Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered.  相似文献   

The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.  相似文献   

Protein chemical shifts encode detailed structural information that is difficult and computationally costly to describe at a fundamental level. Statistical and machine learning approaches have been used to infer correlations between chemical shifts and secondary structure from experimental chemical shifts. These methods range from simple statistics such as the chemical shift index to complex methods using neural networks. Notwithstanding their higher accuracy, more complex approaches tend to obscure the relationship between secondary structure and chemical shift and often involve many parameters that need to be trained. We present hidden Markov models (HMMs) with Gaussian emission probabilities to model the dependence between protein chemical shifts and secondary structure. The continuous emission probabilities are modeled as conditional probabilities for a given amino acid and secondary structure type. Using these distributions as outputs of first‐ and second‐order HMMs, we achieve a prediction accuracy of 82.3%, which is competitive with existing methods for predicting secondary structure from protein chemical shifts. Incorporation of sequence‐based secondary structure prediction into our HMM improves the prediction accuracy to 84.0%. Our findings suggest that an HMM with correlated Gaussian distributions conditioned on the secondary structure provides an adequate generative model of chemical shifts. Proteins 2013; © 2012 Wiley Periodicals, Inc.  相似文献   

A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http://rna.urmc.rochester.edu.  相似文献   

We have developed a phylogeny-aware progressive alignment method that recognizes insertions and deletions as distinct evolutionary events and thus avoids systematic errors created by traditional alignment methods. We now extend this method to simultaneously model regional heterogeneity and evolution. This novel method can be flexibly adapted to alignment of nucleotide or amino acid sequences evolving under processes that vary over genomic regions and, being fully probabilistic, provides an estimate of regional heterogeneity of the evolutionary process along the alignment and a measure of local reliability of the solution. Furthermore, the evolutionary modelling of substitution process permits adjusting the sensitivity and specificity of the alignment and, if high specificity is aimed at, leaving sequences unaligned when their divergence is beyond a meaningful detection of homology.  相似文献   

Tracing the evolution of RNA structure in ribosomes   总被引:7,自引:0,他引:7       下载免费PDF全文
The elucidation of ribosomal structure has shown that the function of ribosomes is fundamentally confined to dynamic interactions established between the RNA components of the ribosomal ensemble. These findings now enable a detailed analysis of the evolution of ribosomal RNA (rRNA) structure. The origin and diversification of rRNA was studied here using phylogenetic tools directly at the structural level. A rooted universal tree was reconstructed from the combined secondary structures of large (LSU) and small (SSU) subunit rRNA using cladistic methods and considerations in statistical mechanics. The evolution of the complete repertoire of structural ribosomal characters was formally traced lineage-by-lineage in the tree, showing a tendency towards molecular simplification and a homogeneous reduction of ribosomal structural change with time. Character tracing revealed patterns of evolution in inter-subunit bridge contacts and tRNA-binding sites that were consistent with the proposed coupling of tRNA translocation and subunit movement. These patterns support the concerted evolution of tRNA-binding sites in the two subunits and the ancestral nature and common origin of certain structural ribosomal features, such as the peptidyl (P) site, the functional relay of the penultimate stem helix of SSU rRNA, and other structures participating in ribosomal dynamics. Overall results provide a rare insight into the evolution of ribosomal structure.  相似文献   

Human observers can perceive the three- dimensional (3-D) structure of their environment using various cues, an important one of which is optic flow. The motion of any point’s projection on the retina depends both on the point’s movement in space and on its distance from the eye. Therefore, retinal motion can be used to extract the 3-D structure of the environment and the shape of objects, in a process known as structure-from-motion (SFM). However, because many combinations of 3-D structure and motion can lead to the same optic flow, SFM is an ill-posed inverse problem. The rigidity hypothesis is a constraint supposed to formally solve the SFM problem and to account for human performance. Recently, however, a number of psychophysical results, with both moving and stationary human observers, have shown that the rigidity hypothesis alone cannot account for human performance in SFM tasks, but no model is known to account for the new results. Here, we construct a Bayesian model of SFM based mainly on one new hypothesis, that of stationarity, coupled with the rigidity hypothesis. The predictions of the model, calculated using a new and powerful methodology called Bayesian programming, account for a wide variety of experimental findings.  相似文献   

The main aim of this paper is to present a simple probabilistic model for the early stage of neuron growth: the specification on an axon out of several initially similar neurites. The model is a Markov process with competition between the growing neurites, wherein longer objects have more chances to grow, and parameter alpha determines the intensity of the competition. For alpha > 1 the model provides results which are qualitatively similar to the experimental ones, i.e. selection of one rapidly elongating axon out of several neurites while other less successful neurites stop growing at some random time. Rigorous mathematical proofs are given.  相似文献   

RNA secondary structure and compensatory evolution   总被引:6,自引:0,他引:6  
The classic concept of epistatic fitness interactions between genes has been extended to study interactions within gene regions, especially between nucleotides that are important in maintaining pre-mRNA/mRNA secondary structures. It is shown that the majority of linkage disequilibria found within the Drosophila Adh gene are likely to be caused by epistatic selection operating on RNA secondary structures. A recently proposed method of RNA secondary structure prediction based on DNA sequence comparisons is reviewed and applied to several types of RNAs, including tRNA, rRNA, and mRNA. The patterns of covariation in these RNAs are analyzed based on Kimura's compensatory evolution model. The results suggest that this model describes the substitution process in the pairing regions (helices) of RNA secondary structures well when the helices are evolutionarily conserved and thermodynamically stable, but fails in some other cases. Epistatic selection maintaining pre-mRNA/mRNA secondary structures is compared to weak selective forces that determine features such as base composition and synonymous codon usage. The relationships among these forces and their relative strengths are addressed. Finally, our mutagenesis experiments using the Drosophila Adh locus are reviewed. These experiments analyze long-range compensatory interactions between the 5' and 3' ends of Adh mRNA, the different constraints on secondary structures in introns and exons, and the possible role of secondary structures in RNA splicing.  相似文献   

RNA viruses: genome structure and evolution   总被引:3,自引:0,他引:3  
The explosive pace of sequencing of RNA viruses is leading to rapid advances in our understanding of the evolution of these viruses and of the ways in which their genomes are organized and expressed. New insights are coming not only from genomic nucleotide sequence comparisons, but also from direct sequencing of transcribed mRNAs and of RNAs that serve as intermediates in replication.  相似文献   

A probabilistic generative model for GO enrichment analysis   总被引:1,自引:0,他引:1  
The Gene Ontology (GO) is extensively used to analyze all types of high-throughput experiments. However, researchers still face several challenges when using GO and other functional annotation databases. One problem is the large number of multiple hypotheses that are being tested for each study. In addition, categories often overlap with both direct parents/descendents and other distant categories in the hierarchical structure. This makes it hard to determine if the identified significant categories represent different functional outcomes or rather a redundant view of the same biological processes. To overcome these problems we developed a generative probabilistic model which identifies a (small) subset of categories that, together, explain the selected gene set. Our model accommodates noise and errors in the selected gene set and GO. Using controlled GO data our method correctly recovered most of the selected categories, leading to dramatic improvements over current methods for GO analysis. When used with microarray expression data and ChIP-chip data from yeast and human our method was able to correctly identify both general and specific enriched categories which were overlooked by other methods.  相似文献   



Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction.  相似文献   

《Ecological Complexity》2005,2(3):312-321
Food webs are networks describing who is eating whom in an ecological community. By now it is clear that many aspects of food-web structure are reproducible across diverse habitats, yet little is known about the driving force behind this structure. Evolutionary and population dynamical mechanisms have been considered. We propose a model for the evolutionary dynamics of food-web topology and show that it accurately reproduces observed food-web characteristics in the steady state. It is based on the observation that most consumers are larger than their resource species and the hypothesis that speciation and extinction rates decrease with increasing body mass. Results give strong support to the evolutionary hypothesis.  相似文献   

We develop a probabilistic approach to optimum reserve design based on the species-area relationship. Specifically, we focus on the distribution of areas among a set of reserves maximizing biodiversity. We begin by presenting analytic solutions for the neutral case in which all species have the same colonization probability. The optimum size distribution is determined by the local-to-regional species richness ratio k. There is a critical k(t) ratio defined by the number of reserves raised to the scaling exponent of the species-area relationship. Below k(t), a uniform area distribution across reserves maximizes biodiversity. Beyond k(t), biodiversity is maximized by allocating a certain area to one reserve and uniformly allocating the remaining area to the other reserves. We proceed by numerically exploring the robustness of our analytic results when departing from the neutral assumption of identical colonization probabilities across species.  相似文献   

Cucumber mosaic virus, a model for RNA virus evolution   总被引:5,自引:0,他引:5  
Taxonomic relationships: Cucumber mosaic virus (CMV) is the type member of the Cucumovirus genus, in the family Bromoviridae . Additional members of the genus are Peanut stunt virus (PSV) and Tomato aspermy virus (TAV). The RNAs 3 of all members of the genus can be exchanged and still yield a viable virus, while the RNAs 1 and 2 can only be exchanged within a species.
Physical properties: The virus particles are about 29 nm in diameter, and are composed of 180 subunits (T = 3 icosahedral symmetry). The particles sediment with an s value of approximately 98. The virions contain 18% RNA, and are highly labile, relying on RNA–protein interactions for their integrity. The three genomic RNAs, designated RNA 1 (3.3 kb in length), RNA 2 (3.0 kb) and RNA 3 (2.2 kb) are packaged in individual particles; a subgenomic RNA, RNA 4 (1.0 kb), is packaged with the genomic RNA 3, making all the particles roughly equivalent in composition. In some strains an additional subgenomic RNA, RNA 4A is also encapsidated at low levels. The genomic RNAs are single stranded, plus sense RNAs with 5' cap structures, and 3' conserved regions that can be folded into tRNA-like structures.
Satellite RNAs: CMV can harbour molecular parasites known as satellite RNAs (satRNAs) that can dramatically alter the symptom phenotype induced by the virus. The CMV satRNAs do not encode any proteins but rely on the RNA for their biological activity.
Hosts: CMV infects over 1000 species of hosts, including members of 85 plant families, making it the broadest host range virus known. The virus is transmitted from host to host by aphid vectors, in a nonpersistent manner.
Useful web sites: http://mmtsb.scripps.edu/viper/1f15.html (structure); http://www.ncbi.nlm.nih.gov/ICTVdb/ICTVdB/10040001.htm (general information)  相似文献   

Probabilistic approaches for sequence alignment are usually based on pair Hidden Markov Models (HMMs) or Stochastic Context Free Grammars (SCFGs). Recent studies have shown a significant correlation between the content of short indels and their flanking regions, which by definition cannot be modelled by the above two approaches. In this work, we present a context-sensitive indel model based on a pair Tree-Adjoining Grammar (TAG), along with accompanying algorithms for efficient alignment and parameter estimation. The increased precision and statistical power of this model is shown on simulated and real genomic data. As the cost of sequencing plummets, the usefulness of comparative analysis is becoming limited by alignment accuracy rather than data availability. Our results will therefore have an impact on any type of downstream comparative genomics analyses that rely on alignments. Fine-grained studies of small functional regions or disease markers, for example, could be significantly improved by our method. The implementation is available at www.mcb.mcgill.ca/~blanchem/software.html.  相似文献   

Biological macromolecules often undergo large conformational rearrangements during a functional cycle. To simulate these structural transitions with full atomic detail typically demands extensive computational resources. Moreover, it is unclear how to incorporate, in a principled way, additional experimental information that could guide the structural transition. This article develops a probabilistic model for conformational transitions in biomolecules. The model can be viewed as a network of anharmonic springs that break, if the experimental data support the rupture of bonds. Hamiltonian Monte Carlo in internal coordinates is used to infer structural transitions from experimental data, thereby sampling large conformational transitions without distorting the structure. The model is benchmarked on a large set of conformational transitions. Moreover, we demonstrate the use of the probabilistic network model for integrative modeling of macromolecular complexes based on data from crosslinking followed by mass spectrometry.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号