期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis 总被引：22，自引：14，他引：8

Galtier N; Gouy M 《Molecular biology and evolution》1998,15(7):871-879

A nonhomogeneous, nonstationary stochastic model of DNA sequence evolution allowing varying equilibrium G + C contents among lineages is devised in order to deal with sequences of unequal base compositions. A maximum-likelihood implementation of this model for phylogenetic analyses allows handling of a reasonable number of sequences. The relevance of the model and the accuracy of parameter estimates are theoretically and empirically assessed, using real or simulated data sets. Overall, a significant amount of information about past evolutionary modes can be extracted from DNA sequences, suggesting that process (rates of distinct kinds of nucleotide substitutions) and pattern (the evolutionary tree) can be simultaneously inferred. G + C contents at ancestral nodes are quite accurately estimated. The new method appears to be useful for phylogenetic reconstruction when base composition varies among compared sequences. It may also be suitable for molecular evolution studies. 相似文献

2.

The augmentation algorithm and molecular phylogenetic trees

Richard Holmquist 《Journal of molecular evolution》1978,12(1):17-24

Summary The augmentation procedure of G.W. Moore leads to correct estimates of the total number of nucleotide substitutions separating two genes descendent from a common ancestor provided the data base is sufficiently dense. These estimates are in agreement with the true distance values from simulations of known evolutionary pathways. The estimates, on the average, are unbiased: they neither overaugment nor underaugment seriously. The variance of the population of augmented distance values reflects accurately the variance of the population of true distance values and is thus not abnormally large due to procedural defects in the algorithm.The augmented distances are in agreement with stochastic models tested on real data when the latter take proper account of the restricted mutability of codons resulting from natural selection.When the experimental data base is not dense, the augmented distance values and population variance may underestimate both the true distance values and their variance. This has a logical consequence that there exist significant and numerous errors in the ancestral sequences reconstructed by the parsimony principle from such data bases.The restrictions, resulting from natural selection, on the mutability of different nucleotide sites is shown to bear critically on the accuracy of estimates of the total number of nucleotide replacements made by stochastic models. 相似文献

3.

Accounting for uncertainty in dormant life stages in stochastic demographic models

下载免费PDF全文

Maria Paniw Pedro F. Quintana‐Ascencio Fernando Ojeda Roberto Salguero‐Gómez 《Oikos》2017,126(6):900-909

Dormant life stages are often critical for population viability in stochastic environments, but accurate field data characterizing them are difficult to collect. Such limitations may translate into uncertainties in demographic parameters describing these stages, which then may propagate errors in the examination of population‐level responses to environmental variation. Expanding on current methods, we 1) apply data‐driven approaches to estimate parameter uncertainty in vital rates of dormant life stages and 2) test whether such estimates provide more robust inferences about population dynamics. We built integral projection models (IPMs) for a fire‐adapted, carnivorous plant species using a Bayesian framework to estimate uncertainty in parameters of three vital rates of dormant seeds – seed‐bank ingression, stasis and egression. We used stochastic population projections and elasticity analyses to quantify the relative sensitivity of the stochastic population growth rate (log λs) to changes in these vital rates at different fire return intervals. We then ran stochastic projections of log λs for 1000 posterior samples of the three seed‐bank vital rates and assessed how strongly their parameter uncertainty propagated into uncertainty in estimates of log λs and the probability of quasi‐extinction, P_q(t). Elasticity analyses indicated that changes in seed‐bank stasis and egression had large effects on log λs across fire return intervals. In turn, uncertainty in the estimates of these two vital rates explained > 50% of the variation in log λs estimates at several fire‐return intervals. Inferences about population viability became less certain as the time between fires widened, with estimates of P_q(t) potentially > 20% higher when considering parameter uncertainty. Our results suggest that, for species with dormant stages, where data is often limited, failing to account for parameter uncertainty in population models may result in incorrect interpretations of population viability. 相似文献

4.

Entropy of the genetic information and evolution

Masami Hasegawa Taka-Aki Yano 《Origins of life and evolution of the biosphere》1975,6(1-2):219-227

The entropy of the amino acid sequences coded by DNA is considered as a measure of diversity or variety of proteins, and is taken as a measure of evolution. The DNA or m-RNA sequence is corsidered as a stationary second-order Markov chain composed of four kinds of bases. Because of the biased nature of the genetic code table, increase of entropy of amino acid sequences is possible with biased nucleotide sequence. Thus the biased DNA base composition and the extreme rarity of the base doubletC _p G of higher organisms are explained. It is expected that the amino acid composition was highly biased at the days of the origin of the genetic code table, and the more frequent amino acids have tended to get rarer, and the rarer ones more frequent. This tendency is observed in the evolution of hemoglobin, cytochrome C, fibrinopeptide, immunoglobulin and lysozyme, and protein as a whole. 相似文献

5.

On the correlation between composition and site-specific evolutionary rate: implications for phylogenetic inference

Gowri-Shankar V Rattray M 《Molecular biology and evolution》2006,23(2):352-364

Model-based phylogenetic reconstruction methods traditionally assume homogeneity of nucleotide frequencies among sequence sites and lineages. Yet, heterogeneity in base composition is a characteristic shared by most biological sequences. Compositional variation in time, reflected in the compositional biases among contemporary sequences, has already been extensively studied, and its detrimental effects on phylogenetic estimates are known. However, fewer studies have focused on the effects of spatial compositional heterogeneity within genes. We show here that different sites in an alignment do not always share a unique compositional pattern, and we provide examples where nucleotide frequency trends are correlated with the site-specific rate of evolution in RNA genes. Spatial compositional heterogeneity is shown to affect the estimation of evolutionary parameters. With standard phylogenetic methods, estimates of equilibrium frequencies are found to be biased towards the composition observed at fast-evolving sites. Conversely, the ancestral composition estimates of some time-heterogeneous but spatially homogeneous methods are found to be biased towards frequencies observed at invariant and slow-evolving sites. The latter finding challenges the result of a previous study arguing against a hyperthermophilic last universal ancestor from the low apparent G + C content of its rRNA sequences. We propose a new model to account for compositional variation across sites. A Gaussian process prior is used to allow for a smooth change in composition with evolutionary rate. The model has been implemented in the phylogenetic inference software PHASE, and Bayesian methods can be used to obtain the model parameters. The results suggest that this model can accurately capture the observed trends in present-day RNA sequences. 相似文献

6.

Base Compositional Bias and Phylogenetic Analyses: A Test of the “Flying DNA” Hypothesis

Ronald A. Van Den Bussche Robert J. Baker John P. Huelsenbeck David M. Hillis 《Molecular phylogenetics and evolution》1998,10(3)

Phylogenetic methods can produce biased estimates of phylogeny when base composition varies along different lineages. Pettigrew (1994,Curr. Biol.4:277–280) has suggested that base composition bias is responsible for the apparent support for the monophyly of bats (Chiroptera: megabats and microbats) from several different nuclear and mitochondrial genes. Pettigrew's “flying DNA” hypothesis makes several predictions: (1) that metabolic constraints associated with flying result in elevated levels of adenine and thymine throughout the genome of both megabats and microbats, (2) that the resulting base compositional bias in bats is sufficient to mislead phylogenetic methods and account for the support for bat monophyly from several nuclear and mitochondrial genes, and (3) that phylogenetic analysis using pairwise distances corrected for compositional bias should eliminate the support for bat monophyly. We tested these predictions by analyzing DNA sequences from two nuclear and three mitochondrial genes. The predicted base compositional bias does not appear to exist in some of the genes, and in other genes the differences in AT content are very small. Analyses under a wide diversity of criteria and models of evolution, including analyses that take base composition into account (using log-determinant distances), all strongly support bat monophyly. Moreover, simulation analyses indicate that even extreme bias toward AT-base composition in bats would be insufficient to explain the observed levels of support for bat monophyly. These analyses provide no support for the “flying DNA” hypothesis, whereas the monophyly of bats appears to be well supported by the DNA sequence data. 相似文献

7.

Mitochondrial DNA Variation, Phylogenetic Relationships, and Evolution of Four Mediterranean Genera of Soles (Soleidae, Pleuronectiformes)

Tinti F Piccinetti C Tommasini S Vallisneri M 《Marine biotechnology (New York, N.Y.)》2000,2(3):274-284

To increase knowledge about the systematics and evolution of Mediterranean soles, we assessed mitochondrial DNA variation, molecular phylogeny, and evolution in eight species from the genera Solea, Microchirus, Monochirus, and Buglossidium by large ribosomal subunit (16S) and cytochrome b (cytb) sequence analysis. Relevant molecular features are the great variation of base composition among species at the third codon in cytb and the heterogeneity of the nucleotide substitution rate. Phylogenies recovered using 16S nucleotide and cytb amino acid sequences agree with those based on morphology in assessing monophyly of Solea species and ancestry of Buglossidium luteum, but they are against the intergeneric differentiation of Microchirus and Monochirus. Conversely, phylogenetic trees based on cytb nucleotide sequences yielded relationships among taxa regardless of their evolutionary histories. The incongruities between morphological and molecular issues suggest the need for reassessing the systematic value of some morphological characters. Approximate estimates of the divergence time of Mediterranean soleid lineages range from 40 to 13 Mya (Oligocene–Miocene), indicating an ancient origin for the group. Received August 31, 1999; accepted December 17, 1999. 相似文献

8.

The Phylogenetic Utility of the Codon-Degeneracy Model

McClellan DA 《Journal of molecular evolution》2000,51(3):185-193

The codon-degeneracy model (CDM) predicts relative frequencies of substitution for any set of homologous protein-coding DNA sequences based on patterns of nucleotide degeneracy, codon composition, and the assumption of selective neutrality. However, at present, the CDM is reliant on outside estimates of transition bias. A new method by which the power of the CDM can be used to find a synonymous transition bias that is optimal for any given phylogenetic tree topology is presented. An example is illustrated that utilizes optimized transition biases to generate CDM GF-scores for every possible phylogenetic tree for pocket gophers of the genus Orthogeomys. The resulting distribution of CDM GF-scores is compared and contrasted with the results of maximum parsimony and maximum likelihood methods. Although convergence on a single tree topology by the CDM and another method indicates greater support for that particular tree, the value of CDM GF-score as the sole optimality criterion for phylogeny reconstruction remains to be determined. It is clear, however, that the a priori estimation of an optimum transition bias from codon composition has a direct application to differentiating between alternative trees. Received: 13 October 1999 / Accepted: 28 April 2000 相似文献

9.

Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST

E Michael Gertz Yi-Kuo Yu Richa Agarwala Alejandro A Schäffer Stephen F Altschul 《BMC biology》2006,4(1):41-14

相似文献

10.

Compensatory substitutions and the evolution of the mitochondrial 12S rRNA gene in mammals 总被引：5，自引：2，他引：3

Springer MS; Hollar LJ; Burk A 《Molecular biology and evolution》1995,12(6):1138-1150

12S ribosomal RNA (rRNA) gene sequences from a suite of mammalian taxa (13 placentals, 4 marsupials, 1 monotreme), for which phylogenetic relationships are well established based on independent criteria, were employed to study the evolution of this gene. Phylogenetic analysis of 12S sequences produces a phylogeny that agrees with expectations. Base composition provides evidence for directional symmetrical substitution pressure in loops; in stems, base composition is much more even. Rates of nucleotide substitution are lower in stems than loops. Patterns of nucleotide substitution show an overall preference for transitions over transversions, with this difference more profound in stems than loops. Among different transversion pathways, there is a wide range of transformation frequencies. An analysis of compensatory substitutions shows that there is strong evidence for their occurrence and that a weighting factor of 0.61 should be applied in phylogenetic analyses to account for the dependence of mutations at stem positions relative to positions where changes are independent. Among stem variables (i.e., stem length, interaction distance, substitution rates, G+C content, and the percentage of bases that are paired), several significant correlations were discovered, but stem length and interaction distance are uncorrelated with other variables. 相似文献

11.

Synonymous substitution rates in Drosophila: Mitochondrial versus nuclear genes

Etsuko N. Moriyama Jeffrey R. Powell 《Journal of molecular evolution》1997,45(4):378-391

Synonymous substitution rates in mitochondrial and nuclear genes of Drosophila were compared. To make accurate comparisons, we considered the following: (1) relative synonymous rates, which do not require divergence time estimates, should be used; (2) methods estimating divergence should take into account base composition; (3) only very closely related species should be used to avoid effects of saturation; (4) the heterogeneity of rates should be examined. We modified the methods estimating synonymous substitution numbers to account for base composition bias. By using these methods, we found that mitochondrial genes have 1.7–3.4 times higher synonymous substitution rates than the fastest nuclear genes or 4.5–9.0 times higher rates than the average nuclear genes. The average rate of synonymous transversions was 2.7 (estimated from the melanogaster species subgroup) or 2.9 (estimated from the obscura group) times higher in mitochondrial genes than in nuclear genes. Synonymous transversions in mitochondrial genes occurred at an approximately equivalent rate to those in the fastest nuclear genes. This last result is not consistent with the hypothesis that the difference in turnover rates between mitochondrial and nuclear genomes is the major factor determining higher synonymous substitution rates in mtDNA. We conclude that the difference in synonymous substitution rates is due to a combination of two factors: a higher transitional mutation rate in mtDNA and constraints on nuclear genes due to selection for codon usage. Received: 27 November 1996 / Accepted: 8 May 1997 相似文献

12.

An Improved Codon Modeling Approach for Accurate Estimation of the Mutation Bias

Thibault Latrille Nicolas Lartillot 《Molecular biology and evolution》2022,39(2)

Phylogenetic codon models are routinely used to characterize selective regimes in coding sequences. Their parametric design, however, is still a matter of debate, in particular concerning the question of how to account for differing nucleotide frequencies and substitution rates. This problem relates to the fact that nucleotide composition in protein-coding sequences is the result of the interactions between mutation and selection. In particular, because of the structure of the genetic code, the nucleotide composition differs between the three coding positions, with the third position showing a more extreme composition. Yet, phylogenetic codon models do not correctly capture this phenomenon and instead predict that the nucleotide composition should be the same for all three positions. Alternatively, some models allow for different nucleotide rates at the three positions, an approach conflating the effects of mutation and selection on nucleotide composition. In practice, it results in inaccurate estimation of the strength of selection. Conceptually, the problem comes from the fact that phylogenetic codon models do not correctly capture the fixation bias acting against the mutational pressure at the mutation–selection equilibrium. To address this problem and to more accurately identify mutation rates and selection strength, we present an improved codon modeling approach where the fixation rate is not seen as a scalar, but as a tensor. This approach gives an accurate representation of how mutation and selection oppose each other at equilibrium and yields a reliable estimate of the mutational process, while disentangling the mean fixation probabilities prevailing in different mutational directions. 相似文献

13.

Sequence analysis of the α-galactosidase <Emphasis Type="Italic">MEL</Emphasis> gene governing the efficient production of ethanol from raffinose-rich molasses in the yeast <Emphasis Type="Italic">Lachancea thermotolerans</Emphasis>

Naoya Takakuwa Masahiko Tamura Masao Ohnishi Yuji Oda 《World journal of microbiology & biotechnology》2007,23(4):587-591

The yeast Lachancea thermotolerans, formerly Kluyveromyces thermotolerans, was tested for the ethanol fermentation of raffinose-rich molasses. Two melibiose-fermenting strains, NBRC 10066 and NBRC 10067, produced more ethanol than eight other strains. The concentration of ethanol synthesized by NBRC 10066 was slightly higher than that by NBRC 10067, probably on the basis of the expression of α-galactosidase. The regions corresponding to the α-galactosidase MEL1 gene of Saccharomyces cerevisiae were amplified. The nucleotide sequences of the two genes designated as MELth1 and MELth2 revealed single open reading frames of 1,416 bp encoding 472 amino acids but differed from each other in one base that converted the amino acid composition. The sequences of the 5′-upstream region from −1 to −515 of the two genes are identical except for one base. 相似文献

14.

Forty Million Years of Independent Evolution: A Mitochondrial Gene and Its Corresponding Nuclear Pseudogene in Primates

Quesada H Ramos-Onsins SE Aguadé M 《Journal of molecular evolution》2005,60(1):1-11

Sequences from nuclear mitochondrial pseudogenes (numts) that originated by transfer of genetic information from mitochondria to the nucleus offer a unique opportunity to compare different regimes of molecular evolution. Analyzing a 1621-nt-long numt of the rRNA specifying mitochondrial DNA residing on human chromosome 3 and its corresponding mitochondrial gene in 18 anthropoid primates, we were able to retrace about 40 MY of primate rDNA evolutionary history. The results illustrate strengths and weaknesses of mtDNA data sets in reconstructing and dating the phylogenetic history of primates. We were able to show the following. In contrast to numt-DNA, (1) the nucleotide composition of mtDNA changed dramatically in the different primate lineages. This is assumed to lead to significant misinterpretations of the mitochondrial evolutionary history. (2) Due to the nucleotide compositional plasticity of primate mtDNA, the phylogenetic reconstruction combining mitochondrial and nuclear sequences is unlikely to yield reliable information for either tree topologies or branch lengths. This is because a major part of the underlying sequence evolution model — the nucleotide composition — is undergoing dramatic change in different mitochondrial lineages. We propose that this problem is also expressed in the occasional unexpected long branches leading to the “common ancestor” of orthologous numt sequences of different primate taxa. (3) The heterogeneous and lineage-specific evolution of mitochondrial sequences in primates renders molecular dating based on primate mtDNA problematic, whereas the numt sequences provide a much more reliable base for dating.[Reviewing Editor: Dr. Rafael Zardoya] 相似文献

15.

Analysis of donor splice sites in different eukaryotic organisms

Igor B. Rogozin Luciano Milanesi 《Journal of molecular evolution》1997,45(1):50-59

We present here a new algorithm for functional site analysis. It is based on four main assumptions: each variation of nucleotide composition makes a different contribution to the overall binding free energy of interaction between a functional site and another molecule; nonfunctioning site-like regions (pseudosites) are absent or rare in genomes; there may be errors in the sample of sites; and nucleotides of different site positions are considered to be mutually dependent. In this algorithm, the site set is divided into subsets, each described by a certain consensus. Donor splice sites of the human protein-coding genes were analyzed. Comparing the results with other methods of donor splice site prediction has demonstrated a more accurate prediction of consensus sequences AG/GU(A,G), G/GUnAG, /GU(A,G)AG, /GU(A,G)nGU, and G/GUA than is achieved by weight matrix and consensus (A,C)AG/GU(A,G)AGU with mismatches. The probability of the first type error, E1, for the obtained consensus set was about 0.05, and the probability of the second type error, E2, was 0.15. The analysis demonstrated that accuracy of the functional site prediction could be improved if one takes into account correlations between the site positions. The accuracy of prediction by using human consensus sequences was tested on sequences from different organisms. Some differences in consensus sequences for the plant Arabidopsis sp., the invertebrate Caenorhabditis sp., and the fungus Aspergillus sp. were revealed. For the yeast Saccharomyces sp. only one conservative consensus, /GUA(U,A,C)G(U,A,C), was revealed (E1 = 0.03, E2 = 0.03). Yeast is a very interesting model to use for analysis of molecular mechanisms of splicing. Received: 14 October 1996 / Accepted: 30 January 1997 相似文献

16.

Dynamics of blood chylomicron fatty acids in a marine carnivore: implications for lipid metabolism and quantitative estimation of predator diets 总被引：1，自引：1，他引：0

Cooper MH Iverson SJ Heras H 《Journal of comparative physiology. B, Biochemical, systemic, and environmental physiology》2005,175(2):133-145

Blubber fatty acid(s) (FA) signatures can provide accurate estimates of predator diets using quantitative FA signature analysis, provided that aspects of predator FA metabolism are taken into account. Because the intestinal absorption of dietary FA and their incorporation into chylomicrons (the primary transport lipoproteins for dietary FA in the blood) may influence the relationship between FA composition in the diet and adipose tissue, we investigated the metabolism of individual FA at these early stages of assimilation. We also investigated the capacity of chylomicron signatures to provide quantitative estimates of prey composition of an experimental meal. Six captive juvenile grey seals (Halichoerus grypus) were fed either 2.3 kg (n=3) or 4.6 kg (n=3) of Atlantic herring (Clupea harengus). Although chylomicron FA signatures resembled diet signatures at all samplings, absolute differences were smallest at 3-h post-feeding, when chylomicrons were likely largest and had the greatest ratio of triacylglycerol to phospholipid FA. Specific FA that differed significantly between diet and chylomicron signatures reflected either input from endogenous sources or loss through peroxisomal -oxidation. When these aspects of metabolism were accounted for, the quantitative predictions of diet composition generated using chylomicron signatures were extremely accurate, even when tested against 28 other prey items. 相似文献

17.

18S rRNA metabarcoding diet analysis of a predatory fish community across seasonal changes in prey availability

Justin M. Waraniak Terence L. Marsh Kim T. Scribner 《Ecology and evolution》2019,9(3):1410-1430

Predator–prey relationships are important ecological interactions, affecting biotic community composition and energy flow through a system, and are of interest to ecologists and managers. Morphological diet analysis has been the primary method used to quantify the diets of predators, but emerging molecular techniques using genetic data can provide more accurate estimates of relative diet composition. This study used sequences from the 18S V9 rRNA barcoding region to identify prey items in the gastrointestinal (GI) tracts of predatory fishes. Predator GI samples were taken from the Black River, Cheboygan Co., MI, USA (n = 367 samples, 12 predator species) during periods of high prey availability, including the larval stage of regionally threatened lake sturgeon (Acipenser fulvescens Rafinesque 1817) in late May/early June of 2015 and of relatively lower prey availability in early July of 2015. DNA was extracted and sequenced from 355 samples (96.7%), and prey DNA was identified in 286 of the 355 samples (80.6%). Prey were grouped into 33 ecologically significant taxonomic groups based on the lowest taxonomic level sequences that could be identified using sequences available on GenBank. Changes in the makeup of diet composition, dietary overlap, and predator preference were analyzed comparing the periods of high and low prey abundance. Some predator species exhibited significant seasonal changes in diet composition. Dietary overlap was slightly but significantly higher during the period of high prey abundance; however, there was little change in predator preference. This suggests that change in prey availability was the driving factor in changing predator diet composition and dietary overlap. This study demonstrates the utility of molecular diet analysis and how temporal variability in community composition adds complexity to predator–prey interactions. 相似文献

18.

Substitutional bias confounds inference of cyanelle origins from sequence data 总被引：10，自引：0，他引：10

P. J. Lockhart C. J. Howe D. A. Bryant T. J. Beanland A. W. D. Larkum 《Journal of molecular evolution》1992,34(2):153-162

Summary Available molecular and biochemical data offer conflicting evidence for the origin of the cyanelle of Cyanophora paradoxa. We show that the similarity of cyanelle and green chloroplast sequences is probably a result of these two lineages independently developing the same pattern of directional nucleotide change (substitutional bias). This finding suggests caution should be exercised in the interpretation of nucleotide sequence analyses that appear to favor the view of a common endosymbiont for the cyanelle and chlorophyll-b-containing chloroplasts. The data and approaches needed to resolve the issue of cyanelle origins are discussed. Our findings also have general implications for phylogenetic inference under conditions where the base compositions (compositional bias) of the sequences analyzed differ. Offprint requests to: C.J. Howe 相似文献

19.

A method for estimating rates of nucleotide substitution using DNA sequence data

N. Kaplan K. Risko 《Theoretical population biology》1982,21(3):318-328

An estimate of the average number of evolutionarily acceptable substitutions per nucleotide since the most recent common ancestor of a pair of homologous sequences is found which uses nucleotide sequence data. The estimate is derived assuming a Poisson-like model for the evolutionary process. A method is also suggested for analyzing nucleotide sequence data in M homologous sequences (M 3). A simulation study is reported showing that the estimates are satisfactory providing there is sufficient homology between the sequences. To demonstrate the methods a numerical example using some β-globin data is presented. 相似文献

20.

Empirical comparison of cross-platform normalization methods for gene expression data

Jason Rudy Faramarz Valafar 《BMC bioinformatics》2011,12(1):1-22

Background

The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented.

Results

TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are shown to be more accurate for TurboFold than for alternative methods that estimate base pairing probabilities. TurboFold-MEA, which uses base pairing probabilities from TurboFold in a maximum expected accuracy algorithm for secondary structure prediction, has accuracy comparable to the best performing secondary structure prediction methods. The computational and memory requirements for TurboFold are modest and, in terms of sequence length and number of sequences, scale much more favorably than joint alignment and folding algorithms.

Conclusions

TurboFold is an iterative probabilistic method for predicting secondary structures for multiple RNA sequences that efficiently and accurately combines the information from the comparative analysis between sequences with the thermodynamic folding model. Unlike most other multi-sequence structure prediction methods, TurboFold does not enforce strict commonality of structures and is therefore useful for predicting structures for homologous sequences that have diverged significantly. TurboFold can be downloaded as part of the RNAstructure package at http://rna.urmc.rochester.edu. 相似文献