首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We describe a procedure for model averaging of relaxed molecular clock models in Bayesian phylogenetics. Our approach allows us to model the distribution of rates of substitution across branches, averaged over a set of models, rather than conditioned on a single model. We implement this procedure and test it on simulated data to show that our method can accurately recover the true underlying distribution of rates. We applied the method to a set of alignments taken from a data set of 12 mammalian species and uncovered evidence that lognormally distributed rates better describe this data set than do exponentially distributed rates. Additionally, our implementation of model averaging permits accurate calculation of the Bayes factor(s) between two or more relaxed molecular clock models. Finally, we introduce a new computational approach for sampling rates of substitution across branches that improves the convergence of our Markov chain Monte Carlo algorithms in this context. Our methods are implemented under the BEAST 1.6 software package, available at http://beast-mcmc.googlecode.com.  相似文献   

2.
Likelihood methods for detecting temporal shifts in diversification rates   总被引:8,自引:0,他引:8  
Maximum likelihood is a potentially powerful approach for investigating the tempo of diversification using molecular phylogenetic data. Likelihood methods distinguish between rate-constant and rate-variable models of diversification by fitting birth-death models to phylogenetic data. Because model selection in this context is a test of the null hypothesis that diversification rates have been constant over time, strategies for selecting best-fit models must minimize Type I error rates while retaining power to detect rate variation when it is present. Here I examine model selection, parameter estimation, and power to reject the null hypothesis using likelihood models based on the birth-death process. The Akaike information criterion (AIC) has often been used to select among diversification models; however, I find that selecting models based on the lowest AIC score leads to a dramatic inflation of the Type I error rate. When appropriately corrected to reduce Type I error rates, the birth-death likelihood approach performs as well or better than the widely used gamma statistic, at least when diversification rates have shifted abruptly over time. Analyses of datasets simulated under a range of rate-variable diversification scenarios indicate that the birth-death likelihood method has much greater power to detect variation in diversification rates when extinction is present. Furthermore, this method appears to be the only approach available that can distinguish between a temporal increase in diversification rates and a rate-constant model with nonzero extinction. I illustrate use of the method by analyzing a published phylogeny for Australian agamid lizards.  相似文献   

3.
For populations having dispersal described by fat-tailed kernels (kernels with tails that are not exponentially bounded), asymptotic population spread rates cannot be estimated by traditional models because these models predict continually accelerating (asymptotically infinite) invasion. The impossible predictions come from the fact that the fat-tailed kernels fitted to dispersal data have a quality (nondiscrete individuals and, thus, no moment-generating function) that never applies to data. Real organisms produce finite (and random) numbers of offspring; thus, an empirical moment-generating function can always be determined. Using an alternative method to estimate spread rates in terms of extreme dispersal events, we show that finite estimates can be derived for fat-tailed kernels, and we demonstrate how variable reproduction modifies these rates. Whereas the traditional models define spread rate as the speed of an advancing front describing the expected density of individuals, our alternative definition for spread rate is the expected velocity for the location of the furthest-forward individual in the population. The asymptotic wave speed for a constant net reproductive rate R0 is approximated as (1/T)(piuR)/2)(1/2) m yr(-1), where T is generation time, and u is a distance parameter (m2) of Clark et al.'s 2Dt model having shape parameter p = 1. From fitted dispersal kernels with fat tails and infinite variance, we derive finite rates of spread and a simple method for numerical estimation. Fitted kernels, with infinite variance, yield distributions of rates of spread that are asymptotically normal and, thus, have finite moments. Variable reproduction can profoundly affect rates of spread. By incorporating the variance in reproduction that results from variable life span, we estimate much lower rates than predicted by the standard approach, which assumes a constant net reproductive rate. Using basic life-history data for trees, we show these estimated rates to be lower than expected from previous analytical models and as interpreted from paleorecords of forest spread at the end of the Pleistocene. Our results suggest reexamination of past rates of spread and the potential for future response to climate change.  相似文献   

4.
Existing methods for identity by descent (IBD) segment detection were designed for SNP array data, not sequence data. Sequence data have a much higher density of genetic variants and a different allele frequency distribution, and can have higher genotype error rates. Consequently, best practices for IBD detection in SNP array data do not necessarily carry over to sequence data. We present a method, IBDseq, for detecting IBD segments in sequence data and a method, SEQERR, for estimating genotype error rates at low-frequency variants by using detected IBD. The IBDseq method estimates probabilities of genotypes observed with error for each pair of individuals under IBD and non-IBD models. The ratio of estimated probabilities under the two models gives a LOD score for IBD. We evaluate several IBD detection methods that are fast enough for application to sequence data (IBDseq, Beagle Refined IBD, PLINK, and GERMLINE) under multiple parameter settings, and we show that IBDseq achieves high power and accuracy for IBD detection in sequence data. The SEQERR method estimates genotype error rates by comparing observed and expected rates of pairs of homozygote and heterozygote genotypes at low-frequency variants in IBD segments. We demonstrate the accuracy of SEQERR in simulated data, and we apply the method to estimate genotype error rates in sequence data from the UK10K and 1000 Genomes projects.  相似文献   

5.
Evolutionary timescales can be estimated from genetic data using phylogenetic methods based on the molecular clock. To account for molecular rate variation among lineages, a number of relaxed‐clock models have been developed. Some of these models assume that rates vary among lineages in an autocorrelated manner, so that closely related species share similar rates. In contrast, uncorrelated relaxed clocks allow all of the branch‐specific rates to be drawn from a single distribution, without assuming any correlation between rates along neighbouring branches. There is uncertainty about which of these two classes of relaxed‐clock models are more appropriate for biological data. We present an R package, NELSI, that allows the evolution of DNA sequences to be simulated according to a range of clock models. Using data generated by this package, we assessed the ability of two Bayesian phylogenetic methods to distinguish among different relaxed‐clock models and to quantify rate variation among lineages. The results of our analyses show that rate autocorrelation is typically difficult to detect, even when there is complete taxon sampling. This provides a potential explanation for past failures to detect rate autocorrelation in a range of data sets.  相似文献   

6.
The potency of antiretroviral agents in AIDS clinical trials can be assessed on the basis of an early viral response such as viral decay rate or change in viral load (number of copies of HIV RNA) of the plasma. Linear, parametric nonlinear, and semiparametric nonlinear mixed‐effects models have been proposed to estimate viral decay rates in viral dynamic models. However, before applying these models to clinical data, a critical question that remains to be addressed is whether these models produce coherent estimates of viral decay rates, and if not, which model is appropriate and should be used in practice. In this paper, we applied these models to data from an AIDS clinical trial of potent antiviral treatments and found significant incongruity in the estimated rates of reduction in viral load. Simulation studies indicated that reliable estimates of viral decay rate were obtained by using the parametric and semiparametric nonlinear mixed‐effects models. Our analysis also indicated that the decay rates estimated by using linear mixed‐effects models should be interpreted differently from those estimated by using nonlinear mixed‐effects models. The semiparametric nonlinear mixed‐effects model is preferred to other models because arbitrary data truncation is not needed. Based on real data analysis and simulation studies, we provide guidelines for estimating viral decay rates from clinical data. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

7.
Coincidence in high-speed flow cytometry: models and measurements   总被引:2,自引:0,他引:2  
In flow cytometry, the coincident arrival of particles becomes a major problem when high sample rates are required. For the development of our high-speed photodamage flow cytometer (ZAPPER), it was of importance to understand the behavior of cells at flow rates of around 50,000-250,000 event/s. We developed and compared two models that describe the relation between the real cell rate and the detectable single cell rate. Both the Computer Simulation model and the Input/Output Device model show distinct optima for the cell rate. The models were compared to measurements performed on the ZAPPER-prototype. Fits of the two models to the experimental data were excellent for cycle times of 4 and 15 microseconds and acceptable for a 2 microseconds cycle time. A third model (Mercer WB, Rev. Sci. Instr. 37:1515-1521,1966) could be fitted to the experimental data, after the proportionality constant k was adapted to the experimental data. At a yield of detectable single cells of 70%, the maximum cell rates are 180,000, 100,000, and 40,000 cells/s for cycle times of 2, 4, and 15 microseconds, respectively. Based on these results we can now select an optimal cell rate for analysis and sorting based on criteria such as accepted cell loss. In addition, the advantages of reducing the cycle time can now be evaluated with respect to the costs of that modification.  相似文献   

8.
Mitochondrial DNA data have been used extensively to study evolution and early human origins. These applications require estimates of the rate at which nucleotide substitutions occur in the DNA sequence. We consider the problem of estimating substitution rates in the presence of site-to-site rate variation. A coalescent model is presented that allows for different substitution rates for purines and pyrimidines, as well as more detailed models that allow fast and slow rates within each of the purine and pyrimidine classes. A method for estimating such rates is presented. Even for these simple models of site heterogeneity, there are, typically, insufficient data to obtain reliable estimates of site-specific substitution rates. However, estimates of the average rate across all sites appear to be relatively stable even in the presence of site heterogeneity. Simulations of models with site-to-site variation in mutation rate show that hypervariable sites can produce peaks in the pairwise difference curves that have previously been attributed to population dynamics.  相似文献   

9.
Cutting-edge biomedical research programs have entered an era in which phenotypic characterizations for genetically altered rodents can facilitate appropriate care. The veterinary care requirements necessary to support such animal models can include the procedures already adapted as standard practice in companion animal hospitals, and can decrease data variability while increasing survival rates.  相似文献   

10.
Standard models for senescence predict an increase in the additive genetic variance for log mortality rate late in the life cycle. Variance component analysis of age-specific mortality rates of related cohorts is problematic. The actual mortality rates are not observable and can be estimated only crudely at early ages when few individuals are dying and at late ages when most are dead. Therefore, standard quantitative genetic analysis techniques cannot be applied with confidence. We present a novel and rigorous analysis that treats the mortality rates as missing data following two different parametric senescence models. Two recent studies of Drosophila melanogaster, the original analyses of which reached different conclusions, are reanalyzed here. The two-parameter Gompertz model assumes that mortality rates increase exponentially with age. A related but more complex three-parameter logistic model allows for subsequent leveling off in mortality rates at late ages. We find that while additive variance for mortality rates increases for late ages under the Gompertz model, it declines under the logistic model. The results from the two studies are similar, with differences attributable to differences between the experiments.  相似文献   

11.
The choice of a probabilistic model to describe sequence evolution can and should be justified. Underfitting the data through the use of overly simplistic models may miss out on interesting phenomena and lead to incorrect inferences. Overfitting the data with models that are too complex may ascribe biological meaning to statistical artifacts and result in falsely significant findings. We describe a likelihood-based approach for evolutionary model selection. The procedure employs a genetic algorithm (GA) to quickly explore a combinatorially large set of all possible time-reversible Markov models with a fixed number of substitution rates. When applied to stem RNA data subject to well-understood evolutionary forces, the models found by the GA 1) capture the expected overall rate patterns a priori; 2) fit the data better than the best available models based on a priori assumptions, suggesting subtle substitution patterns not previously recognized; 3) cannot be rejected in favor of the general reversible model, implying that the evolution of stem RNA sequences can be explained well with only a few substitution rate parameters; and 4) perform well on simulated data, both in terms of goodness of fit and the ability to estimate evolutionary rates. We also investigate the utility of several distance measures for comparing and contrasting inferred evolutionary models. Using widely available small computer clusters, our approach allows, for the first time, to evaluate the performance of existing RNA evolutionary models by comparing them with a large pool of candidate models and to validate common modeling assumptions. In addition, the new method provides the foundation for rigorous selection and comparison of substitution models for other types of sequence data.  相似文献   

12.
Estimation of division and death rates of lymphocytes in different conditions is vital for quantitative understanding of the immune system. Deuterium, in the form of deuterated glucose or heavy water, can be used to measure rates of proliferation and death of lymphocytes in vivo. Inferring these rates from labeling and delabeling curves has been subject to considerable debate with different groups suggesting different mathematical models for that purpose. We show that the three most common models, which are based on quite different biological assumptions, actually predict mathematically identical labeling curves with one parameter for the exponential up and down slope, and one parameter defining the maximum labeling level. By extending these previous models, we here propose a novel approach for the analysis of data from deuterium labeling experiments. We construct a model of “kinetic heterogeneity” in which the total cell population consists of many sub-populations with different rates of cell turnover. In this model, for a given distribution of the rates of turnover, the predicted fraction of labeled DNA accumulated and lost can be calculated. Our model reproduces several previously made experimental observations, such as a negative correlation between the length of the labeling period and the rate at which labeled DNA is lost after label cessation. We demonstrate the reliability of the new explicit kinetic heterogeneity model by applying it to artificially generated datasets, and illustrate its usefulness by fitting experimental data. In contrast to previous models, the explicit kinetic heterogeneity model 1) provides a novel way of interpreting labeling data; 2) allows for a non-exponential loss of labeled cells during delabeling, and 3) can be used to describe data with variable labeling length.  相似文献   

13.
A typical task in the application of aggregated Markov models to ion channel data is the estimation of the transition rates between the states. Realistic models for ion channel data often have one or more loops. We show that the transition rates of a model with loops are not identifiable if the model has either equal open or closed dwell times. This non-identifiability of the transition rates also has an effect on the estimation of the transition rates for models which are not subject to the constraint of either equal open or closed dwell times. If a model with loops has nearly equal dwell times, the Hessian matrix of its likelihood function will be ill-conditioned and the standard deviations of the estimated transition rates become extraordinarily large for a number of data points which are typically recorded in experiments.  相似文献   

14.
Quantifying kill rates and sources of variation in kill rates remains an important challenge in linking predators to their prey. We address current approaches to using global positioning system (GPS)-based movement data for quantifying key predation components of large carnivores. We review approaches to identify kill sites from GPS movement data as a means to estimate kill rates and address advantages of using GPS-based data over past approaches. Despite considerable progress, modelling the probability that a cluster of GPS points is a kill site is no substitute for field visits, but can guide our field efforts. Once kill sites are identified, time spent at a kill site (handling time) and time between kills (killing time) can be determined. We show how statistical models can be used to investigate the influence of factors such as animal characteristics (e.g. age, sex, group size) and landscape features on either handling time or killing efficiency. If we know the prey densities along paths to a kill, we can quantify the ‘attack success’ parameter in functional response models directly. Problems remain in incorporating the behavioural complexity derived from GPS movement paths into functional response models, particularly in multi-prey systems, but we believe that exploring the details of GPS movement data has put us on the right path.  相似文献   

15.
The covarion hypothesis of molecular evolution proposes that selective pressures on an amino acid or nucleotide site change through time, thus causing changes of evolutionary rate along the edges of a phylogenetic tree. Several kinds of Markov models for the covarion process have been proposed. One model, proposed by Huelsenbeck (2002), has 2 substitution rate classes: the substitution process at a site can switch between a single variable rate, drawn from a discrete gamma distribution, and a zero invariable rate. A second model, suggested by Galtier (2001), assumes rate switches among an arbitrary number of rate classes but switching to and from the invariable rate class is not allowed. The latter model allows for some sites that do not participate in the rate-switching process. Here we propose a general covarion model that combines features of both models, allowing evolutionary rates not only to switch between variable and invariable classes but also to switch among different rates when they are in a variable state. We have implemented all 3 covarion models in a maximum likelihood framework for amino acid sequences and tested them on 23 protein data sets. We found significant likelihood increases for all data sets for the 3 models, compared with a model that does not allow site-specific rate switches along the tree. Furthermore, we found that the general model fit the data better than the simpler covarion models in the majority of the cases, highlighting the complexity in modeling the covarion process. The general covarion model can be used for comparing tree topologies, molecular dating studies, and the investigation of protein adaptation.  相似文献   

16.
Current statistical methods for estimating nest survival rates assume that nests are identical in their propensity to succeed. However, there are several biological reasons to question this assumption. For example, experience of the nest builder, number of nest helpers, genetic fitness of individuals, and site effects may contribute to an inherent disparity between nests with respect to their daily mortality rates. Ignoring such heterogeneity can lead to incorrect survival estimates. Our results show that constant survival models can seriously underestimate overall survival in the presence of heterogeneity. This paper presents a flexible random-effects approach to model heterogeneous nest survival data. We illustrate our methods through data on redwing blackbirds.  相似文献   

17.
We consider three approaches for estimating the rates of nonsynonymous and synonymous changes at each site in a sequence alignment in order to identify sites under positive or negative selection: (1) a suite of fast likelihood-based "counting methods" that employ either a single most likely ancestral reconstruction, weighting across all possible ancestral reconstructions, or sampling from ancestral reconstructions; (2) a random effects likelihood (REL) approach, which models variation in nonsynonymous and synonymous rates across sites according to a predefined distribution, with the selection pressure at an individual site inferred using an empirical Bayes approach; and (3) a fixed effects likelihood (FEL) method that directly estimates nonsynonymous and synonymous substitution rates at each site. All three methods incorporate flexible models of nucleotide substitution bias and variation in both nonsynonymous and synonymous substitution rates across sites, facilitating the comparison between the methods. We demonstrate that the results obtained using these approaches show broad agreement in levels of Type I and Type II error and in estimates of substitution rates. Counting methods are well suited for large alignments, for which there is high power to detect positive and negative selection, but appear to underestimate the substitution rate. A REL approach, which is more computationally intensive than counting methods, has higher power than counting methods to detect selection in data sets of intermediate size but may suffer from higher rates of false positives for small data sets. A FEL approach appears to capture the pattern of rate variation better than counting methods or random effects models, does not suffer from as many false positives as random effects models for data sets comprising few sequences, and can be efficiently parallelized. Our results suggest that previously reported differences between results obtained by counting methods and random effects models arise due to a combination of the conservative nature of counting-based methods, the failure of current random effects models to allow for variation in synonymous substitution rates, and the naive application of random effects models to extremely sparse data sets. We demonstrate our methods on sequence data from the human immunodeficiency virus type 1 env and pol genes and simulated alignments.  相似文献   

18.
Our understanding of the principles underlying the protein-folding problem can be tested by developing and characterizing simple models that make predictions which can be compared to experimental data. Here we extend our earlier model of folding free energy landscapes, in which each residue is considered to be either folded as in the native state or completely disordered, by investigating the role of additional factors representing hydrogen bonding and backbone torsion strain, and by using a hybrid between the master equation approach and the simple transition state theory to evaluate kinetics near the free energy barrier in greater detail. Model calculations of folding phi-values are compared to experimental data for 19 proteins, and for more than half of these, experimental data are reproduced with correlation coefficients between r=0.41 and 0.88; calculations of transition state free energy barriers correlate with rates measured for 37 single domain proteins (r=0.69). The model provides insight into the contribution of alternative-folding pathways, the validity of quasi-equilibrium treatments of the folding landscape, and the magnitude of the Arrhenius prefactor for protein folding. Finally, we discuss the limitations of simple native-state-based models, and as a more general test of such models, provide predictions of folding rates and mechanisms for a comprehensive set of over 400 small protein domains of known structure.  相似文献   

19.
We propose models for describing replacement rate variation in genes and proteins, in which the profile of relative replacement rates along the length of a given sequence is defined as a function of the site number. We consider here two types of functions, one derived from the cosine Fourier series, and the other from discrete wavelet transforms. The number of parameters used for characterizing the substitution rates along the sequences can be flexibly changed and in their most parameter-rich versions, both Fourier and wavelet models become equivalent to the unrestricted-rates model, in which each site of a sequence alignment evolves at a unique rate. When applied to a few real data sets, the new models appeared to fit data better than the discrete gamma model when compared with the Akaike information criterion and the likelihood-ratio test, although the parametric bootstrap version of the Cox test performed for one of the data sets indicated that the difference in likelihoods between the two models is not significant. The new models are applicable to testing biological hypotheses such as the statistical identity of rate variation profiles among homologous protein families. These models are also useful for determining regions in genes and proteins that evolve significantly faster or slower than the sequence average. We illustrate the application of the new method by analyzing human immunoglobulin and Drosophilid alcohol dehydrogenase sequences.  相似文献   

20.
Growth models have been developed using data from recent greenhouse studies on the effects of nitrogen, phosphorus and temperature on growth rates of Salvinia molesta Mitchell. Linear transformations were used for estimating Michaelis-Menten kinetic constants and a significant fit of the data was obtained. This has been confirmed by a comparison with field data. The growth models can be used for predicting the potential for Salvinia to remove nutrients from natural and polluted waters. Values thus obtained have a direct bearing on the design and costing of waste-water treatment ponds using Salvinia. Also, by predicting growth rates at particular temperatures and nutrient levels, the likelihood of successful control by biological agents can be assessed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号