首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Inferring speciation times under an episodic molecular clock   总被引:5,自引:0,他引:5  
We extend our recently developed Markov chain Monte Carlo algorithm for Bayesian estimation of species divergence times to allow variable evolutionary rates among lineages. The method can use heterogeneous data from multiple gene loci and accommodate multiple fossil calibrations. Uncertainties in fossil calibrations are described using flexible statistical distributions. The prior for divergence times for nodes lacking fossil calibrations is specified by use of a birth-death process with species sampling. The prior for lineage-specific substitution rates is specified using either a model with autocorrelated rates among adjacent lineages (based on a geometric Brownian motion model of rate drift) or a model with independent rates among lineages specified by a log-normal probability distribution. We develop an infinite-sites theory, which predicts that when the amount of sequence data approaches infinity, the width of the posterior credibility interval and the posterior mean of divergence times form a perfect linear relationship, with the slope indicating uncertainties in time estimates that cannot be reduced by sequence data alone. Simulations are used to study the influence of among-lineage rate variation and the number of loci sampled on the uncertainty of divergence time estimates. The analysis suggests that posterior time estimates typically involve considerable uncertainties even with an infinite amount of sequence data, and that the reliability and precision of fossil calibrations are critically important to divergence time estimation. We apply our new algorithms to two empirical data sets and compare the results with those obtained in previous Bayesian and likelihood analyses. The results demonstrate the utility of our new algorithms.  相似文献   

2.
Several methods have been designed to infer species trees from gene trees while taking into account gene tree/species tree discordance. Although some of these methods provide consistent species tree topology estimates under a standard model, most either do not estimate branch lengths or are computationally slow. An exception, the GLASS method of Mossel and Roch, is consistent for the species tree topology, estimates branch lengths, and is computationally fast. However, GLASS systematically overestimates divergence times, leading to biased estimates of species tree branch lengths. By assuming a multispecies coalescent model in which multiple lineages are sampled from each of two taxa at L independent loci, we derive the distribution of the waiting time until the first interspecific coalescence occurs between the two taxa, considering all loci and measuring from the divergence time. We then use the mean of this distribution to derive a correction to the GLASS estimator of pairwise divergence times. We show that our improved estimator, which we call iGLASS, consistently estimates the divergence time between a pair of taxa as the number of loci approaches infinity, and that it is an unbiased estimator of divergence times when one lineage is sampled per taxon. We also show that many commonly used clustering methods can be combined with the iGLASS estimator of pairwise divergence times to produce a consistent estimator of the species tree topology. Through simulations, we show that iGLASS can greatly reduce the bias and mean squared error in obtaining estimates of divergence times in a species tree.  相似文献   

3.
Despite recent technological advances in DNA sequencing, incomplete coverage remains to be an issue in population genomics, in particular for studies that include ancient samples. Here, we describe an approach to estimate population divergence times for non-overlapping sequence data that is based on probabilities of different genealogical topologies under a structured coalescent model. We show that the approach can be adapted to accommodate common problems such as sequencing errors and postmortem nucleotide misincorporations, and we use simulations to investigate biases involved with estimating genealogical topologies from empirical data. The approach relies on three reference genomes and should be particularly useful for future analysis of genomic data that comprise of nonoverlapping sets of sequences, potentially from different points in time. We applied the method to shotgun sequence data from an ancient wolf together with extant dogs and wolves and found striking resemblance to previously described fine-scale population structure among dog breeds. When comparing modern dogs to four geographically distinct wolves, we find that the divergence time between dogs and an Indian wolf is smallest, followed by the divergence times to a Chinese wolf and a Spanish wolf, and a relatively long divergence time to an Alaskan wolf, suggesting that the origin of modern dogs is somewhere in Eurasia, potentially southern Asia. We find that less than two-thirds of all loci in the boxer and poodle genomes are more similar to each other than to a modern gray wolf and that--assuming complete isolation without gene flow--the divergence time between gray wolves and modern European dogs extends to 3,500 generations before the present, corresponding to approximately 10,000 years ago (95% confidence interval [CI]: 9,000-13,000). We explicitly study the effect of gene flow between dogs and wolves on our estimates and show that a low rate of gene flow is compatible with an even earlier domestication date ~30,000 years ago (95% CI: 15,000-90,000). This observation is in agreement with recent archaeological findings and indicates that human behavior necessary for domestication of wild animals could have appeared much earlier than the development of agriculture.  相似文献   

4.
Molecular methods as applied to the biogeography of single species (phylogeography) or multiple codistributed species (comparative phylogeography) have been productively and extensively used to elucidate common historical features in the diversification of the Earth's biota. However, only recently have methods for estimating population divergence times or their confidence limits while taking into account the critical effects of genetic polymorphism in ancestral species become available, and earlier methods for doing so are underutilized. We review models that address the crucial distinction between the gene divergence, the parameter that is typically recovered in molecular phylogeographic studies, and the population divergence, which is in most cases the parameter of interest and will almost always postdate the gene divergence. Assuming that population sizes of ancestral species are distributed similarly to those of extant species, we show that phylogeographic studies in vertebrates suggest that divergence of alleles in ancestral species can comprise from less than 10% to over 50% of the total divergence between sister species, suggesting that the problem of ancestral polymorphism in dating population divergence can be substantial. The variance in the number of substitutions (among loci for a given species or among species for a given gene) resulting from the stochastic nature of DNA change is generally smaller than the variance due to substitutions along allelic lines whose coalescence times vary due to genetic drift in the ancestral population. Whereas the former variance can be reduced by further DNA sequencing at a single locus, the latter cannot. Contrary to phylogeographic intuition, dating population divergence times when allelic lines have achieved reciprocal monophyly is in some ways more challenging than when allelic lines have not achieved monophyly, because in the former case critical data on ancestral population size provided by residual ancestral polymorphism is lost. In the former case differences in coalescence time between species pairs can in principle be explained entirely by differences in ancestral population size without resorting to explanations involving differences in divergence time. Furthermore, the confidence limits on population divergence times are severely underestimated when those for number of substitutions per site in the DNA sequences examined are used as a proxy. This uncertainty highlights the importance of multilocus data in estimating population divergence times; multilocus data can in principle distinguish differences in coalescence time (T) resulting from differences in population divergence time and differences in T due to differences in ancestral population sizes and will reduce the confidence limits on the estimates. We analyze the contribution of ancestral population size (theta) to T and the effect of uncertainty in theta on estimates of population divergence (tau) for single loci under reciprocal monophyly using a simple Bayesian extension of Takahata and Satta's and Yang's recent coalescent methods. The confidence limits on tau decrease when the range over which ancestral population size theta is assumed to be distributed decreases and when tau increases; they generally exclude zero when tau/(4Ne) > 1. We also apply a maximum-likelihood method to several single and multilocus data sets. With multilocus data, the criterion for excluding tau = 0 is roughly that l tau/(4Ne) > 1, where l is the number of loci. Our analyses corroborate recent suggestions that increasing the number of loci is critical to decreasing the uncertainty in estimates of population divergence time.  相似文献   

5.
Estimating divergence dates from molecular sequences   总被引:25,自引:13,他引:12  
The ability to date the time of divergence between lineages using molecular data provides the opportunity to answer many important questions in evolutionary biology. However, molecular dating techniques have previously been criticized for failing to adequately account for variation in the rate of molecular evolution. We present a maximum- likelihood approach to estimating divergence times that deals explicitly with the problem of rate variation. This method has many advantages over previous approaches including the following: (1) a rate constancy test excludes data for which rate heterogeneity is detected; (2) date estimates are generated with confidence intervals that allow the explicit testing of hypotheses regarding divergence times; and (3) a range of sequences and fossil dates are used, removing the reliance on a single calculated calibration rate. We present tests of the accuracy of our method, which show it to be robust to the effects of some modes of rate variation. In addition, we test the effect of substitution model and length of sequence on the accuracy of the dating technique. We believe that the method presented here offers solutions to many of the problems facing molecular dating and provides a platform for future improvements to such analyses.   相似文献   

6.
A method for detecting and quantifying the cooperativity in the simultaneous binding of two ligands, A and B, to DNA (intercooperativity; omega(AB)) is proposed. This involves the determination of an apparent affinity constant K(app) for one of the ligands (A) in the limit of its null saturation (nu(Alpha) --> 0), in the presence of the second one (B). A definition for this constant is given and an expression is derived corresponding to a simple model of competitive binding to an unbranched three-state homogeneous polar lattice with nearest-neighbor interactions (Markov chain). The ratio between the apparent and intrinsic affinity constants of one ligand in the maximum saturation limit of the other one becomes omega(2)(AB), and thus can be graphically obtained from K(app)(A) vs nu(B) plots. All the frequencies that define the sequence distribution of ligands can be easily calculated by introducing some generalized statistical weights for the free lattice monomer in a standard sequence generating function procedure. A model of fluorescence quenching emission is obtained from such frequencies under the hypothesis of a short-range electron transfer mechanism of the deactivation; it confirms, as suggested by the binding model, an outstanding influence of the intercooperativity on the distribution.  相似文献   

7.
8.
Due to genetic variation in the ancestor of two populations or two species, the divergence time for DNA sequences from two populations is variable along the genome. Within genomic segments all bases will share the same divergence-because they share a most recent common ancestor-when no recombination event has occurred to split them apart. The size of these segments of constant divergence depends on the recombination rate, but also on the speciation time, the effective population size of the ancestral population, as well as demographic effects and selection. Thus, inference of these parameters may be possible if we can decode the divergence times along a genomic alignment. Here, we present a new hidden Markov model that infers the changing divergence (coalescence) times along the genome alignment using a coalescent framework, in order to estimate the speciation time, the recombination rate, and the ancestral effective population size. The model is efficient enough to allow inference on whole-genome data sets. We first investigate the power and consistency of the model with coalescent simulations and then apply it to the whole-genome sequences of the two orangutan sub-species, Bornean (P. p. pygmaeus) and Sumatran (P. p. abelii) orangutans from the Orangutan Genome Project. We estimate the speciation time between the two sub-species to be thousand years ago and the effective population size of the ancestral orangutan species to be , consistent with recent results based on smaller data sets. We also report a negative correlation between chromosome size and ancestral effective population size, which we interpret as a signature of recombination increasing the efficacy of selection.  相似文献   

9.
Scientists often need to test hypotheses and construct corresponding confidence intervals. In designing a study to test a particular null hypothesis, traditional methods lead to a sample size large enough to provide sufficient statistical power. In contrast, traditional methods based on constructing a confidence interval lead to a sample size likely to control the width of the interval. With either approach, a sample size so large as to waste resources or introduce ethical concerns is undesirable. This work was motivated by the concern that existing sample size methods often make it difficult for scientists to achieve their actual goals. We focus on situations which involve a fixed, unknown scalar parameter representing the true state of nature. The width of the confidence interval is defined as the difference between the (random) upper and lower bounds. An event width is said to occur if the observed confidence interval width is less than a fixed constant chosen a priori. An event validity is said to occur if the parameter of interest is contained between the observed upper and lower confidence interval bounds. An event rejection is said to occur if the confidence interval excludes the null value of the parameter. In our opinion, scientists often implicitly seek to have all three occur: width, validity, and rejection. New results illustrate that neglecting rejection or width (and less so validity) often provides a sample size with a low probability of the simultaneous occurrence of all three events. We recommend considering all three events simultaneously when choosing a criterion for determining a sample size. We provide new theoretical results for any scalar (mean) parameter in a general linear model with Gaussian errors and fixed predictors. Convenient computational forms are included, as well as numerical examples to illustrate our methods.  相似文献   

10.
Environmental threats, such as habitat size reduction or environmental pollution, may not cause immediate extinction of a population but shorten the expected time to extinction. We develop a method to estimate the mean time to extinction for a density-dependent population with environmental fluctuation. We first derive a formula for a stochastic differential equation model (canonical model) of a population with logistic growth with environmental and demographic stochasticities. We then study an approximate maximum likelihood (AML) estimate of three parameters (intrinsic growth rate r, carrying capacity K, and environmental stochasticity sigma(2)(e)) from a time series of population size. The AML estimate of r has a significant bias, but by adopting the Monte Carlo method, we can remove the bias very effectively (bias-corrected estimate). We can also determine the confidence interval of the parameter based on the Monte Carlo method. If the length of the time series is moderately long (with 40-50 data points), parameter estimation with the Monte Carlo sampling bias correction has a relatively small variance. However, if the time series is short (less than or equal to 10 data points), the estimate has a large variance and is not reliable. If we know the intrinsic growth rate r, however, the estimate of K and sigma(2)(e)and the mean extinction time T are reliable even if only a short time series is available. We illustrate the method using data for a freshwater fish, Japanese crucian carp (Carassius auratus subsp.) in Lake Biwa, in which the growth rate and environmental noise of crucian carp are estimated using fishery records.  相似文献   

11.
The relationship between speciation times and the corresponding times of gene divergence is of interest in phylogenetic inference as a means of understanding the past evolutionary dynamics of populations and of estimating the timing of speciation events. It has long been recognized that gene divergence times might substantially pre-date speciation events. Although the distribution of the difference between these has previously been studied for the case of two populations, this distribution has not been explicitly computed for larger species phylogenies. Here we derive a simple method for computing this distribution for trees of arbitrary size. A two-stage procedure is proposed which (i) considers the probability distribution of the time from the speciation event at the root of the species tree to the gene coalescent time conditionally on the number of gene lineages available at the root; and (ii) calculates the probability mass function for the number of gene lineages at the root. This two-stage approach dramatically simplifies numerical analysis, because in the first step the conditional distribution does not depend on an underlying species tree, while in the second step the pattern of gene coalescence prior to the species tree root is irrelevant. In addition, the algorithm provides intuition concerning the properties of the distribution with respect to the various features of the underlying species tree. The methodology is complemented by developing probabilistic formulae and software, written in R. The method and software are tested on five-taxon species trees with varying levels of symmetry. The examples demonstrate that more symmetric species trees tend to have larger mean coalescent times and are more likely to have a unimodal gamma-like distribution with a long right tail, while asymmetric trees tend to have smaller mean coalescent times with an exponential-like distribution. In addition, species trees with longer branches generally have shorter mean coalescent times, with branches closest to the root of the tree being most influential.  相似文献   

12.
We study the passage times of a translocating polymer of length N in three dimensions, while it is pulled through a narrow pore with a constant force F applied to one end of the polymer. At small to moderate forces, satisfying the condition FN(nu)/k(B)T less, similar 1, where nu approximately 0.588 is the Flory exponent for the polymer, we find that tau(N), the mean time the polymer takes to leave the pore, scales as N(2+nu) independent of F, in agreement with our earlier result for F = 0. At strong forces, i.e., for, FN(nu)/k(B)T > 1, the behavior of the passage time crosses over to tau(N) approximately N(2)/F. We show here that these behaviors stem from the polymer dynamics at the immediate vicinity of the pore-in particular, the memory effects in the polymer chain tension imbalance across the pore.  相似文献   

13.
A pseudolikelihood method for analyzing interval censored data   总被引:1,自引:0,他引:1  
We introduce a method based on a pseudolikelihood ratio forestimating the distribution function of the survival time ina mixed-case interval censoring model. In a mixed-case model,an individual is observed a random number of times, and at eachtime it is recorded whether an event has happened or not. Oneseeks to estimate the distribution of time to event. We usea Poisson process as the basis of a likelihood function to constructa pseudolikelihood ratio statistic for testing the value ofthe distribution function at a fixed point, and show that thisconverges under the null hypothesis to a known limit distribution,that can be expressed as a functional of different convex minorantsof a two-sided Brownian motion process with parabolic drift.Construction of confidence sets then proceeds by standard inversion.The computation of the confidence sets is simple, requiringthe use of the pool-adjacent-violators algorithm or a standardisotonic regression algorithm. We also illustrate the superiorityof the proposed method over competitors based on resamplingtechniques or on the limit distribution of the maximum pseudolikelihoodestimator, through simulation studies, and illustrate the differentmethods on a dataset involving time to HIV seroconversion ina group of haemophiliacs.  相似文献   

14.
Climate change has been widely recognized as a key factor driving changes in species distributions. In this study we use a metapopulation model, with a window of suitable climate moving polewards, to explore population shifts and survival of woodland birds under different climate change scenarios and landscape configurations. Extinction vulnerability and expansion ability are predicted for the middle spotted woodpecker Dendrocopus medius and two alternative r‐K strategies under west European climate change scenarios of 1, 2 and 4°C temperature increase per century, corresponding to isotemperature velocities of ca 2, 4 and 8 km yr?1. The simulated northward expansion of the bird's distribution is typically in the range of only 0–3 km yr?1, in spite of 10–20 times larger maximum dispersal distances. This is too slow to track the climate change‐driven range contraction of 4 or 8 km yr?1 in the south resulting in metapopulation extinction. Especially K‐selected (large‐bodied) species are vulnerable in the simulations. With a temperature increase of 4°C per century bird species go extinct within 104–178 yr. We present a simple approximation formula to predict the mean time to metapopulation extinction using 1) the rate of climate change, which determines the speed of range contraction in the south, 2) the size of the distribution range, which serves as a buffer against extinction, and 3) the northward expansion velocity, determined by species traits and landscape properties. Finally, our results indicate that the northward expansion rate is not constant. It will be initially lagged suggesting that recently observed expansion rates might be underestimations of future northward expansion rates.  相似文献   

15.
The recent work of Haubold and Wiehe (Mol. Biol. Evol. 18:1157–1160, 2001) considered statistical inference of the divergence time. However, there appears to be a fundamental flaw in the paper since it treated the divergence time as a random variable and not as a parameter. In this note, we derive a valid statistical inference for the divergence time. We derive an estimator for the divergence time as well as explicit expressions for its the probability density function, cumulative distribution function and the means. We also provide a 5-line computer program for computing the associated confidence intervals. We expect that the results presented could be useful for statistical modeling of divergence times.  相似文献   

16.
Constructing Confidence Intervals for Qtl Location   总被引:2,自引:2,他引:0  
B. Mangin  B. Goffinet    A. Rebai 《Genetics》1994,138(4):1301-1308
We describe a method for constructing the confidence interval of the QTL location parameter. This method is developed in the local asymptotic framework, leading to a linear model at each position of the putative QTL. The idea is to construct a likelihood ratio test, using statistics whose asymptotic distribution does not depend on the nuisance parameters and in particular on the effect of the QTL. We show theoretical properties of the confidence interval built with this test, and compare it with the classical confidence interval using simulations. We show in particular, that our confidence interval has the correct probability of containing the true map location of the QTL, for almost all QTLs, whereas the classical confidence interval can be very biased for QTLs having small effect.  相似文献   

17.
Construction of confidence intervals or regions is an important part of statistical inference. The usual approach to constructing a confidence interval for a single parameter or confidence region for two or more parameters requires that the distribution of estimated parameters is known or can be assumed. In reality, the sampling distributions of parameters of biological importance are often unknown or difficult to be characterized. Distribution-free nonparametric resampling methods such as bootstrapping and permutation have been widely used to construct the confidence interval for a single parameter. There are also several parametric (ellipse) and nonparametric (convex hull peeling, bagplot and HPDregionplot) methods available for constructing confidence regions for two or more parameters. However, these methods have some key deficiencies including biased estimation of the true coverage rate, failure to account for the shape of the distribution inherent in the data and difficulty to implement. The purpose of this paper is to develop a new distribution-free method for constructing the confidence region that is based only on a few basic geometrical principles and accounts for the actual shape of the distribution inherent in the real data. The new method is implemented in an R package, distfree.cr/R. The statistical properties of the new method are evaluated and compared with those of the other methods through Monte Carlo simulation. Our new method outperforms the other methods regardless of whether the samples are taken from normal or non-normal bivariate distributions. In addition, the superiority of our method is consistent across different sample sizes and different levels of correlation between the two variables. We also analyze three biological data sets to illustrate the use of our new method for genomics and other biological researches.  相似文献   

18.
The Ornstein-Uhlenbeck process has been proposed as a model for the spontaneous activity of a neuron. In this model, the firing of the neuron corresponds to the first passage of the process to a constant boundary, or threshold. While the Laplace transform of the first-passage time distribution is available, the probability distribution function has not been obtained in any tractable form. We address the problem of estimating the parameters of the process when the only available data from a neuron are the interspike intervals, or the times between firings. In particular, we give an algorithm for computing maximum likelihood estimates and their corresponding confidence regions for the three identifiable (of the five model) parameters by numerically inverting the Laplace transform. A comparison of the two-parameter algorithm (where the time constant tau is known a priori) to the three-parameter algorithm shows that significantly more data is required in the latter case to achieve comparable parameter resolution as measured by 95% confidence intervals widths. The computational methods described here are a efficient alternative to other well known estimation techniques for leaky integrate-and-fire models. Moreover, it could serve as a template for performing parameter inference on more complex integrate-and-fire neuronal models.  相似文献   

19.
Divergence time and substitution rate are seriously confounded in phylogenetic analysis, making it difficult to estimate divergence times when the molecular clock (rate constancy among lineages) is violated. This problem can be alleviated to some extent by analyzing multiple gene loci simultaneously and by using multiple calibration points. While different genes may have different patterns of evolutionary rate change, they share the same divergence times. Indeed, the fact that each gene may violate the molecular clock differently leads to the advantage of simultaneous analysis of multiple loci. Multiple calibration points provide the means for characterizing the local evolutionary rates on the phylogeny. In this paper, we extend previous likelihood models of local molecular clock for estimating species divergence times to accommodate multiple calibration points and multiple genes. Heterogeneity among different genes in evolutionary rate and in substitution process is accounted for by the models. We apply the likelihood models to analyze two mitochondrial protein-coding genes, cytochrome oxidase II and cytochrome b, to estimate divergence times of Malagasy mouse lemurs and related outgroups. The likelihood method is compared with the Bayes method of Thorne et al. (1998, Mol. Biol. Evol. 15:1647-1657), which uses a probabilistic model to describe the change in evolutionary rate over time and uses the Markov chain Monte Carlo procedure to derive the posterior distribution of rates and times. Our likelihood implementation has the drawbacks of failing to accommodate uncertainties in fossil calibrations and of requiring the researcher to classify branches on the tree into different rate groups. Both problems are avoided in the Bayes method. Despite the differences in the two methods, however, data partitions and model assumptions had the greatest impact on date estimation. The three codon positions have very different substitution rates and evolutionary dynamics, and assumptions in the substitution model affect date estimation in both likelihood and Bayes analyses. The results demonstrate that the separate analysis is unreliable, with dates variable among codon positions and between methods, and that the combined analysis is much more reliable. When the three codon positions were analyzed simultaneously under the most realistic models using all available calibration information, the two methods produced similar results. The divergence of the mouse lemurs is dated to be around 7-10 million years ago, indicating a surprisingly early species radiation for such a morphologically uniform group of primates.  相似文献   

20.
We present a hidden Markov model (HMM) for inferring gradual isolation between two populations during speciation, modelled as a time interval with restricted gene flow. The HMM describes the history of adjacent nucleotides in two genomic sequences, such that the nucleotides can be separated by recombination, can migrate between populations, or can coalesce at variable time points, all dependent on the parameters of the model, which are the effective population sizes, splitting times, recombination rate, and migration rate. We show by extensive simulations that the HMM can accurately infer all parameters except the recombination rate, which is biased downwards. Inference is robust to variation in the mutation rate and the recombination rate over the sequence and also robust to unknown phase of genomes unless they are very closely related. We provide a test for whether divergence is gradual or instantaneous, and we apply the model to three key divergence processes in great apes: (a) the bonobo and common chimpanzee, (b) the eastern and western gorilla, and (c) the Sumatran and Bornean orang-utan. We find that the bonobo and chimpanzee appear to have undergone a clear split, whereas the divergence processes of the gorilla and orang-utan species occurred over several hundred thousands years with gene flow stopping quite recently. We also apply the model to the Homo/Pan speciation event and find that the most likely scenario involves an extended period of gene flow during speciation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号