首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Innan H  Zhang K  Marjoram P  Tavaré S  Rosenberg NA 《Genetics》2005,169(3):1763-1777
Several tests of neutral evolution employ the observed number of segregating sites and properties of the haplotype frequency distribution as summary statistics and use simulations to obtain rejection probabilities. Here we develop a “haplotype configuration test” of neutrality (HCT) based on the full haplotype frequency distribution. To enable exact computation of rejection probabilities for small samples, we derive a recursion under the standard coalescent model for the joint distribution of the haplotype frequencies and the number of segregating sites. For larger samples, we consider simulation-based approaches. The utility of the HCT is demonstrated in simulations of alternative models and in application to data from Drosophila melanogaster.  相似文献   

2.
Navarro A  Barton NH 《Genetics》2002,161(2):849-863
We studied the effect of multilocus balancing selection on neutral nucleotide variability at linked sites by simulating a model where diallelic polymorphisms are maintained at an arbitrary number of selected loci by means of symmetric overdominance. Different combinations of alleles define different genetic backgrounds that subdivide the population and strongly affect variability. Several multilocus fitness regimes with different degrees of epistasis and gametic disequilibrium are allowed. Analytical results based on a multilocus extension of the structured coalescent predict that the expected linked neutral diversity increases exponentially with the number of selected loci and can become extremely large. Our simulation results show that although variability increases with the number of genetic backgrounds that are maintained in the population, it is reduced by random fluctuations in the frequencies of those backgrounds and does not reach high levels even in very large populations. We also show that previous results on balancing selection in single-locus systems do not extend to the multilocus scenario in a straightforward way. Different patterns of linkage disequilibrium and of the frequency spectrum of neutral mutations are expected under different degrees of epistasis. Interestingly, the power to detect balancing selection using deviations from a neutral distribution of allele frequencies seems to be diminished under the fitness regime that leads to the largest increase of variability over the neutral case. This and other results are discussed in the light of data from the Mhc.  相似文献   

3.
In order to analyze the pattern of DNA polymorphism in detail, we have developed a simple method using a new statistic theta(i) which estimates 4Nmu from the number of segregating sites whose allelic nucleotide frequency is i/n among n DNA sequences, where N is the effective population size and mu is the mutation rate per generation per nucleotide site. Under the assumption that mutations are selectively neutral and a population size is constant, the expectation of theta(i) is equal to that of theta, which estimates 4Nmu from the number of segregating sites, so that the distribution of theta(i) is flat. Therefore, the departure of the distribution of theta(i) from the horizontal line, which represents the value of theta, reflects change in population size and natural selection. Results of the coalescent simulation show that the distributions of theta(i) in the populations which experienced expansion and reduction are U-shaped and upside-down U-shaped, respectively. And the distributions of theta(i) in some populations that experienced bottleneck are W-shaped. Furthermore, we have applied this method to the SNP data in the International HapMap Project. Results of data analyses show that the distributions of theta(i) in the CEU (European), CHB and JPT (Asian) populations are different from that in the YRI population (African). From these results of data analyses in nuclear DNA and the pattern of polymorphism in human mitochondrial DNA already known, we infer that the CEU, CHB and JPT populations experienced the bottleneck.  相似文献   

4.
Detecting the signature of adaptation on nucleotide variation is often difficult in species that like Arabidopsis thaliana might have a complex demographic history. Recent re-sequencing surveys in this species provided genome-wide information that would mainly reflect its demographic history. We have used a large empirical data set (LED) as well as multilocus coalescent simulations to analyse sequence variation at loci involved in the phenylpropanoid pathway of this species. We surveyed and examined DNA sequence variation at nine of these loci (about 19.7 kb) in 23 accessions of A. thaliana and one accession of its closely related species Arabidopsis lyrata . Nucleotide variation was lower at nonsynonymous sites than at silent sites in all loci, indicating generalized functional constraint at the protein level. No association between variation and position in the metabolic pathway was detected. When the data were contrasted against the standard neutral model, significant deviations for silent variation were detected with Tajima's D , Fu's FS and Fay and Wu's H multilocus test statistics. These deviations were in the same direction than in previous large-scale multilocus analyses, suggesting a genome-wide effect. When the nine-locus data set was contrasted against the large empirical data set, the level (Watterson's θ) and pattern of variation (Tajima's D ) detected in these loci did not deviate either at the single-locus or multilocus level from the corresponding empirical distributions. These results would support an important role of the demographic history of A. thaliana in shaping nucleotide variation at the nine studied phenylpropanoid loci. The potential and limitations of the empirical distribution approach are discussed.  相似文献   

5.
 Multivariate analysis is a branch of statistics that successfully exploits the powerful tools of linear algebra to obtain a fairly comprehensive theory of estimation. The purpose of this paper is to explore to what extent a linear theory of estimation can be developed in the context of coalescent models used in the analysis of DNA polymorphism. We consider a large class of coalescent models, of which the neutral infinite sites model is one example. In the process, we discover several limitations of linear estimators that are quite distinct from those in the classical theory. In particular, we prove that there does not exist a uniformly BLUE (best linear unbiased estimator) for the scaled mutation parameter, under the assumptions of the neutral model of evolution. In fact, we show that no linear estimator performs uniformly better than the Watterson (1975) method based on the total number of segregating sites. For certain coalescent models, the segregating-sites estimator is actually optimal. The general conclusion is the following. If genealogical information is useful for estimating the rate of evolution, then there is no optimal linear method. If there is an optimal linear method, then no information other than the total number of segregating sites is needed. Received: 29 July 1998 / Revised version: 9 October 1998  相似文献   

6.
IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING?   总被引:5,自引:0,他引:5  
The advent and maturation of algorithms for estimating species trees—phylogenetic trees that allow gene tree heterogeneity and whose tips represent lineages, populations and species, as opposed to genes—represent an exciting confluence of phylogenetics, phylogeography, and population genetics, and ushers in a new generation of concepts and challenges for the molecular systematist. In this essay I argue that to better deal with the large multilocus datasets brought on by phylogenomics, and to better align the fields of phylogeography and phylogenetics, we should embrace the primacy of species trees, not only as a new and useful practical tool for systematics, but also as a long‐standing conceptual goal of systematics that, largely due to the lack of appropriate computational tools, has been eclipsed in the past few decades. I suggest that phylogenies as gene trees are a “local optimum” for systematics, and review recent advances that will bring us to the broader optimum inherent in species trees. In addition to adopting new methods of phylogenetic analysis (and ideally reserving the term “phylogeny” for species trees rather than gene trees), the new paradigm suggests shifts in a number of practices, such as sampling data to maximize not only the number of accumulated sites but also the number of independently segregating genes; routinely using coalescent or other models in computer simulations to allow gene tree heterogeneity; and understanding better the role of concatenation in influencing topologies and confidence in phylogenies. By building on the foundation laid by concepts of gene trees and coalescent theory, and by taking cues from recent trends in multilocus phylogeography, molecular systematics stands to be enriched. Many of the challenges and lessons learned for estimating gene trees will carry over to the challenge of estimating species trees, although adopting the species tree paradigm will clarify many issues (such as the nature of polytomies and the star tree paradox), raise conceptually new challenges, or provide new answers to old questions.  相似文献   

7.
The extent to which natural selection shapes diversity within populations is a key question for population genetics. Thus, there is considerable interest in quantifying the strength of selection. A full likelihood approach for inference about selection at a single site within an otherwise neutral fully linked sequence of sites is described here. A coalescent model of evolution is used to model the ancestry of a sample of DNA sequences which have the selected site segregating. The mutation model, for the selected and neutral sites, is the infinitely many-sites model where there is no back or parallel mutation at sites. A unique perfect phylogeny, a gene tree, can be constructed from the configuration of mutations on the sample sequences under this model of mutation. The approach is general and can be used for any bi-allelic selection scheme. Selection is incorporated through modelling the frequency of the selected and neutral allelic classes stochastically back in time, then using a subdivided population model considering the population frequencies through time as variable population sizes. An importance sampling algorithm is then used to explore over coalescent tree space consistent with the data. The method is applied to a simulated data set and the gene tree presented in Verrelli et al. (2002).  相似文献   

8.
9.
Interpretations of phylogeographic patterns can change when analyses shift from single gene-tree to multilocus coalescent analyses. Using multilocus coalescent approaches, a species tree and divergence times can be estimated from a set of gene trees while accounting for gene-tree stochasticity. We utilized the conceptual strengths of a multilocus coalescent approach coupled with complete range-wide sampling to examine the speciation history of a broadly distributed, North American warm-desert toad, Anaxyrus punctatus. Phylogenetic analyses provided strong support for three major lineages within A. punctatus. Each lineage broadly corresponded to one of three desert regions. Early speciation in A. punctatus appeared linked to late Miocene-Pliocene development of the Baja California peninsula. This event was likely followed by a Pleistocene divergence associated with the separation of the Chihuahuan and Sonoran Deserts. Our multilocus coalescent-based reconstruction provides an informative contrast to previous single gene-tree estimates of the evolutionary history of A. punctatus.  相似文献   

10.
The Coalescent Process in Models with Selection   总被引:23,自引:12,他引:11       下载免费PDF全文
N. L. Kaplan  T. Darden    R. R. Hudson 《Genetics》1988,120(3):819-829
Statistical properties of the process describing the genealogical history of a random sample of genes are obtained for a class of population genetics models with selection. For models with selection, in contrast to models without selection, the distribution of this process, the coalescent process, depends on the distribution of the frequencies of alleles in the ancestral generations. If the ancestral frequency process can be approximated by a diffusion, then the mean and the variance of the number of segregating sites due to selectively neutral mutations in random samples can be numerically calculated. The calculations are greatly simplified if the frequencies of the alleles are tightly regulated. If the mutation rates between alleles maintained by balancing selection are low, then the number of selectively neutral segregating sites in a random sample of genes is expected to substantially exceed the number predicted under a neutral model.  相似文献   

11.
Y. X. Fu 《Genetics》1996,144(2):829-838
The number of segregating sites in a sample of DNA sequences and the age of the most recent common ancestor (MRCA) of the sequences in the sample are positively correlated. The value of the former can be used to estimate the value of the latter. Using the coalescent approach, we derive in this paper the joint probability distribution of the number of segregating sites and the age of the MRCA of a sample under the neutral Wright-Fisher model. From this distribution, we are able to compute the likelihood function of the number of segregating sites and the posterior probability of the age of the MRCA of a sample. Three point estimators and one interval estimator of the age of the MRCA are developed; their relationships and properties are investigated. The estimation of the age of the MRCA of human Y chromosomes from a sample of no variation is discussed.  相似文献   

12.
The ratio of singletons to the total number of segregating sites is used to estimate a reproduction parameter in a population model of large offspring numbers without having to jointly estimate the mutation rate. For neutral genetic variation, the ratio of singletons to the total number of segregating sites is equivalent to the ratio of total length of external branches to the total length of the gene genealogy. A multinomial maximum likelihood method that takes into account more frequency classes than just the singletons is developed to estimate the parameter of another large offspring number model. The performance of these methods with regard to sample size, mutation rate, and bias, is investigated by simulation. The expected value of the ratio of the total length of external branches to the total length of the whole tree is, using simulation, shown to decrease for the Kingman coalescent as sample size increases, but can increase or decrease, depending on parameter values, for Λ coalescents. Considering ratios of tree statistics, as opposed to considering lengths of various subtrees separately, can yield better insight into the dynamics of gene genealogies.  相似文献   

13.
R. W. Allard  Q. Zhang  MAS. Maroof    O. M. Muona 《Genetics》1992,131(4):957-969
Data from 311 selfed families isolated from four generations (F8, F13, F23, F45) of an experimental barley population were analyzed to determine patterns of change in character expression for seven quantitative traits, and in single-locus allelic frequencies, and multilocus genetic structure, for 16 Mendelian loci that code for discretely recognizable variants. The analyses showed that large changes in single-locus allelic frequencies and major reorganizations in multilocus genetic structure occurred in each of the generation-to-generation transitions examined. Although associations among a few traits persisted over generations, dynamic dissociations and reassociations occurred among several traits in each generation-transition period. Overall, the restructuring that occurred was characterized by gradual decreases in the number of clusters of associated traits and increases in the number of traits within each cluster. The observed changes in single-locus frequencies and in multilocus genetic structure were attributed to interplay among various evolutionary factors among which natural selection acting in a temporally heterogeneous environment was the guiding force.  相似文献   

14.
Balancing selection at one locus can increase the amount of selectively neutral variation within neighboring genomic regions. Discrete phenotypic polymorphisms studied in natural populations are frequently determined by sets of interacting genes instead of alternative alleles at single loci. We extend coalescent theory to investigate balancing selection on combinations of linked genes. We find that variation at neutral sites is increased across a much larger genomic region relative to the single-locus models: the entire region lying between the two loci in balanced combination is affected to some degree. Epistatic selection maintains these high levels of neutral variation because it directly opposes the homogenizing effect of recombination. The results of the theory are discussed in relation to published gene sequence data, primarily from Drosophila.  相似文献   

15.
Wright SI  Charlesworth B 《Genetics》2004,168(2):1071-1076
We present a maximum-likelihood-ratio test of the standard neutral model, using multilocus data on polymorphism within species and divergence between species. The model is based on the Hudson-Kreitman-Aguade (HKA) test, but allows for an explicit test of selection at individual loci in a multilocus framework. We use coalescent simulations to show that the likelihood-ratio test statistic is conservative, particularly when the assumption of no recombination is violated. Application of the method to polymorphism data from 18 loci from a population of Arabidopsis lyrata provides significant evidence for a balanced polymorphism at a candidate locus thought to be linked to the centromere. The method is also applied to polymorphism data in maize, providing support for the hypothesis of directional selection on genes in the starch pathway.  相似文献   

16.
We investigate the effect of purifying selection at multiple sites on both the shape of the genealogy and the distribution of mutations on the tree. We find that the primary effect of purifying selection on a genealogy is to shift the distribution of mutations on the tree, whereas the shape of the tree remains largely unchanged. This result is relevant to the large number of coalescent estimation procedures, which generally assume neutrality for segregating polymorphisms--applying these estimators to evolutionarily constrained sequences could lead to a significant degree of bias. We also estimate the statistical power of several neutrality tests in detecting weak to moderate purifying selection and find that the power is quite good for some parameter combinations. This result contrasts with previous studies, which predicted low statistical power because of the minor effect that weak purifying selection has on the shape of a genealogy. Finally, we investigate the effect of Hill-Robertson interference among linked deleterious mutations on patterns of molecular variation. We find that dependence among selected loci can substantially reduce the efficacy of even fairly strong purifying selection.  相似文献   

17.
Proportional and separate models able to apply different combination of substitution rate matrix (SRM) and among-site rate variation model (ASRVM) to each locus are frequently used in phylogenetic studies of multilocus data. A proportional model assumes that branch lengths are proportional among partitions and a separate model assumes that each partition has an independent set of branch lengths. However, the selection from among nonpartitioned (i.e., a common combination of models is applied to all-loci concatenated sequences), proportional and separate models is usually based on the researcher's preference rather than on any information criteria. This study describes two programs, 'Kakusan4' (for DNA sequences) and 'Aminosan' (for amino-acid sequences), which allow the selection of evolutionary models based on several types of information criteria. The programs can handle both multilocus and single-locus data, in addition to providing an easy-to-use wizard interface and a noninteractive command line interface. In the case of multilocus data, SRMs and ASRVMs are compared at each locus and at all-loci concatenated sequences, after which nonpartitioned, proportional and separate models are compared based on information criteria. The programs also provide model configuration files for mrbayes, paup*, phyml, raxml and Treefinder to support further phylogenetic analysis using a selected model. When likelihoods are optimized by Treefinder, the best-fit models were found to differ depending on the data set. Furthermore, differences in the information criteria among nonpartitioned, proportional and separate models were much larger than those among the nonpartitioned models. These findings suggest that selecting from nonpartitioned, proportional and separate models results in a better phylogenetic tree. Kakusan4 and Aminosan are available at http://www.fifthdimension.jp/. They are licensed under gnugpl Ver.2, and are able to run on Windows, MacOS X and Linux.  相似文献   

18.
The relationship between speciation times and the corresponding times of gene divergence is of interest in phylogenetic inference as a means of understanding the past evolutionary dynamics of populations and of estimating the timing of speciation events. It has long been recognized that gene divergence times might substantially pre-date speciation events. Although the distribution of the difference between these has previously been studied for the case of two populations, this distribution has not been explicitly computed for larger species phylogenies. Here we derive a simple method for computing this distribution for trees of arbitrary size. A two-stage procedure is proposed which (i) considers the probability distribution of the time from the speciation event at the root of the species tree to the gene coalescent time conditionally on the number of gene lineages available at the root; and (ii) calculates the probability mass function for the number of gene lineages at the root. This two-stage approach dramatically simplifies numerical analysis, because in the first step the conditional distribution does not depend on an underlying species tree, while in the second step the pattern of gene coalescence prior to the species tree root is irrelevant. In addition, the algorithm provides intuition concerning the properties of the distribution with respect to the various features of the underlying species tree. The methodology is complemented by developing probabilistic formulae and software, written in R. The method and software are tested on five-taxon species trees with varying levels of symmetry. The examples demonstrate that more symmetric species trees tend to have larger mean coalescent times and are more likely to have a unimodal gamma-like distribution with a long right tail, while asymmetric trees tend to have smaller mean coalescent times with an exponential-like distribution. In addition, species trees with longer branches generally have shorter mean coalescent times, with branches closest to the root of the tree being most influential.  相似文献   

19.
Genome-scale sequence data have become increasingly available in the phylogenetic studies for understanding the evolutionary histories of species. However, it is challenging to develop probabilistic models to account for heterogeneity of phylogenomic data. The multispecies coalescent model describes gene trees as independent random variables generated from a coalescence process occurring along the lineages of the species tree. Since the multispecies coalescent model allows gene trees to vary across genes, coalescent-based methods have been popularly used to account for heterogeneous gene trees in phylogenomic data analysis. In this paper, we summarize and evaluate the performance of coalescent-based methods for estimating species trees from genome-scale sequence data. We investigate the effects of deep coalescence and mutation on the performance of species tree estimation methods. We found that the coalescent-based methods perform well in estimating species trees for a large number of genes, regardless of the degree of deep coalescence and mutation. The performance of the coalescent methods is negatively correlated with the lengths of internal branches of the species tree.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号