首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper concerns the genealogical structure of a sample of chromosomes sharing a neutral rare allele. We suppose that the mutation giving rise to the allele has only happened once in the history of the entire population, and that the allele is of known frequency q in the population. Within a coalescent framework C. Wiuf and P. Donnelly (1999, Theor. Popul. Biol. 56, 183-201) derived an exact analysis of the conditional genealogy but it is inconvenient for applications. Here, we develop an approximation to the exact distribution of the conditional genealogy, including an approximation to the distribution of the time at which the mutation arose. The approximations are accurate for frequencies q<5-10%. In addition, a simple and fast simulation scheme is constructed. We consider a demography parameterized by a d-dimensional vector alpha=(alpha(1), em leader, alpha(d)). It is shown that the conditional genealogy and the age of the mutation have distributions that depend on a=qalpha and q only, and that the effect of q is a linear scaling of times in the genealogy; if q is doubled, the lengths of all branches in the genealogy are doubled. The theory is exemplified in two different demographies of some interest in the study of human evolution: (1) a population of constant size and (2) a population of exponentially decreasing size (going backward in time).  相似文献   

2.
In this paper we consider the genealogy of a random sample of n chromosomes from a panmictic population which has evolved with constant size N over many generations. We address two related problems. First we describe how genealogical information may be usefully partitioned into information on the events (mutations and coalescences) which occur in the genealogy, and the times between these events. We show that the distribution of the times given information on the events is particularly simple and describe how this can considerably reduce the computational burden when performing inference for these times. Second we investigate the effect on the genealogy of conditioning on a single mutation having occurred during the ancestry of the sample. In particular we use results from the first part of the paper to derive explicit formulae for the density of the age of a mutant allele, conditional on its frequency in either a sample or the population.  相似文献   

3.
Gene Genealogies within Mutant Allelic Classes   总被引:2,自引:2,他引:0  
M. Slatkin 《Genetics》1996,143(1):579-587
A coalescent theory of the gene genealogy within an allelic class that arises by a unique mutational event is developed and analyzed. To interpret this theory it was necessary to expand on existing theory for populations of varying size. Two features of the gene genealogy--the average pairwise distance and the total tree length--within the mutant class and within the nonmutant class are found. An index, I, is proposed that describes the extent to which a genealogy is similar to one from a population of constant size (for which I = 0) or to a star genealogy (for which I = 1). The value of I is positive in growing populations and is generally positive for the gene genealogy for the mutant class. The value of I is negative for a population decreasing in size and for the nonmutant class, if the mutant arose recently. The results are discussed in the context of the infinite sites model of mutation, which is appropriate for nucleotide sequence data, and the generalized stepwise mutation model, which is appropriate for microsatellite loci. The same genealogical methods are used to find the probability of at least one recombination event between a nucleotide that defines an allelic class and a marker at a nearby linked site.  相似文献   

4.
Determining the expected distribution of the time to the most recent common ancestor of a sample of individuals may deliver important information about the genetic markers and evolution of the population. In this paper, we introduce a new recursive algorithm to calculate the distribution of the time to the most recent common ancestor of the sample from a population evolved by any conditional multinomial sampling model. The most important advantage of our method is that it can be applied to a sample of any size drawn from a population regardless of its size growth pattern. We also present a very efficient method to implement and store the genealogy tree of the population evolved by the Galton–Watson process. In the final section we present results applied to a simulated population with a single bottleneck event and to real populations of known size histories.  相似文献   

5.
Determining the expected distribution of the time to the most recent common ancestor of a sample of individuals may deliver important information about the genetic markers and evolution of the population. In this paper, we introduce a new recursive algorithm to calculate the distribution of the time to the most recent common ancestor of the sample from a population evolved by any conditional multinomial sampling model. The most important advantage of our method is that it can be applied to a sample of any size drawn from a population regardless of its size growth pattern. We also present a very efficient method to implement and store the genealogy tree of the population evolved by the Galton–Watson process. In the final section we present results applied to a simulated population with a single bottleneck event and to real populations of known size histories.  相似文献   

6.
Genealogy of neutral genes in two partially isolated populations   总被引:1,自引:0,他引:1  
Gene genealogy in two partially isolated populations which diverged at a given time t in the past and have since been exchanging individuals at a constant rate m is studied based upon an analytic method for large t and a simulation method for any t. Particular attention is paid to the conditions under which neutral genes sampled from populations are mono-, para-, and polyphyletic in terms of coalescence (divergence) times of genes. It is shown tha the probability of monophyly is high if M = 2Nm less than 0.5 and T = t/(2N) greater than 1, where N is the size of ancestral and descendant haploid populations, in which case most gene genealogies are likely to be concordant with the population relatedness. This probbility decreases as the sample size of genes increases. On the other hand, the case where the probability of monophyly is low will be either that of M greater than 1 and any T or that of M less than 1 and T less than 1, but the clear distinction between these conditions appears very difficult to make. These results are also examined if the gene genealogy is reconstructed from nucleotide differences. It is then shown that the results based upon coalescence times remain valid if the number of nucleotide differences between any pair of genes is not much smaller than 10. To observe such large nucleotide differences in small populations and therefore infer a reliable gene genealogy, we must examine a fairly long stretch of DNA sequences.  相似文献   

7.
Slade PF  Wakeley J 《Genetics》2005,169(2):1117-1131
We show that the unstructured ancestral selection graph applies to part of the history of a sample from a population structured by restricted migration among subpopulations, or demes. The result holds in the limit as the number of demes tends to infinity with proportionately weak selection, and we have also made the assumptions of island-type migration and that demes are equivalent in size. After an instantaneous sample-size adjustment, this structured ancestral selection graph converges to an unstructured ancestral selection graph with a mutation parameter that depends inversely on the migration rate. In contrast, the selection parameter for the population is independent of the migration rate and is identical to the selection parameter in an unstructured population. We show analytically that estimators of the migration rate, based on pairwise sequence differences, derived under the assumption of neutrality should perform equally well in the presence of weak selection. We also modify an algorithm for simulating genealogies conditional on the frequencies of two selected alleles in a sample. This permits efficient simulation of stronger selection than was previously possible. Using this new algorithm, we simulate gene genealogies under the many-demes ancestral selection graph and identify some situations in which migration has a strong effect on the time to the most recent common ancestor of the sample. We find that a similar effect also increases the sensitivity of the genealogy to selection.  相似文献   

8.
Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology. Here, we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration. The Kingman coalescent model is used to describe the time structure of the ancestral tree. We recover information about the unknown true ancestral coalescent tree, population size, and the overall mutation rate from temporally spaced data, that is, from nucleotide sequences gathered at different times, from different individuals, in an evolving haploid population. We briefly discuss the methodological implications and show what can be inferred, in various practically relevant states of prior knowledge. We develop extensions for exponentially growing population size and joint estimation of substitution model parameters. We illustrate some of the important features of this approach on a genealogy of HIV-1 envelope (env) partial sequences.  相似文献   

9.
Fertility inheritance, a phenomenon in which an individual's number of offspring is positively correlated with his or her number of siblings, is a cultural process that can have a strong impact on genetic diversity. Until now, fertility inheritance has been detected primarily using genealogical databases. In this study, we develop a new method to infer fertility inheritance from genetic data in human populations. The method is based on the reconstruction of the gene genealogy of a sample of sequences from a given population and on the computation of the degree of imbalance in this genealogy. We show indeed that this level of imbalance increases with the level of fertility inheritance, and that other phenomena such as hidden population structure are unlikely to generate a signal of imbalance in the genealogy that would be confounded with fertility inheritance. By applying our method to mtDNA samples from 37 human populations, we show that matrilineal fertility inheritance is more frequent in hunter-gatherer populations than in food-producer populations. One possible explanation for this result is that in hunter-gatherer populations, individuals belonging to large kin networks may benefit from stronger social support and may be more likely to have a large number of offspring.  相似文献   

10.
The Genealogy of Samples in Models with Selection   总被引:1,自引:0,他引:1  
C. Neuhauser  S. M. Krone 《Genetics》1997,145(2):519-534
We introduce the genealogy of a random sample of genes taken from a large haploid population that evolves according to random reproduction with selection and mutation. Without selection, the genealogy is described by Kingman''s well-known coalescent process. In the selective case, the genealogy of the sample is embedded in a graph with a coalescing and branching structure. We describe this graph, called the ancestral selection graph, and point out differences and similarities with Kingman''s coalescent. We present simulations for a two-allele model with symmetric mutation in which one of the alleles has a selective advantage over the other. We find that when the allele frequencies in the population are already in equilibrium, then the genealogy does not differ much from the neutral case. This is supported by rigorous results. Furthermore, we describe the ancestral selection graph for other selective models with finitely many selection classes, such as the K-allele models, infinitely-many-alleles models, DNA sequence models, and infinitely-many-sites models, and briefly discuss the diploid case.  相似文献   

11.
Fertility inheritance, a phenomenon in which an individual's number of offspring is positively correlated with his or her number of siblings, is a cultural process that can have a strong impact on genetic diversity. Until now, fertility inheritance has been detected primarily using genealogical databases. In this study, we develop a new method to infer fertility inheritance from genetic data in human populations. The method is based on the reconstruction of the gene genealogy of a sample of sequences from a given population and on the computation of the degree of imbalance in this genealogy. We show indeed that this level of imbalance increases with the level of fertility inheritance, and that other phenomena such as hidden population structure are unlikely to generate a signal of imbalance in the genealogy that would be confounded with fertility inheritance. By applying our method to mtDNA samples from 37 human populations, we show that matrilineal fertility inheritance is more frequent in hunter–gatherer populations than in food-producer populations. One possible explanation for this result is that in hunter–gatherer populations, individuals belonging to large kin networks may benefit from stronger social support and may be more likely to have a large number of offspring.  相似文献   

12.
Genomic survey data now permit an unprecedented level of sensitivity in the detection of departures from canonical evolutionary models, including expansions in population size and selective sweeps. Here, we examine the effects of seemingly subtle differences among sampling distributions on goodness of fit analyses of site frequency spectra constructed from single nucleotide polymorphisms. Conditioning on the observation of exactly two alleles in a random sample results in a site frequency spectrum that is independent of the scaled rate of neutral substitution (θ). Other sampling distributions, including conditioning on a single mutational event in the sample genealogy or randomly selecting a single mutation from a genealogy with multiple mutations, have distinct site frequency spectra that show highly significant departures from the predictions of the biallelic model. Some aspects of data filtering may contribute to significant departures of site frequency spectra from expectation, apart from any violation of the standard neutral model.  相似文献   

13.
In this paper we consider the genealogy of two nested mutant alleles, assuming the constant-size neutral coalescent model with infinite sites mutation. We study the conditional genealogy and derive explicit formulas for the joint and marginal site frequency spectra for the double, single and zero mutant allele. In addition, we find the mean ages of the two mutations. We show that the age of the youngest mutation does not depend on the frequency of the single mutant allele and that the frequency spectra for the single mutant allele and the zero mutant allele are the same.  相似文献   

14.
A composite-conditional-likelihood (CCL) approach is proposed to map the position of a trait-influencing mutation (TIM) using the ancestral recombination graph (ARG) and importance sampling to reconstruct the genealogy of DNA sequences with respect to windows of marker loci and predict the linkage disequilibrium pattern observed in a sample of cases and controls. The method is designed to fine-map the location of a disease mutation, not as an association study. The CCL function proposed for the position of the TIM is a weighted product of conditional likelihood functions for windows of a given number of marker loci that encompass the TIM locus, given the sample configuration at the marker loci in those windows. A rare recessive allele is assumed for the TIM and single nucleotide polymorphisms (SNPs) are considered as markers. The method is applied to a range of simulated data sets. Not only do the CCL profiles converge more rapidly with smaller window sizes as the number of simulated histories of the sampled sequences increases, but the maximum-likelihood estimates for the position of the TIM remain as satisfactory, while requiring significantly less computing time. The simulations also suggest that non-random samples, more precisely, a non-proportional number of controls versus the number of cases, has little effect on the estimation procedure as well as sample size and marker density beyond some threshold values. Moreover, when compared with some other recent methods under the same assumptions, the CCL approach proves to be competitive.  相似文献   

15.
In this paper, we show how to construct the genealogy of a sample of genes for a large class of models with selection and mutation. Each gene corresponds to a single locus at which there is no recombination. The genealogy of the sample is embedded in a graph which we call theancestral selection graph. This graph contains all the information about the ancestry; it is the analogue of Kingman's coalescent process which arises in the case with no selection. The ancestral selection graph can be easily simulated and we outline an algorithm for simulating samples. The main goal is to analyze the ancestral selection graph and to compare it to Kingman's coalescent process. In the case of no mutation, we find that the distribution of the time to the most recent common ancestor does not depend on the selection coefficient and hence is the same as in the neutral case. When the mutation rate is positive, we give a procedure for computing the probability that two individuals in a sample are identical by descent and the Laplace transform of the time to the most recent common ancestor of a sample of two individuals; we evaluate the first two terms of their respective power series in terms of the selection coefficient. The probability of identity by descent depends on both the selection coefficient and the mutation rate and is different from the analogous expression in the neutral case. The Laplace transform does not have a linear correction term in the selection coefficient. We also provide a recursion formula that can be used to approximate the probability of a given sample by simulating backwards along the sample paths of the ancestral selection graph, a technique developed by Griffiths and Tavaré (1994).  相似文献   

16.
17.
M. K. Uyenoyama 《Genetics》1997,147(3):1389-1400
A method is proposed for characterizing the structure of genealogies among alleles that regulate selfincompatibility in flowering plants. Expected distributions of ratios of divergence times among alleles, scaled by functions of allele number, were generated by numerical simulation. These distributions appeared relatively insensitive to the particular parameter values assigned in the simulations over a fourfold range in effective population size and a 100-fold range in mutation rate. Generalized leastsquares estimates of the scaled indices were obtained from genealogies reconstructed from nucleotide sequences of self-incompatibility alleles from natural populations of two solanaceous species. Comparison of the observed indices to the expected distributions generated by numerical simulation indicated that the allelic genealogy of one species appeared consistent with the symmetric balancing selection generated by self-incompatibility. However, the allelic genealogy of the second species showed unusually long terminal branches, suggesting the operation of additional evolutionary processes.  相似文献   

18.
We develop a Bayesian simulation based approach for determining the sample size required for estimating a binomial probability and the difference between two binomial probabilities where we allow for dependence between two fallible diagnostic procedures. Examples include estimating the prevalence of disease in a single population based on results from two imperfect diagnostic tests applied to sampled individuals, or surveys designed to compare the prevalences of two populations using diagnostic outcomes that are subject to misclassification. We propose a two stage procedure in which the tests are initially assumed to be independent conditional on true disease status (i.e. conditionally independent). An interval based sample size determination scheme is performed under this assumption and data are collected and used to test the conditional independence assumption. If the data reveal the diagnostic tests to be conditionally dependent, structure is added to the model to account for dependence and the sample size routine is repeated in order to properly satisfy the criterion under the correct model. We also examine the impact on required sample size when adding an extra heterogeneous population to a study.  相似文献   

19.
 It is shown how the mean ancestral times at one locus are affected in a two- locus model with recombination when information is given regarding the number of segregating sites at another locus. For samples of n genes, recursive equations are derived that describe precisely the evolution of the time-depth of such a linked genealogy. Exact numerical solutions and Markov chain Monte Carlo simulations are discussed and compared. The dependence of some properties of a singleton mutation on waiting times between events in the two-locus genealogy is quantified and illustrates the effect of recombination on these properties. The following cases are presented: (1) the distribution of the number of mutant genes in a sample arising from a singleton mutation; (2) the probability that an allele observed in a genes of a sample of size n is the ancestral type (the oldest); (3) the expectation and variance of the age of a mutant having b copies in a sample of n genes. Received: 1 September 2000 / Revised version: 1 October 2001 / Published online: 8 May 2002  相似文献   

20.
How often are polymorphic restriction sites due to a single mutation?   总被引:2,自引:0,他引:2  
An approximate expression is obtained for the probability that a restriction site, which is polymorphic in a random sample, is a site at which two or more mutations have occurred in the descent to the sample from the most recent common ancestor of the sample. The analysis is based on the assumption that the population from which the sample is obtained is at equilibrium under a selectively neutral Wright-Fisher model. Monte Carlo simulations show that the approximation is quite accurate. For commonly observed levels of genetic variation in humans and in natural populations of Drosophila, it is found that multiple mutations would occur at 5 to 10 percent of polymorphic restriction sites assuming that six-cutter enzymes are used on samples of size 50 to 100. Simulations are also used to investigate the bias and mean square error of four estimators of 4Nu, where N is the population size and u is the neutral mutation rate per nucleotide site. Two of the estimators are biased by approximately 20 percent when levels of variation are similar to those which have been observed in natural populations of Drosophila.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号