首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Estimating the age of the common ancestor of a sample of DNA sequences   总被引:10,自引:3,他引:7  
We present a simple Monte Carlo method for estimating the age of the most recent common ancestor (MRCA) of a sample of DNA sequences. We show that Templeton's (1993) estimator of the age of the MRCA based on the maximum number of nucleotide differences between two sequences in a sample is inaccurate, and we demonstrate the new method by reanalyzing a sample of DNA sequences from human Y chromosomes and a sample of human Alu sequences.   相似文献   

2.
Statistical Properties of a DNA Sample under the Finite-Sites Model   总被引:1,自引:0,他引:1       下载免费PDF全文
Z. Yang 《Genetics》1996,144(4):1941-1950
Statistical properties of a DNA sample from a random-mating population of constant size are studied under the finite-sites model. It is assumed that there is no migration and no recombination occurs within the locus. A Markov process model is used for nucleotide substitution, allowing for multiple substitutions at a single site. The evolutionary rates among sites are treated as either constant or variable. The general likelihood calculation using numerical integration involves intensive computation and is feasible for three or four sequences only; it may be used for validating approximate algorithms. Methods are developed to approximate the probability distribution of the number of segregating sites in a random sample of n sequences, with either constant or variable substitution rates across sites. Calculations using parameter estimates obtained for human D-loop mitochondrial DNAs show that among-site rate variation has a major effect on the distribution of the number of segregating sites; the distribution under the finite-sites model with variable rates among sites is quite different from that under the infinite-sites model.  相似文献   

3.
Wiuf C  Hein J 《Genetics》1999,151(3):1217-1228
In this article we discuss the ancestry of sequences sampled from the coalescent with recombination with constant population size 2N. We have studied a number of variables based on simulations of sample histories, and some analytical results are derived. Consider the leftmost nucleotide in the sequences. We show that the number of nucleotides sharing a most recent common ancestor (MRCA) with the leftmost nucleotide is approximately log(1 + 4N Lr)/4Nr when two sequences are compared, where L denotes sequence length in nucleotides, and r the recombination rate between any two neighboring nucleotides per generation. For larger samples, the number of nucleotides sharing MRCA with the leftmost nucleotide decreases and becomes almost independent of 4N Lr. Further, we show that a segment of the sequences sharing a MRCA consists in mean of 3/8Nr nucleotides, when two sequences are compared, and that this decreases toward 1/4Nr nucleotides when the whole population is sampled. A measure of the correlation between the genealogies of two nucleotides on two sequences is introduced. We show analytically that even when the nucleotides are separated by a large genetic distance, but share MRCA, the genealogies will show only little correlation. This is surprising, because the time until the two nucleotides shared MRCA is reciprocal to the genetic distance. Using simulations, the mean time until all positions in the sample have found a MRCA increases logarithmically with increasing sequence length and is considerably lower than a theoretically predicted upper bound. On the basis of simulations, it turns out that important properties of the coalescent with recombinations of the whole population are reflected in the properties of a sample of low size.  相似文献   

4.
The number of segregating sites provides an indicator of the degree of DNA sequence variation that is present in a sample, and has been of great interest to the biological, pharmaceutical and medical professions. In this paper, we first provide linear- and expected-sublinear-time algorithms for finding all the segregating sites of a given set of DNA sequences. We also describe a data structure for tracking segregating sites in a set of sequences, such that every time the set is updated with the insertion of a new sequence or removal of an existing one, the segregating sites are updated accordingly without the need to re-scan the entire set of sequences.  相似文献   

5.
Within-population variation at the DNA level will rarely be studied by sequencing of loci of randomly chosen individuals. Instead, individuals will usually be chosen for sequencing based on some knowledge of their genotype. Data collected in this way require new sampling theory. Motivated by these observations, we have examined the sampling properties of a finite population model with two mutation processes and with no selection or recombination. One mutation process generates new alleles according to an infinite-alleles model, and the other generates polymorphisms at sites according to an infinite-sites model. A sample of n genes is considered. The stationary distribution of the number of segregating sites in a subsample from one of the allelic classes in the sample conditional on the allelic configuration of the sample is studied. A recursive scheme is developed to compute the moments of this distribution, and it is shown that the distribution is functionally independent of the number of additional alleles in the sample and their respective frequencies in the sample. For the case in which the sample contains only two alleles, the distribution of the number of segregating sites in a subsample containing both alleles conditional on the sample frequencies of the alleles is studied. The results are applied to the analysis of DNA sequences of two alleles found at the Adh locus of Drosophila melanogaster. No significant departure from the neutral model is detected.  相似文献   

6.
R Nielsen  D M Weinreich 《Genetics》1999,153(1):497-506
McDonald/Kreitman tests performed on animal mtDNA consistently reveal significant deviations from strict neutrality in the direction of an excess number of polymorphic nonsynonymous sites, which is consistent with purifying selection acting on nonsynonymous sites. We show that under models of recurrent neutral and deleterious mutations, the mean age of segregating neutral mutations is greater than the mean age of segregating selected mutations, even in the absence of recombination. We develop a test of the hypothesis that the mean age of segregating synonymous mutations equals the mean age of segregating nonsynonymous mutations in a sample of DNA sequences. The power of this age-of-mutation test and the power of the McDonald/Kreitman test are explored by computer simulations. We apply the new test to 25 previously published mitochondrial data sets and find weak evidence for selection against nonsynonymous mutations.  相似文献   

7.
The frequency distribution of pairwise differences between sequences of mtDNA has recently been used to estimate the size of human populations before and after a hypothetical episode of rapid population growth and the time at which the population grew. To test the internal consistency of this method, we used three different sets of human mtDNA data and the corresponding demographic parameters estimated from the distribution of pairwise differences to determine by simulation the expected number of segregating sites, S, and its empirical distribution. The results indicate that the observed values of S are significantly lower than expected in two of three cases under the assumption of the infinite-sites model. Further simulations in which mutations were allowed to occur more than once at the same site and in which there was variation in mutation rate among sites show that the expected number of segregating sites can be much lower than under the infinite-site assumption. Nevertheless, the observed value of S is still significantly different from the value expected under the expansion hypothesis in two of three cases.   相似文献   

8.
In human populations, a null allele having several nucleotide differences from the wild-type allele is segregating at the FUT2 locus (the ABO-Secretor locus) encoding α(1,2)fucosyltransferase. To estimate the age of the most recent common ancestor (MRCA) of these two alleles, we sequenced FUT2 homologues from chimpanzee, gorilla, orangutan, and green monkey. Since we did not detect acceleration or any heterogeneity in the substitution rate at this locus among these species, the age of the MRCA was estimated to be around 3 MYA, assuming the divergence time of human and chimpanzee to be 5 MYA. We developed a simple test to examine whether or not the old age of the MRCA of the FUT2 is consistent with that expected for two divergent neutral alleles sampled from a random mating population. An application of the test to the data at FUT2 indicated that the age of the MRCA is too old to be explained by the simple neutral assumptions, although our test depends on accurate estimation of the divergence time of human and chimpanzee in units of twice the human population size. Various possibilities including balancing selection are discussed to explain this old age of the MRCA. Received: 9 May 1999 / Accepted: 20 September 1999  相似文献   

9.
 The distribution of the number of segregating sites among randomly sampled DNA sequences from a geographically structured population is studied. We assume the infinitely-many-sites model of neutral genes and no recombination. Employing the genealogical process, we derive an equation for the generating function of the distribution of the number of segregating sites. First we study the strong-migration limit and prove that the distribution converges to that for a panmictic population. We also study the case of two sampled DNA sequences in the d-dimensional torus model with homogeneous migration. Received 13 July 1995; received in revised form 21 April 1997  相似文献   

10.
In population genetics, under a neutral Wright-Fisher model, the scaling parameter straight theta=4Nmu represents twice the average number of new mutants per generation. The effective population size is N and mu is the mutation rate per sequence per generation. Watterson proposed a consistent estimator of this parameter based on the number of segregating sites in a sample of nucleotide sequences. We study the distribution of the Watterson estimator. Enlarging the size of the sample, we asymptotically set a Central Limit Theorem for the Watterson estimator. This exhibits asymptotic normality with a slow rate of convergence. We then prove the asymptotic efficiency of this estimator. In the second part, we illustrate the slow rate of convergence found in the Central Limit Theorem. To this end, by studying the confidence intervals, we show that the asymptotic Gaussian distribution is not a good approximation for the Watterson estimator.  相似文献   

11.
A formula is obtained for the probability that two genes at a single locus, sampled at random from a population at time t, are of particular types. The model assumed is a diffusion approximation to a neutral Wright-Fisher model in which mutation is general and not necessarily symmetric. An example is given of a population in which one allele has a high mutation rate, and the others have an equal, low mutation rate. The matrix Q, with elements given by the probability of sampling two alleles of particular types, is calculated exactly and approximately for this case. A formula is given for the distribution of the number of segregating sites occurring in two randomly sampled finite sequences of completely linked sites, with general mutation at a site and identical mutation structure between sites.  相似文献   

12.
It is known that under neutral mutation at a known mutation rate a sample of nucleotide sequences, within which there is assumed to be no recombination, allows estimation of the effective size of an isolated population. This paper investigates the case of very long sequences, where each pair of sequences allows a precise estimate of the divergence time of those two gene copies. The average divergence time of all pairs of copies estimates twice the effective population number and an estimate can also be derived from the number of segregating sites. One can alternatively estimate the genealogy of the copies. This paper shows how a maximum likelihood estimate of the effective population number can be derived from such a genealogical tree. The pairwise and the segregating sites estimates are shown to be much less efficient than this maximum likelihood estimate, and this is verified by computer simulation. The result implies that there is much to gain by explicitly taking the tree structure of these genealogies into account.  相似文献   

13.
Barcoding is an initiative to define a standard fragment of DNA to be used to assign sequences of unknown origin to existing known species whose sequences are recorded in databases. This is a difficult task when species are closely related and individuals of these species might have more than one origin. Using a previously introduced Bayesian statistical tree-less assignment algorithm based on segregating sites, we examine how it functions in the presence of hidden population subdivision with closely related species using simulations. Not surprisingly, adding samples to the database from a greater proportion of the species range leads to a consistently higher number of accurate results. Without such samples, query sequences that originate from outside of the sampled range are easily misinterpreted as coming from other species. However, we show that even the addition of a single sample from a different subpopulation is sufficient to greatly increase the probability of placement of unknown queries into the correct species group. This study highlights the importance of broad sampling, even with five reference samples per species, in the creation of a reference database.  相似文献   

14.
On the number of segregating sites in genetical models without recombination.   总被引:51,自引:0,他引:51  
The distribution is obtained for the number of segregating sites observed in a sample from a population which is subject to recurring, new, mutations but not subject to recombination. After allowance is made for the different effective population sizes, the results apply approximately to three population models, due to Wright, Burrows and Cockerham, and Moran. Included as extreme special cases are the distributions of the number of segregating sites in the whole population and of the number of heterozygous sites in a diploid individual. Some results of Fisher, Haldane, Kimura, and Ewens concerning the means of the distributions for different models are confirmed, but the variances, and the distributions themselves, are new.  相似文献   

15.
In order to analyze the pattern of DNA polymorphism in detail, we have developed a simple method using a new statistic theta(i) which estimates 4Nmu from the number of segregating sites whose allelic nucleotide frequency is i/n among n DNA sequences, where N is the effective population size and mu is the mutation rate per generation per nucleotide site. Under the assumption that mutations are selectively neutral and a population size is constant, the expectation of theta(i) is equal to that of theta, which estimates 4Nmu from the number of segregating sites, so that the distribution of theta(i) is flat. Therefore, the departure of the distribution of theta(i) from the horizontal line, which represents the value of theta, reflects change in population size and natural selection. Results of the coalescent simulation show that the distributions of theta(i) in the populations which experienced expansion and reduction are U-shaped and upside-down U-shaped, respectively. And the distributions of theta(i) in some populations that experienced bottleneck are W-shaped. Furthermore, we have applied this method to the SNP data in the International HapMap Project. Results of data analyses show that the distributions of theta(i) in the CEU (European), CHB and JPT (Asian) populations are different from that in the YRI population (African). From these results of data analyses in nuclear DNA and the pattern of polymorphism in human mitochondrial DNA already known, we infer that the CEU, CHB and JPT populations experienced the bottleneck.  相似文献   

16.
A simple genealogical structure is found for a general finite island model of population subdivision. The model allows for variation in the sizes of demes, in contributions to the migrant pool, and in the fraction of each deme that is replaced by migrants every generation. The ancestry of a sample of non-recombining DNA sequences has a simple structure when the sample size is much smaller than the total number of demes in the population. This allows an expression for the probability distribution of the number of segregating sites in the sample to be derived under the infinite-sites mutation model. It also yields easily computed estimators of the migration parameter for each deme in a multi-deme sample. The genealogical process is such that the lineages ancestral to the sample tend to accumulate in demes with low migration rates and/or which contribute disproportionately to the migrant pool. In addition, common ancestor or coalescent events tend to occur in demes of small size. This provides a framework for understanding the determinants of the effective size of the population, and leads to an expression for the probability that the root of a genealogy occurs in a particular geographic region, or among a particular set of demes.  相似文献   

17.
The sample frequency spectrum of a segregating site is the probability distribution of a sample of alleles from a genetic locus, conditional on observing the sample to be polymorphic. This distribution is widely used in population genetic inferences, including statistical tests of neutrality in which a skew in the observed frequency spectrum across independent sites is taken as a signature of departure from neutral evolution. Theoretical aspects of the frequency spectrum have been well studied and several interesting results are available, but they are usually under the assumption that a site has undergone at most one mutation event in the history of the sample. Here, we extend previous theoretical results by allowing for at most two mutation events per site, under a general finite allele model in which the mutation rate is independent of current allelic state but the transition matrix is otherwise completely arbitrary. Our results apply to both nested and nonnested mutations. Only the former has been addressed previously, whereas here we show it is the latter that is more likely to be observed except for very small sample sizes. Further, for any mutation transition matrix, we obtain the joint sample frequency spectrum of the two mutant alleles at a triallelic site, and derive a closed-form formula for the expected age of the younger of the two mutations given their frequencies in the population. Several large-scale resequencing projects for various species are presently under way and the resulting data will include some triallelic polymorphisms. The theoretical results described in this paper should prove useful in population genomic analyses of such data.  相似文献   

18.
J. Hey 《Genetics》1991,128(4):831-840
When two samples of DNA sequences are compared, one way in which they may differ is in the presence of fixed differences, which are defined as sites at which all of the sequences in one sample are different from all of the sequences in a second sample. The probability distribution of the number of fixed differences is developed. The theory employs Wright-Fisher genealogies and the infinite sites mutation model. For the case when both samples are drawn randomly from the same population it is found that genealogies permitting fixed differences are very unlikely. Thus the mere presence of fixed differences between samples is statistically significant, even for small samples. The theory is extended to samples from populations that have been separated for some time. The relationship between a simple Poisson distribution of mutations and the distribution of fixed differences is described as a function of the time since populations have been isolated. It is shown how these results may contribute to improved tests of recent balancing or directional selection.  相似文献   

19.
Innan H  Zhang K  Marjoram P  Tavaré S  Rosenberg NA 《Genetics》2005,169(3):1763-1777
Several tests of neutral evolution employ the observed number of segregating sites and properties of the haplotype frequency distribution as summary statistics and use simulations to obtain rejection probabilities. Here we develop a “haplotype configuration test” of neutrality (HCT) based on the full haplotype frequency distribution. To enable exact computation of rejection probabilities for small samples, we derive a recursion under the standard coalescent model for the joint distribution of the haplotype frequencies and the number of segregating sites. For larger samples, we consider simulation-based approaches. The utility of the HCT is demonstrated in simulations of alternative models and in application to data from Drosophila melanogaster.  相似文献   

20.
The evolution of a completely linked diallelic multilocus system of neutral genes in a finite population is studied. A diffusion model incorporating random genetic drift and mutation is used. We neglect the recombination. To begin with, the spectral analysis of the Kolmogorov backward equation for this model is investigated. We apply this to two extreme situations when the number of sites approaches to infinity. One is a DeMoivre-Laplace type approximation and the other is a Poisson type approximation. The former is applied to the study of the simultaneous distribution and evolution of a large number of neutral genes. It is applicable to the distribution of a polygenic character controlled by clustered loci on a chromosome, and we show that it differs from the normal distribution on account of random genetic drift and linkage disequilibrium. The latter is applied to the distribution of the number of segregating sites in DNA nucleotide sequences, and the rate of evolution is obtained.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号