首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
The sample frequency spectrum of a segregating site is the probability distribution of a sample of alleles from a genetic locus, conditional on observing the sample to be polymorphic. This distribution is widely used in population genetic inferences, including statistical tests of neutrality in which a skew in the observed frequency spectrum across independent sites is taken as a signature of departure from neutral evolution. Theoretical aspects of the frequency spectrum have been well studied and several interesting results are available, but they are usually under the assumption that a site has undergone at most one mutation event in the history of the sample. Here, we extend previous theoretical results by allowing for at most two mutation events per site, under a general finite allele model in which the mutation rate is independent of current allelic state but the transition matrix is otherwise completely arbitrary. Our results apply to both nested and nonnested mutations. Only the former has been addressed previously, whereas here we show it is the latter that is more likely to be observed except for very small sample sizes. Further, for any mutation transition matrix, we obtain the joint sample frequency spectrum of the two mutant alleles at a triallelic site, and derive a closed-form formula for the expected age of the younger of the two mutations given their frequencies in the population. Several large-scale resequencing projects for various species are presently under way and the resulting data will include some triallelic polymorphisms. The theoretical results described in this paper should prove useful in population genomic analyses of such data.  相似文献   

2.
The Coalescent Process in Models with Selection   总被引:23,自引:12,他引:11       下载免费PDF全文
N. L. Kaplan  T. Darden    R. R. Hudson 《Genetics》1988,120(3):819-829
Statistical properties of the process describing the genealogical history of a random sample of genes are obtained for a class of population genetics models with selection. For models with selection, in contrast to models without selection, the distribution of this process, the coalescent process, depends on the distribution of the frequencies of alleles in the ancestral generations. If the ancestral frequency process can be approximated by a diffusion, then the mean and the variance of the number of segregating sites due to selectively neutral mutations in random samples can be numerically calculated. The calculations are greatly simplified if the frequencies of the alleles are tightly regulated. If the mutation rates between alleles maintained by balancing selection are low, then the number of selectively neutral segregating sites in a random sample of genes is expected to substantially exceed the number predicted under a neutral model.  相似文献   

3.
A model of selection involving two selectively equivalent classes of alleles at a locus is considered. One class consists of normal alleles A1, A2, A3,. . .; the other class consists of detrimental alleles a1, a2, a3, . . . . Mutation within and between allelic classes can occur without restriction, but selection operates in such a way as to maintain an approximately constant overall frequency of A-type and a-type alleles is derived, and it is shown that the distribution of allele frequencies in a sample of detrimental alleles depends on the forward (A to a) mutation rate but not on the selection coefficient, degree of dominance, or mutation rate among a-type alleles. Recurrent mutation therefore generates allelic multiplicity among detrimental alleles, and this is discussed in the context of clinical heterogeneity in simple Mendelian disorders.  相似文献   

4.
A formula is derived for the probability that two genes taken at random from the same locus in two populations isolated at time t ago are of the same allelic type. The model assumed is a neutral one where there are possibly different mutation rates between different alleles. Inequalities are derived for this probability. A particular result is that for a fixed overall mutation rate, the probability is least for the infinite alleles model. Inequalities and approximations are found for Nei's genetic identity at one locus when mutation rates vary, and also for the identity across loci when the overall mutation rates per locus vary. Genetic identity at the molecular level is considered and a probability generating function found for the number of segregating sites between two randomly chosen gametes from two divergent populations, under various models.  相似文献   

5.
The extent to which natural selection shapes diversity within populations is a key question for population genetics. Thus, there is considerable interest in quantifying the strength of selection. A full likelihood approach for inference about selection at a single site within an otherwise neutral fully linked sequence of sites is described here. A coalescent model of evolution is used to model the ancestry of a sample of DNA sequences which have the selected site segregating. The mutation model, for the selected and neutral sites, is the infinitely many-sites model where there is no back or parallel mutation at sites. A unique perfect phylogeny, a gene tree, can be constructed from the configuration of mutations on the sample sequences under this model of mutation. The approach is general and can be used for any bi-allelic selection scheme. Selection is incorporated through modelling the frequency of the selected and neutral allelic classes stochastically back in time, then using a subdivided population model considering the population frequencies through time as variable population sizes. An importance sampling algorithm is then used to explore over coalescent tree space consistent with the data. The method is applied to a simulated data set and the gene tree presented in Verrelli et al. (2002).  相似文献   

6.
A formula is obtained for the probability that two genes at a single locus, sampled at random from a population at time t, are of particular types. The model assumed is a diffusion approximation to a neutral Wright-Fisher model in which mutation is general and not necessarily symmetric. An example is given of a population in which one allele has a high mutation rate, and the others have an equal, low mutation rate. The matrix Q, with elements given by the probability of sampling two alleles of particular types, is calculated exactly and approximately for this case. A formula is given for the distribution of the number of segregating sites occurring in two randomly sampled finite sequences of completely linked sites, with general mutation at a site and identical mutation structure between sites.  相似文献   

7.
In a population intended for breeding and selection, questions of interest relative to a specific segregating QTL are the variance it generates in the population, and the number and effects of its alleles. One approach to address these questions is to extract several inbreds from the population and use them to generate multiple mapping families. Given random sampling of parents, sampling strategy may be an important factor determining the power of the analysis and its accuracy in estimating QTL variance and allelic number. We describe appropriate multiple-family QTL mapping methodology and apply it to simulated data sets to determine optimal sampling strategies in terms of family number versus family size. Genomes were simulated with seven chromosomes, on which 107 markers and six QTL were distributed. The total heritability was 0.60. Two to ten alleles were segregating at each QTL. Sampling strategies ranged from sampling two inbreds and generating a single family of 600 progeny to sampling 40 inbreds and generating 40 families of 15 progeny each. Strategies involving only one to five families were subject to variation due to the sampling of inbred parents. For QTL where more than two alleles were segregating, these strategies did not sample QTL alleles representative of the original population. Conversely, strategies involving 30 or more parents were subject to variation due to sampling of QTL genotypes within the small families obtained. Given these constraints, greatest QTL detection power was obtained for strategies involving five to ten mapping families. The most accurate estimation of the variance generated by the QTL, however, was obtained with strategies involving 20 or more families. Finally, strategies with an intermediate number of families best estimated the number of QTL alleles. We conclude that no overall optimal sampling strategy exists but that the strategy adopted must depend on the objective.Communicated by P. Langridge  相似文献   

8.
The sampling theory for the infinite site model taking into account the phylogenetic relationship between the alleles is developed for those cases in which two or three alleles are observed in the sample. From this theory a maximum likelihood estimate of θ = 4 can be obtained. Unlike the maximum likelihood estimate of θ based on the infinite allele model or the number of segregating sites, this estimate of θ is a function of the frequencies of the alleles. This method is used to estimate θ for mitochondrial DNA in Drosophila melanogaster and D. virilis from data obtained by Shah and Langley (1979. Nature (London)281, 696–699) using restriction endonucleases.  相似文献   

9.
Y. X. Fu 《Genetics》1997,146(4):1489-1499
A coalescent theory for a sample of DNA sequences from a partially selfing diploid population and an algorithm for simulating such samples are developed in this article. Approximate formulas are given for the expectation and the variance of the number of segregating sites in a sample of k sequences from n individuals. Several new estimators of the important parameters θ = 4Nμ and the selfing rate s, where N and μ are, respectively, the effective population size and the mutation rate per sequence per generation, are proposed and their sampling properties are studied.  相似文献   

10.
DNA typing offers a unique opportunity to identify individuals for medical and forensic purposes. Probabilistic inference regarding the chance occurrence of a match between the DNA type of an evidentiary sample and that of an accused suspect, however, requires reliable estimation of genotype and allele frequencies in the population. Although population-based data on DNA typing at several hypervariable loci are being accumulated at various laboratories, a rigorous treatment of the sample size needed for such purposes has not been made from population genetic considerations. It is shown here that the loci that are potentially most useful for forensic identification of individuals have the intrinsic property that they involve a large number of segregating alleles, and a great majority of these alleles are rare. As a consequence, because of the large number of possible genotypes at the hypervariable loci that offer the maximum potential for individualization, the sample size needed to observe all possible genotypes in a sample is large. In fact, the size is so large that even if such a huge number of individuals could be sampled, it could not be guaranteed that such a sample was drawn from a single homogeneous population. Therefore adequate estimation of genotypic probabilities must be based on allele frequencies, and the sample size needed to represent all possible alleles is far more reasonable. Further economization of sample size is possible if one wants to have representation of only the frequent alleles in the sample, so that the rare allele frequencies can be approximated by an upper bound for forensic applications.  相似文献   

11.
用基因芯片检测DPYD等位基因在受试人群中的发生频率   总被引:1,自引:0,他引:1  
二氢嘧啶脱氢酶基因(DPYD基因)所编码的二氢嘧啶脱氢酶(DPD酶)是氟化嘧啶类抗肿瘤药物代谢的主要限速酶,其活性存在显著的个体差异,并因此影响药物的疗效和毒副作用.大部分编码低/无活性酶的突变型等位基因是由于基因中的单核苷酸多态性(single nucleotide polymorphism,SNP)造成的,检测这些SNPs是预测患者对药物的反应和实现个体化给药方案的基础.制备并优化了用于检测DPYD基因中6个已知SNPs所编码的等位基因(DPYD*2,*3,*4,*5,*9,*12)的基因芯片,建立了该芯片的基因分型标准.并利用该芯片检测了肿瘤患者(112例)、肾病患者(83例)和健康者(45例)中DPYD突变型等位基因的发生频率.在受试人群中,突变型等位基因DPYD*5和DPYD*9平均发生率分别为32.08%和11.25%,未发现DPYD*2,*3,*4,*12突变型等位基因.而且以上单碱基突变的发生率在肿瘤患者、肾病患者和健康者间以及男性、女性肿瘤患者间无显著性差异,表明其与疾病的发生或性别无显著性关联.对20例标本的基因分型结果采用直接测序法进行验证,19例基因芯片分型结果与直接测序法结果相一致.DPYD*5、DPYD*9突变型等位基因在受试人群中具有较高的发生率.利用基因芯片能够对其实现快速准确的检测.  相似文献   

12.
Balancing selection is common on many defense genes, but it has rarely been reported for immune effector proteins such as antimicrobial peptides (AMPs). We describe genetic diversity at a brevinin-1 AMP locus in three species of leopard frogs (Rana pipiens, Rana blairi, and Rana palustris). Several highly divergent allelic lineages are segregating at this locus. That this unusual pattern results from balancing selection is demonstrated by multiple lines of evidence, including a ratio of nonsynonymous/synonymous polymorphism significantly higher than 1, the ZnS test, incongruence between the number of segregating sites and haplotype diversity, and significant Tajima's D values. Our data are more consistent with a model of fluctuating selection in which alleles change frequencies over time than with a model of stable balancing selection such as overdominance. Evidence for fluctuating selection includes skewed allele frequencies, low levels of synonymous variation, nonneutral values of Tajima's D within allelic lineages, an inverse relationship between the frequency of an allelic lineage and its degree of polymorphism, and divergent allele frequencies among populations. AMP loci could be important sites of adaptive genetic diversity, with consequences for host-pathogen coevolution and the ability of species to resist disease epidemics.  相似文献   

13.
Evolutionary Relationship of DNA Sequences in Finite Populations   总被引:74,自引:27,他引:47       下载免费PDF全文
Fumio Tajima 《Genetics》1983,105(2):437-460
With the aim of analyzing and interpreting data on DNA polymorphism obtained by DNA sequencing or restriction enzyme technique, a mathematical theory on the expected evolutionary relationship among DNA sequences (nucleons) sampled is developed under the assumption that the evolutionary change of nucleons is determined solely by mutation and random genetic drift. The statistical property of the number of nucleotide differences between randomly chosen nucleons and that of heterozygosity or nucleon diversity is investigated using this theory. These studies indicate that the estimates of the average number of nucleotide differences and nucleon diversity have a large variance, and a large part of this variance is due to stochastic factors. Therefore, increasing sample size does not help reduce the variance significantly. The distribution of sample allele (nucleomorph) frequencies is also studied, and it is shown that a small number of samples are sufficient in order to know the distribution pattern.  相似文献   

14.
Michael Lynch 《Genetics》2009,182(1):295-301
A new generation of high-throughput sequencing strategies will soon lead to the acquisition of high-coverage genomic profiles of hundreds to thousands of individuals within species, generating unprecedented levels of information on the frequencies of nucleotides segregating at individual sites. However, because these new technologies are error prone and yield uneven coverage of alleles in diploid individuals, they also introduce the need for novel methods for analyzing the raw read data. A maximum-likelihood method for the estimation of allele frequencies is developed, eliminating both the need to arbitrarily discard individuals with low coverage and the requirement for an extrinsic measure of the sequence error rate. The resultant estimates are nearly unbiased with asymptotically minimal sampling variance, thereby defining the limits to our ability to estimate population-genetic parameters and providing a logical basis for the optimal design of population-genomic surveys.  相似文献   

15.
Current methods for detecting fluctuating selection require time series data on genotype frequencies. Here, we propose an alternative approach that makes use of DNA polymorphism data from a sample of individuals collected at a single point in time. Our method uses classical diffusion approximations to model temporal fluctuations in the selection coefficients to find the expected distribution of mutation frequencies in the population. Using the Poisson random-field setting we derive the site-frequency spectrum (SFS) for three different models of fluctuating selection. We find that the general effect of fluctuating selection is to produce a more "U"-shaped site-frequency spectrum with an excess of high-frequency derived mutations at the expense of middle-frequency variants. We present likelihood-ratio tests, comparing the fluctuating selection models to the neutral model using SFS data, and use Monte Carlo simulations to assess their power. We find that we have sufficient power to reject a neutral hypothesis using samples on the order of a few hundred SNPs and a sample size of approximately 20 and power to distinguish between selection that varies in time and constant selection for a sample of size 20. We also find that fluctuating selection increases the probability of fixation of selected sites even if, on average, there is no difference in selection among a pair of alleles segregating at the locus. Fluctuating selection will, therefore, lead to an increase in the ratio of divergence to polymorphism similar to that observed under positive directional selection.  相似文献   

16.
Properties of a neutral allele model with intragenic recombination   总被引:35,自引:0,他引:35  
An infinite-site neutral allele model with crossing-over possible at any of an infinite number of sites is studied. A formula for the variance of the number of segregating sites in a sample of gametes is obtained. An approximate expression for the expected homozygosity is also derived. Simulation results are presented to indicate the accuracy of the approximations. The results concerning the number of segregating sites and the expected homozygosity indicate that a two-locus model and the infinite-site model behave similarly for 4Nu less than or equal to 2 and r less than or equal to 5u, where N is the population size, u is the neutral mutation rate, and r is the recombination rate. Simulations of a two-locus model and a four-locus model were also carried out to determine the effect of intragenic recombination on the homozygosity test of Watterson (Genetics 85, 789-814; 88, 405-417) and on the number of unique alleles in a sample. The results indicate that for 4Nu less than or equal to 2 and r less than or equal to 10u, the effect of recombination is quite small.  相似文献   

17.
Richard R. Hudson 《Genetics》1985,109(3):611-631
The sampling distributions of several statistics that measure the association of alleles on gametes (linkage disequilibrium) are estimated under a two-locus neutral infinite allele model using an efficient Monte Carlo method. An often used approximation for the mean squared linkage disequilibrium is shown to be inaccurate unless the proper statistical conditioning is used. The joint distribution of linkage disequilibrium and the allele frequencies in the sample is studied. This estimated joint distribution is sufficient for obtaining an approximate maximum likelihood estimate of C = 4Nc, where N is the population size and c is the recombination rate. It has been suggested that observations of high linkage disequilibrium might be a good basis for rejecting a neutral model in favor of a model in which natural selection maintains genetic variation. It is found that a single sample of chromosomes, examined at two loci cannot provide sufficient information for such a test if C less than 10, because with C this small, very high levels of linkage disequilibrium are not unexpected under the neutral model. In samples of size 50, it is found that, even when C is as large as 50, the distribution of linkage disequilibrium conditional on the allele frequencies is substantially different from the distribution when there is no linkage between the loci. When conditioned on the number of alleles at each locus in the sample, all of the sample statistics examined are nearly independent of theta = 4N mu, where mu is the neutral mutation rate.  相似文献   

18.
Statistical Properties of a DNA Sample under the Finite-Sites Model   总被引:1,自引:0,他引:1       下载免费PDF全文
Z. Yang 《Genetics》1996,144(4):1941-1950
Statistical properties of a DNA sample from a random-mating population of constant size are studied under the finite-sites model. It is assumed that there is no migration and no recombination occurs within the locus. A Markov process model is used for nucleotide substitution, allowing for multiple substitutions at a single site. The evolutionary rates among sites are treated as either constant or variable. The general likelihood calculation using numerical integration involves intensive computation and is feasible for three or four sequences only; it may be used for validating approximate algorithms. Methods are developed to approximate the probability distribution of the number of segregating sites in a random sample of n sequences, with either constant or variable substitution rates across sites. Calculations using parameter estimates obtained for human D-loop mitochondrial DNAs show that among-site rate variation has a major effect on the distribution of the number of segregating sites; the distribution under the finite-sites model with variable rates among sites is quite different from that under the infinite-sites model.  相似文献   

19.
To examine the role of contemporary selection in maintaining significant allele frequency differences at the pantophysin (PanI) locus among populations of the Atlantic cod, Gadus morhua, in northern Norway, we sequenced 127 PanIA alleles sampled from six coastal and two Barents Sea populations. The distributions of variable sites segregating within the PanIA allelic class were then compared among the populations. Significant differences were detected in the overall frequencies of PanIA alleles among populations within coastal and Arctic regions that was similar in magnitude to heterogeneity in the distributions of polymorphic sites segregating within the PanIA allelic class. The differentiation observed at silent sites in the PanIA allelic class contradicts the predicted effects of widescale gene flow and suggests that postsettlement selection acting on cohorts cannot be responsible for the genetic differences described between coastal and Arctic populations. Our results suggest that the marked differences observed between coastal and Arctic populations of G. morhua in northern Norway at the PanI locus reflect the action of recent diversifying selection and that populations throughout the region may be more independent than suggested by previous studies.  相似文献   

20.
Ewens' sampling formula, the probability distribution of a configuration of alleles in a sample of genes under the infinitely-many-alleles model of mutation, is proved by a direct combinatorial argument. The distribution is extended to a model where the population size may vary back in time. The distribution of age-ordered frequencies in the population is also derived in the model, extending the GEM distribution of age-ordered frequencies in a model with a constant-sized population. The genealogy of a rare allele is studied using a combinatorial approach. A connection is explored between the distribution of age-ordered frequencies and ladder indices and heights in a sequence of random variables. In a sample of n genes the connection is with ladder heights and indices in a sequence of draws from an urn containing balls labelled 1,2,...,n; and in the population the connection is with ladder heights and indices in a sequence of independent uniform random variables.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号