首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we establish an upper bound for time to convergence to stationarity for the discrete time infinite alleles Moran model. If M is the population size and μ is the mutation rate, this bound gives a cutoff time of log(M μ)/μ generations. The stationary distribution for this process in the case of sampling without replacement is the Ewens sampling formula. We show that the bound for the total variation distance from the generation t distribution to the Ewens sampling formula is well approximated by one of the extreme value distributions, namely, a standard Gumbel distribution. Beginning with the card shuffling examples of Aldous and Diaconis and extending the ideas of Donnelly and Rodrigues for the two allele model, this model adds to the list of Markov chains that show evidence for the cutoff phenomenon. Because of the broad use of infinite alleles models, this cutoff sets the time scale of applicability for statistical tests based on the Ewens sampling formula and other tests of neutrality in a number of population genetic studies.  相似文献   

2.
An estimator for pairwise relatedness using molecular markers   总被引:21,自引:0,他引:21  
Wang J 《Genetics》2002,160(3):1203-1215
I propose a new estimator for jointly estimating two-gene and four-gene coefficients of relatedness between individuals from an outbreeding population with data on codominant genetic markers and compare it, by Monte Carlo simulations, to previous ones in precision and accuracy for different distributions of population allele frequencies, numbers of alleles per locus, actual relationships, sample sizes, and proportions of relatives included in samples. In contrast to several previous estimators, the new estimator is well behaved and applies to any number of alleles per locus and any allele frequency distribution. The estimates for two- and four-gene coefficients of relatedness from the new estimator are unbiased irrespective of the sample size and have sampling variances decreasing consistently with an increasing number of alleles per locus to the minimum asymptotic values determined by the variation in identity-by-descent among loci per se, regardless of the actual relationship. The new estimator is also robust for small sample sizes and for unknown relatives being included in samples for estimating allele frequencies. Compared to previous estimators, the new one is generally advantageous, especially for highly polymorphic loci and/or small sample sizes.  相似文献   

3.
Estimating the age of alleles by use of intraallelic variability.   总被引:9,自引:6,他引:3  
A method is presented for estimating the age of an allele by use of its frequency and the extent of variation among different copies. The method uses the joint distribution of the number of copies in a population sample and the coalescence times of the intraallelic gene genealogy conditioned on the number of copies. The linear birth-death process is used to approximate the dynamics of a rare allele in a finite population. A maximum-likelihood estimate of the age of the allele is obtained by Monte Carlo integration over the coalescence times. The method is applied to two alleles at the cystic fibrosis (CFTR) locus, deltaF508 and G542X, for which intraallelic variability at three intronic microsatellite loci has been examined. Our results indicate that G542X is somewhat older than deltaF508. Although absolute estimates depend on the mutation rates at the microsatellite loci, our results support the hypothesis that deltaF508 arose < 500 generations (approximately 10,000 years) ago.  相似文献   

4.
In this paper a number of simulation results relating to the theory of neutral alleles are discussed. In particular, results derived from the distribution of Karlin, McGregor, and Ewens for the number and configuration of alleles in a sample of genes from a selectively neutral locus are considered. The problems discussed concern efficiencies of estimation, an approximation to the distribution of a test-statistic for neutrality, and the effect of changes in population size on an index of neutrality.  相似文献   

5.
Lessard S 《Genetics》2007,177(2):1249-1254
An exact sampling formula for a Wright-Fisher population of fixed size N under the infinitely many neutral alleles model is deduced. This extends the Ewens formula for the configuration of a random sample to the case where the sample is drawn from a population of small size, that is, without the usual large-N and small-mutation-rate assumption. The formula is used to prove a conjecture ascertaining the validity of a diffusion approximation for the frequency of a mutant-type allele under weak selection in segregation with a wild-type allele in the limit finite-island model, namely, a population that is subdivided into a finite number of demes of size N and that receives an expected fraction m of migrants from a common migrant pool each generation, as the number of demes goes to infinity. This is done by applying the formula to the migrant ancestors of a single deme and sampling their types at random. The proof of the conjecture confirms an analogy between the island model and a random-mating population, but with a different timescale that has implications for estimation procedures.  相似文献   

6.
Models for selectively neutral mutation, in which mutation always yields a new allele, seem always to lead, in the limit of large population size, to a sampling formula first propounded by Ewens in 1972. It is shown that the asymptotic validity of the Ewens formula is equivalent to a certain limiting joint distribution for the allele proportions in the population, arranged in descending order. The familiar diffusion approximations are corollaries of this limiting distribution, and therefore share the apparent robustness of the sampling formula.  相似文献   

7.
A significant heterozygote deficiency was found for microsatellite locus 20H7 among adult breeding birds in four populations of the oystercatcher ( Haematopus ostralegus ). Genotype frequencies at seven other loci were according to Hardy–Weinberg equilibria. Deviations between observed and expected genotype numbers decreased substantially when the data were corrected based on the estimated frequency of a putative null allele at locus 20H7 . However, no null homozygotes were observed in the total sample of 378 individuals. The probability that, because of chance effects, null homozygotes were not represented in the sample ( n =230) from the most intensively studied population (Schiermonnikoog) was estimated to be less than 1%. Parent–offspring comparisons from Schiermonnikoog showed that observed genotype numbers in the offspring were in accordance with expected values based on the estimated frequency of the putative null allele in the population. Moreover, a null homozygote was observed among the nestlings. The combined results indicated that a null allele is present at locus 20H7 in oystercatchers and that the inheritance is according to normal Mendelian segregation. If the absence of null homozygotes among adult animals cannot be ascribed to statistical effects, null homozygotes may suffer a selective disadvantage during the juvenile stage.  相似文献   

8.
We have examined a model of selection (local selection) in which successive favorable alleles enter into a population by displacing a random fraction of each of the preexisting alleles. When the distribution of fitness among newly arising favorable mutations is given by a power law, then the distribution of allele frequencies in the population converge to a Poisson-Dirichlet limit, and the sampling distribution of alleles is a Ewens distribution. This property leads to a convenient algorithm for simulating random equilibrium frequencies of alleles within samples. The model can also be interpreted in terms of species abundances when each invading species displaces a random fraction of each pre-existing species, or in terms of age structures in populations subjected to random catastrophes.  相似文献   

9.
Richard R. Hudson 《Genetics》1985,109(3):611-631
The sampling distributions of several statistics that measure the association of alleles on gametes (linkage disequilibrium) are estimated under a two-locus neutral infinite allele model using an efficient Monte Carlo method. An often used approximation for the mean squared linkage disequilibrium is shown to be inaccurate unless the proper statistical conditioning is used. The joint distribution of linkage disequilibrium and the allele frequencies in the sample is studied. This estimated joint distribution is sufficient for obtaining an approximate maximum likelihood estimate of C = 4Nc, where N is the population size and c is the recombination rate. It has been suggested that observations of high linkage disequilibrium might be a good basis for rejecting a neutral model in favor of a model in which natural selection maintains genetic variation. It is found that a single sample of chromosomes, examined at two loci cannot provide sufficient information for such a test if C less than 10, because with C this small, very high levels of linkage disequilibrium are not unexpected under the neutral model. In samples of size 50, it is found that, even when C is as large as 50, the distribution of linkage disequilibrium conditional on the allele frequencies is substantially different from the distribution when there is no linkage between the loci. When conditioned on the number of alleles at each locus in the sample, all of the sample statistics examined are nearly independent of theta = 4N mu, where mu is the neutral mutation rate.  相似文献   

10.
Studies of genetics and ecology often require estimates of relatedness coefficients based on genetic marker data. However, with the presence of null alleles, an observed genotype can represent one of several possible true genotypes. This results in biased estimates of relatedness. As the numbers of marker loci are often limited, loci with null alleles cannot be abandoned without substantial loss of statistical power. Here, we show how loci with null alleles can be incorporated into six estimators of relatedness (two novel). We evaluate the performance of various estimators before and after correction for null alleles. If the frequency of a null allele is <0.1, some estimators can be used directly without adjustment; if it is >0.5, the potency of estimation is too low and such a locus should be excluded. We make available a software package entitled PolyRelatedness v1.6, which enables researchers to optimize these estimators to best fit a particular data set.  相似文献   

11.
The Exact Test for Cytonuclear Disequilibria   总被引:2,自引:0,他引:2       下载免费PDF全文
C. J. Basten  M. A. Asmussen 《Genetics》1997,146(3):1165-1171
We extend the analysis of the statistical properties of cytonuclear disequilibria in two major ways. First, we develop the asymptotic sampling theory for the nonrandom associations between the alleles at a haploid cytoplasmic locus and the alleles and genotypes at a diploid nuclear locus, when there are an arbitrary number of alleles at each marker. This includes the derivation of the maximum likelihood estimators and their sampling variances for each disequilibrium measure, together with simple tests of the null hypothesis of no disequilibrium. In addition to these new asymptotic tests, we provide the first implementation of Fisher's exact test for the genotypic cytonuclear disequilibria and some approximations of the exact test. We also outline an exact test for allelic cytonuclear disequilibria in multiallelic systems. An exact test should be used for data sets when either the marginal frequencies are extreme or the sample size is small. The utility of this new sampling theory is illustrated through applications to recent nuclear-mtDNA and nuclear-cpDNA data sets. The results also apply to population surveys of nuclear loci in conjunction with markers in cytoplasmically inherited microorganisms.  相似文献   

12.
Statistical genetic considerations for maintaining germ plasm collections   总被引:2,自引:0,他引:2  
One objective of the regeneration of genetic populations is to maintain at least one copy of each allele present in the original population. Genetic diversity within populations depends on the number and frequency of alleles across all loci. The objectives of this study on outbreeding crops are: (1) to use probability models to determine optimal sample sizes for the regeneration for a number of alleles at independent loci; and (2) to examine theoretical considerations in choosing core subsets of a collection. If we assume that k-1 alleles occur at an identical low frequency of p0 and that the kth allele occurs at a frequency of 1-[(k-1)p0], for loci with two, three, or four alleles, each with a p0 of 0.05, 89–110 additional individuals are required if at least one allele at each of 10 loci is to be retained with a 90% probability; if 100 loci are involved, 134–155 individuals are required. For two, three, or four alleles, when p0 is 0.03 at each of 10 loci, the sample size required to include at least one of the alleles from each class in each locus is 150–186 individuals; if 100 loci are involved, 75 additional individuals are required. Sample sizes of 160–210 plants are required to capture alleles at frequencies of 0.05 or higher in each of 150 loci, with a 90–95% probability. For rare alleles widespread throughout the collection, most alleles with frequencies of 0.03 and 0.05 per locus will be included in a core subset of 25–100 accessions.  相似文献   

13.
An importance-sampling method is presented for computing the likelihood of the configuration of population genetic data under general assumptions about population history and transitions among states. The configuration of the data is the number of chromosomes sampled that are in each of a finite set of states. Transitions among states are governed by a Markov chain with transition probabilities dependent on one or more parameters. The method assumes that the joint distribution of coalescence times of the underlying gene genealogy is independent of the genetic state of each lineage. Given a set of coalescence times, the probability that a pair of lineages is chosen to coalesce in each replicate is proportional to the contribution that the coalescence event makes to the probability of the data. This method can be applied to gene genealogies generated by the neutral coalescent process and to genealogies generated by other processes, such as a linear birth-death process which provides a good approximation to the dynamics of low-frequency alleles. Two applications are described. In the first, the fit of allele frequencies at two microsatellite loci sampled in a Sardinian population to the one-step mutation model is tested. The one-step model is rejected for one locus but not for the other. The second application is to low-frequency alleles in a geographically subdivided population. The geographic location is the allelic state, and the alleles are assumed to be sufficiently rare that their dynamics can be approximated by a linear birth-death process in which the birth and death rates are independent of geographic location. The analysis of eight low-frequency allozyme alleles found in the glaucous-winged gull, Larus glaucescens, illustrates how geographically restricted dispersal can be detected.  相似文献   

14.
Summary The main purpose of germplasm banks is to preserve the genetic variability existing in crop species. The effectiveness of the regeneration of collections stored in gene banks is affected by factors such as sample size, random genetic drift, and seed viability. The objective of this paper is to review probability models and population genetics theory to determine the choice of sample size used for seed regeneration. A number of conclusions can be drawn from the results. First, the size of the sample depends largely on the frequency of the least common allele or genotype. Genotypes or alleles occurring at frequencies of more than 10% can be preserved with a sample size of 40 individuals. A sample size of 100 individuals will preserve genotypes (alleles) that occur at frequencies of 5%. If the frequency of rare genotypes (alleles) drops below 5%, larger sample sizes are required. A second conclusion is that for two, three, and four alleles per locus the sample size required to include a copy of each allele depends more on the frequency of the rare allele or alleles than on the number. Samples of 300 to 400 are required to preserve alleles that are present at a frequency of 1%. Third, if seed is bulked, the expected number of parents involved in any sample drawn from the bulk will be less than the number of parents included in the bulk. Fourth, to maintain a rate of breeding (F) of 1 %, the effective population size (N e) should be at least 150 for three alleles, and 300 for four alleles. Fifth, equalizing the reproductive output of each family to two progeny doubles the effective size of the population. Based on the results presented here, a practical option is considered for regenerating maize seed in a program constrained by limited funds.Part of this paper was presented at the Global Maize Germplasm Workshop, CIMMYT, El Batan, Mexico, March 6–12, 1988  相似文献   

15.
The evolutionary mechanisms that give rise to microsatellite alleles remain poorly understood in general and are especially understudied for fungal microsatellite loci. The unusual G28 microsatellite locus was developed from the Hawaiian mushroom Rhodocollybia laulaha. Here, we employ a novel approach to test for allele size homoplasy and examine competing mechanistic models of microsatellite evolution in the context of biogeographic expectations for this locus based on Hawaiian geologic history. Seven G28 alleles have been identified from a sampling of 153 individuals. The G28 locus is composed of a trinucleotide imperfect motif, which permits examination of the relationships between alleles and allows for detection of potential size homoplasy within the repetitive element. Alignment of G28 allele sequence data across multiple unrelated individuals suggests that alleles of like size are homologous within Hawaii. A variety of gap coding methods are explored in the inference of allele evolution. Length differences between alleles appear to be the result of polymerase slippage at multiple positions in the repetitive element, suggesting an intricate process of allelic evolution, which is not necessarily stepwise. Complex migration scenarios must be invoked to explain the current geographic distribution of alleles if their evolution was in fact sequential (from longest to shortest or from shortest to longest) as predicted by the "progression rule."  相似文献   

16.
Sample size considerations in genetic polymorphism studies.   总被引:6,自引:0,他引:6  
C B-Rao 《Human heredity》2001,52(4):191-200
OBJECTIVES: Molecular studies for genetic polymorphisms are being carried out for a number of different applications, such as genetic disorders in different populations, pharmacogenomics, genetic identification of ethnic groups for forensic and legal applications, genetic identification of breed/stock in animals and plants for commercial applications and conservation of germ plasm. In this paper, for a random sampling scheme, we address two questions: (A) What should be the minimum size of the sample so that, with a prespecified probability, all alleles at a given locus (or haplotypes at a given set of loci) are detected? (B) What should be the sample size so that the allele frequency distribution at a given locus (or haplotype frequency distribution at a given set of loci) is estimated reliably within permissible error limits? METHODS: We have used combinatorial probabilistic arguments and Monte Carlo simulations to answer these questions. RESULTS: We found that the minimum sample size required in case A depends mainly on the prespecified probability of detecting all alleles, while in case B, it varies greatly depending on the permissible error in estimation (which will vary with the application). We have obtained the minimum sample sizes for different degrees of polymorphism at a locus under high stringency, as well as a relaxed level of permissible error. We present a detailed sampling procedure for estimating allele frequencies at a given locus, which will be of use in practical applications. CONCLUSION: Since the sample size required for reliable estimation of allele frequency distribution increases with the number of alleles at the locus, there is a strong case for using biallelic markers (like single nucleotide polymorphisms) when the available sample size is about 800 or less.  相似文献   

17.
Despite long-term study, the mechanism explaining the parapatric distribution of two Australian reptile tick species is not understood. We describe the development of primers amplifying 10 microsatellite Bothriocroton hydrosauri loci, for the study of population structure and dispersal patterns of this tick. The numbers of alleles per locus ranged from two to seven in ticks from the study site, and the observed heterozygosity between 0.28 and 0.69. Pedigree analysis indicates that one locus is inherited in a non-Mendelian manner in three families, which was not explained by null allele presence.  相似文献   

18.
Stadler T 《Genetics》2011,188(3):663-672
In this article, I develop a methodology for inferring the transmission rate and reproductive value of an epidemic on the basis of genotype data from a sample of infected hosts. The epidemic is modeled by a birth-death process describing the transmission dynamics in combination with an infinite-allele model describing the evolution of alleles. I provide a recursive formulation for the probability of the allele frequencies in a sample of hosts and a Bayesian framework for estimating transmission rates and reproductive values on the basis of observed allele frequencies. Using the Bayesian method, I reanalyze tuberculosis data from the United States. I estimate a net transmission rate of 0.19/year [0.13, 0.24] and a reproductive value of 1.02 [1.01, 1.04]. I demonstrate that the allele frequency probability under the birth-death model does not follow the well-known Ewens' sampling formula that holds under Kingman's coalescent.  相似文献   

19.
It has recently been shown that the Ewens sampling formula may be generated by a Polya-like urn model. A genealogical proof of this result equates the labelling of balls in the urn to the partition by age of alleles in the sample. This urn construction is shown to be equivalent to the construction of Kingman (Proc. Roy. Soc. London Ser. A 361 (1978), 1-20) using a Poisson-Dirichlet "paintbox" and as a consequence, the partition by ages is seen to be equivalent to the size biased permutation of the Poisson-Dirichlet distribution. This approach unifies and extends many results on ages of alleles, the Polya urn, and the Poisson-Dirichlet distribution. Furthermore the Ewens sampling formula is characterized as being the only partition structure which may be generated by an urn-like mechanism.  相似文献   

20.
In a previous study, Keith (1983) showed by sequential gel electrophoresis of the esterase-5 protein in Drosophila pseudoobscura that a highly polymorphic locus with many alleles can have very similar frequency distributions in populations separated by 500 km. The present work studies another highly polymorphic locus, xanthine dehydrogenase, in the same California population samples, using the same technique to distinguish allelic classes. Twelve electromorphs were found in one population and 15 in the other. Both populations shared a single very frequent (approximately 60%) allele, as well as five other alleles in low but similar frequencies. In addition, each population had an array of unique alleles present only once in one population sample but absent in the other. A statistical test against the stationary distribution for neutral alleles shows that, if the populations are at equilibrium, then purifying selection is operating on xanthine dehydrogenase. The extremely close similarity in frequency distributions of the alleles between populations for both the xanthine dehydrogenase and esterase-5 loci, despite differences in allele frequency distribution between loci, strongly emphasizes the importance of migration in influencing genic diversity in these populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号