首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
DeGiorgio M  Jankovic I  Rosenberg NA 《Genetics》2010,186(4):1367-1387
Gene diversity, a commonly used measure of genetic variation, evaluates the proportion of heterozygous individuals expected at a locus in a population, under the assumption of Hardy-Weinberg equilibrium. When using the standard estimator of gene diversity, the inclusion of related or inbred individuals in a sample produces a downward bias. Here, we extend a recently developed estimator shown to be unbiased in a diploid autosomal sample that includes known related or inbred individuals to the general case of arbitrary ploidy. We derive an exact formula for the variance of the new estimator, H, and present an approximation to facilitate evaluation of the variance when each individual is related to at most one other individual in a sample. When examining samples from the human X chromosome, which represent a mixture of haploid and diploid individuals, we find that H performs favorably compared to the standard estimator, both in theoretical computations of mean squared error and in data analysis. We thus propose that H is a useful tool in characterizing gene diversity in samples of arbitrary ploidy that contain related or inbred individuals.  相似文献   

2.
Measurement of temporal change in allele frequencies represents an indirect method for estimating the genetically effective size of populations. When allele frequencies are estimated for gene markers that display dominant gene expression, such as, e.g. random amplified polymorphic DNA (RAPD) and amplified fragment length polymorphism (AFLP) markers, the estimates can be seriously biased. We quantify bias for previous allele frequency estimators and present a new expression that is generally less biased and provides a more precise assessment of temporal allele frequency change. We further develop an estimator for effective population size that is appropriate when dealing with dominant gene markers. Comparison with estimates based on codominantly expressed genes, such as allozymes or microsatellites, indicates that about twice as many loci or sampled individuals are required when using dominant markers to achieve the same precision.  相似文献   

3.
Unbiased estimator for genetic drift and effective population size   总被引:2,自引:0,他引:2       下载免费PDF全文
Jorde PE  Ryman N 《Genetics》2007,177(2):927-935
Amounts of genetic drift and the effective size of populations can be estimated from observed temporal shifts in sample allele frequencies. Bias in this so-called temporal method has been noted in cases of small sample sizes and when allele frequencies are highly skewed. We characterize bias in commonly applied estimators under different sampling plans and propose an alternative estimator for genetic drift and effective size that weights alleles differently. Numerical evaluations of exact probability distributions and computer simulations verify that this new estimator yields unbiased estimates also when based on a modest number of alleles and loci. At the cost of a larger standard deviation, it thus eliminates the bias associated with earlier estimators. The new estimator should be particularly useful for microsatellite loci and panels of SNPs, representing a large number of alleles, many of which will occur at low frequencies.  相似文献   

4.
Best linear unbiased allele-frequency estimation in complex pedigrees   总被引:4,自引:0,他引:4  
McPeek MS  Wu X  Ober C 《Biometrics》2004,60(2):359-367
Many types of genetic analyses depend on estimates of allele frequencies. We consider the problem of allele-frequency estimation based on data from related individuals. The motivation for this work is data collected on the Hutterites, an isolated founder population, so we focus particularly on the case in which the relationships among the sampled individuals are specified by a large, complex pedigree for which maximum likelihood estimation is impractical. For this case, we propose to use the best linear unbiased estimator (BLUE) of allele frequency. We derive this estimator, which is equivalent to the quasi-likelihood estimator for this problem, and we describe an efficient algorithm for computing the estimate and its variance. We show that our estimator has certain desirable small-sample properties in common with the maximum likelihood estimator (MLE) for this problem. We treat both the case when parental origin of each allele is known and when it is unknown. The results are extended to prediction of allele frequency in some set of individuals S based on genotype data collected on a set of individuals R. We compare the mean-squared error of the BLUE, the commonly used naive estimator (sample frequency) and the MLE when the latter is feasible to calculate. The results indicate that although the MLE performs the best of the three, the BLUE is close in performance to the MLE and is substantially easier to calculate, making it particularly useful for large complex pedigrees in which MLE calculation is impractical or infeasible. We apply our method to allele-frequency estimation in a Hutterite data set.  相似文献   

5.
Anderson AD  Weir BS 《Genetics》2007,176(1):421-440
A maximum-likelihood estimator for pairwise relatedness is presented for the situation in which the individuals under consideration come from a large outbred subpopulation of the population for which allele frequencies are known. We demonstrate via simulations that a variety of commonly used estimators that do not take this kind of misspecification of allele frequencies into account will systematically overestimate the degree of relatedness between two individuals from a subpopulation. A maximum-likelihood estimator that includes F(ST) as a parameter is introduced with the goal of producing the relatedness estimates that would have been obtained if the subpopulation allele frequencies had been known. This estimator is shown to work quite well, even when the value of F(ST) is misspecified. Bootstrap confidence intervals are also examined and shown to exhibit close to nominal coverage when F(ST) is correctly specified.  相似文献   

6.
E G Williamson  M Slatkin 《Genetics》1999,152(2):755-761
We develop a maximum-likelihood framework for using temporal changes in allele frequencies to estimate the number of breeding individuals in a population. We use simulations to compare the performance of this estimator to an F-statistic estimator of variance effective population size. The maximum-likelihood estimator had a lower variance and smaller bias. Taking advantage of the likelihood framework, we extend the model to include exponential growth and show that temporal allele frequency data from three or more sampling events can be used to test for population growth.  相似文献   

7.
Mouse genetic resources include inbred strains, recombinant inbred lines, chromosome substitution strains, heterogeneous stocks, and the Collaborative Cross (CC). These resources were generated through various breeding designs that potentially produce different genetic architectures, including the level of diversity represented, the spatial distribution of the variation, and the allele frequencies within the resource. By combining sequencing data for 16 inbred strains and the recorded history of related strains, the architecture of genetic variation in mouse resources was determined. The most commonly used resources harbor only a fraction of the genetic diversity of Mus musculus, which is not uniformly distributed thus resulting in many blind spots. Only resources that include wild-derived inbred strains from subspecies other than M. m. domesticus have no blind spots and a uniform distribution of the variation. Unlike other resources that are primarily suited for gene discovery, the CC is the only resource that can support genome-wide network analysis, which is the foundation of systems genetics. The CC captures significantly more genetic diversity with no blind spots and has a more uniform distribution of the variation than all other resources. Furthermore, the distribution of allele frequencies in the CC resembles that seen in natural populations like humans in which many variants are found at low frequencies and only a minority of variants are common. We conclude that the CC represents a dramatic improvement over existing genetic resources for mammalian systems biology applications.  相似文献   

8.
Determining the origin of individuals in mixed population samples is key in many ecological, conservation and management contexts. Genetic data can be analyzed using genetic stock identification (GSI), where the origin of single individuals is determined using Individual Assignment (IA) and population proportions are estimated with Mixed Stock Analysis (MSA). In such analyses, allele frequencies in a reference baseline are required. Unknown individuals or mixture proportions are assigned to source populations based on the likelihood that their multilocus genotypes occur in a particular baseline sample. Representative sampling of populations included in a baseline is important when designing and performing GSI. Here, we investigate the effects of family sampling on GSI, using both simulated and empirical genotypes for Atlantic salmon (Salmo salar). We show that nonrepresentative sampling leading to inclusion of close relatives in a reference baseline may introduce bias in estimated proportions of contributing populations in a mixed sample, and increases the amount of incorrectly assigned individual fish. Simulated data further show that the induced bias increases with increasing family structure, but that it can be partly mitigated by increased baseline population sample sizes. Results from standard accuracy tests of GSI (using only a reference baseline and/or self‐assignment) gave a false and elevated indication of the baseline power and accuracy to identify stock proportions and individuals. These findings suggest that family structure in baseline population samples should be quantified and its consequences evaluated, before carrying out GSI.  相似文献   

9.
Geographic variation in microsatellite allele frequencies was assessed at nine sites in two regional vocal dialects of the parrot Amazona auropalliata (yellow-naped amazon) to test for correspondence between dialects and population structure. There was no relationship between the genetic distances between individuals and their dialect membership. High rates of gene flow were estimated between vocal dialects based on genetic differentiation. In addition, 5.5% of pairs of individuals compared across the dialect boundary were estimated to be related at the level of half siblings, indicating that dispersal is ongoing. The number of effective migrants per generation between dialects estimated with the microsatellite data was roughly one-seventh the number estimated with mitochondrial control region sequence data from the same individuals, suggesting that gene flow may be female-biased. Together, these results suggest that the observed mosaic pattern of geographic variation in vocalizations is maintained by learning of local call types by immigrant birds after dispersal. We found no evidence that ongoing habitat fragmentation has contributed to cryptic population structure.  相似文献   

10.
The effect of the Neolithic expansion on European molecular diversity   总被引:5,自引:0,他引:5  
We performed extensive and realistic simulations of the colonization process of Europe by Neolithic farmers, as well as their potential admixture and competition with local Palaeolithic hunter-gatherers. We find that minute amounts of gene flow between Palaeolithic and Neolithic populations should lead to a massive Palaeolithic contribution to the current gene pool of Europeans. This large Palaeolithic contribution is not expected under the demic diffusion (DD) model, which postulates that agriculture diffused over Europe by a massive migration of individuals from the Near East. However, genetic evidence in favour of this model mainly consisted in the observation of allele frequency clines over Europe, which are shown here to be equally probable under a pure DD or a pure acculturation model. The examination of the consequence of range expansions on single nucleotide polymorphism (SNP) diversity reveals that an ascertainment bias consisting of selecting SNPs with high frequencies will promote the observation of genetic clines (which are not expected for random SNPs) and will lead to multimodal mismatch distributions. We conclude that the different patterns of molecular diversity observed for Y chromosome and mitochondrial DNA can be at least partly owing to an ascertainment bias when selecting Y chromosome SNPs for studying European populations.  相似文献   

11.
Kitakado T  Kitada S  Obata Y  Kishino H 《Genetics》2006,173(4):2063-2072
In stock enhancement programs, it is important to assess mixing rates of released individuals in stocks. For this purpose, genetic stock identification has been applied. The allele frequencies in a composite population are expressed as a mixture of the allele frequencies in the natural and released populations. The estimation of mixing rates is possible, under successive sampling from the composite population, on the basis of temporal changes in allele frequencies. The allele frequencies in the natural population may be estimated from those of the composite population in the preceding year. However, it should be noted that these frequencies can vary between generations due to genetic drift. In this article, we develop a new method for simultaneous estimation of mixing rates and genetic drift in a stock enhancement program. Numerical simulation shows that our procedure estimates the mixing rate with little bias. Although the genetic drift is underestimated when the amount of information is small, reduction of the bias is possible by analyzing multiple unlinked loci. The method was applied to real data on mud crab stocking, and the result showed a yearly variation in the mixing rate.  相似文献   

12.
Maximum-likelihood estimation of relatedness   总被引:8,自引:0,他引:8  
Milligan BG 《Genetics》2003,163(3):1153-1167
Relatedness between individuals is central to many studies in genetics and population biology. A variety of estimators have been developed to enable molecular marker data to quantify relatedness. Despite this, no effort has been given to characterize the traditional maximum-likelihood estimator in relation to the remainder. This article quantifies its statistical performance under a range of biologically relevant sampling conditions. Under the same range of conditions, the statistical performance of five other commonly used estimators of relatedness is quantified. Comparison among these estimators indicates that the traditional maximum-likelihood estimator exhibits a lower standard error under essentially all conditions. Only for very large amounts of genetic information do most of the other estimators approach the likelihood estimator. However, the likelihood estimator is more biased than any of the others, especially when the amount of genetic information is low or the actual relationship being estimated is near the boundary of the parameter space. Even under these conditions, the amount of bias can be greatly reduced, potentially to biologically irrelevant levels, with suitable genetic sampling. Additionally, the likelihood estimator generally exhibits the lowest root mean-square error, an indication that the bias in fact is quite small. Alternative estimators restricted to yield only biologically interpretable estimates exhibit lower standard errors and greater bias than do unrestricted ones, but generally do not improve over the maximum-likelihood estimator and in some cases exhibit even greater bias. Although some nonlikelihood estimators exhibit better performance with respect to specific metrics under some conditions, none approach the high level of performance exhibited by the likelihood estimator across all conditions and all metrics of performance.  相似文献   

13.
The forensic genetics field is generating extensive population data on polymorphism of short tandem repeats (STR) markers in globally distributed samples. In this study we explored and quantified the informative power of these datasets to address issues related to human evolution and diversity, by using two online resources: an allele frequency dataset representing 141 populations summing up to almost 26 thousand individuals; a genotype dataset consisting of 42 populations and more than 11 thousand individuals. We show that the genetic relationships between populations based on forensic STRs are best explained by geography, as observed when analysing other worldwide datasets generated specifically to study human diversity. However, the global level of genetic differentiation between populations (as measured by a fixation index) is about half the value estimated with those other datasets, which contain a much higher number of markers but much less individuals. We suggest that the main factor explaining this difference is an ascertainment bias in forensics data resulting from the choice of markers for individual identification. We show that this choice results in average low variance of heterozygosity across world regions, and hence in low differentiation among populations. Thus, the forensic genetic markers currently produced for the purpose of individual assignment and identification allow the detection of the patterns of neutral genetic structure that characterize the human population but they do underestimate the levels of this genetic structure compared to the datasets of STRs (or other kinds of markers) generated specifically to study the diversity of human populations.  相似文献   

14.
Multilocus DNA fingerprinting methods have been used extensively to address genetic issues in wildlife populations. Hypotheses concerning population subdivision and differing levels of diversity can be addressed through the use of the similarity index (S), a band-sharing coefficient, and many researchers construct hypothesis tests with S based on the work of Lynch. It is shown in the present study, through mathematical analysis and through simulations, that estimates of the variance of a mean S based on Lynch's work are downwardly biased. An unbiased alternative is presented and mathematically justified. It is shown further, however, that even when the bias in Lynch's estimator is corrected, the estimator is highly imprecise compared with estimates based on an alternative approach such as 'parametric bootstrapping' of allele frequencies. Also discussed are permutation tests and their construction given the interdependence of Ss which share individuals. A simulation illustrates how some published misuses of these tests can lead to incorrect conclusions in hypothesis testing.  相似文献   

15.
Using striped bass (Morone saxatilis) and six multiplexed microsatellite markers, we evaluated procedures for estimating allele frequencies by pooling DNA from multiple individuals, a method suggested as cost-effective relative to individual genotyping. Using moment-based estimators, we estimated allele frequencies in experimental DNA pools and found that the three primary laboratory steps, DNA quantitation and pooling, PCR amplification, and electrophoresis, accounted for 23, 48, and 29%, respectively, of the technical variance of estimates in pools containing DNA from 2-24 individuals. Exact allele-frequency estimates could be made for pools of sizes 2-8, depending on the locus, by using an integer-valued estimator. Larger pools of size 12 and 24 tended to yield biased estimates; however, replicates of these estimates detected allele frequency differences among pools with different allelic compositions. We also derive an unbiased estimator of Hardy-Weinberg disequilibrium coefficients that uses multiple DNA pools and analyze the cost-efficiency of DNA pooling. DNA pooling yields the most potential cost savings when a large number of loci are employed using a large number of individuals, a situation becoming increasingly common as microsatellite loci are developed in increasing numbers of taxa.  相似文献   

16.
An estimator of relative risk in a case control study has been proposed in terms of observed cell frequencies and the probability of disease. The bias of the usual estimator i.e odds ratio as compared to the new estimator has been workedout. The expression of Mean Square Error of proposed estimator has been derived in situations where probability of disease is exactly known and when it is estimated through an independent survey. It has been observed that there is a serious error using odds ratio as an estimate of relative risk when probability of disease is not negligible. In such situations the proposed estimator can be used with advantage.  相似文献   

17.
Several estimators have been proposed that use molecular marker data to infer the degree of relatedness for pairs of individuals. The objective of this study was to evaluate the performance of seven estimators when applied to marker data of a set of 33 key individuals from a large complex apple pedigree. The evaluation considered different scenarios of allele frequencies and different numbers of marker loci. The method of moments estimators were Similarity, Queller-Goodknight, Lynch-Ritland and Wang. The maximum likelihood estimators were Thompson, Anderson-Weir and Jacquard. The pedigree-based coancestry coefficients were taken as the point of reference in calculating correlations and root mean square error (RMSE). The marker data comprised 86 multi-allelic SSR markers on 17 linkage groups, covering 11 Morgans. Additionally, we simulated 10 datasets conditional on the real pedigree to support the results on the real dataset. None of the estimators outperformed the others. Knowledge of allele frequencies appeared to be the most influential, i.e., the highest correlations and lowest RMSE were found when frequencies from the founder population were available. When equal allele frequencies were used, all estimators resulted in very similar, but on average lower, correlations. The use of allele frequencies estimated from the set of 33 individuals gave, on average, the poorest results. The maximum likelihood estimators and the Lynch-Ritland estimator were the most sensitive to allele frequencies. The results from the simulation study fully supported the trends in results of the real dataset. This study indicated that high correlations (up to 0.90) and small RMSE (below 0.03), may be obtained when population allelic frequencies are available. In this scenario, the performances of the various estimators were similar, but seemed to favor the maximum likelihood estimators. In the absence of reliable allele frequencies the method of moments estimators were shown to be more robust. The number of marker loci influenced the average performance of the estimators; however, the ranking was not affected. Correlations up to 0.80 were obtained when two markers per chromosome and appropriate allele frequencies were available. Adding more markers to the current dataset may lead to marginal improvements.  相似文献   

18.
Wang J 《Molecular ecology》2004,13(10):3169-3178
Knowledge of the genetic relatedness between a pair of individuals is important in many research areas of quantitative genetics, conservation genetics, evolution and ecology. Many estimators have been developed to estimate such pairwise relatedness (r) using codominant markers, such as microsatellites and enzymes. In contrast, only two estimators are proposed to use dominant markers, such as random amplified polymorphic DNAs (RAPDs) and amplified fragment length polymorphisms (AFLPs), in relatedness inference. They are both biased estimators, and their statistical properties and robustness to the sampling errors in allele frequency have not been investigated. In this short paper, I propose two new pairwise relatedness estimators for dominant markers, and compare them in precision, accuracy and robustness to sampling with the two previous estimators using simulations. It was found that the new estimator based on the least squares approach is unbiased when allele frequencies are known or estimated from a sample without correcting for sampling effects. It has, however, a low precision and as a result, an intermediate overall performance among the four estimators in terms of the mean squared deviation (MSD) of estimates from actual values of r. The new estimator based on a similarity index is slightly biased but has generally the lowest MSD among the four estimators compared, regardless of the number of loci, type of actual relationships, allele frequencies known or estimated from samples. Simulations also show that the confidence intervals estimated by bootstrapping are appropriate for different estimators provided that the number of loci used in the estimation is not small.  相似文献   

19.
Estimation of allele frequencies for VNTR loci   总被引:9,自引:4,他引:5       下载免费PDF全文
VNTR loci provide valuable information for a number of fields of study involving human genetics, ranging from forensics (DNA fingerprinting and paternity testing) to linkage analysis and population genetics. Alleles of a VNTR locus are simply fragments obtained from a particular portion of the DNA molecule and are defined in terms of their length. The essential element of a VNTR fragment is the repeat, which is a short sequence of basepairs. The core of the fragment is composed of a variable number of identical repeats that are linked in tandem. A sample of fragments from a population of individuals exhibits substantial variation in length because of variation in the number of repeats. Each distinct fragment length defines an allele, but any given fragment is measured with error. Therefore the observed distribution of fragment lengths is not discrete but is continuous, and determination of distinct allele classes is not straightforward. A mixture model is the natural statistical method for estimating the allele frequencies of VNTR loci. In this article we develop nonparametric methods for obtaining the distribution of allele sizes and estimates of their frequencies. Methods for obtaining maximum-likelihood estimates are developed. In addition, we suggest an empirical Bayes method to improve the maximum-likelihood estimates of the gene frequencies; the empirical Bayes procedure effects a local smoothing. The latter method works particularly well when measurement error is large relative to the repeat size, because the estimated distribution of allele frequencies when maximum likelihood is used is unreliable because of an alternating pattern of over- and underestimation. We define alleles and estimate the allele frequencies for two VNTR loci from the human genome (D17S79 and D2S44), from data obtained from Lifecodes, Inc.  相似文献   

20.
Messer PW  Neher RA 《Genetics》2012,191(2):593-605
Selective sweeps are typically associated with a local reduction of genetic diversity around the adaptive site. However, selective sweeps can also quickly carry neutral mutations to observable population frequencies if they arise early in a sweep and hitchhike with the adaptive allele. We show that the interplay between mutation and exponential amplification through hitchhiking results in a characteristic frequency spectrum of the resulting novel haplotype variation that depends only on the ratio of the mutation rate and the selection coefficient of the sweep. On the basis of this result, we develop an estimator for the selection coefficient driving a sweep. Since this estimator utilizes the novel variation arising from mutations during a sweep, it does not rely on preexisting variation and can also be applied to loci that lack recombination. Compared with standard approaches that infer selection coefficients from the size of dips in genetic diversity around the adaptive site, our estimator requires much shorter sequences but sampled at high population depth to capture low-frequency variants; given such data, it consistently outperforms standard approaches. We investigate analytically and numerically how the accuracy of our estimator is affected by the decay of the sweep pattern over time as a consequence of random genetic drift and discuss potential effects of recombination, soft sweeps, and demography. As an example for its use, we apply our estimator to deep sequencing data from human immunodeficiency virus populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号