首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Given a set of potential source populations, genotypes of an individual of unknown origin at a collection of markers can be used to predict the correct source population of the individual. For improved efficiency, informative markers can be chosen from a larger set of markers to maximize the accuracy of this prediction. However, selecting the loci that are individually most informative does not necessarily produce the optimal panel. Here, using genotypes from eight species--carp, cat, chicken, dog, fly, grayling, human, and maize--this univariate accumulation procedure is compared to new multivariate "greedy" and "maximin" algorithms for choosing marker panels. The procedures generally suggest similar panels, although the greedy method often recommends inclusion of loci that are not chosen by the other algorithms. In seven of the eight species, when applied to five or more markers, all methods achieve at least 94% assignment accuracy on simulated individuals, with one species--dog--producing this level of accuracy with only three markers, and the eighth species--human--requiring approximately 13-16 markers. The new algorithms produce substantial improvements over use of randomly selected markers; where differences among the methods are noticeable, the greedy algorithm leads to slightly higher probabilities of correct assignment. Although none of the approaches necessarily chooses the panel with optimal performance, the algorithms all likely select panels with performance near enough to the maximum that they all are suitable for practical use.  相似文献   

2.
Because of the rich diversity among rice accessions grown around the world in distinct environments, traditional methods using morphology, cross compatibility and geography for classifying rice accessions according to different sub-populations have given way to use of molecular markers. Having a few robust markers that can quickly assign population structure to germplasm will facilitate making more informed choices about genetic diversity within seedbanks and breeding genepools. WHICHLOCI is a computer program that selects the best combination of loci for population assignment through empirical analysis of molecular marker data. This program has been used in surveys of plant species, for fish population assignment, and in human ancestry analysis. Using WHICHLOCI, we ranked the discriminatory power of 72 DNA markers used to genotype 1,604 accessions of the USDA rice core collection, and developed panels with a minimum number of markers for population assignment with 99% or higher accuracy. A total of 14 markers with high discriminatory power, genetic diversity, allelic frequency, and polymorphic information content were identified. A panel of just four markers, RM551, RM11, RM224 and RM44, was effective in assigning germplasm accessions to any of five sub-populations with 99.4% accuracy. Panels using only three markers were effective for assignment of rice germplasm to specific sub-populations, tropical japonica, temperate japonica, indica, aus, and aromatic. Assignment to tropical japonica, temperate japonica, or indica sub-populations was highly reliable using 3–4 markers, demonstrated by the high correlation with assignment using 72 markers. However, population assignment to aus and aromatic groups was less reliable, possibly due to the smaller representation of this material in the USDA core collection. More reference cultivars may be needed to improve population assignment to these two groups. This study demonstrated that a small number of DNA markers is effective for classification of germplasm into five sub-populations in rice. This will facilitate rapid screening of large rice germplasm banks for population assignment at a modest cost. The resulting information will be valuable to researchers to verify population classification of germplasm prior to initiating genetic studies, maximizing genetic diversity between sub-populations, or minimizing cross incompatibility while maximizing allelic diversity within specific sub-populations.  相似文献   

3.
Parentage assignment is defined as the identification of the true parents of one focal offspring among a list of candidates and has been commonly used in zoological, ecological, and agricultural studies. Although likelihood‐based parentage assignment is the preferred method in most cases, it requires genotyping a predefined set of DNA markers and providing their population allele frequencies. In the present study, we proposed an alternative method of parentage assignment that does not depend on genotype data and prior information of allele frequencies. Our method employs the restriction site‐associated DNA sequencing (RAD‐seq) reads for clustering into the overlapped RAD loci among the compared individuals, following which the likelihood ratio of parentage assignment could be directly calculated using two parameters—the genome heterozygosity and error rate of sequencing reads. This method was validated on one simulated and two real data sets with the accurate assignment of true parents to focal offspring. However, our method could not provide a statistical confidence to conclude that the first ranked candidate is a true parent.  相似文献   

4.
The correct identification of hybrids is essential in avian hybridisation studies, but selection of the appropriate set of genetic markers for this purpose is at times complicated. Microsatellites and single nucleotide polymorphisms (SNPs) are currently the most commonly used markers in this field. We compare the efficiency of these two marker types, and their combination, in the identification of the threatened avian species, the greater spotted eagle and the lesser spotted eagle, as well as hybrids between the two species. We developed novel SNP markers from genome-wide distributed 122 candidate introns using only sympatric samples, and tested these markers successfully in 60 sympatric and allopatric spotted eagles using Bayesian model-based approaches. Comparatively, only one out of twelve previously described avian nuclear intron markers showed significant species-specific allele frequency difference, thus stressing the importance of selecting the proper markers. Twenty microsatellites outperformed selected nine SNPs in species identification, but were poorer in hybrid detection, whereas the resolution power of ten microsatellites remained too low for correct assignment. A combination of SNPs and microsatellites resulted in the most efficient and accurate identification of all individuals. Our study shows that the use of various sets of markers could lead to strikingly different assignment results, hybridisation studies may have been affected by too low a resolution power of used markers, and that an appropriate set of markers is essential for successful hybrid identification.  相似文献   

5.

Background

The plateau pika (Ochotona curzoniae) is an underground-dwelling mammal, native to the Tibetan plateau of China. A set of 10 polymorphic microsatellite loci has been developed earlier. Its reliability for parentage assignment has been tested in a plateau pika population. Two family groups with a known pedigree were used to validate the power of this set of markers.

Results

The error in parentage assignment using a combination of these 10 loci was very low as indicated by their power of discrimination (0.803 - 0.932), power of exclusion (0.351 - 0.887), and an effectiveness of the combined probability of exclusion in parentage assignment of 99.999%.

Conclusion

All the offspring of a family could be assigned to their biological mother; and their father or relatives could also be identified. This set of markers therefore provides a powerful and efficient tool for parentage assignment and other population analyses in the plateau pika.  相似文献   

6.
This study used simulations and a known two-generation pedigree of chinook salmon (Oncorhynchus tshawytscha) to evaluate the effect of full sibs of parents on pedigree reconstruction. Parentage analysis was conducted on 100 parent pair-offspring relationships from pedigrees with unrelated (simulation) and related (chinook salmon) candidate parents. Parentage assignment success for the chinook salmon was lower than in the simulated populations. For example, the six most variable loci (mean H(E) = 0.87) provided a mean of 97% unambiguous assignments in the simulated population and 67% unambiguous assignments for the chinook salmon. Estimates of the pairwise relatedness coefficient ((xy)) for most nonexcluded false parents and true parents of chinook salmon offspring exceeded 0.50. These results support the conclusion that closely related candidate parents decrease the power of genetic markers for pedigree reconstruction based on exclusion. Ambiguous parentage may be resolved using single parent- and parent pair-offspring likelihood analysis, however, these methods should be used with caution and they are not replacements for using more loci when many candidate parents are full sibs.  相似文献   

7.
The use of methodologies such as RAPD and AFLP for studying genetic variation in natural populations is widespread in the ecology community. Because data generated using these methods exhibit dominance, their statistical treatment is less straightforward. Several estimators have been proposed for estimating population genetic parameters, assuming simple random sampling and the Hardy-Weinberg (HW) law. The merits of these estimators remain unclear because no comparative studies of their theoretical properties have been carried out. Furthermore, ascertainment bias has not been explicitly modelled. Here, we present a comparison of a set of candidate estimators of null allele frequency (q), locus-specific heterozygosity (h) and average heterozygosity () in terms of their bias, standard error, and root mean square error (RMSE). For estimating q and h, we show that none of the estimators considered has the least RMSE over the parameter space. Our proposed zero-correction procedure, however, generally leads to estimators with improved RMSE. Assuming a beta model for the distribution of null homozygote proportions, we show how correction for ascertainment bias can be carried out using a linear transform of the sample average of h and the truncated beta-binomial likelihood. Simulation results indicate that the maximum likelihood and empirical Bayes estimator of have negligible bias and similar RMSE. Ascertainment bias in estimators of is most pronounced when the beta distribution is J-shaped and negligible when the latter is inverse J-shaped. The validity of the current findings depends importantly on the HW assumption-a point that we illustrate using data from two published studies.  相似文献   

8.
Paternity inference using highly polymorphic codominant markers is becoming common in the study of natural populations. However, multiple males are often found to be genetically compatible with each offspring tested, even when the probability of excluding an unrelated male is high. While various methods exist for evaluating the likelihood of paternity of each nonexcluded male, interpreting these likelihoods has hitherto been difficult, and no method takes account of the incomplete sampling and error-prone genetic data typical of large-scale studies of natural systems. We derive likelihood ratios for paternity inference with codominant markers taking account of typing error, and define a statistic Δ for resolving paternity. Using allele frequencies from the study population in question, a simulation program generates criteria for Δ that permit assignment of paternity to the most likely male with a known level of statistical confidence. The simulation takes account of the number of candidate males, the proportion of males that are sampled and gaps and errors in genetic data. We explore the potentially confounding effect of relatives and show that the method is robust to their presence under commonly encountered conditions. The method is demonstrated using genetic data from the intensively studied red deer ( Cervus elaphus ) population on the island of Rum, Scotland. The Windows-based computer program, CERVUS , described in this study is available from the authors. CERVUS can be used to calculate allele frequencies, run simulations and perform parentage analysis using data from all types of codominant markers.  相似文献   

9.
Individual-based population assignment tests have thus far mainly relied on the use of microsatellite loci. However, the logistic difficulty of screening large numbers of loci required to reach sufficient statistical power hampers the usefulness of microsatellites in situations of weak population structuring. Amplified fragment length polymorphisms (AFLP) represents an alternative for overcoming this logistical issue as the technique allows the user to characterize a much larger number of loci with a comparable analytical effort. In this study, an assignment test based on maximum likelihood for dominant markers was used to investigate the potential usefulness of AFLP for population assignment. We also compared assignment success achieved with AFLP with that obtained using microsatellites in a case study of low population differentiation involving whitefish (Coregonus clupeaformis) sympatric ecotypes. The analytical investigation showed that the minimum number of AFLP loci required to reach an assignment success of 95% stood within values that are easily achievable in many situations. This also showed how assignment success varied according to the number of AFLP loci used, their absolute frequency and their frequency differential and sampling errors, as well as the number of putative source populations. The case study showed that given a comparable analytical effort in the laboratory, AFLP were much more efficient than the microsatellite loci in discriminating the source of an individual among putative populations. AFLP resulted in higher assignment success at all levels of stringency and the log-likelihood differences between populations obtained with AFLP for each individual were much larger than those obtained with microsatellites. These results indicate that research involving individual-based population assignment methods should benefit importantly from the use of AFLP markers, especially in systems characterized by weak population structuring.  相似文献   

10.
In 1971, John Sved derived an approximate relationship between linkage disequilibrium (LD) and effective population size for an ideal finite population. This seminal work was extended by Sved and Feldman (Theor Pop Biol 4, 129, 1973) and Weir and Hill (Genetics 95, 477, 1980) who derived additional equations with the same purpose. These equations yield useful estimates of effective population size, as they require a single sample in time. As these estimates of effective population size are now commonly used on a variety of genomic data, from arrays of single nucleotide polymorphisms to whole genome data, some authors have investigated their bias through simulation studies and proposed corrections for different mating systems. However, the cause of the bias remains elusive. Here, we show the problems of using LD as a statistical measure and, analogously, the problems in estimating effective population size from such measure. For that purpose, we compare three commonly used approaches with a transition probability‐based method that we develop here. It provides an exact computation of LD. We show here that the bias in the estimates of LD and effective population size are partly due to low‐frequency markers, tightly linked markers or to a small total number of crossovers per generation. These biases, however, do not decrease when increasing sample size or using unlinked markers. Our results show the issues of such measures of effective population based on LD and suggest which of the method here studied should be used in empirical studies as well as the optimal distance between markers for such estimates.  相似文献   

11.
Gattepaille LM  Jakobsson M 《Genetics》2012,190(1):159-174
High-throughput genotyping and sequencing technologies can generate dense sets of genetic markers for large numbers of individuals. For most species, these data will contain many markers in linkage disequilibrium (LD). To utilize such data for population structure inference, we investigate the use of haplotypes constructed by combining the alleles at single-nucleotide polymorphisms (SNPs). We introduce a statistic derived from information theory, the gain of informativeness for assignment (GIA), which quantifies the additional information for assigning individuals to populations using haplotype data compared to using individual loci separately. Using a two-loci-two-allele model, we demonstrate that combining markers in linkage equilibrium into haplotypes always leads to nonpositive GIA, suggesting that combining the two markers is not advantageous for ancestry inference. However, for loci in LD, GIA is often positive, suggesting that assignment can be improved by combining markers into haplotypes. Using GIA as a criterion for combining markers into haplotypes, we demonstrate for simulated data a significant improvement of assigning individuals to candidate populations. For the many cases that we investigate, incorrect assignment was reduced between 26% and 97% using haplotype data. For empirical data from French and German individuals, the incorrectly assigned individuals can, for example, be decreased by 73% using haplotypes. Our results can be useful for challenging population structure and assignment problems, in particular for studies where large-scale population-genomic data are available.  相似文献   

12.
Whole-genome regression methods are being increasingly used for the analysis and prediction of complex traits and diseases. In human genetics, these methods are commonly used for inferences about genetic parameters, such as the amount of genetic variance among individuals or the proportion of phenotypic variance that can be explained by regression on molecular markers. This is so even though some of the assumptions commonly adopted for data analysis are at odds with important quantitative genetic concepts. In this article we develop theory that leads to a precise definition of parameters arising in high dimensional genomic regressions; we focus on the so-called genomic heritability: the proportion of variance of a trait that can be explained (in the population) by a linear regression on a set of markers. We propose a definition of this parameter that is framed within the classical quantitative genetics theory and show that the genomic heritability and the trait heritability parameters are equal only when all causal variants are typed. Further, we discuss how the genomic variance and genomic heritability, defined as quantitative genetic parameters, relate to parameters of statistical models commonly used for inferences, and indicate potential inferential problems that are assessed further using simulations. When a large proportion of the markers used in the analysis are in LE with QTL the likelihood function can be misspecified. This can induce a sizable finite-sample bias and, possibly, lack of consistency of likelihood (or Bayesian) estimates. This situation can be encountered if the individuals in the sample are distantly related and linkage disequilibrium spans over short regions. This bias does not negate the use of whole-genome regression models as predictive machines; however, our results indicate that caution is needed when using marker-based regressions for inferences about population parameters such as the genomic heritability.  相似文献   

13.
Recent studies of extra-pair paternity have found support for the idea that heterozygous males have an advantage in siring offspring. Most studies use DNA microsatellite loci to determine paternity and then use the same loci to estimate individual heterozygosity. However, because the likelihood of detecting extra-pair offspring depends on the combinations of parental alleles, it is possible that biases arise from particular allele combinations. This might produce false support for the influence of heterozygosity on mating behaviour. We used a simulation model to assess how large this bias might be. We found two sources of bias. First, we found a bias in the null hypothesis of a simple statistical test commonly used to test several predictions of the heterozygosity hypothesis. The use of randomization tests could eliminate this bias. Second, we found that using the same loci for both paternity and heterozygosity can cause an increase in results supporting the heterozygosity hypothesis when no effect of heterozygosity actually exists. This bias is reduced through the use of more markers with higher levels of polymorphism and heterozygosity, but can be eliminated entirely by using a separate set of markers to determine paternity and assess heterozygosity. The two sources of bias reduce evidence favouring the heterozygosity hypothesis, but do not negate all of the studies that support it. We suggest that further studies of heterozygosity and extra-pair paternity are important and likely to be informative, but our recommendations should be incorporated by researchers to improve the reliability of their conclusions.  相似文献   

14.
Although genetic association studies using unrelated individuals may be subject to bias caused by population stratification, alternative methods that are robust to population stratification, such as family-based association designs, may be less powerful. Furthermore, it is often more feasible and less expensive to collect unrelated individuals. Recently, several statistical methods have been proposed for case-control association tests in a structured population; these methods may be robust to population stratification. In the present study, we propose a quantitative similarity-based association test (QSAT) to identify association between a candidate marker and a quantitative trait of interest, through use of unrelated individuals. For the QSAT, we first determine whether two individuals are from the same subpopulation or from different subpopulations, using genotype data at a set of independent markers. We then perform an association test between the candidate marker and the quantitative trait, through incorporation of such information. Simulation results based on either coalescent models or empirical population genetics data show that the QSAT has a correct type I error rate in the presence of population stratification and that the power of the QSAT is higher than that of family-based association designs.  相似文献   

15.
MOTIVATION: Although population-based association mapping may be subject to the bias caused by population stratification, alternative methods that are robust to population stratification such as family-based linkage analysis have lower mapping resolution. Recently, various statistical methods robust to population stratification were proposed for association studies, using unrelated individuals to identify associations between candidate genes and traits of interest. The association between a candidate gene and a quantitative trait is often evaluated via a regression model with inferred population structure variables as covariates, where the residual distribution is customarily assumed to be from a symmetric and unimodal parametric family, such as a Gaussian, although this may be inappropriate for the analysis of many real-life datasets. RESULTS: In this article, we proposed a new structured association (SA) test. Our method corrects for continuous population stratification by first deriving population structure and kinship matrices through a set of random genetic markers and then modeling the relationship between trait values, genotypic scores at a candidate marker and genetic background variables through a semiparametric model, where the error distribution is modeled as a mixture of Polya trees centered around a normal family of distributions. We compared our model to the existing SA tests in terms of model fit, type I error rate, power, precision and accuracy by application to a real dataset as well as simulated datasets.  相似文献   

16.
Mapping multiple Quantitative Trait Loci by Bayesian classification   总被引:2,自引:0,他引:2       下载免费PDF全文
Zhang M  Montooth KL  Wells MT  Clark AG  Zhang D 《Genetics》2005,169(4):2305-2318
We developed a classification approach to multiple quantitative trait loci (QTL) mapping built upon a Bayesian framework that incorporates the important prior information that most genotypic markers are not cotransmitted with a QTL or their QTL effects are negligible. The genetic effect of each marker is modeled using a three-component mixture prior with a class for markers having negligible effects and separate classes for markers having positive or negative effects on the trait. The posterior probability of a marker's classification provides a natural statistic for evaluating credibility of identified QTL. This approach performs well, especially with a large number of markers but a relatively small sample size. A heat map to visualize the results is proposed so as to allow investigators to be more or less conservative when identifying QTL. We validated the method using a well-characterized data set for barley heading values from the North American Barley Genome Mapping Project. Application of the method to a new data set revealed sex-specific QTL underlying differences in glucose-6-phosphate dehydrogenase enzyme activity between two Drosophila species. A simulation study demonstrated the power of this approach across levels of trait heritability and when marker data were sparse.  相似文献   

17.
Our understanding of the distribution of worldwide human genomic diversity has greatly increased over recent years thanks to the availability of large data sets derived from short tandem repeats (STRs), insertion deletion polymorphisms (indels) and single nucleotide polymorphisms (SNPs). A concern, however, is that the current picture of worldwide human genomic diversity may be inaccurate because of biases in the selection process of genetic markers (so-called 'ascertainment bias'). To evaluate this problem, we first compared the distribution of genomic diversity between these three types of genetic markers in the populations from the HGDP-CEPH panel for evidence of bias or incongruities. In a second step, using a very relaxed set of criteria to prevent the intrusion of bias, we developed a new set of unbiased STR markers and compared the results against those from available panels. Contrarily to recent claims, our results show that the STR markers suffer from no discernible bias, and can thus be used as a baseline reference for human genetic diversity and population differentiation. The bias on SNPs is moderate compared to that on the set of indels analysed, which we recommend should be avoided for work describing the distribution of human genetic diversity or making inference on human settlement history.  相似文献   

18.
This study was designed to address issues regarding sample size and marker location that have arisen from the discovery of SNPs in the genomes of poorly characterized primate species and the application of these markers to the study of primate population genetics. We predict the effect of discovery sample size on the probability of discovering both rare and common SNPs and then compare this prediction with the proportion of common and rare SNPs discovered when different numbers of individuals are sequenced. Second, we examine the effect of genomic region on estimates of common population genetic data, comparing markers from both coding and non-coding regions of the rhesus macaque genome and the population genetic data calculated from these markers, to measure the degree and direction of bias introduced by SNPs located in coding versus non-coding regions of the genome. We found that both discovery sample size and genomic region surveyed affect SNP marker attributes and population genetic estimates, even when these are calculated from an expanded data set containing more individuals than the original discovery data set. Although none of the SNP detection methods or genomic regions tested in this study was completely uninformative, these results show that each has a different kind of genetic variation that is suitable for different purposes, and each introduces specific types of bias. Given that each SNP marker has an individual evolutionary history, we calculated that the most complete and unbiased representation of the genetic diversity present in the individual can be obtained by incorporating at least 10 individuals into the discovery sample set, to ensure the discovery of both common and rare polymorphisms.  相似文献   

19.
The capability of molecular markers to provide information of genetic structure is influenced by their number and the way they are chosen. This study evaluates the effects of single nucleotide polymorphism (SNP) number and selection strategy on estimates of germplasm diversity and population structure for different types of barley germplasm, namely cultivar and landrace. One hundred and sixty-nine barley landraces from Syria and Jordan and 171 European barley cultivars were genotyped with 1536 SNPs. Different subsets of 384 and 96 SNPs were selected from the 1536 set, based on their ability to detect diversity in landraces or cultivated barley in addition to corresponding randomly chosen subsets. All SNP sets except the landrace-optimised subsets underestimated the diversity present in the landrace germplasm, and all subsets of SNP gave similar estimates for cultivar germplasm. All marker subsets gave qualitatively similar estimates of the population structure in both germplasm sets, but the 96 SNP sets showed much lower data resolution values than the larger SNP sets. From these data we deduce that pre-selecting markers for their diversity in a germplasm set is very worthwhile in terms of the quality of data obtained. Second, we suggest that a properly chosen 384 SNP subset gives a good combination of power and economy for germplasm characterization, whereas the rather modest gain from using 1536 SNPs does not justify the increased cost and 96 markers give unacceptably low performance. Lastly, we propose a specific 384 SNP subset as a standard genotyping tool for middle-eastern landrace barley.  相似文献   

20.
In recent years multilocus data sets have been used to study the demographic history of human populations. In this paper (1) analyses previously done on 60 short tandem repeat (STR) loci are repeated on 30 restriction site polymorphism (RSP) markers; (2) relative population weights are estimated from the RSP data set and compared to previously published estimates from STR and craniometric data sets; and (3) computer simulations are performed to show the effects of ascertainment bias on relative population weight estimates. Not surprisingly, given that the RSP markers were originally identified in a small panel of Caucasians, estimates of relative population weights are biased and the European population weight is artificially inflated. However, the effects of ascertainment bias are not apparent in a principal components plot or estimates of FST. Ascertainment bias can have a large effect in other genetic systems with inherently low heterozygosity such as Alus or single nucleotide polymorphisms (SNPs), and care must be taken to have prior knowledge of how polymorphic markers in a given data set were originally identified. Otherwise, results can be skewed and interpretations faulty.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号