首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Jinliang Wang 《Molecular ecology》2016,25(19):4692-4711
In molecular ecology and conservation genetics studies, the important parameter of effective population size (Ne) is increasingly estimated from a single sample of individuals taken at random from a population and genotyped at a number of marker loci. Several estimators are developed, based on the information of linkage disequilibrium (LD), heterozygote excess (HE), molecular coancestry (MC) and sibship frequency (SF) in marker data. The most popular is the LD estimator, because it is more accurate than HE and MC estimators and is simpler to calculate than SF estimator. However, little is known about the accuracy of LD estimator relative to that of SF and about the robustness of all single‐sample estimators when some simplifying assumptions (e.g. random mating, no linkage, no genotyping errors) are violated. This study fills the gaps and uses extensive simulations to compare the biases and accuracies of the four estimators for different population properties (e.g. bottlenecks, nonrandom mating, haplodiploid), marker properties (e.g. linkage, polymorphisms) and sample properties (e.g. numbers of individuals and markers) and to compare the robustness of the four estimators when marker data are imperfect (with allelic dropouts). Extensive simulations show that SF estimator is more accurate, has a much wider application scope (e.g. suitable to nonrandom mating such as selfing, haplodiploid species, dominant markers) and is more robust (e.g. to the presence of linkage and genotyping errors of markers) than the other estimators. An empirical data set from a Yellowstone grizzly bear population was analysed to demonstrate the use of the SF estimator in practice.  相似文献   

2.
Estimation of effective population sizes from data on genetic markers   总被引:9,自引:0,他引:9  
The effective population size (Ne) is an important parameter in ecology, evolutionary biology and conservation biology. It is, however, notoriously difficult to estimate, mainly because of the highly stochastic nature of the processes of inbreeding and genetic drift for which Ne is usually defined and measured, and because of the many factors (such as time and spatial scales, systematic forces) confounding such processes. Many methods have been developed in the past three decades to estimate the current, past and ancient effective population sizes using different information extracted from some genetic markers in a sample of individuals. This paper reviews the methodologies proposed for estimating Ne from genetic data using information on heterozygosity excess, linkage disequilibrium, temporal changes in allele frequency, and pattern and amount of genetic variation within and between populations. For each methodology, I describe mainly the logic and genetic model on which it is based, the data required and information used, the interpretation of the estimate obtained, some results from applications to simulated or empirical datasets and future developments that are needed.  相似文献   

3.
The prediction of identity by descent (IBD) probabilities is essential for all methods that map quantitative trait loci (QTL). The IBD probabilities may be predicted from marker genotypes and/or pedigree information. Here, a method is presented that predicts IBD probabilities at a given chromosomal location given data on a haplotype of markers spanning that position. The method is based on a simplification of the coalescence process, and assumes that the number of generations since the base population and effective population size is known, although effective size may be estimated from the data. The probability that two gametes are IBD at a particular locus increases as the number of markers surrounding the locus with identical alleles increases. This effect is more pronounced when effective population size is high. Hence as effective population size increases, the IBD probabilities become more sensitive to the marker data which should favour finer scale mapping of the QTL. The IBD probability prediction method was developed for the situation where the pedigree of the animals was unknown (i.e. all information came from the marker genotypes), and the situation where, say T, generations of unknown pedigree are followed by some generations where pedigree and marker genotypes are known.  相似文献   

4.
Related individuals are identical by descent (IBD) at a genetic locus if they share the same DNA material from a common ancestor. Continuous gamete IBD data consist of the lengths of (in order) IBD and non-IBD regions along the genomes for gametes segregating from two related individuals and can be used to distinguish different relationships. Under the assumption that the crossovers follow a Poisson process, we show that the exact calculation of the likelihood of a particular relationship for a given gamete IBD datum is tractable. Greatgrandparent--greatgrandchild and cousin relationships are used as examples to illustrate our methods.  相似文献   

5.
Seventy sorghum inbred lines which formed part of the Queensland Department of Primary Industries (QDPI) sorghum breeding program were screened with 104 previously mapped RFLP markers. The lines were related by pedigree and consisted of ancestral source lines, intermediate lines and recent releases from the program. We compared the effect of defining marker alleles using either identity by state (IBS) or identity by descent (IBD) on our capacity to trace markers through the pedigree and detect evidence of selection for particular alleles. Allelic identities defined using IBD were much more sensitive for detecting non-Mendelian segregation in this pedigree. Only one marker allele showed significant evidence of selection when IBS was used compared with ten regions with particular allelic identities when IBD was used. Regions under selection were compared with the location of QTLs for agronomic traits known to be under selection in the breeding program. Only two of the ten regions were associated with known QTLs that matched with knowledge of the agronomic characteristics of the ancestral lines. Some of the other regions were hypothesised to be associated with genes for particular traits based on the properties of the ancestral source lines.  相似文献   

6.
7.
Prediction of total genetic value using genome-wide dense marker maps   总被引:63,自引:0,他引:63  
Meuwissen TH  Hayes BJ  Goddard ME 《Genetics》2001,157(4):1819-1829
Recent advances in molecular genetic techniques will make dense marker maps available and genotyping many individuals for these markers feasible. Here we attempted to estimate the effects of approximately 50,000 marker haplotypes simultaneously from a limited number of phenotypic records. A genome of 1000 cM was simulated with a marker spacing of 1 cM. The markers surrounding every 1-cM region were combined into marker haplotypes. Due to finite population size N(e) = 100, the marker haplotypes were in linkage disequilibrium with the QTL located between the markers. Using least squares, all haplotype effects could not be estimated simultaneously. When only the biggest effects were included, they were overestimated and the accuracy of predicting genetic values of the offspring of the recorded animals was only 0.32. Best linear unbiased prediction of haplotype effects assumed equal variances associated to each 1-cM chromosomal segment, which yielded an accuracy of 0.73, although this assumption was far from true. Bayesian methods that assumed a prior distribution of the variance associated with each chromosome segment increased this accuracy to 0.85, even when the prior was not correct. It was concluded that selection on genetic values predicted from markers could substantially increase the rate of genetic gain in animals and plants, especially if combined with reproductive techniques to shorten the generation interval.  相似文献   

8.
A method is proposed to calculate the maximum likelihood estimate of gene frequency and linkage disequilibrium from disease-codominant marker conditional data. The method is illustrated using data on sickle-cell anemia and Duchenne muscular dystrophy and linked polymorphic restriction endonuclease cleavage sites.  相似文献   

9.
10.
Identity by descent (IBD) can be reliably detected for long shared DNA segments, which are found in related individuals. However, many studies contain cohorts of unrelated individuals that share only short IBD segments. New sequencing technologies facilitate identification of short IBD segments through rare variants, which convey more information on IBD than common variants. Current IBD detection methods, however, are not designed to use rare variants for the detection of short IBD segments. Short IBD segments reveal genetic structures at high resolution. Therefore, they can help to improve imputation and phasing, to increase genotyping accuracy for low-coverage sequencing and to increase the power of association studies. Since short IBD segments are further assumed to be old, they can shed light on the evolutionary history of humans. We propose HapFABIA, a computational method that applies biclustering to identify very short IBD segments characterized by rare variants. HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next-generation sequencing, but can also be applied to DNA microarray data. Especially in next-generation sequencing data, HapFABIA exploits rare variants for IBD detection. HapFABIA significantly outperformed competing algorithms at detecting short IBD segments on artificial and simulated data with rare variants. HapFABIA identified 160 588 different short IBD segments characterized by rare variants with a median length of 23 kb (mean 24 kb) in data for chromosome 1 of the 1000 Genomes Project. These short IBD segments contain 752 000 single nucleotide variants (SNVs), which account for 39% of the rare variants and 23.5% of all variants. The vast majority—152 000 IBD segments—are shared by Africans, while only 19 000 and 11 000 are shared by Europeans and Asians, respectively. IBD segments that match the Denisova or the Neandertal genome are found significantly more often in Asians and Europeans but also, in some cases exclusively, in Africans. The lengths of IBD segments and their sharing between continental populations indicate that many short IBD segments from chromosome 1 existed before humans migrated out of Africa. Thus, rare variants that tag these short IBD segments predate human migration from Africa. The software package HapFABIA is available from Bioconductor. All data sets, result files and programs for data simulation, preprocessing and evaluation are supplied at http://www.bioinf.jku.at/research/short-IBD.  相似文献   

11.
Estimating recombination rates from population genetic data.   总被引:21,自引:0,他引:21  
P Fearnhead  P Donnelly 《Genetics》2001,159(3):1299-1318
We introduce a new method for estimating recombination rates from population genetic data. The method uses a computationally intensive statistical procedure (importance sampling) to calculate the likelihood under a coalescent-based model. Detailed comparisons of the new algorithm with two existing methods (the importance sampling method of Griffiths and Marjoram and the MCMC method of Kuhner and colleagues) show it to be substantially more efficient. (The improvement over the existing importance sampling scheme is typically by four orders of magnitude.) The existing approaches not infrequently led to misleading results on the problems we investigated. We also performed a simulation study to look at the properties of the maximum-likelihood estimator of the recombination rate and its robustness to misspecification of the demographic model.  相似文献   

12.
The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this "chromosome painting" can be summarized as a "coancestry matrix," which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.  相似文献   

13.
Thomas SC  Hill WG 《Genetics》2000,155(4):1961-1972
Previous techniques for estimating quantitative genetic parameters, such as heritability in populations where exact relationships are unknown but are instead inferred from marker genotypes, have used data from individuals on a pairwise level only. At this level, families are weighted according to the number of pairs within which each family appears, hence by size rather than information content, and information from multiple relationships is lost. Estimates of parameters are therefore not the most efficient achievable. Here, Markov chain Monte Carlo techniques have been used to partition the population into complete sibships, including, if known, prior knowledge of the distribution of family sizes. These pedigrees have then been used with restricted maximum likelihood under an animal model to estimate quantitative genetic parameters. Simulations to compare the properties of parameter estimates with those of existing techniques indicate that the use of sibship reconstruction is superior to earlier methods, having lower mean square errors and showing nonsignificant downward bias. In addition, sibship reconstruction allows the estimation of population allele frequencies that account for the relationships within the sample, so prior knowledge of allele frequencies need not be assumed. Extensions to these techniques allow reconstruction of half sibships when some or all of the maternal genotypes are known.  相似文献   

14.
We propose a general likelihood-based approach to the linkage analysis of qualitative and quantitative traits using identity by descent (IBD) data from sib-pairs. We consider the likelihood of IBD data conditional on phenotypes and test the null hypothesis of no linkage between a marker locus and a gene influencing the trait using a score test in the recombination fraction theta between the two loci. This method unifies the linkage analysis of qualitative and quantitative traits into a single inferential framework, yielding a simple and intuitive test statistic. Conditioning on phenotypes avoids unrealistic random sampling assumptions and allows sib-pairs from differing ascertainment mechanisms to be incorporated into a single likelihood analysis. In particular, it allows the selection of sib-pairs based on their trait values and the analysis of only those pairs having the most informative phenotypes. The score test is based on the full likelihood, i.e. the likelihood based on all phenotype data rather than just differences of sib-pair phenotypes. Considering only phenotype differences, as in Haseman and Elston (1972) and Kruglyak and Lander (1995), may result in important losses in power. The linkage score test is derived under general genetic models for the trait, which may include multiple unlinked genes. Population genetic assumptions, such as random mating or linkage equilibrium at the trait loci, are not required. This score test is thus particularly promising for the analysis of complex human traits. The score statistic readily extends to accommodate incomplete IBD data at the test locus, by using the hidden Markov model implemented in the programs MAPMAKER/SIBS and GENEHUNTER (Kruglyak and Lander, 1995; Kruglyak et al., 1996). Preliminary simulation studies indicate that the linkage score test generally matches or outperforms the Haseman-Elston test, the largest gains in power being for selected samples of sib-pairs with extreme phenotypes.  相似文献   

15.
We present data on the population genetics of cystic fibrosis (CF) in Bulgaria, obtained by comprehensive mutation analysis and the construction of intragenic microsatellite haplotypes. The sample of 262 CF alleles analysed is representative of the patients diagnosed during the period of referral and of the three main ethnic groups in the country. ΔF508 accounted for 100% of Gypsy CF alleles, which thus differed significantly from both Bulgarians and ethnic Turks. In Bulgarian and Turkish CF patients, 92% of the mutant alleles were identified, yielding a total of 25 different mutations, of which only 7 occurred at frequencies higher than 1%. The findings were compared to other European populations and to the distribution of phenylketonuria mutations. Genetic distances and population trees demonstrated that in the south-eastern tip of Europe, the overall distribution of CF mutations and polymorphic haplotypes is very close to that of Mediterranean populations, with a high frequency of N1303K and G542X, a large number of rare mutations and a prevalence of the 23 31 13 haplotype in association with ΔF508. These findings are consistent with a main role for the Neolithic expansion in the shaping of the CF mutation spectrum in Bulgaria and southern Europe. Received: 1 September 1996  相似文献   

16.
The sampling theory for the infinite site model taking into account the phylogenetic relationship between the alleles is developed for those cases in which two or three alleles are observed in the sample. From this theory a maximum likelihood estimate of θ = 4 can be obtained. Unlike the maximum likelihood estimate of θ based on the infinite allele model or the number of segregating sites, this estimate of θ is a function of the frequencies of the alleles. This method is used to estimate θ for mitochondrial DNA in Drosophila melanogaster and D. virilis from data obtained by Shah and Langley (1979. Nature (London)281, 696–699) using restriction endonucleases.  相似文献   

17.
18.
Interest has surged recently in removing siblings from population genetic data sets before conducting downstream analyses. However, even if the pedigree is inferred correctly, this has the potential to do more harm than good. We used computer simulations and empirical samples of coho salmon to evaluate strategies for adjusting samples to account for family structure. We compared performance in full samples and sibling‐reduced samples of estimators of allele frequency (), population differentiation () and effective population size (). Results: (i) unless simulated samples included large family groups together with a component of unrelated individuals, removing siblings generally reduced precision of and ; (ii) based on the linkage disequilibrium method was largely unbiased using full random samples but became increasingly upwardly biased under aggressive purging of siblings. Under nonrandom sampling (some families over‐represented), using full samples was downwardly biased; removing just the right ‘Goldilocks’ fraction of siblings could produce an unbiased estimate, but this sweet spot varied widely among scenarios; (iii) weighting individuals based on the inferred pedigree (to produce a best linear unbiased estimator, BLUE) maximized precision of when the inferred pedigree was correct but performed poorly when the pedigree was wrong; (iv) a variant of sibling removal that leaves intact small sibling groups appears to be more robust to errors in inferences about family structure. Our results illustrate the complex challenges posed by presence of family structure, suggest that no single optimal solution exists and argue for caution in adjusting population genetic data sets for the presence of putative siblings without fully understanding the consequences.  相似文献   

19.
Reliable selfing rate estimates from imperfect population genetic data   总被引:2,自引:0,他引:2  
Genotypic frequencies at codominant marker loci in population samples convey information on mating systems. A classical way to extract this information is to measure heterozygote deficiencies (FIS) and obtain the selfing rate s from FIS = s/(2 - s), assuming inbreeding equilibrium. A major drawback is that heterozygote deficiencies are often present without selfing, owing largely to technical artefacts such as null alleles or partial dominance. We show here that, in the absence of gametic disequilibrium, the multilocus structure can be used to derive estimates of s independent of FIS and free of technical biases. Their statistical power and precision are comparable to those of FIS, although they are sensitive to certain types of gametic disequilibria, a bias shared with progeny-array methods but not FIS. We analyse four real data sets spanning a range of mating systems. In two examples, we obtain s = 0 despite positive FIS, strongly suggesting that the latter are artefactual. In the remaining examples, all estimates are consistent. All the computations have been implemented in a open-access and user-friendly software called rmes (robust multilocus estimate of selfing) available at http://ftp.cefe.cnrs.fr, and can be used on any multilocus data. Being able to extract the reliable information from imperfect data, our method opens the way to make use of the ever-growing number of published population genetic studies, in addition to the more demanding progeny-array approaches, to investigate selfing rates.  相似文献   

20.
Gay J  Myers S  McVean G 《Genetics》2007,177(2):881-894
Gene conversion plays an important part in shaping genetic diversity in populations, yet estimating the rate at which it occurs is difficult because of the short lengths of DNA involved. We have developed a new statistical approach to estimating gene conversion rates from genetic variation, by extending an existing model for haplotype data in the presence of crossover events. We show, by simulation, that when the rate of gene conversion events is at least comparable to the rate of crossover events, the method provides a powerful approach to the detection of gene conversion and estimation of its rate. Application of the method to data from the telomeric X chromosome of Drosophila melanogaster, in which crossover activity is suppressed, indicates that gene conversion occurs approximately 400 times more often than crossover events. We also extend the method to estimating variable crossover and gene conversion rates and estimate the rate of gene conversion to be approximately 1.5 times higher than the crossover rate in a region of human chromosome 1 with known recombination hotspots.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号