首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Methods for detecting genetic linkage are more powerful when they fully use all of the data collected from pedigrees. We first discuss a method for obtaining the probability that a pedigree member has a given genotype, conditional on the phenotypes of his relatives. We then develop a rapid method to obtain the conditional probabilities of identity-by-descent sharing of marker alleles for all related pairs of individuals from extended pedigrees. The method assumes that the individuals are noninbred and that the relationship between genotype and phenotype is known for the marker locus studied. The probabilities of identity-by-descent sharing among relative pairs, conditional on marker phenotype information, can then be used in any of the model free tests for linkage between a trait locus and a marker locus.  相似文献   

2.
Gene diversity is sometimes estimated from samples that contain inbred or related individuals. If inbred or related individuals are included in a sample, then the standard estimator for gene diversity produces a downward bias caused by an inflation of the variance of estimated allele frequencies. We develop an unbiased estimator for gene diversity that relies on kinship coefficients for pairs of individuals with known relationship and that reduces to the standard estimator when all individuals are noninbred and unrelated. Applying our estimator to data simulated based on allele frequencies observed for microsatellite loci in human populations, we find that the new estimator performs favorably compared with the standard estimator in terms of bias and similarly in terms of mean squared error. For human population-genetic data, we find that a close linear relationship previously seen between gene diversity and distance from East Africa is preserved when adjusting for the inclusion of close relatives.  相似文献   

3.
In genome-wide association studies, results have been improved through imputation of a denser marker set based on reference haplotypes and phasing of the genotype data. To better handle very large sets of reference haplotypes, pre-phasing with only study individuals has been suggested. We present a possible problem which is aggravated when pre-phasing strategies are used, and suggest a modification avoiding the resulting issues with application to the MaCH tool, although the underlying problem is not specific to that tool.We evaluate the effectiveness of our remedy to a subset of Hapmap data, comparing the original version of MaCH and our modified approach. Improvements are demonstrated on the original data (phase switch error rate decreasing by 10%), but the differences are more pronounced in cases where the data is augmented to represent the presence of closely related individuals, especially when siblings are present (30% reduction in switch error rate in the presence of children, 47% reduction in the presence of siblings).The main conclusion of this investigation is that existing statistical methods for phasing and imputation of unrelated individuals might give results of sub-par quality if a subset of study individuals nonetheless are related. As the populations collected for general genome-wide association studies grow in size, including relatives might become more common. If a general GWAS framework for unrelated individuals would be employed on datasets with some related individuals, such as including familial data or material from domesticated animals, caution should also be taken regarding the quality of haplotypes.Our modification to MaCH is available on request and straightforward to implement. We hope that this mode, if found to be of use, could be integrated as an option in future standard distributions of MaCH.  相似文献   

4.
The International Haplotype Map Project (HapMap) has provided an essential database for studies of human population genetics and genome-wide association. Phases I and II of the HapMap project generated genotype data across ∼3 million SNP loci in 270 individuals representing four populations. Phase III provides dense genotype data on ∼1.5 million SNPs, generated by Illumina and Affymetrix platforms in a larger set of individuals. Release 3 of phase III of the HapMap contains 1397 individuals from 11 populations, including 250 of the original 270 phase I and phase II individuals and 1147 additional individuals. Although some known relationships among the phase III individuals have been described in the data release, the genotype data that are currently available provide an opportunity to empirically ascertain previously unknown relationships. We performed a systematic analysis of genetic relatedness and were able not only to confirm the reported relationships, but also to detect numerous additional, previously unidentified pairs of close relatives in the HapMap sample. The inferred relative pairs make it possible to propose standardized subsets of unrelated individuals for use in future studies in which relatedness needs to be clearly defined.  相似文献   

5.
Although genetic association studies using unrelated individuals may be subject to bias caused by population stratification, alternative methods that are robust to population stratification, such as family-based association designs, may be less powerful. Furthermore, it is often more feasible and less expensive to collect unrelated individuals. Recently, several statistical methods have been proposed for case-control association tests in a structured population; these methods may be robust to population stratification. In the present study, we propose a quantitative similarity-based association test (QSAT) to identify association between a candidate marker and a quantitative trait of interest, through use of unrelated individuals. For the QSAT, we first determine whether two individuals are from the same subpopulation or from different subpopulations, using genotype data at a set of independent markers. We then perform an association test between the candidate marker and the quantitative trait, through incorporation of such information. Simulation results based on either coalescent models or empirical population genetics data show that the QSAT has a correct type I error rate in the presence of population stratification and that the power of the QSAT is higher than that of family-based association designs.  相似文献   

6.
The identification of related and unrelated individuals from molecular marker data is often difficult, particularly when no pedigree information is available and the data set is large. High levels of relatedness or inbreeding can influence genotype frequencies and thus genetic marker evaluation, as well as the accurate inference of hidden genetic structure. Identification of related and unrelated individuals is also important in breeding programmes, to inform decisions about breeding pairs and translocations. We present Friends and Family, a Windows executable program with a graphical user interface that identifies unrelated individuals from a pairwise relatedness matrix or table generated in programs such as coancestry and genalex . Friends and Family outputs a list of samples that are all unrelated to each other, based on a user‐defined relatedness cut‐off value. This unrelated data set can be used in downstream analyses, such as marker evaluation or inference of genetic structure. The results can be compared to that of the full data set to determine the effect related individuals have on the analyses. We demonstrate one of the applications of the program: how the removal of related individuals altered the Hardy–Weinberg equilibrium test outcome for microsatellite markers in an empirical data set. Friends and Family can be obtained from https://github.com/DeondeJager/Friends-and-Family .  相似文献   

7.
In this paper, different strategies to test for association in samples with related individuals designed for linkage studies are compared. Because no independent controls are available, a family-based association test and case-control tests corrected for the presence of related individuals in which unaffected relatives are used as controls were tested. When unrelated controls are available, additional strategies including selection of a single case per family considering either all families or a subset of linked families, are also considered. Analyses are performed on the simulated dataset, blind to the answers. The case-control test corrected for the presence of related individuals is the most powerful strategy to detect three loci associated with the disease under study. Using a correction factor for the case-control test performed conditional on the marker information rather than unconditional does not impact the power significantly.  相似文献   

8.
Usually, genetic correlations are estimated from breeding designs in the laboratory or greenhouse. However, estimates of the genetic correlation for natural populations are lacking, mostly because pedigrees of wild individuals are rarely known. Recently Lynch (1999) proposed a formula to estimate the genetic correlation in the absence of data on pedigree. This method has been shown to be particularly accurate provided a large sample size and a minimum (20%) proportion of relatives. Lynch (1999) proposed the use of the bootstrap to estimate standard errors associated with genetic correlations, but did not test the reliability of such a method. We tested the bootstrap and showed the jackknife can provide valid estimates of the genetic correlation calculated with the Lynch formula. The occurrence of undefined estimates, combined with the high number of replicates involved in the bootstrap, means there is a high probability of obtaining a biased upward, incomplete bootstrap, even when there is a high fraction of related pairs in a sample. It is easier to obtain complete jackknife estimates for which all the pseudovalues have been defined. We therefore recommend the use of the jackknife to estimate the genetic correlation with the Lynch formula. Provided data can be collected for more than two individuals at each location, we propose a group sampling method that produces low standard errors associated with the jackknife, even when there is a low fraction of relatives in a sample.  相似文献   

9.
Abstract Lynch (1999) proposed a method for estimation of genetic correlations from phenotypic measurements of individuals for which no pedigree information is available. This method assumes that shared environmental effects do not contribute to the similarity of relatives, and it is expected to perform best when sample sizes are large, many individuals in the sample are paired with close relatives, and heritability of the traits is high. We tested the practicality of this method for field biologists by using it to estimate genetic correlations from measurements of field‐caught waterstriders {Aquarius remigis). Results for sample sizes of less than 100 pairs were often unstable or undefined, and even with more than 500 pairs only half of those correlations that had been found to be significant in standard laboratory experiments were statistically significant in this study. Statistically removing the influence of environmental effects (shared between relatives) weakened the estimates, possibly by removing some of the genetic similarity between relatives. However, the method did generate statistically significant estimates for some genetic correlations. Lynch (1999) anticipated the problems found, and proposed another method that uses estimates of relatedness between members of pairs (from molecular marker data) to improve the estimates of genetic correlations, but that approach has yet to be tested in the field.  相似文献   

10.
M C Bink  J A Van Arendonk 《Genetics》1999,151(1):409-420
Augmentation of marker genotypes for ungenotyped individuals is implemented in a Bayesian approach via the use of Markov chain Monte Carlo techniques. Marker data on relatives and phenotypes are combined to compute conditional posterior probabilities for marker genotypes of ungenotyped individuals. The presented procedure allows the analysis of complex pedigrees with ungenotyped individuals to detect segregating quantitative trait loci (QTL). Allelic effects at the QTL were assumed to follow a normal distribution with a covariance matrix based on known QTL position and identity by descent probabilities derived from flanking markers. The Bayesian approach estimates variance due to the single QTL, together with polygenic and residual variance. The method was empirically tested through analyzing simulated data from a complex granddaughter design. Ungenotyped dams were related to one or more sons or grandsires in the design. Heterozygosity of the marker loci and size of QTL were varied. Simulation results indicated a significant increase in power when ungenotyped dams were included in the analysis.  相似文献   

11.
The calculation of heritabilities and genetic correlations, which are necessary for predicting evolutionary responses, requires knowledge about the relatedness between individuals. This information is often not directly available, especially not for natural populations, but can be inferred by using molecular markers such as allozymes. Several methods based on inferred relatedness from marker data have been developed to estimate heritabilities and genetic correlations in natural populations. Most methods use maximum-likelihood procedures to assign pairs or groups of individuals to predefined discrete relatedness classes (e.g., half sibs and unrelated individuals). The Ritland method, on the other hand, uses method of moments estimators to estimate pairwise relatedness among individuals as continuous values. We tested both the Ritland method and a maximum-likelihood method by applying them to a greenhouse population consisting of seed families of the herb Mimulus guttatus and comparing the results to the ones from a frequently used standard method based on half-sib families. Estimates of genetic correlations were far from accurate, especially when we used the Ritland method. However, this study shows that even with a few variable allozyme loci, it is possible to get qualitatively good indications about the presence of heritable genetic variation from marker-based methods, even though both methods underestimated it.  相似文献   

12.
Wang J 《Genetics》2012,191(1):183-194
Quite a few methods have been proposed to infer sibship and parentage among individuals from their multilocus marker genotypes. They are all based on Mendelian laws either qualitatively (exclusion methods) or quantitatively (likelihood methods), have different optimization criteria, and use different algorithms in searching for the optimal solution. The full-likelihood method assigns sibship and parentage relationships among all sampled individuals jointly. It is by far the most accurate method, but is computationally prohibitive for large data sets with many individuals and many loci. In this article I propose a new likelihood-based method that is computationally efficient enough to handle large data sets. The method uses the sum of the log likelihoods of pairwise relationships in a configuration as the score to measure its plausibility, where log likelihoods of pairwise relationships are calculated only once and stored for repeated use. By analyzing several empirical and many simulated data sets, I show that the new method is more accurate than pairwise likelihood and exclusion-based methods, but is slightly less accurate than the full-likelihood method. However, the new method is computationally much more efficient than the full-likelihood method, and for the cases of both sexes polygamous and markers with genotyping errors, it can be several orders faster. The new method can handle a large sample with thousands of individuals and the number of markers limited only by the computer memory.  相似文献   

13.
Cooperatively breeding animals live in social groups in which some individuals help to raise the offspring of others, often at the expense of their own reproduction. Kin selection—when individuals increase their inclusive fitness by aiding genetic relatives—is a powerful explanation for the evolution of cooperative breeding, particularly because most groups consist of family members. However, recent molecular studies have revealed that many cooperative groups also contain unrelated immigrants, and the processes responsible for the formation and maintenance of non-kin coalitions are receiving increasing attention. Here, I provide the first systematic review of group structure for all 213 species of cooperatively breeding birds for which data are available. Although the majority of species (55%) nest in nuclear family groups, cooperative breeding by unrelated individuals is more common than previously recognized: 30% nest in mixed groups of relatives and non-relatives, and 15% nest primarily with non-relatives. Obligate cooperative breeders are far more likely to breed with non-kin than are facultative cooperators, indicating that when constraints on independent breeding are sufficiently severe, the direct benefits of group membership can substitute for potential kin-selected benefits. I review three patterns of dispersal that give rise to social groups with low genetic relatedness, and I discuss the selective pressures that favour the formation of such groups. Although kin selection has undoubtedly been crucial to the origin of most avian social systems, direct benefits have subsequently come to play a predominant role in some societies, allowing cooperation to persist despite low genetic relatedness.  相似文献   

14.
The problem of ascertainment for linkage analysis.   总被引:2,自引:0,他引:2       下载免费PDF全文
It is generally believed that ascertainment corrections are unnecessary in linkage analysis, provided individuals are selected for study solely on the basis of trait phenotype and not on the basis of marker genotype. The theoretical rationale for this is that standard linkage analytic methods involve conditioning likelihoods on all the trait data, which may be viewed as an application of the ascertainment assumption-free (AAF) method of Ewens and Shute. In this paper, we show that when the observed pedigree structure depends on which relatives within a pedigree happen to have been the probands (proband-dependent, or PD, sampling) conditioning on all the trait data is not a valid application of the AAF method and will result in asymptotically biased estimates of genetic parameters (except under single ascertainment). Furthermore, this result holds even if the recombination fraction R is the only parameter of interest. Since the lod score is proportional to the likelihood of the marker data conditional on all the trait data, this means that when data are obtained under PD sampling the lod score will yield asymptotically biased estimates of R, and that so-called mod scores (i.e., lod scores maximized over both R and parameters theta of the trait distribution) will yield asymptotically biased estimates of R and theta. Furthermore, the problem appears to be intractable, in the sense that it is not possible to formulate the correct likelihood conditional on observed pedigree structure. In this paper we do not investigate the numerical magnitude of the bias, which may be small in many situations. On the other hand, virtually all linkage data sets are collected under PD sampling. Thus, the existence of this bias will be the rule rather than the exception in the usual applications.  相似文献   

15.
Wang J 《Genetical research》2007,89(3):135-153
Knowledge of the genetic relatedness among individuals is essential in diverse research areas such as behavioural ecology, conservation biology, quantitative genetics and forensics. How to estimate relatedness accurately from genetic marker information has been explored recently by many methodological studies. In this investigation I propose a new likelihood method that uses the genotypes of a triad of individuals in estimating pairwise relatedness (r). The idea is to use a third individual as a control (reference) in estimating the r between two other individuals, thus reducing the chance of genes identical in state being mistakenly inferred as identical by descent. The new method allows for inbreeding and accounts for genotype errors in data. Analyses of both simulated and human microsatellite and SNP datasets show that the quality of r estimates (measured by the root mean squared error, RMSE) is generally improved substantially by the new triadic likelihood method (TL) over the dyadic likelihood method and five moment estimators. Simulations also show that genotyping errors/mutations, when ignored, result in underestimates of r for related dyads, and that incorporating a model of typing errors in the TL method improves r estimates for highly related dyads but impairs those for loosely related or unrelated dyads. The effects of inbreeding were also investigated through simulations. It is concluded that, because most dyads in a natural population are unrelated or only loosely related, the overall performance of the new triadic likelihood method is the best, offering r estimates with a RMSE that is substantially smaller than the five commonly used moment estimators and the dyadic likelihood method.  相似文献   

16.
Sibship reconstruction from genetic data with typing errors   总被引:13,自引:0,他引:13  
Wang J 《Genetics》2004,166(4):1963-1979
Likelihood methods have been developed to partition individuals in a sample into full-sib and half-sib families using genetic marker data without parental information. They invariably make the critical assumption that marker data are free of genotyping errors and mutations and are thus completely reliable in inferring sibships. Unfortunately, however, this assumption is rarely tenable for virtually all kinds of genetic markers in practical use and, if violated, can severely bias sibship estimates as shown by simulations in this article. I propose a new likelihood method with simple and robust models of typing error incorporated into it. Simulations show that the new method can be used to infer full- and half-sibships accurately from marker data with a high error rate and to identify typing errors at each locus in each reconstructed sib family. The new method also improves previous ones by adopting a fresh iterative procedure for updating allele frequencies with reconstructed sibships taken into account, by allowing for the use of parental information, and by using efficient algorithms for calculating the likelihood function and searching for the maximum-likelihood configuration. It is tested extensively on simulated data with a varying number of marker loci, different rates of typing errors, and various sample sizes and family structures and applied to two empirical data sets to demonstrate its usefulness.  相似文献   

17.
Abney M 《Genetics》2008,179(3):1577-1590
Computing identity-by-descent sharing between individuals connected through a large, complex pedigree is a computationally demanding task that often cannot be done using exact methods. What I present here is a rapid computational method for estimating, in large complex pedigrees, the probability that pairs of alleles are IBD given the single-point genotype data at that marker for all individuals. The method can be used on pedigrees of essentially arbitrary size and complexity without the need to divide the individuals into separate subpedigrees. I apply the method to do qualitative trait linkage mapping using the nonparametric sharing statistic S(pairs). The validity of the method is demonstrated via simulation studies on a 13-generation 3028-person pedigree with 700 genotyped individuals. An analysis of an asthma data set of individuals in this pedigree finds four loci with P-values <10(-3) that were not detected in prior analyses. The mapping method is fast and can complete analyses of approximately 150 affected individuals within this pedigree for thousands of markers in a matter of hours.  相似文献   

18.
STR markers for kinship analysis   总被引:1,自引:0,他引:1  
The analysis of short tandem repeats is a widely used method to estimate relatedness between closely related populations or individuals. The AmpFlSTR PCR Amplification Kit has 15 highly variable autosomal markers of tetranucleotide repeats and is principally made to identify individuals and first- or second-degree relatives. However, in many studies one is searching for individuals who are related through more than one generation. We wanted to test whether the amplification kit can also be used to identify more distantly related individuals. Therefore we compared 16 different methods that calculate genetic distance with regard to each method's ability to cluster more distantly related individuals from two test families. Among all the tested methods Nei et al.'s (1983) DA distance performed well in clustering family members within a group of unrelated individuals for a broad range of scenarios. However, second-degree relatives were difficult to cluster with any of the examined methods when other family members were absent. With a simulation we further estimated how many markers would actually be needed to detect a certain degree of relatedness. According to this simulation, one would need at least 123 independent microsatellite markers to detect third-degree relatives with 90% probability. In conclusion, the 15 STR markers in the amplification kit are suitable for detecting only very closely related individuals or entire families.  相似文献   

19.
Molecular marker data collected from natural populations allows information on genetic relationships to be established without referencing an exact pedigree. Numerous methods have been developed to exploit the marker data. These fall into two main categories: method of moment estimators and likelihood estimators. Method of moment estimators are essentially unbiased, but utilise weighting schemes that are only optimal if the analysed pair is unrelated. Thus, they differ in their efficiency at estimating parameters for different relationship categories. Likelihood estimators show smaller mean squared errors but are much more biased. Both types of estimator have been used in variance component analysis to estimate heritability. All marker-based heritability estimators require that adequate levels of the true relationship be present in the population of interest and that adequate amounts of informative marker data are available. I review the different approaches to relationship estimation, with particular attention to optimizing the use of this relationship information in subsequent variance component estimation.  相似文献   

20.
Regional-based association analysis instead of individual testing of each SNP was introduced in genome-wide association studies to increase the power of gene mapping, especially for rare genetic variants. For regional association tests, the kernel machine-based regression approach was recently proposed as a more powerful alternative to collapsing-based methods. However, the vast majority of existing algorithms and software for the kernel machine-based regression are applicable only to unrelated samples. In this paper, we present a new method for the kernel machine-based regression association analysis of quantitative traits in samples of related individuals. The method is based on the GRAMMAR+ transformation of phenotypes of related individuals, followed by use of existing kernel machine-based regression software for unrelated samples. We compared the performance of kernel-based association analysis on the material of the Genetic Analysis Workshop 17 family sample and real human data by using our transformation, the original untransformed trait, and environmental residuals. We demonstrated that only the GRAMMAR+ transformation produced type I errors close to the nominal value and that this method had the highest empirical power. The new method can be applied to analysis of related samples by using existing software for kernel-based association analysis developed for unrelated samples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号