首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 843 毫秒
1.
Correct annotation of the genetic relationships between samples is essential for population genomic studies, which could be biased by errors or omissions. To this end, we used identity-by-state (IBS) and identity-by-descent (IBD) methods to assess genetic relatedness of individuals within HapMap phase III data. We analyzed data from 1,397 individuals across 11 ethnic populations. Our results support previous studies (Pemberton et al., 2010; Kyriazopoulou-Panagiotopoulou et al., 2011) assessing unknown relatedness present within this population. Additionally, we present evidence for 1,657 novel pairwise relationships across 9 populations. Surprisingly, significant Cotterman''s coefficients of relatedness K1 (IBD1) values were detected between pairs of known parents. Furthermore, significant K2 (IBD2) values were detected in 32 previously annotated parent-child relationships. Consistent with a hypothesis of inbreeding, regions of homozygosity (ROH) were identified in the offspring of related parents, of which a subset overlapped those reported in previous studies (Gibson et al. 2010; Johnson et al. 2011). In total, we inferred 28 inbred individuals with ROH that overlapped areas of relatedness between the parents and/or IBD2 sharing at a different genomic locus between a child and a parent. Finally, 8 previously annotated parent-child relationships had unexpected K0 (IBD0) values (resulting from a chromosomal abnormality or genotype error), and 10 previously annotated second-degree relationships along with 38 other novel pairwise relationships had unexpected IBD2 (indicating two separate paths of recent ancestry). These newly described types of relatedness may impact the outcome of previous studies and should inform the design of future studies relying on the HapMap Phase III resource.  相似文献   

2.
It is an assumption of large, population-based datasets that samples are annotated accurately whether they correspond to known relationships or unrelated individuals. These annotations are key for a broad range of genetics applications. While many methods are available to assess relatedness that involve estimates of identity-by-descent (IBD) and/or identity-by-state (IBS) allele-sharing proportions, we developed a novel approach that estimates IBD0, 1, and 2 based on observed IBS within windows. When combined with genome-wide IBS information, it provides an intuitive and practical graphical approach with the capacity to analyze datasets with thousands of samples without prior information about relatedness between individuals or haplotypes. We applied the method to a commonly used Human Variation Panel consisting of 400 nominally unrelated individuals. Surprisingly, we identified identical, parent-child, and full-sibling relationships and reconstructed pedigrees. In two instances non-sibling pairs of individuals in these pedigrees had unexpected IBD2 levels, as well as multiple regions of homozygosity, implying inbreeding. This combined method allowed us to distinguish related individuals from those having atypical heterozygosity rates and determine which individuals were outliers with respect to their designated population. Additionally, it becomes increasingly difficult to identify distant relatedness using genome-wide IBS methods alone. However, our IBD method further identified distant relatedness between individuals within populations, supported by the presence of megabase-scale regions lacking IBS0 across individual chromosomes. We benchmarked our approach against the hidden Markov model of a leading software package (PLINK), showing improved calling of distantly related individuals, and we validated it using a known pedigree from a clinical study. The application of this approach could improve genome-wide association, linkage, heterozygosity, and other population genomics studies that rely on SNP genotype data.  相似文献   

3.
Genealogical inference from genetic data is essential for a variety of applications in human genetics. In genome-wide and sequencing association studies, for example, accurate inference on both recent genetic relatedness, such as family structure, and more distant genetic relatedness, such as population structure, is necessary for protection against spurious associations. Distinguishing familial relatedness from population structure with genotype data, however, is difficult because both manifest as genetic similarity through the sharing of alleles. Existing approaches for inference on recent genetic relatedness have limitations in the presence of population structure, where they either (1) make strong and simplifying assumptions about population structure, which are often untenable, or (2) require correct specification of and appropriate reference population panels for the ancestries in the sample, which might be unknown or not well defined. Here, we propose PC-Relate, a model-free approach for estimating commonly used measures of recent genetic relatedness, such as kinship coefficients and IBD sharing probabilities, in the presence of unspecified structure. PC-Relate uses principal components calculated from genome-screen data to partition genetic correlations among sampled individuals due to the sharing of recent ancestors and more distant common ancestry into two separate components, without requiring specification of the ancestral populations or reference population panels. In simulation studies with population structure, including admixture, we demonstrate that PC-Relate provides accurate estimates of genetic relatedness and improved relationship classification over widely used approaches. We further demonstrate the utility of PC-Relate in applications to three ancestrally diverse samples that vary in both size and genealogical complexity.  相似文献   

4.
The Hasemann-Elston method of linkage detection is based on the probabilities of a sib pair having 0, 1, or 2 alleles identical by descent (IBD) at a marker and a trait locus. These probabilities form a 3x3 matrix. Here, the characteristic values and characteristic vectors of this matrix were used to clarify the structure of the equations and to simplify calculations. As examples, the regression coefficients were derived for three genetic systems: a trait and a marker, two epistatic traits and two markers, and one trait locus and two markers. The last model was studied under the assumption of no crossover interference, the expression for allele IBD sharing at a trait locus was derived as a function of allele IBD sharing at two marker loci, and the regression is shown to be non-linear.  相似文献   

5.
The detection of family relationships in genetic databases is of interest in various scientific disciplines such as genetic epidemiology, population and conservation genetics, forensic science, and genealogical research. Nowadays, screening genetic databases for related individuals forms an important aspect of standard quality control procedures. Relatedness research is usually based on an allele sharing analysis of identity by state (IBS) or identity by descent (IBD) alleles. Existing IBS/IBD methods mainly aim to identify first-degree relationships (parent–offspring or full siblings) and second degree (half-siblings, avuncular, or grandparent–grandchild) pairs. Little attention has been paid to the detection of in-between first and second-degree relationships such as three-quarter siblings (3/4S) who share fewer alleles than first-degree relationships but more alleles than second-degree relationships. With the progressively increasing sample sizes used in genetic research, it becomes more likely that such relationships are present in the database under study. In this paper, we extend existing likelihood ratio (LR) methodology to accurately infer the existence of 3/4S, distinguishing them from full siblings and second-degree relatives. We use bootstrap confidence intervals to express uncertainty in the LRs. Our proposal accounts for linkage disequilibrium (LD) by using marker pruning, and we validate our methodology with a pedigree-based simulation study accounting for both LD and recombination. An empirical genome-wide array data set from the GCAT Genomes for Life cohort project is used to illustrate the method.Subject terms: Genetic markers, Population genetics  相似文献   

6.
Hu XS 《Heredity》2005,94(3):338-346
The 'spatial' pattern of the correlation of pairwise relatedness among loci within a chromosome is an important aspect for an insight into genomic evolution in natural populations. In this article, a statistical genetic method is presented for estimating the correlation of pairwise relatedness among linked loci. The probabilities of identity-in-state (IIS) are related to the probabilities of identity-by-descent (IBS) for the two- and three-loci cases. By decomposing the joint probabilities of two- or three-loci IBD, the probability of pairwise relatedness at a single locus and its correlation among linked loci can be simultaneously estimated. To provide effective statistical methods for estimation, weighted least square (LS) and maximum likelihood (ML) methods are evaluated through extensive Monte Carlo simulations. Results show that the ML method gives a better performance than the weighted LS method with haploid genotypic data. However, there are no significant differences between the two methods when two- or three-loci diploid genotypic data are employed. Compared with the optimal size for haploid genotypic data, a smaller optimal sample size is predicted with diploid genotypic data.  相似文献   

7.
We present here four nonparametric statistics for linkage analysis that test whether pairs of affected relatives share marker alleles more often than expected. These statistics are based on simulating the null distribution of a given statistic conditional on the unaffecteds' marker genotypes. Each statistic uses a different measure of marker sharing: the SimAPM statistic uses the simulation-based affected-pedigree-member measure based on identity-by-state (IBS) sharing. The SimKIN (kinship) measure is 1.0 for identity-by-descent (IBD) sharing, 0.0 for no IBD status sharing, and the kinship coefficient when the IBD status is ambiguous. The simulation-based IBD (SimIBD) statistic uses a recursive algorithm to determine the probability of two affecteds sharing a specific allele IBD. The SimISO statistic is identical to SimIBD, except that it also measures marker similarity between unaffected pairs. We evaluated our statistics on data simulated under different two-locus disease models, comparing our results to those obtained with several other nonparametric statistics. Use of IBD information produces dramatic increases in power over the SimAPM method, which uses only IBS information. The power of our best statistic in most cases meets or exceeds the power of the other nonparametric statistics. Furthermore, our statistics perform comparisons between all affected relative pairs within general pedigrees and are not restricted to sib pairs or nuclear families.  相似文献   

8.
In 1972, Haseman and Elston proposed a pioneering regression method for mapping quantitative trait loci using randomly selected sib pairs. Recently, the statistical power of their method was shown to be increased when extremely discordant sib pairs are ascertained. While the precise genetic model may not be known, prior information that constrains IBD probabilities is often available. We investigate properties of tests that are robust against model uncertainty and show that the power gain from further constraining IBD probabilities is marginal. The additional linkage information contained in the trait values can be incorporated by combining the Haseman-Elston regression method and a robust allele sharing test.  相似文献   

9.
Misspecified relationships can have serious consequences for linkage studies, resulting in either reduced power or false-positive evidence for linkage. If some individuals in the pedigree are untyped, then Mendelian errors may not be observed. Previous approaches to detection of misspecified relationships by use of genotype data were developed for sib and half-sib pairs. We extend the likelihood calculations of G?ring and Ott and Boehnke and Cox to more-general relative pairs, for which identity-by-descent (IBD) status is no longer a Markov chain, and we propose a likelihood-ratio test. We also extend the identity-by-state (IBS)-based test of Ehm and Wagner to nonsib relative pairs. The likelihood-ratio test has high power, but its drawbacks include the need to construct and apply a separate Markov chain for each possible alternative relationship and the need for simulation to assess significance. The IBS-based test is simpler but has lower power. We propose two new test statistics-conditional expected IBD (EIBD) and adjusted IBS (AIBS)-designed to retain the simplicity of IBS while increasing power by taking into account chance sharing. In simulations, the power of EIBD is generally close to that of the likelihood-ratio test. The power of AIBS is higher than that of IBS, in all cases considered. We suggest a strategy of initial screening by use of EIBD and AIBS, followed by application of the likelihood-ratio test to only a subset of relative pairs, identified by use of EIBD and AIBS. We apply the methods to a Genetic Analysis Workshop 11 data set from the Collaborative Study on the Genetics of Alcoholism.  相似文献   

10.

Background

A fundamental goal of single nucleotide polymorphism (SNP) genotyping is to determine the sharing of alleles between individuals across genomic loci. Such analyses have diverse applications in defining the relatedness of individuals (including unexpected relationships in nominally unrelated individuals, or consanguinity within pedigrees), analyzing meiotic crossovers, and identifying a broad range of chromosomal anomalies such as hemizygous deletions and uniparental disomy, and analyzing population structure.

Principal Findings

We present SNPduo, a command-line and web accessible tool for analyzing and visualizing the relatedness of any two individuals using identity by state. Using identity by state does not require prior knowledge of allele frequencies or pedigree information, and is more computationally tractable and is less affected by population stratification than calculating identity by descent probabilities. The web implementation visualizes shared genomic regions, and generates UCSC viewable tracks. The command-line version requires pedigree information for compatibility with existing software and determining specified relationships even though pedigrees are not required for IBS calculation, generates no visual output, is written in portable C++, and is well-suited to analyzing large datasets. We demonstrate how the SNPduo web tool identifies meiotic crossover positions in siblings, and confirm our findings by visualizing meiotic recombination in synthetic three-generation pedigrees. We applied SNPduo to 210 nominally unrelated Phase I / II HapMap samples and, consistent with previous findings, identified six undeclared pairs of related individuals. We further analyzed identity by state in 2,883 individuals from multiplex families with autism and identified a series of anomalies including related parents, an individual with mosaic loss of chromosome 18, an individual with maternal heterodisomy of chromosome 16, and unexplained replicate samples.

Conclusions

SNPduo provides the ability to explore and visualize SNP data to characterize the relatedness between individuals. It is compatible with, but distinct from, other established analysis software such as PLINK, and performs favorably in benchmarking studies for the analyses of genetic relatedness.  相似文献   

11.
There is currently considerable interest in testing the effects of genetic compatibility and heterozygosity on animal mate preferences. Evidence for either effect is rapidly accumulating, although results are not always clear-cut. However, correlations between mating preferences and either genetic similarity or heterozygosity are usually tested independently, and the possibility that similarity and heterozygosity may be confounded has rarely been taken into account. Here we show that measures of genetic similarity (allele sharing, relatedness) may be correlated with heterozygosity, using data from 441 human individuals genotyped at major loci in the major histocompatibility complex, and 281 peafowl (Pavo cristatus) individuals genotyped at 13 microsatellite loci. We show that average levels of allele sharing and relatedness are each significantly associated with heterozygosity in both humans and peafowl, that these relationships are influenced by the level of polymorphism, and that these similarity measures may correlate with heterozygosity in qualitatively different ways. We discuss the implications of these inter-relationships for interpretation of mate choice studies. It has recently become apparent that mating preferences for 'good genes' and 'compatible genes' may introduce discordant choice amongst individuals, since the optimal mate for one trait may not be optimal for the other, and our results are consistent with this idea. The inter-relationship between these measures of genetic quality also carries implications for the way in which mate choice studies are designed and interpreted, and generates predictions that can be tested in future research.  相似文献   

12.
Genomic selection based on the single-step genomic best linear unbiased prediction (ssGBLUP) approach is becoming an important tool in forest tree breeding. The quality of the variance components and the predictive ability of the estimated breeding values (GEBV) depends on how well marker-based genomic relationships describe the actual genetic relationships at unobserved causal loci. We investigated the performance of GEBV obtained when fitting models with genomic covariance matrices based on two identity-by-descent (IBD) and two identity-by-state (IBS) relationship measures. Multiple-trait multiple-site ssGBLUP models were fitted to diameter and stem straightness in five open-pollinated progeny trials of Eucalyptus dunnii, genotyped using the EUChip60K. We also fitted the conventional ABLUP model with a pedigree-based covariance matrix. Estimated relationships from the IBD estimators displayed consistently lower standard deviations than those from the IBS approaches. Although ssGBLUP based in IBS estimators resulted in higher trait-site heritabilities, the gain in accuracy of the relationships using IBD estimators has resulted in higher predictive ability and lower bias of GEBV, especially for low-heritability trait-site. ssGBLUP based on IBS and IBD approaches performed considerably better than the traditional ABLUP. In summary, our results advocate the use of the ssGBLUP approach jointly with the IBD relationship matrix in open-pollinated forest tree evaluation.Subject terms: Plant breeding, Genomics  相似文献   

13.
The Demerelate package offers algorithms to calculate different interindividual relatedness measurements. Three different allele sharing indices, five pairwise weighted estimates of relatedness and four pairwise weighted estimates with sample size correction are implemented to analyse kinship structures within populations. Statistics are based on randomization tests; modelling relatedness coefficients by logistic regression, modelling relatedness with geographic distance by mantel correlation and comparing mean relatedness between populations using pairwise t‐tests. Demerelate provides an advance on previous software packages by including some estimators not available in R to date, along with FIS, as well as combining analysis of relatedness and spatial structuring. An UPGMA tree visualizes genetic relatedness among individuals. Additionally, Demerelate summarizes information on data sets (allele vs. genotype frequencies; heterozygosity; FIS values). Demerelate is – to our knowledge – the first R package implementing basic allele sharing indices such as Blouin's Mxy relatedness, the estimator of Wang corrected for sample size (wangxy), estimators based on Morans I adapted to genetic relatedness as well as combining all estimators with geographic information. The R environment enables users to better understand relatedness within populations due to the flexibility of Demerelate of accepting different data sets as empirical data, reference data, geographical data and by providing intermediate results. Each statistic and tool can be used separately, which helps to understand the suitability of the data for relatedness analysis, and can be easily implemented in custom pipelines.  相似文献   

14.
Genome-wide association studies (GWASs) are commonly used for the mapping of genetic loci that influence complex traits. A problem that is often encountered in both population-based and family-based GWASs is that of identifying cryptic relatedness and population stratification because it is well known that failure to appropriately account for both pedigree and population structure can lead to spurious association. A number of methods have been proposed for identifying relatives in samples from homogeneous populations. A strong assumption of population homogeneity, however, is often untenable, and many GWASs include samples from structured populations. Here, we consider the problem of estimating relatedness in structured populations with admixed ancestry. We propose a method, REAP (relatedness estimation in admixed populations), for robust estimation of identity by descent (IBD)-sharing probabilities and kinship coefficients in admixed populations. REAP appropriately accounts for population structure and ancestry-related assortative mating by using individual-specific allele frequencies at SNPs that are calculated on the basis of ancestry derived from whole-genome analysis. In simulation studies with related individuals and admixture from highly divergent populations, we demonstrate that REAP gives accurate IBD-sharing probabilities and kinship coefficients. We apply REAP to the Mexican Americans in Los Angeles, California (MXL) population sample of release 3 of phase III of the International Haplotype Map Project; in this sample, we identify third- and fourth-degree relatives who have not previously been reported. We also apply REAP to the African American and Hispanic samples from the Women's Health Initiative SNP Health Association Resource (WHI-SHARe) study, in which hundreds of pairs of cryptically related individuals have been identified.  相似文献   

15.
Genetic diversity has emerged as an important source of variation in the ecological properties of populations, but there are few studies of genetic diversity effects on colonisation processes. This relative scarcity of studies is surprising given the influence of colonisation on species coexistence, invasion, and population persistence. Here, we manipulated relatedness in experimental populations of colonising larvae in four sessile marine invertebrates. We then examined the influence of coloniser relatedness on the number, spatial arrangement and phenotype of colonisers following permanent settlement. Overall, relatedness influenced colonisation in all four species, but the effects of relatedness on colonisation differed among species. The variable responses of species to manipulations of relatedness likely reflect differences in intensity of inter‐ and intra‐specific competition among adults, as well as the differential consequences of larval behaviours for each species. Relatedness appears to play an underappreciated role in the colonisation process, and we recommend that future studies of genetic diversity effects consider not only adult stages – the focus of most work to date – but also the importance of genetic diversity in early life history stages.  相似文献   

16.

Background

With the advent of high throughput DNA typing, dense marker maps have become available to investigate genetic diversity on specific regions of the genome. The aim of this paper was to compare two marker based estimates of the genetic diversity in specific genomic regions lying in between markers: IBD-based genetic diversity and heterozygosity.

Methods

A computer simulated population was set up with individuals containing a single 1-Morgan chromosome and 1665 SNP markers and from this one, an additional population was produced with a lower marker density i.e. 166 SNP markers. For each marker interval based on adjacent markers, the genetic diversity was estimated either by IBD probabilities or heterozygosity. Estimates were compared to each other and to the true genetic diversity. The latter was calculated for a marker in the middle of each marker interval that was not used to estimate genetic diversity.

Results

The simulated population had an average minor allele frequency of 0.28 and an LD (r2) of 0.26, comparable to those of real livestock populations. Genetic diversities estimated by IBD probabilities and by heterozygosity were positively correlated, and correlations with the true genetic diversity were quite similar for the simulated population with a high marker density, both for specific regions (r = 0.19-0.20) and large regions (r = 0.61-0.64) over the genome. For the population with a lower marker density, the correlation with the true genetic diversity turned out to be higher for the IBD-based genetic diversity.

Conclusions

Genetic diversities of ungenotyped regions of the genome (i.e. between markers) estimated by IBD-based methods and heterozygosity give similar results for the simulated population with a high marker density. However, for a population with a lower marker density, the IBD-based method gives a better prediction, since variation and recombination between markers are missed with heterozygosity.  相似文献   

17.
Association mapping is a powerful approach for exploring the molecular basis of phenotypic variations in plants. A maize (Zea mays L.) association mapping panel including 527 inbred lines with tropical, subtropical and temperate backgrounds, representing the global maize diversity, was genotyped using 1,536 single nucleotide polymorphisms (SNPs). In total, 926 SNPs with minor allele frequencies of ≥0.1 were used to estimate the pattern of genetic diversity and relatedness among individuals. The analysis revealed broad phenotypic diversity and complex genetic relatedness in the maize panel. Two different Bayesian approaches identified three specific subpopulations, which were then reconfirmed by principal component analysis (PCA) and tree-based analyses. Marker–trait associations were performed to assess the suitability of different models for false-positive correction by population structure (Q matrix/PCA) and familial kinship (K matrix) alone or in combination in this panel. The K, Q + K and PCA + K models could reduce the false positives, and the Q + K model performed slightly better for flowering time, ear height and ear diameter. Our findings suggest that this maize panel is suitable for association mapping in order to understand the relationship between genotypic and phenotypic variations for agriculturally complex quantitative traits using optimal statistical methods.  相似文献   

18.
The benefits and costs of stratification of affected-sib-pair (ASP) data were examined in three situations: (1) when there is no difference in identity-by-descent (IBD) allele sharing between stratified and unstratified ASP data sets; (2) when there is an increase in IBD allele sharing in one of the stratified groups; and (3) when the data are stratified on the basis of IBD allele-sharing status at one locus, and the stratified ASPs are then analyzed for linkage at a second locus. When there is no difference in IBD sharing between strata, a penalty is always paid for stratifying the data. The loss of power to detect linkage in the stratified ASP data sets is the result of multiple testing and the smaller sample size within individual strata. In the case in which etiologic heterogeneity (i.e., severity of phenotype, age at onset) represents genetic heterogeneity, the power to detect linkage can be increased by stratifying the ASP data. This benefit is obtained when there is sufficient IBD allele sharing and sample sizes. Once linkage has been established for a given locus, data can be stratified on the basis of IBD status at this locus and can be tested for linkage at a second locus. When the relative risk is in the vicinity of 1, the power to detect linkage at the second locus is always greater for the unstratified ASP data set. Even for values of the relative risk that diverge sufficiently from 1, with adequate sample sizes and IBD allele sharing, the benefits of stratifying ASP data are minimal.  相似文献   

19.
Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.  相似文献   

20.
Pedigree relatedness, not greenbeard genes, explains eusociality   总被引:1,自引:0,他引:1  
The evolution of eusociality, where some individuals altruistically forgo reproduction, poses a dilemma which can be solved by kin selection, i.e. by considering relatedness among cooperating individuals. Most often, such relatedness is caused by pedigree relationships between family members. However, an alternative explanation has recently emerged in an article by Wilson and Hölldobler (2005) . Wilson and Hölldobler see the ecological benefit of group living as the principal reason for sociality. In their scenario, individuals sharing the same altruistic allele (analogous to a greenbeard gene) preferentially interact with each other, regardless of pedigree relatedness. We argue that empirical evidence has the potential to answer the question of whether pedigree relatedness plays a role in the evolution of eusociality. We conclude that both phylogenetic studies and studies of intra-genomic conflict support the importance of pedigree relatedness in the evolution of eusociality.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号