首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Correct annotation of the genetic relationships between samples is essential for population genomic studies, which could be biased by errors or omissions. To this end, we used identity-by-state (IBS) and identity-by-descent (IBD) methods to assess genetic relatedness of individuals within HapMap phase III data. We analyzed data from 1,397 individuals across 11 ethnic populations. Our results support previous studies (Pemberton et al., 2010; Kyriazopoulou-Panagiotopoulou et al., 2011) assessing unknown relatedness present within this population. Additionally, we present evidence for 1,657 novel pairwise relationships across 9 populations. Surprisingly, significant Cotterman''s coefficients of relatedness K1 (IBD1) values were detected between pairs of known parents. Furthermore, significant K2 (IBD2) values were detected in 32 previously annotated parent-child relationships. Consistent with a hypothesis of inbreeding, regions of homozygosity (ROH) were identified in the offspring of related parents, of which a subset overlapped those reported in previous studies (Gibson et al. 2010; Johnson et al. 2011). In total, we inferred 28 inbred individuals with ROH that overlapped areas of relatedness between the parents and/or IBD2 sharing at a different genomic locus between a child and a parent. Finally, 8 previously annotated parent-child relationships had unexpected K0 (IBD0) values (resulting from a chromosomal abnormality or genotype error), and 10 previously annotated second-degree relationships along with 38 other novel pairwise relationships had unexpected IBD2 (indicating two separate paths of recent ancestry). These newly described types of relatedness may impact the outcome of previous studies and should inform the design of future studies relying on the HapMap Phase III resource.  相似文献   

2.
It is an assumption of large, population-based datasets that samples are annotated accurately whether they correspond to known relationships or unrelated individuals. These annotations are key for a broad range of genetics applications. While many methods are available to assess relatedness that involve estimates of identity-by-descent (IBD) and/or identity-by-state (IBS) allele-sharing proportions, we developed a novel approach that estimates IBD0, 1, and 2 based on observed IBS within windows. When combined with genome-wide IBS information, it provides an intuitive and practical graphical approach with the capacity to analyze datasets with thousands of samples without prior information about relatedness between individuals or haplotypes. We applied the method to a commonly used Human Variation Panel consisting of 400 nominally unrelated individuals. Surprisingly, we identified identical, parent-child, and full-sibling relationships and reconstructed pedigrees. In two instances non-sibling pairs of individuals in these pedigrees had unexpected IBD2 levels, as well as multiple regions of homozygosity, implying inbreeding. This combined method allowed us to distinguish related individuals from those having atypical heterozygosity rates and determine which individuals were outliers with respect to their designated population. Additionally, it becomes increasingly difficult to identify distant relatedness using genome-wide IBS methods alone. However, our IBD method further identified distant relatedness between individuals within populations, supported by the presence of megabase-scale regions lacking IBS0 across individual chromosomes. We benchmarked our approach against the hidden Markov model of a leading software package (PLINK), showing improved calling of distantly related individuals, and we validated it using a known pedigree from a clinical study. The application of this approach could improve genome-wide association, linkage, heterozygosity, and other population genomics studies that rely on SNP genotype data.  相似文献   

3.
The determination of the relationship between a pair of individuals is a fundamental application of genetics. Previously, we and others have demonstrated that identity-by-descent (IBD) information generated from high-density single-nucleotide polymorphism (SNP) data can greatly improve the power and accuracy of genetic relationship detection. Whole-genome sequencing (WGS) marks the final step in increasing genetic marker density by assaying all single-nucleotide variants (SNVs), and thus has the potential to further improve relationship detection by enabling more accurate detection of IBD segments and more precise resolution of IBD segment boundaries. However, WGS introduces new complexities that must be addressed in order to achieve these improvements in relationship detection. To evaluate these complexities, we estimated genetic relationships from WGS data for 1490 known pairwise relationships among 258 individuals in 30 families along with 46 population samples as controls. We identified several genomic regions with excess pairwise IBD in both the pedigree and control datasets using three established IBD methods: GERMLINE, fastIBD, and ISCA. These spurious IBD segments produced a 10-fold increase in the rate of detected false-positive relationships among controls compared to high-density microarray datasets. To address this issue, we developed a new method to identify and mask genomic regions with excess IBD. This method, implemented in ERSA 2.0, fully resolved the inflated cryptic relationship detection rates while improving relationship estimation accuracy. ERSA 2.0 detected all 1st through 6th degree relationships, and 55% of 9th through 11th degree relationships in the 30 families. We estimate that WGS data provides a 5% to 15% increase in relationship detection power relative to high-density microarray data for distant relationships. Our results identify regions of the genome that are highly problematic for IBD mapping and introduce new software to accurately detect 1st through 9th degree relationships from whole-genome sequence data.  相似文献   

4.
The detection of family relationships in genetic databases is of interest in various scientific disciplines such as genetic epidemiology, population and conservation genetics, forensic science, and genealogical research. Nowadays, screening genetic databases for related individuals forms an important aspect of standard quality control procedures. Relatedness research is usually based on an allele sharing analysis of identity by state (IBS) or identity by descent (IBD) alleles. Existing IBS/IBD methods mainly aim to identify first-degree relationships (parent–offspring or full siblings) and second degree (half-siblings, avuncular, or grandparent–grandchild) pairs. Little attention has been paid to the detection of in-between first and second-degree relationships such as three-quarter siblings (3/4S) who share fewer alleles than first-degree relationships but more alleles than second-degree relationships. With the progressively increasing sample sizes used in genetic research, it becomes more likely that such relationships are present in the database under study. In this paper, we extend existing likelihood ratio (LR) methodology to accurately infer the existence of 3/4S, distinguishing them from full siblings and second-degree relatives. We use bootstrap confidence intervals to express uncertainty in the LRs. Our proposal accounts for linkage disequilibrium (LD) by using marker pruning, and we validate our methodology with a pedigree-based simulation study accounting for both LD and recombination. An empirical genome-wide array data set from the GCAT Genomes for Life cohort project is used to illustrate the method.Subject terms: Genetic markers, Population genetics  相似文献   

5.
In this work we develop a novel algorithm for reconstructing the genomes of ancestral individuals, given genotype or sequence data from contemporary individuals and an extended pedigree of family relationships. A pedigree with complete genomes for every individual enables the study of allele frequency dynamics and haplotype diversity across generations, including deviations from neutrality such as transmission distortion. When studying heritable diseases, ancestral haplotypes can be used to augment genome-wide association studies and track disease inheritance patterns. The building blocks of our reconstruction algorithm are segments of Identity-By-Descent (IBD) shared between two or more genotyped individuals. The method alternates between identifying a source for each IBD segment and assembling IBD segments placed within each ancestral individual. Unlike previous approaches, our method is able to accommodate complex pedigree structures with hundreds of individuals genotyped at millions of SNPs.We apply our method to an Old Order Amish pedigree from Lancaster, Pennsylvania, whose founders came to North America from Europe during the early 18th century. The pedigree includes 1338 individuals from the past 12 generations, 394 with genotype data. The motivation for reconstruction is to understand the genetic basis of diseases segregating in the family through tracking haplotype transmission over time. Using our algorithm thread, we are able to reconstruct an average of 224 ancestral individuals per chromosome. For these ancestral individuals, on average we reconstruct 79% of their haplotypes. We also identify a region on chromosome 16 that is difficult to reconstruct—we find that this region harbors a short Amish-specific copy number variation and the gene HYDIN. thread was developed for endogamous populations, but can be applied to any extensive pedigree with the recent generations genotyped. We anticipate that this type of practical ancestral reconstruction will become more common and necessary to understand rare and complex heritable diseases in extended families.  相似文献   

6.
Stefanov VT 《Genetics》2000,156(3):1403-1410
A methodology is introduced for numerical evaluation, with any given accuracy, of the cumulative probabilities of the proportion of genome shared identical by descent (IBD) on chromosome segments by two individuals in a grandparent-type relationship. Programs are provided in the popular software package Maple for rapidly implementing such evaluations in the cases of grandchild-grandparent and great-grandchild-great-grandparent relationships. Our results can be used to identify chromosomal segments that may contain disease genes. Also, exact P values in significance testing for resemblance of either a grandparent with a grandchild or a great-grandparent with a great-grandchild can be calculated. The genomic continuum model, with Haldane's model for the crossover process, is assumed. This is the model that has been used recently in the genetics literature devoted to IBD calculations. Our methodology is based on viewing the model as a special exponential family and elaborating on recent research results for such families.  相似文献   

7.
Homologous long segments along the genomes of close or remote relatives that are identical by descent (IBD) from a common ancestor provide clues for recent events in human genetics. We set out to extensively map such IBD segments in large cohorts and investigate their distribution within and across different populations. We report analysis of several data sets, demonstrating that IBD is more common than expected by na?ve models of population genetics. We show that the frequency of IBD pairs is population dependent and can be used to cluster individuals into populations, detect a homogeneous subpopulation within a larger cohort, and infer bottleneck events in such a subpopulation. Specifically, we show that Ashkenazi Jewish individuals are all connected through transitive remote family ties evident by sharing of 50 cM IBD to a publicly available data set of less than 400 individuals. We further expose regions where long-range haplotypes are shared significantly more often than elsewhere in the genome, observed across multiple populations, and enriched for common long structural variation. These are inconsistent with recent relatedness and suggest ancient common ancestry, with limited recombination between haplotypes.  相似文献   

8.
The accurate estimation of the probability of identity by descent (IBD) at loci or genome positions of interest is paramount to the genetic study of quantitative and disease resistance traits. We present a Monte Carlo Markov Chain method to compute IBD probabilities between individuals conditional on DNA markers and on pedigree information. The IBDs can be obtained in a completely general pedigree at any genome position of interest, and all marker and pedigree information available is used. The method can be split into two steps at each iteration. First, phases are sampled using current genotypic configurations of relatives and second, crossover events are simulated conditional on phases. Internal track is kept of all founder origins and crossovers such that the IBD probabilities averaged over replicates are rapidly obtained. We illustrate the method with some examples. First, we show that all pedigree information should be used to obtain line origin probabilities in F2 crosses. Second, the distribution of genetic relationships between half and full sibs is analysed in both simulated data and in real data from an F2 cross in pigs.  相似文献   

9.
Colorectal cancer (CRC) occurs with an increased incidence in individuals with chronic inflammatory bowel disease (IBD) of the colon. Recent data suggest that a family history of colorectal cancer is an independent risk factor for CRC in IBD, an observation that implies that genetic factors are relevant to the development of CRC in this context. Among the genetic defects associated with CRC, the APC I1307K mutation has been detected nearly exclusively in individuals of Ashkenazi Jewish (AJ) origin, occurring in 6%-7% of the AJ general population and in 10%-28% of AJ with a either a personal or family history of CRC or adenomatous polyps. These findings, together with the increased incidence of IBD in AJ, prompted the current analysis of the contribution of the APC I1307K variant of CRC in AJ IBD patients. APC I1307K carrier frequencies were determined in 306 AJ individuals affected with IBD and 308 of their unaffected relatives ascertained from a family collection obtained for the identification of IBD susceptibility genes. Prevalence of the I1307K variant was not significantly different among individuals with IBD, Crohn's disease, ulcerative colitis, and unaffected relatives (6.9%, 7.6%, 4.7%, and 6.2%, respectively), and the mutation was detected in only one of five IBD-affected individuals with a diagnosis of CRC. These results reveal that IBD patients of AJ origin carry the APC I1307K variant at the same rate as individuals within the general AJ population. Lack of an increased APC I1307K carrier rate suggests that this mutation does not account for the increased CRC susceptibility associated with IBD.  相似文献   

10.
Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.  相似文献   

11.
Meirmans PG 《Molecular ecology》2012,21(12):2839-2846
The genetic population structure of many species is characterised by a pattern of isolation by distance (IBD): due to limited dispersal, individuals that are geographically close tend to be genetically more similar than individuals that are far apart. Despite the ubiquity of IBD in nature, many commonly used statistical tests are based on a null model that is completely non-spatial, the Island model. Here, I argue that patterns of spatial autocorrelation deriving from IBD present a problem for such tests as it can severely bias their outcome. I use simulated data to illustrate this problem for two widely used types of tests: tests of hierarchical population structure and the detection of loci under selection. My results show that for both types of tests the presence of IBD can indeed lead to a large number of false positives. I therefore argue that all analyses in a study should take the spatial dependence in the data into account, unless it can be shown that there is no spatial autocorrelation in the allele frequency distribution that is under investigation. Thus, it is urgent to develop additional statistical approaches that are based on a spatially explicit null model instead of the non-spatial Island model.  相似文献   

12.
《Genomics》2020,112(1):683-693
BackgroundRecent studies discovered many genetic variants associated with both psychiatric and inflammatory disorders, but the role of genetic factors in the development of psychiatric comorbidity (PC) in inflammatory bowel disease (IBD) is underexplored. Particularly, it has been shown that some of the genetic variants have been linked to the concentrations of circulating cytokines and symptoms of the inflammatory cytokine-associated depression. We analysed genomic features of individuals with IBD by comparing IBD patients with PC with those who have IBD but without PC. We hypothesized that cytokine related signalling pathways may be significantly associated with the psychiatric comorbidity in patients with IBD.MethodsIndividuals enrolled in the Manitoba IBD Cohort Study were separated to two groups accordingly to the presence of PC. A sample set comprising 97 IBD individuals with PC (IBD + PC) and 146 IBD individuals without PC (IBD) was first used to identify copy number variations (CNVs) from genome-wide genetic data using three different detection algorithms. IBD + PC and IBD groups were compared by the number of CNVs overlapping each gene; deletions and duplications were analysed separately. Gene set overrepresentation analysis was then conducted using CNV-overlapped genes and the candidate gene sets of neurological and immunological relevance.ResultsMedium-sized CNV (size between 100 and 500 kilobase pairs)-burden is significantly higher in IBD + PC than IBD groups. Gene-based CNV association analysis did not show significant differences between the two IBD groups. Gene set overrepresentation analysis demonstrated the significant enrichment of gene sets related to cytokine signalling pathways by the genes overlapped by deletions in the IBD individuals with PC.ConclusionOur results confirm the role of cytokine signalling pathways in the development of PC in IBD. Additionally, our results warrant further study with a larger sample size focusing on cytokine SNPs to further understand the relationship between inflammatory and psychiatric disorders.  相似文献   

13.
Although a few hundred single nucleotide polymorphisms (SNPs) suffice to infer close familial relationships, high density genome-wide SNP data make possible the inference of more distant relationships such as 2nd to 9th cousinships. In order to characterize the relationship between genetic similarity and degree of kinship given a timeframe of 100–300 years, we analyzed the sharing of DNA inferred to be identical by descent (IBD) in a subset of individuals from the 23andMe customer database (n = 22,757) and from the Human Genome Diversity Panel (HGDP-CEPH, n = 952). With data from 121 populations, we show that the average amount of DNA shared IBD in most ethnolinguistically-defined populations, for example Native American groups, Finns and Ashkenazi Jews, differs from continentally-defined populations by several orders of magnitude. Via extensive pedigree-based simulations, we determined bounds for predicted degrees of relationship given the amount of genomic IBD sharing in both endogamous and ‘unrelated’ population samples. Using these bounds as a guide, we detected tens of thousands of 2nd to 9th degree cousin pairs within a heterogenous set of 5,000 Europeans. The ubiquity of distant relatives, detected via IBD segments, in both ethnolinguistic populations and in large ‘unrelated’ populations samples has important implications for genetic genealogy, forensics and genotype/phenotype mapping studies.  相似文献   

14.
Mao Y  Xu S 《Heredity》2005,94(3):305-315
Identity-By-Descent (IBD) is a general measurement of the relationship between two groups of genes. If the two groups consist of two homologous genes, one from each individual, the IBD is called the coancestry between the two individuals. Coancestry is an important concept in both population and quantitative genetics. It is the probability that both genes are copies of the same gene in the genealogy. The average coancestry value at a random locus in a population reflects the level of population diversity, effective population size, the level of inbreeding and other attributes. Coancestry is also the building block for the covariance structure used to estimate the additive genetic variance component for a quantitative trait. There are many other types of IBD matrices, depending on the natures of the genes included in each group, and these IBD matrices vary from locus to locus. Molecular markers distributed along the genome provide information that can be used to infer these locus-specific IBD matrices. As a result, we can estimate and test the variance components of a quantitative trait contributed by these loci using the inferred IBD matrices. In this study, we develop the concept of locus-specific epistatic IBD matrices and a Monte Carlo method to infer these IBD matrices. The method is suitable for large pedigrees with arbitrary complexity and various levels of missing marker information. With these locus-specific IBD matrices, we are ready to search for quantitative trait loci along the genome in complicated pedigrees.  相似文献   

15.
Studies of relatedness have been crucial in molecular ecology over the last decades. Good evidence of this is the fact that studies of population structure, evolution of social behaviours, genetic diversity and quantitative genetics all involve relatedness research. The main aim of this article was to review the most common graphical methods used in allele sharing studies for detecting and identifying family relationships. Both IBS‐ and IBD‐based allele sharing studies are considered. Furthermore, we propose two additional graphical methods from the field of compositional data analysis: the ternary diagram and scatterplots of isometric log‐ratios of IBS and IBD probabilities. We illustrate all graphical tools with genetic data from the HGDP‐CEPH diversity panel, using mainly 377 microsatellites genotyped for 25 individuals from the Maya population of this panel. We enhance all graphics with convex hulls obtained by simulation and use these to confirm the documented relationships. The proposed compositional graphics are shown to be useful in relatedness research, as they also single out the most prominent related pairs. The ternary diagram is advocated for its ability to display all three allele sharing probabilities simultaneously. The log‐ratio plots are advocated as an attempt to overcome the problems with the Euclidean distance interpretation in the classical graphics.  相似文献   

16.
Misspecified relationships can have serious consequences for linkage studies, resulting in either reduced power or false-positive evidence for linkage. If some individuals in the pedigree are untyped, then Mendelian errors may not be observed. Previous approaches to detection of misspecified relationships by use of genotype data were developed for sib and half-sib pairs. We extend the likelihood calculations of G?ring and Ott and Boehnke and Cox to more-general relative pairs, for which identity-by-descent (IBD) status is no longer a Markov chain, and we propose a likelihood-ratio test. We also extend the identity-by-state (IBS)-based test of Ehm and Wagner to nonsib relative pairs. The likelihood-ratio test has high power, but its drawbacks include the need to construct and apply a separate Markov chain for each possible alternative relationship and the need for simulation to assess significance. The IBS-based test is simpler but has lower power. We propose two new test statistics-conditional expected IBD (EIBD) and adjusted IBS (AIBS)-designed to retain the simplicity of IBS while increasing power by taking into account chance sharing. In simulations, the power of EIBD is generally close to that of the likelihood-ratio test. The power of AIBS is higher than that of IBS, in all cases considered. We suggest a strategy of initial screening by use of EIBD and AIBS, followed by application of the likelihood-ratio test to only a subset of relative pairs, identified by use of EIBD and AIBS. We apply the methods to a Genetic Analysis Workshop 11 data set from the Collaborative Study on the Genetics of Alcoholism.  相似文献   

17.
One widely used measure of genetic similarity for pairs of relatives is gene identity-by-descent (IBD) sharing. Genes that are copies of a single gene in a common ancestor of the individuals who now carry them are said to be IBD. One obvious extension of the IBD concept is IBD gene(s) shared by more than two individuals. In this paper, I further extend the gene IBD concept to the proportion of genomes shared IBD by every member of a group of relatives. Genome may refer either to the entire autosomal genome or to one or more chromosomal segments or regions with known lengths. Consideration of a genome instead of one or two loci has several advantages. I present a model to describe the crossover process, based on the work of K. P. Donnelly. On the basis of this model, I give a mathematical definition of the proportion of genome shared IBD by relatives, or IBDP for short. Since the distribution of the IBDP is in general very difficult to determine, and since in most applications the mean and variance of the IBDP will suffice, I then provide a method for computing the first two moments of the IBDP. Applications to assessing gene survival, to genetic resemblance between two relatives, and to gene mapping are illustrated with examples. Finally, I discuss the utility of the IBDP in other areas.  相似文献   

18.
Several authors have studied identity by descent (IBD) by way of a continuous recombination process along a chromosome. Despite its potential uses in, for example, gene mapping or delineation of biological relationships there has been no exact algebraic result given for the probability density function of the IBD proportion in any familial relationship. Other authors have derived algebraic approximations in the case of half-sibs by way of the Poisson clumping heuristic and used computational methods to compute the distribution function of the IBD sharing for unilineal relationships. Here we provide a general numerical method for finding the density of IBD sharing that could be applied to any unilineal relationship and more importantly we derive algebraically an expression for the density for a grandparent-grandchild relationship. Initially we assume that recombination events occur at random along a chromosome, then go on to show how the method could be extended to incorporate a form of genetic interference.  相似文献   

19.
Lee SH  Van der Werf JH 《Genetics》2006,174(2):1009-1016
Dominance (intralocus allelic interactions) plays often an important role in quantitative trait variation. However, few studies about dominance in QTL mapping have been reported in outbred animal or human populations. This is because common dominance effects can be predicted mainly for many full sibs, which do not often occur in outbred or natural populations with a general pedigree. Moreover, incomplete genotypes for such a pedigree make it infeasible to estimate dominance relationship coefficients between individuals. In this study, identity-by-descent (IBD) coefficients are estimated on the basis of population-wide linkage disequilibrium (LD), which makes it possible to track dominance relationships between unrelated founders. Therefore, it is possible to use dominance effects in QTL mapping without full sibs. Incomplete genotypes with a complex pedigree and many markers can be efficiently dealt with by a Markov chain Monte Carlo method for estimating IBD and dominance relationship matrices (D(RM)). It is shown by simulation that the use of D(RM) increases the likelihood ratio at the true QTL position and the mapping accuracy and power with complete dominance, overdominance, and recessive inheritance modes when using 200 genotyped and phenotyped individuals.  相似文献   

20.

Background  

Pairs of related individuals are widely used in linkage analysis. Most of the tests for linkage analysis are based on statistics associated with identity by descent (IBD) data. The current biotechnology provides data on very densely packed loci, and therefore, it may provide almost continuous IBD data for pairs of closely related individuals. Therefore, the distribution theory for statistics on continuous IBD data is of interest. In particular, distributional results which allow the evaluation of p-values for relevant tests are of importance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号