首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Most multipoint linkage programs assume linkage equilibrium among the markers being studied. The assumption is appropriate for the study of sparsely spaced markers with intermarker distances exceeding a few centimorgans, because linkage equilibrium is expected over these intervals for almost all populations. However, with recent advancements in high-throughput genotyping technology, much denser markers are available, and linkage disequilibrium (LD) may exist among the markers. Applying linkage analyses that assume linkage equilibrium to dense markers may lead to bias. Here, we demonstrated that, when some or all of the parental genotypes are missing, assuming linkage equilibrium among tightly linked markers where strong LD exists can cause apparent oversharing of multipoint identity by descent (IBD) between sib pairs and false-positive evidence for multipoint model-free linkage analysis of affected sib pair data. LD can also mimic linkage between a disease locus and multiple tightly linked markers, thus causing false-positive evidence of linkage using parametric models, particularly when heterogeneity LOD score approaches are applied. Bias can be eliminated by inclusion of parental genotype data and can be reduced when additional unaffected siblings are included in the analysis.  相似文献   

2.
Dense SNP maps can be highly informative for linkage studies. But when parental genotypes are missing, multipoint linkage scores can be inflated in regions with substantial marker-marker linkage disequilibrium (LD). Such regions were observed in the Affymetrix SNP genotypes for the Genetic Analysis Workshop 14 (GAW14) Collaborative Study on the Genetics of Alcoholism (COGA) dataset, providing an opportunity to test a novel simulation strategy for studying this problem. First, an inheritance vector (with or without linkage present) is simulated for each replicate, i.e., locations of recombinations and transmission of parental chromosomes are determined for each meiosis. Then, two sets of founder haplotypes are superimposed onto the inheritance vector: one set that is inferred from the actual data and which contains the pattern of LD; and one set created by randomly selecting parental alleles based on the known allele frequencies, with no correlation (LD) between markers. Applying this strategy to a map of 176 SNPs (66 Mb of chromosome 7) for 100 replicates of 116 sibling pairs, significant inflation of multipoint linkage scores was observed in regions of high LD when parental genotypes were set to missing, with no linkage present. Similar inflation was observed in analyses of the COGA data for these affected sib pairs with parental genotypes set to missing, but not after reducing the marker map until r2 between any pair of markers was 相似文献   

3.
OBJECTIVES: Describe the inflation in nonparametric multipoint LOD scores due to inter-marker linkage disequilibrium (LD) across many markers with varied allele frequencies. METHOD: Using simulated two-generation families with and without parents, we conducted nonparametric multipoint linkage analysis with 2 to 10 markers with minor allele frequencies (MAF) of 0.5 and 0.1. RESULTS: Misspecification of population haplotype frequencies by assuming linkage equilibrium caused inflated multipoint LOD scores due to inter-marker LD when parental genotypes were not included. Inflation increased as more markers in LD were included and decreased as markers in equilibrium were added. When marker allele frequencies were unequal, the r2 measure of LD was a better predictor of inflation than D'. CONCLUSION: This observation strongly supports the evaluation of LD in multipoint linkage analyses, and further suggests that unaccounted for LD may be suspected when two-point and multipoint linkage analyses show a marked disparity in regions with elevated r2 measures of LD. Given the increasing popularity of high-density genome-wide SNP screens, inter-marker LD should be a concern in future linkage studies.  相似文献   

4.
Fan R  Jung J 《Human heredity》2003,56(4):166-187
This paper proposes variance component models for high resolution joint linkage disequilibrium (LD) and linkage mapping of quantitative trait loci (QTL) based on sibship data; this can include population data if independent individuals are treated as single sibships. One application of these models is late onset complex disease gene mapping, when parental data are not available. The models simultaneously incorporate both LD and linkage information. The LD information is contained in mean coefficients of sibship data. The linkage information is contained in the variance-covariance matrices of trait values for sibships with at least two siblings. We derive formulas for calculating the probability of sharing two trait alleles identical by descent (IBD) for sibpairs in interval mapping of QTL; this is the coefficient of dominant variance of the trait covariance of sibpairs on major QTL. To investigate the performance of the formulas, we calculate the numerical values via the formulas and get satisfactory approximations. We compare the power and sample sizes for both LD and linkage mapping. By simulation and theoretical analysis, we compare the results with those of Fulker and Abecasis "AbAw" approach. It is well known that the resolution of linkage analysis can be low for complex disease gene mapping. LD mapping, on the other hand, can increase mapping precision and is useful in high resolution mapping. Linkage analysis is less sensitive to population subdivisions and admixtures. The level of LD is sensitive to population stratification which may easily lead to spurious association. Performing a joint analysis of LD and linkage mapping can help to overcome the limits of both approaches. Moreover, the advantages of the two complementary strategies can be utilized maximally. In practice, linkage analysis may be performed using pedigree data to identify suggestive linkage between markers and trait loci based on a sparse marker map. In the presence of linkage, joint LD and linkage mapping can be carried out to do fine gene mapping based on a dense genetic map using both pedigree and population data. Population and pedigree data of any type can be combined to perform a joint analysis of high resolution LD and linkage mapping of QTL by generalizing the method.  相似文献   

5.
Single-nucleotide polymorphisms (SNPs) are rapidly replacing microsatellites as the markers of choice for genetic linkage studies and many other studies of human pedigrees. Here, we describe an efficient approach for modeling linkage disequilibrium (LD) between markers during multipoint analysis of human pedigrees. Using a gene-counting algorithm suitable for pedigree data, our approach enables rapid estimation of allele and haplotype frequencies within clusters of tightly linked markers. In addition, with the use of a hidden Markov model, our approach allows for multipoint pedigree analysis with large numbers of SNP markers organized into clusters of markers in LD. Simulation results show that our approach resolves previously described biases in multipoint linkage analysis with SNPs that are in LD. An updated version of the freely available Merlin software package uses the approach described here to perform many common pedigree analyses, including haplotyping and haplotype frequency estimation, parametric and nonparametric multipoint linkage analysis of discrete traits, variance-components and regression-based analysis of quantitative traits, calculation of identity-by-descent or kinship coefficients, and case selection for follow-up association studies. To illustrate the possibilities, we examine a data set that provides evidence of linkage of psoriasis to chromosome 17.  相似文献   

6.
Li B  Leal SM 《Human heredity》2008,65(4):199-208
Missing genotype data can increase false-positive evidence for linkage when either parametric or nonparametric analysis is carried out ignoring intermarker linkage disequilibrium (LD). Previously it was demonstrated by Huang et al. [1] that no bias occurs in this situation for affected sib-pairs with unrelated parents when either both parents are genotyped or genotype data is available for two additional unaffected siblings when parental genotypes are missing. However, this is not the case for autosomal recessive consanguineous pedigrees, where missing genotype data for any pedigree member within a consanguinity loop can increase false-positive evidence of linkage. False-positive evidence for linkage is further increased when cryptic consanguinity is present. The amount of false-positive evidence for linkage, and which family members aid in its reduction, is highly dependent on which family members are genotyped. When parental genotype data is available, the false-positive evidence for linkage is usually not as strong as when parental genotype data is unavailable. For a pedigree with an affected proband whose first-cousin parents have been genotyped, further reduction in the false-positive evidence of linkage can be obtained by including genotype data from additional affected siblings of the proband or genotype data from the proband's sibling-grandparents. For the situation, when parental genotypes are unavailable, false-positive evidence for linkage can be reduced by including genotype data from either unaffected siblings of the proband or the proband's married-in-grandparents in the analysis.  相似文献   

7.
Linkage disequilibrium (LD) in crops, established by domestication and early breeding, can be a valuable basis for mapping the genome. We undertook an assessment of LD in sugarcane (Saccharum spp), characterized by one of the most complex crop genomes, with its high ploidy level (>or=8) and chromosome number (>100) as well as its interspecific origin. Using AFLP markers, we surveyed 1,537 polymorphisms among 72 modern sugarcane cultivars. We exploited information from available genetic maps to determine a relevant statistical threshold that discriminates marker associations due to linkage from other associations. LD is very common among closely linked markers and steadily decreases within a 0-30 cM window. Many instances of linked markers cannot be recognized due to the confounding effect of polyploidy. However, LD within a sample of cultivars appears as efficient as linkage analysis within a controlled progeny in terms of assigning markers to cosegregation groups. Saturating the genome coverage remains a challenge, but applying LD-based mapping within breeding programs will considerably speed up the localization of genes controlling important traits by making use of phenotypic information produced in the course of selection.  相似文献   

8.
Lee SH  Van der Werf JH 《Genetics》2006,173(4):2329-2337
Within a small region (e.g., <10 cM), there can be multiple quantitative trait loci (QTL) underlying phenotypes of a trait. Simultaneous fine mapping of closely linked QTL needs an efficient tool to remove confounded shade effects among QTL within such a small region. We propose a variance component method using combined linkage disequilibrium (LD) and linkage information and a reversible jump Markov chain Monte Carlo (MCMC) sampling for model selection. QTL identity-by-descent (IBD) coefficients between individuals are estimated by a hybrid MCMC combining the random walk and the meiosis Gibbs sampler. These coefficients are used in a mixed linear model and an empirical Bayesian procedure combines residual maximum likelihood (REML) to estimate QTL effects and a reversible jump MCMC that samples the number of QTL and the posterior QTL intensities across the tested region. Note that two MCMC processes are used, i.e., an (internal) MCMC for IBD estimation and an (external) MCMC for model selection. In a simulation study, the use of the multiple-QTL model clearly removes the shade effects between three closely linked QTL located at 1.125, 3.875, and 7.875 cM across the region of 10 cM, using 40 markers at 0.25-cM intervals. It is shown that the use of combined LD and linkage information gives much more useful information compared to using linkage information alone for both single- and multiple-QTL analyses. When using a lower marker density (11 markers at 1-cM intervals), the signal of the second QTL can disappear. Extreme values of past effective size (resulting in extreme levels of LD) decrease the mapping accuracy.  相似文献   

9.
In studies of complex diseases, a common paradigm is to conduct association analysis at markers in regions identified by linkage analysis, to attempt to narrow the region of interest. Family-based tests for association based on parental transmissions to affected offspring are often used in fine-mapping studies. However, for diseases with late onset, parental genotypes are often missing. Without parental genotypes, family-based tests either compare allele frequencies in affected individuals with those in their unaffected siblings or use siblings to infer missing parental genotypes. An example of the latter approach is the score test implemented in the computer program TRANSMIT. The inference of missing parental genotypes in TRANSMIT assumes that transmissions from parents to affected siblings are independent, which is appropriate when there is no linkage. However, using computer simulations, we show that, when the marker and disease locus are linked and the data set consists of families with multiple affected siblings, this assumption leads to a bias in the score statistic under the null hypothesis of no association between the marker and disease alleles. This bias leads to an inflated type I error rate for the score test in regions of linkage. We present a novel test for association in the presence of linkage (APL) that correctly infers missing parental genotypes in regions of linkage by estimating identity-by-descent parameters, to adjust for correlation between parental transmissions to affected siblings. In simulated data, we demonstrate the validity of the APL test under the null hypothesis of no association and show that the test can be more powerful than the pedigree disequilibrium test and family-based association test. As an example, we compare the performance of the tests in a candidate-gene study in families with Parkinson disease.  相似文献   

10.
Selective genotyping of extreme progeny is a powerful method to increase the information content per individual when looking for quantitative trait loci (QTLs) using molecular markers for which a map is known. However, if marker information from the selected individuals is used to construct the map of the markers, this can lead to distorted segregation of the markers that in turn can lead to the estimation of a spurious linkage between independently inherited markers. The mistaken estimation of linkage between independently inherited markers will occur when there are two (or more) independently inherited QTLs linked to two (or more) markers and the same individuals are used to estimate the map of the markers and to do the QTL estimation. The incorrect linkage occurs because in selecting individuals from the tails of the phenotypic distribution we will also be selecting certain combinations of the markers instead of obtaining a random sample of the true distribution of the marker genotypes. Analytical results are outlined and the analyses of a simulated data set illustrate the problems that could arise when data from individuals chosen by selective genotyping are incorrectly employed to construct a marker map. A strategy is proposed to remedy this problem.  相似文献   

11.
Genotype data from the Illumina Linkage III SNP panel (n = 4,720 SNPs) and the Affymetrix 10 k mapping array (n = 11,120 SNPs) were used to test the effects of linkage disequilibrium (LD) between SNPs in a linkage analysis in the Collaborative Study on the Genetics of Alcoholism pedigree collection (143 pedigrees; 1,614 individuals). The average r2 between adjacent markers across the genetic map was 0.099 +/- 0.003 in the Illumina III panel and 0.17 +/- 0.003 in the Affymetrix 10 k array. In order to determine the effect of LD between marker loci in a nonparametric multipoint linkage analysis, markers in strong LD with another marker (r2 > 0.40) were removed (n = 471 loci in the Illumina panel; n = 1,804 loci in the Affymetrix panel) and the linkage analysis results were compared to the results using the entire marker sets. In all analyses using the ALDX1 phenotype, 8 linkage regions on 5 chromosomes (2, 7, 10, 11, X) were detected (peak markers p < 0.01), and the Illumina panel detected an additional region on chromosome 6. Analysis of the same pedigree set and ALDX1 phenotype using short tandem repeat markers (STRs) resulted in 3 linkage regions on 3 chromosomes (peak markers p < 0.01). These results suggest that in this pedigree set, LD between loci with spacing similar to the SNP panels tested may not significantly affect the overall detection of linkage regions in a genome scan. Moreover, since the data quality and information content are greatly improved in the SNP panels over STR genotyping methods, new linkage regions may be identified due to higher information content and data quality in a dense SNP linkage panel.  相似文献   

12.
OBJECTIVES: Linkage disequilibrium (LD) between closely spaced SNPs can be accommodated in linkage analysis by specifying the multi-SNP haplotype frequencies, if known. Phased haplotypes in candidate regions can provide gold standard haplotype frequency estimates, and may be of inherent interest as markers. We evaluated the effects of different methods of haplotype frequency estimation, and the use of marker phase information, on linkage analysis of a multi-SNP cluster in a candidate region for Alzheimer's disease (AD). METHODS: We performed parametric linkage analysis of a five-SNP cluster in extended pedigrees to compare the use of: (1) haplotype frequencies estimated by molecular phase determination, maximum likelihood estimation, or by assuming linkage equilibrium (LE); (2) AD families or controls as the frequency source; and (3) unphased or molecularly phased SNP data. RESULTS: There was moderate to strong pairwise LD among the five SNPs. Falsely assuming LE substantially inflated the LOD score, but the method of haplotype frequency estimation and particular sample used made little difference provided that LD was accommodated. Use of phased haplotypes produced a modest increase in the LOD score over unphased SNPs. CONCLUSIONS: Ignoring LD between markers can lead to substantially inflated evidence for linkage in LOD score analysis of extended pedigrees with missing data. Use of marker phase information in linkage analysis may be important in disease studies where the costs of family recruitment and phenotyping greatly exceed the costs of phase determination.  相似文献   

13.
Effectiveness of marker-assisted selection (MAS) and quantitative trait locus (QTL) mapping using population-wide linkage disequilibrium (LD) between markers and QTLs depends on the extent of LD and how it declines with distance between markers and QTLs in a population. Marker-QTL LD can be predicted from LD between markers. Our previous work evaluated LD measures between multi-allelic markers as predictors of usable LD of multi-allelic markers with QTLs. Since single nucleotide polymorphisms (SNPs) are the current marker of choice for high-density genotyping and LD-mapping of QTLs, the objective of this study was to use LD between multi-allelic markers to predict LD among biallelic SNPs or between SNPs and QTLs. Observable LD between multi-allelic markers was evaluated using nine measures. These included two pooled and standardized measures of LD between pairs of alleles at two markers based on Lewontin's LD measure, two pooled measures of squared correlations between alleles, one standardized measure using Hardy-Weinberg heterozygosities, and four measures based on the chi-square statistic for testing for association between alleles at two loci. The standardized chi-square measure that best predicted usable LD between multi-allelic markers and QTLs, based on our previous work, overestimated usable SNP-SNP or SNP-QTL LD. Instead, three other measures were found to be good predictors of usable SNP-SNP or SNP-QTL LD when LD is generated by drift. Therefore, the LD measure between multi-allelic markers that is best for predicting usable LD in a population depends on the type of markers (i.e. multi-allelic or biallelic) that will eventually be used for QTL mapping or MAS.  相似文献   

14.
Li Y  Li Y  Wu S  Han K  Wang Z  Hou W  Zeng Y  Wu R 《Genetics》2007,176(3):1811-1821
Analysis of population structure and organization with DNA-based markers can provide important information regarding the history and evolution of a species. Linkage disequilibrium (LD) analysis based on allelic associations between different loci is emerging as a viable tool to unravel the genetic basis of population differentiation. In this article, we derive the EM algorithm to obtain the maximum-likelihood estimates of the linkage disequilibria between dominant markers, to study the patterns of genetic diversity for a diploid species. The algorithm was expanded to estimate and test linkage disequilibria of different orders among three dominant markers and can be technically extended to manipulate an arbitrary number of dominant markers. The feasibility of the proposed algorithm is validated by an example of population genetic studies of hickory trees, native to southeastern China, using dominant random amplified polymorphic DNA markers. Extensive simulation studies were performed to investigate the statistical properties of this algorithm. The precision of the estimates of linkage disequilibrium between dominant markers was compared with that between codominant markers. Results from simulation studies suggest that three-locus LD analysis displays increased power of LD detection relative to two-locus LD analysis. This algorithm is useful for studying the pattern and amount of genetic variation within and among populations.  相似文献   

15.
Genomewide linkage studies are tending toward the use of single-nucleotide polymorphisms (SNPs) as the markers of choice. However, linkage disequilibrium (LD) between tightly linked SNPs violates the fundamental assumption of linkage equilibrium (LE) between markers that underlies most multipoint calculation algorithms currently available, and this leads to inflated affected-relative-pair allele-sharing statistics when founders' multilocus genotypes are unknown. In this study, we investigate the impact that the degree of LD, marker allele frequency, and association type have on estimating the probabilities of sharing alleles identical by descent in multipoint calculations and hence on type I error rates of different sib-pair linkage approaches that assume LE. We show that marker-marker LD does not inflate type I error rates of affected sib pair (ASP) statistics in the whole parameter space, and that, in any case, discordant sib pairs (DSPs) can be used to control for marker-marker LD in ASPs. We advocate the ASP/DSP design with appropriate sib-pair statistics that test the difference in allele sharing between ASPs and DSPs.  相似文献   

16.
The level of population structure and the extent of linkage disequilibrium (LD) can have large impacts on the power, resolution, and design of genome-wide association studies (GWAS) in plants. Until recently, the topics of LD and population structure have not been explored in oat due to the lack of a high-throughput, high-density marker system. The objectives of this research were to survey the level of population structure and the extent of LD in oat germplasm and determine their implications for GWAS. In total, 1,205 lines and 402 diversity array technology (DArT) markers were used to explore population structure. Principal component analysis and model-based cluster analysis of these data indicated that, for the lines used in this study, relatively weak population structure exists. To explore LD decay, map distances of 2,225 linked DArT marker pairs were compared with LD (estimated as r 2). Results showed that LD between linked markers decayed rapidly to r 2 = 0.2 for marker pairs with a map distance of 1.0 centi-Morgan (cM). For GWAS, we suggest a minimum of one marker every cM, but higher densities of markers should increase marker-QTL association and therefore detection power. Additionally, it was found that LD was relatively consistent across the majority of germplasm clusters. These findings suggest that GWAS in oat can include germplasm with diverse origins and backgrounds. The results from this research demonstrate the feasibility of GWAS and related analyses in oat.  相似文献   

17.
Current genome-wide linkage-mapping single-nucleotide polymorphism (SNP) panels with densities of 0.3 cM are likely to have increased intermarker linkage disequilibrium (LD) compared to 5-cM microsatellite panels. The resulting difference in haplotype frequencies versus that predicted may affect multipoint linkage analysis with ungenotyped founders; a common haplotype may be assumed to be rare, leading to inflation of identical-by-descent (IBD) allele-sharing estimates and evidence for linkage. Using data simulated for the Genetic Analysis Workshop 14, we assessed bias in allele-sharing measures and nonparametric linkage (NPL all) and Kong and Cox LOD (KC-LOD) scores in a targeted analysis of regions with and without LD and with and without genes. Using over 100 replicates, we found that if founders were not genotyped, multipoint IBD estimates and delta parameters were modestly inflated and NPL all and KC-LOD scores were biased upwards in the region with LD and no gene; rather than centering on the null, the mean NPL all and KC-LOD scores were 0.51 +/- 0.91 and 0.19 +/- 0.38, respectively. Reduction of LD by dropping markers reduced this upward bias. These trends were not seen in the non-LD region with no gene. In regions with genes (with and without LD), a slight loss in power with dropping markers was suggested. These results indicate that LD should be considered in dense scans; removal of markers in LD may reduce false-positive results although information may also be lost. Methods to address LD in a high-throughput manner are needed for efficient, robust genomic scans with dense SNPs.  相似文献   

18.
Substantial increases of linkage disequilibrium (LD) both in magnitude and in range have been observed in recently admixed populations such as African-American (AfA). On the other hand, it has also been shown that LD in AfAs was very similar to that of African. In this study, we attempted to resolve these contradicting observations by conducting a systematic examination of the LD structure in AfAs by genotyping a sample of AfA individuals at 24,341 single nucleotide polymorphisms (SNPs) spanning almost the entire chromosome 21, with an average density of 1.5 kb/SNP. The overall LD in AfAs is similar to that in African populations and much less than that in European populations. Even when the ancestry-informative markers (AIMs) were used, extended LD in AfA was found to be limited to certain magnitude range (0.2 < or = r(2) < or = 0.8) and certain distance range, that is, between-marker distance more than 200 kb. Furthermore, the inclusion of AfA individuals with predominant African ancestry was found to reduce the overall magnitude of LD. Elevation of LD in the AfA population, compared with its parental populations, can only be observed at the markers with large allele frequency differences between 2 parental populations at limited scenario. AfA individuals of wholly African ancestry contribute little to the extended LD in the AfA population, and further genotyping or association analysis conducted using only admixed individuals may lead to higher statistical power and possibly reduced cost.  相似文献   

19.
Characterizing the spatial patterns of genetic diversity in human populations has a wide range of applications, from detecting genetic mutations associated with disease to inferring human history. Current approaches, including the widely used principal-component analysis, are not suited for the analysis of linked markers, and local and long-range linkage disequilibrium (LD) can dramatically reduce the accuracy of spatial localization when unaccounted for. To overcome this, we have introduced an approach that performs spatial localization of individuals on the basis of their genetic data and explicitly models LD among markers by using a multivariate normal distribution. By leveraging external reference panels, we derive closed-form solutions to the optimization procedure to achieve a computationally efficient method that can handle large data sets. We validate the method on empirical data from a large sample of European individuals from the POPRES data set, as well as on a large sample of individuals of Spanish ancestry. First, we show that by modeling LD, we achieve accuracy superior to that of existing methods. Importantly, whereas other methods show decreased performance when dense marker panels are used in the inference, our approach improves in accuracy as more markers become available. Second, we show that accurate localization of genetic data can be achieved with only a part of the genome, and this could potentially enable the spatial localization of admixed samples that have a fraction of their genome originating from a given continent. Finally, we demonstrate that our approach is resistant to distortions resulting from long-range LD regions; such distortions can dramatically bias the results when unaccounted for.  相似文献   

20.
We have compared the efficiency of the lod score test which assumes heterogeneity (lod2) to the standard lod score test which assumes homogeneity (lod1) when three-point linkage analysis is used in successive map intervals. If it is assumed that a gene located midway between two linked marker loci is responsible for a proportion of disease cases, then the lod1 test loses power relative to the lod2 test, as the proportion of linked families decreases, as the flanking markers are more closely linked, and as more map intervals are tested. Moreover, when multipoint analysis is used, linkage for a disease gene is more likely to be incorrectly excluded from a complete and dense linkage map if true genetic heterogeneity is ignored. We thus conclude that, in general, the lod2 linkage test is more efficient for detecting a true linkage when a complete genetic marker map is screened for a heterogeneous disorder.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号