首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Genotype data from the Illumina Linkage III SNP panel (n = 4,720 SNPs) and the Affymetrix 10 k mapping array (n = 11,120 SNPs) were used to test the effects of linkage disequilibrium (LD) between SNPs in a linkage analysis in the Collaborative Study on the Genetics of Alcoholism pedigree collection (143 pedigrees; 1,614 individuals). The average r2 between adjacent markers across the genetic map was 0.099 +/- 0.003 in the Illumina III panel and 0.17 +/- 0.003 in the Affymetrix 10 k array. In order to determine the effect of LD between marker loci in a nonparametric multipoint linkage analysis, markers in strong LD with another marker (r2 > 0.40) were removed (n = 471 loci in the Illumina panel; n = 1,804 loci in the Affymetrix panel) and the linkage analysis results were compared to the results using the entire marker sets. In all analyses using the ALDX1 phenotype, 8 linkage regions on 5 chromosomes (2, 7, 10, 11, X) were detected (peak markers p < 0.01), and the Illumina panel detected an additional region on chromosome 6. Analysis of the same pedigree set and ALDX1 phenotype using short tandem repeat markers (STRs) resulted in 3 linkage regions on 3 chromosomes (peak markers p < 0.01). These results suggest that in this pedigree set, LD between loci with spacing similar to the SNP panels tested may not significantly affect the overall detection of linkage regions in a genome scan. Moreover, since the data quality and information content are greatly improved in the SNP panels over STR genotyping methods, new linkage regions may be identified due to higher information content and data quality in a dense SNP linkage panel.  相似文献   

2.
Dense SNP maps can be highly informative for linkage studies. But when parental genotypes are missing, multipoint linkage scores can be inflated in regions with substantial marker-marker linkage disequilibrium (LD). Such regions were observed in the Affymetrix SNP genotypes for the Genetic Analysis Workshop 14 (GAW14) Collaborative Study on the Genetics of Alcoholism (COGA) dataset, providing an opportunity to test a novel simulation strategy for studying this problem. First, an inheritance vector (with or without linkage present) is simulated for each replicate, i.e., locations of recombinations and transmission of parental chromosomes are determined for each meiosis. Then, two sets of founder haplotypes are superimposed onto the inheritance vector: one set that is inferred from the actual data and which contains the pattern of LD; and one set created by randomly selecting parental alleles based on the known allele frequencies, with no correlation (LD) between markers. Applying this strategy to a map of 176 SNPs (66 Mb of chromosome 7) for 100 replicates of 116 sibling pairs, significant inflation of multipoint linkage scores was observed in regions of high LD when parental genotypes were set to missing, with no linkage present. Similar inflation was observed in analyses of the COGA data for these affected sib pairs with parental genotypes set to missing, but not after reducing the marker map until r2 between any pair of markers was 相似文献   

3.
Both theoretical and applied studies have proven that the utility of single nucleotide polymorphism (SNP) markers in linkage analysis is more powerful and cost-effective than current microsatellite marker assays. Here we performed a whole-genome scan on 115 White, non-Hispanic families segregating for alcohol dependence, using one 10.3-cM microsatellite marker set and two SNP data sets (0.33-cM, 0.78-cM spacing). Two definitions of alcohol dependence (ALDX1 and ALDX2) were used. Our multipoint nonparametric linkage analysis found alcoholism was nominal linked to 12 genomic regions. The linkage peaks obtained by using the microsatellite marker set and the two SNP sets had a high degree of correspondence in general, but the microsatellite marker set was insufficient to detect some nominal linkage peaks. The presence of linkage disequilibrium between markers did not significantly affect the results. Across the entire genome, SNP datasets had a much higher average linkage information content (0.33 cM: 0.93, 0.78 cM: 0.91) than did microsatellite marker set (0.57). The linkage peaks obtained through two SNP datasets were very similar with some minor differences. We conclude that genome-wide linkage analysis by using approximately 5,000 SNP markers evenly distributed across the human genome is sufficient and might be more powerful than current 10-cM microsatellite marker assays.  相似文献   

4.
Once genetic linkage has been identified for a complex disease, the next step is often association analysis, in which single-nucleotide polymorphisms (SNPs) within the linkage region are genotyped and tested for association with the disease. If a SNP shows evidence of association, it is useful to know whether the linkage result can be explained, in part or in full, by the candidate SNP. We propose a novel approach that quantifies the degree of linkage disequilibrium (LD) between the candidate SNP and the putative disease locus through joint modeling of linkage and association. We describe a simple likelihood of the marker data conditional on the trait data for a sample of affected sib pairs, with disease penetrances and disease-SNP haplotype frequencies as parameters. We estimate model parameters by maximum likelihood and propose two likelihood-ratio tests to characterize the relationship of the candidate SNP and the disease locus. The first test assesses whether the candidate SNP and the disease locus are in linkage equilibrium so that the SNP plays no causal role in the linkage signal. The second test assesses whether the candidate SNP and the disease locus are in complete LD so that the SNP or a marker in complete LD with it may account fully for the linkage signal. Our method also yields a genetic model that includes parameter estimates for disease-SNP haplotype frequencies and the degree of disease-SNP LD. Our method provides a new tool for detecting linkage and association and can be extended to study designs that include unaffected family members.  相似文献   

5.
The genotyping of closely spaced single-nucleotide polymorphism (SNP) markers frequently yields highly correlated data, owing to extensive linkage disequilibrium (LD) between markers. The extent of LD varies widely across the genome and drives the number of frequent haplotypes observed in small regions. Several studies have illustrated the possibility that LD or haplotype data could be used to select a subset of SNPs that optimize the information retained in a genomic region while reducing the genotyping effort and simplifying the analysis. We propose a method based on the spectral decomposition of the matrices of pairwise LD between markers, and we select markers on the basis of their contributions to the total genetic variation. We also modify Clayton's "haplotype tagging SNP" selection method, which utilizes haplotype information. For both methods, we propose sliding window-based algorithms that allow the methods to be applied to large chromosomal regions. Our procedures require genotype information about a small number of individuals for an initial set of SNPs and selection of an optimum subset of SNPs that could be efficiently genotyped on larger numbers of samples while retaining most of the genetic variation in samples. We identify suitable parameter combinations for the procedures, and we show that a sample size of 50-100 individuals achieves consistent results in studies of simulated data sets in linkage equilibrium and LD. When applied to experimental data sets, both procedures were similarly effective at reducing the genotyping requirement while maintaining the genetic information content throughout the regions. We also show that haplotype-association results that Hosking et al. obtained near CYP2D6 were almost identical before and after marker selection.  相似文献   

6.
Population-based mapping approaches are attractive for tracing the genetic background to phenotypic traits in wild species, given that it is often difficult to gather extensive and well-defined pedigrees needed for quantitative trait locus analysis. However, the feasibility of association or hitch-hiking mapping is dependent on the degree of linkage disequilibrium (LD) in the population, on which there is yet limited information for wild species. Here we use single nucleotide polymorphism (SNP) markers from 23 genes in a recently established linkage map of the Z chromosome of the collared flycatcher, to study the extent of LD in a natural bird population. In most but not all cases we find SNPs within the same intron (less than 500 bp) to be in perfect LD. However, LD then decays to background level at a distance 1cM or 400-500 kb. Although LD seems more extensive than in other species, if the observed pattern is representative for other regions of the genome and turns out to be a general feature of natural bird populations, dense marker maps might be needed for genome scans aimed at identifying association between marker and trait loci.  相似文献   

7.
Genome-wide linkage analysis using microsatellite markers has been successful in the identification of numerous Mendelian and complex disease loci. The recent availability of high-density single-nucleotide polymorphism (SNP) maps provides a potentially more powerful option. Using the simulated and Collaborative Study on the Genetics of Alcoholism (COGA) datasets from the Genetics Analysis Workshop 14 (GAW14), we examined how altering the density of SNP marker sets impacted the overall information content, the power to detect trait loci, and the number of false positive results. For the simulated data we used SNP maps with density of 0.3 cM, 1 cM, 2 cM, and 3 cM. For the COGA data we combined the marker sets from Illumina and Affymetrix to create a map with average density of 0.25 cM and then, using a sub-sample of these markers, created maps with density of 0.3 cM, 0.6 cM, 1 cM, 2 cM, and 3 cM. For each marker set, multipoint linkage analysis using MERLIN was performed for both dominant and recessive traits derived from marker loci. Our results showed that information content increased with increased map density. For the homogeneous, completely penetrant traits we created, there was only a modest difference in ability to detect trait loci. Additionally, as map density increased there was only a slight increase in the number of false positive results when there was linkage disequilibrium (LD) between markers. The presence of LD between markers may have led to an increased number of false positive regions but no clear relationship between regions of high LD and locations of false positive linkage signals was observed.  相似文献   

8.
Linkage disequilibrium (LD) refers to the correlation among neighboring alleles, reflecting non-random patterns of association between alleles at (nearby) loci. A better understanding of LD in the porcine genome is of direct relevance for identification of genes and mutations with a certain effect on the traits of interest. Here, 215 SNPs in seven genomic regions were genotyped in individuals of three breeds. Pairwise linkage disequilibrium was calculated for all marker pairs. To estimate the extent of LD, all pairwise LD values were plotted against the distance between the markers. Based on SNP markers in four genomic regions analyzed in three panels from populations of Large White, Dutch Landrace, and Meishan origin, useful LD is estimated to extend for approximately 40 to 60 kb in the porcine genome.  相似文献   

9.
Ulgen A  Li W 《BMC genetics》2005,6(Z1):S13
We compared linkage analysis results for an alcoholism trait, ALDX1 (DSM-III-R and Feigner criteria) using a nonparametric linkage analysis method, which takes into account allele sharing among several affected persons, for both microsatellite and single-nucleotide polymorphism (SNP) markers (Affymetrix and Illumina) in the Collaborative Study on the Genetics of Alcoholism (COGA) dataset provided to participants at the Genetic Analysis Workshop 14 (GAW14). The two sets of linkage results from the dense Affymetrix SNP markers and less densely spaced Illumina SNP markers are very similar. The linkage analysis results from microsatellite and SNP markers are generally similar, but the match is not perfect. Strong linkage peaks were found on chromosome 7 in three sets of linkage analyses using both SNP and microsatellite marker data. We also observed that for SNP markers, using the given genetic map and using the map by converting 1 megabase pair (1 Mb) to 1 centimorgan (cM), did not change the linkage results. We recommend the use of the 1 Mb-to-1 cM converted map in a first round of linkage analysis with SNP markers in which map integration is an issue.  相似文献   

10.
Farmed Atlantic salmon (Salmo salar) is a globally important production species, including in Australia where breeding and selection has been in progress since the 1960s. The recent development of SNP genotyping platforms means genome‐wide association and genomic prediction can now be implemented to speed genetic gain. As a precursor, this study collected genotypes at 218 132 SNPs in 777 fish from a Tasmanian breeding population to assess levels of genetic diversity, the strength of linkage disequilibrium (LD) and imputation accuracy. Genetic diversity in Tasmanian Atlantic salmon was lower than observed within European populations when compared using four diversity metrics. The distribution of allele frequencies also showed a clear difference, with the Tasmanian animals carrying an excess of low minor allele frequency variants. The strength of observed LD was high at short distances (<25 kb) and remained above background for marker pairs separated by large chromosomal distances (hundreds of kb), in sharp contrast to the European Atlantic salmon tested. Genotypes were used to evaluate the accuracy of imputation from low density (0.5 to 5 K) up to increased density SNP sets (78 K). This revealed high imputation accuracies (0.89–0.97), suggesting that the use of low density SNP sets will be a successful approach for genomic prediction in this population. The long‐range LD, comparatively low genetic diversity and high imputation accuracy in Tasmanian salmon is consistent with known aspects of their population history, which involved a small founding population and an absence of subsequent introgression. The findings of this study represent an important first step towards the design of methods to apply genomics in this economically important population.  相似文献   

11.
OBJECTIVES: Linkage disequilibrium (LD) between closely spaced SNPs can be accommodated in linkage analysis by specifying the multi-SNP haplotype frequencies, if known. Phased haplotypes in candidate regions can provide gold standard haplotype frequency estimates, and may be of inherent interest as markers. We evaluated the effects of different methods of haplotype frequency estimation, and the use of marker phase information, on linkage analysis of a multi-SNP cluster in a candidate region for Alzheimer's disease (AD). METHODS: We performed parametric linkage analysis of a five-SNP cluster in extended pedigrees to compare the use of: (1) haplotype frequencies estimated by molecular phase determination, maximum likelihood estimation, or by assuming linkage equilibrium (LE); (2) AD families or controls as the frequency source; and (3) unphased or molecularly phased SNP data. RESULTS: There was moderate to strong pairwise LD among the five SNPs. Falsely assuming LE substantially inflated the LOD score, but the method of haplotype frequency estimation and particular sample used made little difference provided that LD was accommodated. Use of phased haplotypes produced a modest increase in the LOD score over unphased SNPs. CONCLUSIONS: Ignoring LD between markers can lead to substantially inflated evidence for linkage in LOD score analysis of extended pedigrees with missing data. Use of marker phase information in linkage analysis may be important in disease studies where the costs of family recruitment and phenotyping greatly exceed the costs of phase determination.  相似文献   

12.
We constructed an integrated DNA marker linkage map of eggplant (Solanum melongena L.) using DNA marker segregation data sets obtained from two independent intraspecific F(2) populations. The linkage map consisted of 12 linkage groups and encompassed 1,285.5 cM in total. We mapped 952 DNA markers, including 313 genomic SSR markers developed by random sequencing of simple sequence repeat (SSR)-enriched genomic libraries, and 623 single-nucleotide polymorphisms (SNP) and insertion/deletion polymorphisms (InDels) found in eggplant-expressed sequence tags (ESTs) and related genomic sequences [introns and untranslated regions (UTRs)]. Because of their co-dominant inheritance and their highly polymorphic and multi-allelic nature, the SSR markers may be more versatile than the SNP and InDel markers for map-based genetic analysis of any traits of interest using segregating populations derived from any intraspecific crosses of practical breeding materials. However, we found that the distribution of microsatellites in the genome was biased to some extent, and therefore a considerable part of the eggplant genome was first detected when gene-derived SNP and InDel markers were mapped. Of the 623 SNP and InDel markers mapped onto the eggplant integrated map, 469 were derived from eggplant unigenes contained within Solanum orthologous (SOL) gene sets (i.e., sets of orthologous unigenes from eggplant, tomato, and potato). Out of the 469 markers, 326 could also be mapped onto the tomato map. These common markers will be informative landmarks for the transfer of tomato's more saturated genomic information to eggplant and will also provide comparative information on the genome organization of the two solanaceous species. The data are available from the DNA marker database of vegetables, VegMarks (http://vegmarks.nivot.affrc.go.jp).  相似文献   

13.
The pattern of linkage disequilibrium in German Holstein cattle   总被引:1,自引:0,他引:1  
This study presents a second generation of linkage disequilibrium (LD) map statistics for the whole genome of the Holstein–Friesian population, which has a four times higher resolution compared with that of the maps available so far. We used DNA samples of 810 German Holstein–Friesian cattle genotyped by the Illumina Bovine SNP50K BeadChip to analyse LD structure. A panel of 40 854 (75.6%) markers was included in the final analysis. The pairwise r2 statistic of SNPs up to 5 Mb apart across the genome was estimated. A mean value of r2 = 0.30 ± 0.32 was observed in pairwise distances of <25 kb and it dropped to 0.20 ± 0.24 at 50–75 kb, which is nearly the average inter‐marker space in this study. The proportion of SNPs in useful LD (r20.25) was 26% for the distance of 50 and 75 kb between SNPs. We found a lower level of LD for SNP pairs at the distance ≤100 kb than previously thought. Analysis revealed 712 haplo‐blocks spanning 4.7% of the genome and containing 8.0% of all SNPs. Mean and median block length were estimated as 164 ± 117 kb and 144 kb respectively. Allele frequencies of the SNPs have a considerable and systematic impact on the estimate of r2. It is shown that minimizing the allele frequency difference between SNPs reduces the influence of frequency on r2 estimates. Analysis of past effective population size based on the direct estimates of recombination rates from SNP data showed a decline in effective population size to Ne = 103 up to ~4 generations ago. Systematic effects of marker density and effective population size on observed LD and haplotype structure are discussed.  相似文献   

14.
Quantitative trait loci (QTL) affecting the phenotype of interest can be detected using linkage analysis (LA), linkage disequilibrium (LD) mapping or a combination of both (LDLA). The LA approach uses information from recombination events within the observed pedigree and LD mapping from the historical recombinations within the unobserved pedigree. We propose the Bayesian variable selection approach for combined LDLA analysis for single-nucleotide polymorphism (SNP) data. The novel approach uses both sources of information simultaneously as is commonly done in plant and animal genetics, but it makes fewer assumptions about population demography than previous LDLA methods. This differs from approaches in human genetics, where LDLA methods use LA information conditional on LD information or the other way round. We argue that the multilocus LDLA model is more powerful for the detection of phenotype–genotype associations than single-locus LDLA analysis. To illustrate the performance of the Bayesian multilocus LDLA method, we analyzed simulation replicates based on real SNP genotype data from small three-generational CEPH families and compared the results with commonly used quantitative transmission disequilibrium test (QTDT). This paper is intended to be conceptual in the sense that it is not meant to be a practical method for analyzing high-density SNP data, which is more common. Our aim was to test whether this approach can function in principle.  相似文献   

15.
OBJECTIVES: Describe the inflation in nonparametric multipoint LOD scores due to inter-marker linkage disequilibrium (LD) across many markers with varied allele frequencies. METHOD: Using simulated two-generation families with and without parents, we conducted nonparametric multipoint linkage analysis with 2 to 10 markers with minor allele frequencies (MAF) of 0.5 and 0.1. RESULTS: Misspecification of population haplotype frequencies by assuming linkage equilibrium caused inflated multipoint LOD scores due to inter-marker LD when parental genotypes were not included. Inflation increased as more markers in LD were included and decreased as markers in equilibrium were added. When marker allele frequencies were unequal, the r2 measure of LD was a better predictor of inflation than D'. CONCLUSION: This observation strongly supports the evaluation of LD in multipoint linkage analyses, and further suggests that unaccounted for LD may be suspected when two-point and multipoint linkage analyses show a marked disparity in regions with elevated r2 measures of LD. Given the increasing popularity of high-density genome-wide SNP screens, inter-marker LD should be a concern in future linkage studies.  相似文献   

16.
At present there is tremendous interest in characterizing the magnitude and distribution of linkage disequilibrium (LD) throughout the human genome, which will provide the necessary foundation for genome-wide LD analyses and facilitate detailed evolutionary studies. To this end, a human high-density single-nucleotide polymorphism (SNP) marker map has been constructed. Many of the SNPs on this map, however, were identified by sampling a small number of chromosomes from a single population, and inferences drawn from studies using such SNPs may be influenced by ascertainment bias (AB). Through extensive simulations, we have found that AB is a potentially significant problem in estimating and comparing LD within and between populations. Specifically, the magnitude of AB is a function of the SNP discovery strategy, number of chromosomes used for SNP discovery, population genetic characteristics of the particular genomic region considered, amount of gene flow between populations, and demographic history of the populations. We demonstrate that a balanced SNP discovery strategy (where equal numbers of chromosomes are sampled from multiple subpopulations) is the optimal study design for generating broadly applicable SNP resources. Finally, we validate our theoretical predictions by comparing our results to publicly available data from ten genes sequenced in 24 African American and 23 European American individuals.  相似文献   

17.
Linkage disequilibrium (LD) content was calculated for the Genetic Analysis Workshop 14 Affymetrix and Illumina single-nucleotide polymorphism (SNP) genome scans of the Collaborative Study on the Genetics of Alcoholism samples. Pair-wise LD was measured as both D' and r2 on 505 pedigree founder individuals. The r2 estimates were then used to correct the multipoint identity by descent matrix (MIBD) calculation to account for LD and LOD scores on chromosomes 3 and 18 were calculated for COGA's ttdt3 electrophysiological trait using those MIBDs. Extensive LD was observed throughout both marker sets, and it was higher in Affymetrix's more dense SNP map. However, SNP density did not solely account for Affymetrix's higher LD. MIBD estimation procedures assume linkage equilibrium to construct genotypes of non-genotyped pedigree founder individuals, and dense SNP genotyping maps are likely to contain moderate to high LD between markers. LOD score plots calculated after correction for LD followed the same general pattern as uncorrected ones. Since in our study almost half of the pedigree founders were genotyped, it is possible that LD had a minor impact on the LOD scores. Caution should probably be taken when using high density SNP maps when many non-genotyped founders are present in the study pedigrees.  相似文献   

18.
Association mapping is based on linkage disequilibrium (LD) resulting from historical recombinations and helps understanding the genetic basis of complex traits. Many factors affect LD and, therefore, it must be determined empirically in the germplasm under investigation to examine the prospects of successful genome-wide association mapping. The objectives of our study were to (1) examine the extent of LD with simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) markers in 1,537 commercial maize inbred lines belonging to four heterotic pools, (2) compare the LD patterns determined by these two marker types, (3) evaluate the number of SNP markers needed to perform genome-wide association analyses, and (4) investigate temporal trends of LD. Mean values of the squared correlation coefficient ( $ \bar{R} $ ) were almost identical for unlinked, linked, and adjacent SSR marker pairs. In contrast, $ \bar{R} $ values were lowest for the unlinked SNP loci and highest for the SNPs within amplicons. LD decay varied across the different heterotic pools and the individual chromosomes. The SSR markers employed in the present study are not adequate for association analysis, because of insufficient marker density for the germplasm evaluated. Based on the decay of LD in the various heterotic pools, we would need between 4,000 and 65,000 SNP markers to detect with a reasonable power associations with rather large quantitative trait loci (QTL). A much higher marker density is required to identify QTL with smaller effects. However, not only the total number of markers but also their distribution among and along the chromosomes are primordial for undertaking powerful association analyses.  相似文献   

19.
Significant interest has emerged in mapping genetic susceptibility for complex traits through whole-genome association studies. These studies rely on the extent of association, i.e., linkage disequilibrium (LD), between single nucleotide polymorphisms (SNPs) across the human genome. LD describes the nonrandom association between SNP pairs and can be used as a metric when designing maximally informative panels of SNPs for association studies in human populations. Using data from the 1.58 million SNPs genotyped by Perlegen, we explored the allele frequency dependence of the LD statistic r(2) both empirically and theoretically. We show that average r(2) values between SNPs unmatched for allele frequency are always limited to much less than 1 (theoretical approximately 0.46 to 0.57 for this dataset). Frequency matching of SNP pairs provides a more sensitive measure for assessing the average decay of LD and generates average r(2) values across nearly the entire informative range (from 0 to 0.89 through 0.95). Additionally, we analyzed the extent of perfect LD (r(2) = 1.0) using frequency-matched SNPs and found significant differences in the extent of LD in genic regions versus intergenic regions. The SNP pairs exhibiting perfect LD showed a significant bias for derived, nonancestral alleles, providing evidence for positive natural selection in the human genome.  相似文献   

20.
Genetic association studies of common disease often rely on linkage disequilibrium (LD) along the human genome and in the population under study. Although understanding the characteristics of this correlation has been the focus of many large-scale surveys (culminating in genomewide haplotype maps), the results of different studies have yielded wide-ranging estimates. Since understanding these differences (and whether they can be reconciled) has important implications for whole-genome association studies, in this article we dissect biases in these estimations that are due to known aspects of study design and analytic methodology. In particular, we document in the empirical data that the long-known complicating effects of allele frequency, marker density, and sample size largely reconcile all large-scale surveys. Two exceptions are an underappraisal of redundancy among single-nucleotide polymorphisms (SNPs) when evaluation is limited to short regions (as in candidate-gene resequencing studies) and an inflation in the extent of LD in HapMap phase I, which is likely due to oversampling of specific haplotypes in the creation of the public SNP map. Understanding these factors can guide the understanding of empirical LD surveys and has implications for genetic association studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号