首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Population stratification can be a serious obstacle in the analysis of genomewide association studies. We propose a method for evaluating the significance of association scores in whole-genome cohorts with stratification. Our approach is a randomization test akin to a standard permutation test. It conditions on the genotype matrix and thus takes into account not only the population structure but also the complex linkage disequilibrium structure of the genome. As we show in simulation experiments, our method achieves higher power and significantly better control over false-positive rates than do existing methods. In addition, it can be easily applied to whole-genome association studies.  相似文献   

2.
With the widespread availability of SNP genotype data, there is great interest in analyzing pedigree haplotype data. Intermarker linkage disequilibrium for microsatellite markers is usually low due to their physical distance; however, for dense maps of SNP markers, there can be strong linkage disequilibrium between marker loci. Linkage analysis (parametric and nonparametric) and family-based association studies are currently being carried out using dense maps of SNP marker loci. Monte Carlo methods are often used for both linkage and association studies; however, to date there are no programs available which can generate haplotype and/or genotype data consisting of a large number of loci for pedigree structures. SimPed is a program that quickly generates haplotype and/or genotype data for pedigrees of virtually any size and complexity. Marker data either in linkage disequilibrium or equilibrium can be generated for greater than 20,000 diallelic or multiallelic marker loci. Haplotypes and/or genotypes are generated for pedigree structures using specified genetic map distances and haplotype and/or allele frequencies. The simulated data generated by SimPed is useful for a variety of purposes, including evaluating methods that estimate haplotype frequencies for pedigree data, evaluating type I error due to intermarker linkage disequilibrium and estimating empirical p values for linkage and family-based association studies.  相似文献   

3.
Genomewide association studies are being conducted to unravel the genetic etiology of complex human diseases. Because of cost constraints, these studies typically employ a two-stage design, under which a large panel of markers is examined in a subsample of subjects, and the most-promising markers are then examined in all subjects. This report describes a simple and efficient method to evaluate statistical significance for such genome studies. The proposed method, which properly accounts for the correlated nature of polymorphism data, provides accurate control of the overall false-positive rate and is substantially more powerful than the standard Bonferroni correction, especially when the markers are in strong linkage disequilibrium.  相似文献   

4.
A major aim of association studies is the identification of polymorphisms (usually SNPs) associated with a trait. Tests of association may be based on individual SNPs or on sets of neighboring SNPs, by use of (for example) a product P value method or Hotelling's T test. Linkage disequilibrium, the nonindependence of SNPs in physical proximity, causes problems for all these tests. First, multiple-testing correction for individual-SNP tests or for multilocus tests either leads to conservative P values (if Bonferroni correction is used) or is computationally expensive (if permutation is used). Second, calculation of product P values usually requires permutation. Here, we present the direct simulation approach (DSA), a method that accurately approximates P values obtained by permutation but is much faster. It may be used whenever tests are based on score statistics--for example, with Armitage's trend test or its multivariate analogue. The DSA can be used with binary, continuous, or count traits and allows adjustment for covariates. We demonstrate the accuracy of the DSA on real and simulated data and illustrate how it might be used in the analysis of a whole-genome association study.  相似文献   

5.
Cardon LR 《Human heredity》2000,50(6):350-358
A multiple-regression model is described for the detection of linkage disequilibrium in quantitative trait loci. The model is developed for application to large numbers of single nucleotide polymorphism (SNP) markers genotyped on small nuclear families. Parental data are not required by the method, although it provides a direct means to test quantitative trait locus-marker allele association and to determine whether any such association is attributable to linkage disequilibrium or population admixture. Analytical expectations for the regression coefficients are derived, allowing direct interpretation of the parameter estimates. Simulation studies indicate a substantial improvement in power over classical linkage studies of sibling pairs and show the effects of population admixture on the model outcomes.  相似文献   

6.
Lou XY  Casella G  Littell RC  Yang MC  Johnson JA  Wu R 《Genetics》2003,163(4):1533-1548
For tightly linked loci, cosegregation may lead to nonrandom associations between alleles in a population. Because of its evolutionary relationship with linkage, this phenomenon is called linkage disequilibrium. Today, linkage disequilibrium-based mapping has become a major focus of recent genome research into mapping complex traits. In this article, we present a new statistical method for mapping quantitative trait loci (QTL) of additive, dominant, and epistatic effects in equilibrium natural populations. Our method is based on haplotype analysis of multilocus linkage disequilibrium and exhibits two significant advantages over current disequilibrium mapping methods. First, we have derived closed-form solutions for estimating the marker-QTL haplotype frequencies within the maximum-likelihood framework implemented by the EM algorithm. The allele frequencies of putative QTL and their linkage disequilibria with the markers are estimated by solving a system of regular equations. This procedure has significantly improved the computational efficiency and the precision of parameter estimation. Second, our method can detect marker-QTL disequilibria of different orders and QTL epistatic interactions of various kinds on the basis of a multilocus analysis. This can not only enhance the precision of parameter estimation, but also make it possible to perform whole-genome association studies. We carried out extensive simulation studies to examine the robustness and statistical performance of our method. The application of the new method was validated using a case study from humans, in which we successfully detected significant QTL affecting human body heights. Finally, we discuss the implications of our method for genome projects and its extension to a broader circumstance. The computer program for the method proposed in this article is available at the webpage http://www.ifasstat.ufl.edu/genome/~LD.  相似文献   

7.
Estimates of the degree of nonrandom association among genes (linkage disequilibrium) can provide evidence of the role of natural selection in maintaining allozyme polymorphisms in natural populations. This paper outlines the maximum likelihood procedures for such estimates based on gametic or zygotic frequencies at the level of two loci. The analysis is extended to estimating disequilibrium between three loci. In particular, the question of the sampling requirements to detect different intensities of disequilibrium is considered. It is found that relatively large samples are required to detect nonrandom association, unless gene frequencies are intermediate and disequilibrium is relatively intense. This might be one reason why cases of linkage disequilibrium have so far proved to be the exception, rather than the rule, in population studies.  相似文献   

8.
Genomewide association studies (GWAS) aim to identify genetic markers strongly associated with quantitative traits by utilizing linkage disequilibrium (LD) between candidate genes and markers. However, because of LD between nearby genetic markers, the standard GWAS approaches typically detect a number of correlated SNPs covering long genomic regions, making corrections for multiple testing overly conservative. Additionally, the high dimensionality of modern GWAS data poses considerable challenges for GWAS procedures such as permutation tests, which are computationally intensive. We propose a cluster‐based GWAS approach that first divides the genome into many large nonoverlapping windows and uses linkage disequilibrium network analysis in combination with principal component (PC) analysis as dimensional reduction tools to summarize the SNP data to independent PCs within clusters of loci connected by high LD. We then introduce single‐ and multilocus models that can efficiently conduct the association tests on such high‐dimensional data. The methods can be adapted to different model structures and used to analyse samples collected from the wild or from biparental F2 populations, which are commonly used in ecological genetics mapping studies. We demonstrate the performance of our approaches with two publicly available data sets from a plant (Arabidopsis thaliana) and a fish (Pungitius pungitius), as well as with simulated data.  相似文献   

9.
There is a substantial literature on the use of linkage disequilibrium (LD) to estimate effective population size using unlinked loci. The estimates are extremely sensitive to the sampling process, and there is currently no theory to cope with the possible biases. We derive formulae for the analysis of idealised populations mating at random with multi-allelic (microsatellite) loci. The ‘Burrows composite index’ is introduced in a novel way with a ‘composite haplotype table’. We show that in a sample of diploid size , the mean value of or from the composite haplotype table is biased by a factor of , rather than the usual factor for a conventional haplotype table. But analysis of population data using these formulae leads to estimates that are unrealistically low. We provide theory and simulation to show that this bias towards low estimates is due to null alleles, and introduce a randomised permutation correction to compensate for the bias. We also consider the effect of introducing a within-locus disequilibrium factor to , and find that this factor leads to a bias in the estimate. However this bias can be overcome using the same randomised permutation correction, to yield an altered with lower variance than the original , and one that is also insensitive to null alleles. The resulting formulae are used to provide estimates on 40 samples of the Queensland fruit fly, Bactrocera tryoni, from populations with widely divergent expectations. Linkage relationships are known for most of the microsatellite loci in this species. We find that there is little difference in the estimated values from using known unlinked loci as compared to using all loci, which is important for conservation studies where linkage relationships are unknown.  相似文献   

10.
Summary We reconsider the method of Ott and Falk (1982) for the analysis of genetic linkage and of epistasis in the presence of phenotypic association. Their approach is extended to allow for gametic disequilibrium between marker and trait loci. We show that epistasis and tight linkage with gametic disequilibrium may be indistinguishable as explanations of association even in a very large pedigree.  相似文献   

11.
Contemporary genetic association studies may test hundreds of thousands of genetic variants for association, often with multiple binary and continuous traits or under more than one model of inheritance. Many of these association tests may be correlated with one another because of linkage disequilibrium between nearby markers and correlation between traits and models. Permutation tests and simulation-based methods are often employed to adjust groups of correlated tests for multiple testing, since conventional methods such as Bonferroni correction are overly conservative when tests are correlated. We present here a method of computing P values adjusted for correlated tests (PACT) that attains the accuracy of permutation or simulation-based tests in much less computation time, and we show that our method applies to many common association tests that are based on multiple traits, markers, and genetic models. Simulation demonstrates that PACT attains the power of permutation testing and provides a valid adjustment for hundreds of correlated association tests. In data analyzed as part of the Finland–United States Investigation of NIDDM Genetics (FUSION) study, we observe a near one-to-one relationship (r2>.999) between PACT and the corresponding permutation-based P values, achieving the same precision as permutation testing but thousands of times faster.  相似文献   

12.
Genome wide association studies using high throughput technology are already being conducted despite the significant hurdles that need to be overcome (Nat Rev Genet 6:95–108, 2005; Nat Rev Genet 6:109–118, 2005). Methods for detecting haplotype association signals in genome wide haplotype datasets are as yet very limited. Much methodological research has already been devoted to linkage disequilibrium (LD) fine mapping where the focus is the identification of the disease locus rather than the detection of a disease signal. Applications of these approaches to genome wide scanning are limited by the strong model assumptions of the sharing process, which lead to computational complexity. We describe a new algorithm for the initial identification of disease susceptibility loci in genome wide haplotype association studies. Excess sharing of ancestral haplotypes, which indicates the presence of a disease locus, is detected with a simple, easy to interpret, χ 2 based statistic. The method allows genome wide scanning for qualitative traits within reasonable computational timeframes and can serve as a first pass analysis prior to the usage of likelihood based methods, providing candidate regions and inferred susceptibility haplotypes. Our method makes no assumptions regarding the population history or the pattern of background LD. Statistical significance is evaluated with permutation tests. The method is illustrated on simulated and real data where it is applied to simple (cystic fibrosis) and complex disease (multiple sclerosis) examples. The statistic has low type I error and greater power to map disease loci over conventional single marker tests for low to moderate levels of LD.  相似文献   

13.
By means of a combination of genome-wide and follow-up studies, recent large-scale association studies of populations of European descent have now identified over 46 loci associated with coronary artery disease (CAD). As part of the TAICHI Consortium, we have collected and genotyped 8556 subjects from Taiwan, comprising 5423 controls and 3133 cases with coronary artery disease, for 9087 CAD SNPs using the CardioMetaboChip. We applied penalized logistic regression to ascertain the top SNPs that contribute together to CAD susceptibility in Taiwan. We observed that the 9p21 locus contributes to CAD at the level of genome-wide significance (rs1537372, with the presence of C, the major allele, the effect estimate is -0.216, standard error 0.033, p value 5.8x10-10). In contrast to a previous report, we propose that the 9p21 locus is a single genetic contribution to CAD in Taiwan because: 1) the penalized logistic regression and the follow-up conditional analysis suggested that rs1537372 accounts for all of the CAD association in 9p21, and 2) the high linkage disequilibrium observed for all associated SNPs in 9p21. We also observed evidence for the following loci at a false discovery rate >5%: SH2B3, ADAMTS7, PHACTR1, GGCX, HTRA1, COL4A1, and LARP6-LRRC49. We also took advantage of the fact that penalized methods are an efficient approach to search for gene-by-gene interactions, and observed that two-way interactions between the PHACTR1 and ADAMTS7 loci and between the SH2B3 and COL4A1 loci contribute to CAD risk. Both the similarities and differences between the significance of these loci when compared with significance of loci in studies of populations of European descent underscore the fact that further genetic association of studies in additional populations will provide clues to identify the genetic architecture of CAD across all populations worldwide.  相似文献   

14.
Many popular methods for exploring gene-gene interactions, including the case-only approach, rely on the key assumption that physically distant loci are in linkage equilibrium in the underlying population. These methods utilize the presence of correlation between unlinked loci in a disease-enriched sample as evidence of interactions among the loci in the etiology of the disease. We use data from the CGEMS case-control genome-wide association study of breast cancer to demonstrate empirically that the case-only and related methods have the potential to create large-scale false positives because of the presence of population stratification (PS) that creates long-range linkage disequilibrium in the genome. We show that the bias can be removed by considering parametric and nonparametric methods that assume gene-gene independence between unlinked loci, not in the entire population, but only conditional on population substructure that can be uncovered based on the principal components of a suitably large panel of PS markers. Applications in the CGEMS study as well as simulated data show that the proposed methods are robust to the presence of population stratification and are yet much more powerful, relative to standard logistic regression methods that are also commonly used as robust alternatives to the case-only type methods.  相似文献   

15.

Background  

The information provided by dense genome-wide markers using high throughput technology is of considerable potential in human disease studies and livestock breeding programs. Genome-wide association studies relate individual single nucleotide polymorphisms (SNP) from dense SNP panels to individual measurements of complex traits, with the underlying assumption being that any association is caused by linkage disequilibrium (LD) between SNP and quantitative trait loci (QTL) affecting the trait. Often SNP are in genomic regions of no trait variation. Whole genome Bayesian models are an effective way of incorporating this and other important prior information into modelling. However a full Bayesian analysis is often not feasible due to the large computational time involved.  相似文献   

16.
Genome Wide Association Studies (GWAS) are a standard approach for large-scale common variation characterization and for identification of single loci predisposing to disease. However, due to issues of moderate sample sizes and particularly multiple testing correction, many variants of smaller effect size are not detected within a single allele analysis framework. Thus, small main effects and potential epistatic effects are not consistently observed in GWAS using standard analytical approaches that consider only single SNP alleles. Here, we propose unique methodology that aggregates variants of interest (for example, genes in a biological pathway) using GWAS results. Multiple testing and type I error concerns are minimized using empirical genomic randomization to estimate significance. Randomization corrects for common pathway-based analysis biases, such as SNP coverage and density, linkage disequilibrium, gene size and pathway size. Pathway Analysis by Randomization Incorporating Structure (PARIS) applies this randomization and in doing so directly accounts for linkage disequilibrium effects. PARIS is independent of association analysis method and is thus applicable to GWAS datasets of all study designs. Using the KEGG database as an example, we apply PARIS to the publicly available Autism Genetic Resource Exchange GWAS dataset, revealing pathways with a significant enrichment of positive association results.  相似文献   

17.
Zhou H  Wei LJ  Xu X  Xu X 《Human heredity》2008,65(3):166-174
In the search to detect genetic associations between complex traits and DNA variants, a practice is to select a subset of Single Nucleotide Polymorphisms (tag SNPs) in a gene or chromosomal region of interest. This allows study of untyped polymorphisms in this region through the phenomenon of linkage disequilibrium (LD). However, it is crucial in the analysis to utilize such multiple SNP markers efficiently. In this study, we present a robust testing approach (T(C)) that combines single marker association test statistics or p values. This combination is based on the summation of single test statistics or p values, giving greater weight to those with lower p values. We compared the powers of T(C) in identifying common trait loci, using tag SNPs within the same haplotype block that the trait loci reside, with competing published tests, in case-control settings. These competing tests included the Bonferroni procedure (T(B)), the simple permutation procedure (T(P)), the permutation procedure proposed by Hoh et al. (T(P-H)) and its revised version using 'deflated' statistics (T(P-H_def)), the traditional chi(2) procedure (T(CHI)), the regression procedure (Hotelling T(2) test) (T(R)) and the haplotype-based test (T(H)). Results of these comparisons show that our proposed combining procedure (T(C)) is preferred in all scenarios examined. We also apply this new test to a data set from a previously reported association study on airway responsiveness to methacholine.  相似文献   

18.

Background  

Population structure analysis is important to genetic association studies and evolutionary investigations. Parametric approaches, e.g. STRUCTURE and L-POP, usually assume Hardy-Weinberg equilibrium (HWE) and linkage equilibrium among loci in sample population individuals. However, the assumptions may not hold and allele frequency estimation may not be accurate in some data sets. The improved version of STRUCTURE (version 2.1) can incorporate linkage information among loci but is still sensitive to high background linkage disequilibrium. Nowadays, large-scale single nucleotide polymorphisms (SNPs) are becoming popular in genetic studies. Therefore, it is imperative to have software that makes full use of these genetic data to generate inference even when model assumptions do not hold or allele frequency estimation suffers from high variation.  相似文献   

19.
Andolfatto P  Przeworski M 《Genetics》2000,156(1):257-268
We analyze nucleotide polymorphism data for a large number of loci in areas of normal to high recombination in Drosophila melanogaster and D. simulans (24 and 16 loci, respectively). We find a genome-wide, systematic departure from the neutral expectation for a panmictic population at equilibrium in natural populations of both species. The distribution of sequence-based estimates of 2Nc across loci is inconsistent with the assumptions of the standard neutral theory, given the observed levels of nucleotide diversity and accepted values for recombination and mutation rates. Under these assumptions, most estimates of 2Nc are severalfold too low; in other words, both species exhibit greater intralocus linkage disequilibrium than expected. Variation in recombination or mutation rates is not sufficient to account for the excess of linkage disequilibrium. While an equilibrium island model does not seem to account for the data, more complicated forms of population structure may. A proper test of alternative demographic models will require loci to be sampled in a more consistent fashion.  相似文献   

20.
Richard R. Hudson 《Genetics》1985,109(3):611-631
The sampling distributions of several statistics that measure the association of alleles on gametes (linkage disequilibrium) are estimated under a two-locus neutral infinite allele model using an efficient Monte Carlo method. An often used approximation for the mean squared linkage disequilibrium is shown to be inaccurate unless the proper statistical conditioning is used. The joint distribution of linkage disequilibrium and the allele frequencies in the sample is studied. This estimated joint distribution is sufficient for obtaining an approximate maximum likelihood estimate of C = 4Nc, where N is the population size and c is the recombination rate. It has been suggested that observations of high linkage disequilibrium might be a good basis for rejecting a neutral model in favor of a model in which natural selection maintains genetic variation. It is found that a single sample of chromosomes, examined at two loci cannot provide sufficient information for such a test if C less than 10, because with C this small, very high levels of linkage disequilibrium are not unexpected under the neutral model. In samples of size 50, it is found that, even when C is as large as 50, the distribution of linkage disequilibrium conditional on the allele frequencies is substantially different from the distribution when there is no linkage between the loci. When conditioned on the number of alleles at each locus in the sample, all of the sample statistics examined are nearly independent of theta = 4N mu, where mu is the neutral mutation rate.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号