共查询到20条相似文献,搜索用时 0 毫秒
1.
Wang X Zhu X Qin H Cooper RS Ewens WJ Li C Li M 《Bioinformatics (Oxford, England)》2011,27(5):670-677
MOTIVATION: Admixed populations offer a unique opportunity for mapping diseases that have large disease allele frequency differences between ancestral populations. However, association analysis in such populations is challenging because population stratification may lead to association with loci unlinked to the disease locus. Methods and results: We show that local ancestry at a test single nucleotide polymorphism (SNP) may confound with the association signal and ignoring it can lead to spurious association. We demonstrate theoretically that adjustment for local ancestry at the test SNP is sufficient to remove the spurious association regardless of the mechanism of population stratification, whether due to local or global ancestry differences among study subjects; however, global ancestry adjustment procedures may not be effective. We further develop two novel association tests that adjust for local ancestry. Our first test is based on a conditional likelihood framework which models the distribution of the test SNP given disease status and flanking marker genotypes. A key advantage of this test lies in its ability to incorporate different directions of association in the ancestral populations. Our second test, which is computationally simpler, is based on logistic regression, with adjustment for local ancestry proportion. We conducted extensive simulations and found that the Type I error rates of our tests are under control; however, the global adjustment procedures yielded inflated Type I error rates when stratification is due to local ancestry difference. 相似文献
2.
Case-control tests for association are an important tool for mapping complex-trait genes. But population structure can invalidate this approach, leading to apparent associations at markers that are unlinked to disease loci. Family-based tests of association can avoid this problem, but such studies are often more expensive and in some cases--particularly for late-onset diseases--are impractical. In this review article we describe a series of approaches published over the past 2 years which use multilocus genotype data to enable valid case-control tests of association, even in the presence of population structure. These tests can be classified into two categories. "Genomic control" methods use the independent marker loci to adjust the distribution of a standard test statistic, while "structured association" methods infer the details of population structure en route to testing for association. We discuss the statistical issues involved in the different approaches and present results from simulations comparing the relative performance of the methods under a range of models. 相似文献
3.
Tsai HJ Choudhry S Naqvi M Rodriguez-Cintron W Burchard EG Ziv E 《Human genetics》2005,118(3-4):424-433
Population stratification may confound the results of genetic association studies among unrelated individuals from admixed populations. Several methods have been proposed to estimate the ancestral information in admixed populations and used to adjust the population stratification in genetic association tests. We evaluate the performances of three different methods: maximum likelihood estimation, ADMIXMAP and Structure through various simulated data sets and real data from Latino subjects participating in a genetic study of asthma. All three methods provide similar information on the accuracy of ancestral estimates and control type I error rate at an approximately similar rate. The most important factor in determining accuracy of the ancestry estimate and in minimizing type I error rate is the number of markers used to estimate ancestry. We demonstrate that approximately 100 ancestry informative markers (AIMs) are required to obtain estimates of ancestry that correlate with correlation coefficients more than 0.9 with the true individual ancestral proportions. In addition, after accounting for the ancestry information in association tests, the excess of type I error rate is controlled at the 5% level when 100 markers are used to estimate ancestry. However, since the effect of admixture on the type I error rate worsens with sample size, the accuracy of ancestry estimates also needs to increase to make the appropriate correction. Using data from the Latino subjects, we also apply these methods to an association study between body mass index and 44 AIMs. These simulations are meant to provide some practical guidelines for investigators conducting association studies in admixed populations. 相似文献
4.
Background
Infectious disease of livestock continues to be a cause of substantial economic loss and has adverse welfare consequences in both the developing and developed world. New solutions to control disease are needed and research focused on the genetic loci determining variation in immune-related traits has the potential to deliver solutions. However, identifying selectable markers and the causal genes involved in disease resistance and vaccine response is not straightforward. The aims of this study were to locate regions of the bovine genome that control the immune response post immunisation. 195 F2 and backcross Holstein Charolais cattle were immunised with a 40-mer peptide derived from foot-and-mouth disease virus (FMDV). T cell and antibody (IgG1 and IgG2) responses were measured at several time points post immunisation. All experimental animals (F0, F1 and F2, n = 982) were genotyped with 165 microsatellite markers for the genome scan.Results
Considerable variability in the immune responses across time was observed and sire, dam and age had significant effects on responses at specific time points. There were significant correlations within traits across time, and between IgG1 and IgG2 traits, also some weak correlations were detected between T cell and IgG2 responses. The whole genome scan detected 77 quantitative trait loci (QTL), on 22 chromosomes, including clusters of QTL on BTA 4, 5, 6, 20, 23 and 25. Two QTL reached 5% genome wide significance (on BTA 6 and 24) and one on BTA 20 reached 1% genome wide significance.Conclusions
A proportion of the variance in the T cell and antibody response post immunisation with an FDMV peptide has a genetic component. Even though the antigen was relatively simple, the humoral and cell mediated responses were clearly under complex genetic control, with the majority of QTL located outside the MHC locus. The results suggest that there may be specific genes or loci that impact on variation in both the primary and secondary immune responses, whereas other loci may be specifically important for early or later phases of the immune response. Future fine mapping of the QTL clusters identified has the potential to reveal the causal variations underlying the variation in immune response observed. 相似文献5.
We present a new haplotype-based approach for inferring local genetic ancestry of individuals in an admixed population. Most existing approaches for local ancestry estimation ignore the latent genetic relatedness between ancestral populations and treat them as independent. In this article, we exploit such information by building an inheritance model that describes both the ancestral populations and the admixed population jointly in a unified framework. Based on an assumption that the common hypothetical founder haplotypes give rise to both the ancestral and the admixed population haplotypes, we employ an infinite hidden Markov model to characterize each ancestral population and further extend it to generate the admixed population. Through an effective utilization of the population structural information under a principled nonparametric Bayesian framework, the resulting model is significantly less sensitive to the choice and the amount of training data for ancestral populations than state-of-the-art algorithms. We also improve the robustness under deviation from common modeling assumptions by incorporating population-specific scale parameters that allow variable recombination rates in different populations. Our method is applicable to an admixed population from an arbitrary number of ancestral populations and also performs competitively in terms of spurious ancestry proportions under a general multiway admixture assumption. We validate the proposed method by simulation under various admixing scenarios and present empirical analysis results from a worldwide-distributed dataset from the Human Genome Diversity Project. 相似文献
6.
In many case-control genetic association studies, a set of correlated secondary phenotypes that may share common genetic factors with disease status are collected. Examination of these secondary phenotypes can yield valuable insights about the disease etiology and supplement the main studies. However, due to unequal sampling probabilities between cases and controls, standard regression analysis that assesses the effect of SNPs (single nucleotide polymorphisms) on secondary phenotypes using cases only, controls only, or combined samples of cases and controls can yield inflated type I error rates when the test SNP is associated with the disease. To solve this issue, we propose a Gaussian copula-based approach that efficiently models the dependence between disease status and secondary phenotypes. Through simulations, we show that our method yields correct type I error rates for the analysis of secondary phenotypes under a wide range of situations. To illustrate the effectiveness of our method in the analysis of real data, we applied our method to a genome-wide association study on high-density lipoprotein cholesterol (HDL-C), where "cases" are defined as individuals with extremely high HDL-C level and "controls" are defined as those with low HDL-C level. We treated 4 quantitative traits with varying degrees of correlation with HDL-C as secondary phenotypes and tested for association with SNPs in LIPG, a gene that is well known to be associated with HDL-C. We show that when the correlation between the primary and secondary phenotypes is >0.2, the P values from case-control combined unadjusted analysis are much more significant than methods that aim to correct for ascertainment bias. Our results suggest that to avoid false-positive associations, it is important to appropriately model secondary phenotypes in case-control genetic association studies. 相似文献
7.
Anderson CA Pettersson FH Clarke GM Cardon LR Morris AP Zondervan KT 《Nature protocols》2010,5(9):1564-1573
This protocol details the steps for data quality assessment and control that are typically carried out during case-control association studies. The steps described involve the identification and removal of DNA samples and markers that introduce bias. These critical steps are paramount to the success of a case-control study and are necessary before statistically testing for association. We describe how to use PLINK, a tool for handling SNP data, to perform assessments of failure rate per individual and per SNP and to assess the degree of relatedness between individuals. We also detail other quality-control procedures, including the use of SMARTPCA software for the identification of ancestral outliers. These platforms were selected because they are user-friendly, widely used and computationally efficient. Steps needed to detect and establish a disease association using case-control data are not discussed here. Issues concerning study design and marker selection in case-control studies have been discussed in our earlier protocols. This protocol, which is routinely used in our labs, should take approximately 8 h to complete. 相似文献
8.
Hinds DA Stokowski RP Patil N Konvicka K Kershenobich D Cox DR Ballinger DG 《American journal of human genetics》2004,74(2):317-325
Association studies in populations that are genetically heterogeneous can yield large numbers of spurious associations if population subgroups are unequally represented among cases and controls. This problem is particularly acute for studies involving pooled genotyping of very large numbers of single-nucleotide-polymorphism (SNP) markers, because most methods for analysis of association in structured populations require individual genotyping data. In this study, we present several strategies for matching case and control pools to have similar genetic compositions, based on ancestry information inferred from genotype data for approximately 300 SNPs tiled on an oligonucleotide-based genotyping array. We also discuss methods for measuring the impact of population stratification on an association study. Results for an admixed population and a phenotype strongly confounded with ancestry show that these simple matching strategies can effectively mitigate the impact of population stratification. 相似文献
9.
In case-control genetic association studies, cases are subjects with the disease and controls are subjects without the disease. At the time of case-control data collection, information about secondary phenotypes is also collected. In addition to studies of primary diseases, there has been some interest in studying genetic variants associated with secondary phenotypes. In genetic association studies, the deviation from Hardy-Weinberg proportion (HWP) of each genetic marker is assessed as an initial quality check to identify questionable genotypes. Generally, HWP tests are performed based on the controls for the primary disease or secondary phenotype. However, when the disease or phenotype of interest is common, the controls do not represent the general population. Therefore, using only controls for testing HWP can result in a highly inflated type I error rate for the disease- and/or phenotype-associated variants. Recently, two approaches, the likelihood ratio test (LRT) approach and the mixture HWP (mHWP) exact test were proposed for testing HWP in samples from case-control studies. Here, we show that these two approaches result in inflated type I error rates and could lead to the removal from further analysis of potential causal genetic variants associated with the primary disease and/or secondary phenotype when the study of primary disease is frequency-matched on the secondary phenotype. Therefore, we proposed alternative approaches, which extend the LRT and mHWP approaches, for assessing HWP that account for frequency matching. The goal was to maintain more (possible causative) single-nucleotide polymorphisms in the sample for further analysis. Our simulation results showed that both extended approaches could control type I error probabilities. We also applied the proposed approaches to test HWP for SNPs from a genome-wide association study of lung cancer that was frequency-matched on smoking status and found that the proposed approaches can keep more genetic variants for association studies. 相似文献
10.
11.
Wang K 《Biostatistics (Oxford, England)》2012,13(4):724-733
The central theme in case-control genetic association studies is to efficiently identify genetic markers associated with trait status. Powerful statistical methods are critical to accomplishing this goal. A popular method is the omnibus Pearson's chi-square test applied to genotype counts. To achieve increased power, tests based on an assumed trait model have been proposed. However, they are not robust to model misspecification. Much research has been carried out on enhancing robustness of such model-based tests. An analysis framework that tests the equality of allele frequency while allowing for different deviation from Hardy-Weinberg equilibrium (HWE) between cases and controls is proposed. The proposed method does not require specification of trait models nor HWE. It involves only 1 degree of freedom. The likelihood ratio statistic, score statistic, and Wald statistic associated with this framework are introduced. Their performance is evaluated by extensive computer simulation in comparison with existing methods. 相似文献
12.
We studied a trend test for genetic association between disease and the number of risk alleles using case-control data. When the data are sampled from families, this trend test can be adjusted to take into account the correlations among family members in complex pedigrees. However, the test depends on the scores based on the underlying genetic model and thus it may have substantial loss of power when the model is misspecified. Since the mode of inheritance will be unknown for complex diseases, we have developed two robust trend tests for case-control studies using family data. These robust tests have relatively good power for a class of possible genetic models. The trend tests and robust trend tests were applied to a dataset of Genetic Analysis Workshop 14 from the Collaborative Study on the Genetics of Alcoholism. 相似文献
13.
In the search to detect genetic associations between complex traits and DNA variants, a practice is to select a subset of Single Nucleotide Polymorphisms (tag SNPs) in a gene or chromosomal region of interest. This allows study of untyped polymorphisms in this region through the phenomenon of linkage disequilibrium (LD). However, it is crucial in the analysis to utilize such multiple SNP markers efficiently. In this study, we present a robust testing approach (T(C)) that combines single marker association test statistics or p values. This combination is based on the summation of single test statistics or p values, giving greater weight to those with lower p values. We compared the powers of T(C) in identifying common trait loci, using tag SNPs within the same haplotype block that the trait loci reside, with competing published tests, in case-control settings. These competing tests included the Bonferroni procedure (T(B)), the simple permutation procedure (T(P)), the permutation procedure proposed by Hoh et al. (T(P-H)) and its revised version using 'deflated' statistics (T(P-H_def)), the traditional chi(2) procedure (T(CHI)), the regression procedure (Hotelling T(2) test) (T(R)) and the haplotype-based test (T(H)). Results of these comparisons show that our proposed combining procedure (T(C)) is preferred in all scenarios examined. We also apply this new test to a data set from a previously reported association study on airway responsiveness to methacholine. 相似文献
14.
Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model 下载免费PDF全文
We propose a novel latent-class approach to detect and account for population stratification in a case-control study of association between a candidate gene and a disease. In our approach, population substructure is detected and accounted for using data on additional loci that are in linkage equilibrium within subpopulations but have alleles that vary in frequency between subpopulations. We have tested our approach using simulated data based on allele frequencies in 12 short tandem repeat (STR) loci in four populations in Argentina. 相似文献
15.
16.
SUMMARY: A website that plots power and sample size calculations over a range of up to eight parameters (including diagnostic misclassification error parameters) for two commonly used statistical tests of genetic association, the linear trend test and the genotypic test of association. AVAILABILITY: This method is made available via the website http://linkage.rockefeller.edu/pawe3d/ CONTACT: pawe3d@linkage.rockefeller.edu. 相似文献
17.
A practical introduction to Random Forest for genetic association studies in ecology and evolution 下载免费PDF全文
Large genomic studies are becoming increasingly common with advances in sequencing technology, and our ability to understand how genomic variation influences phenotypic variation between individuals has never been greater. The exploration of such relationships first requires the identification of associations between molecular markers and phenotypes. Here, we explore the use of Random Forest (RF), a powerful machine‐learning algorithm, in genomic studies to discern loci underlying both discrete and quantitative traits, particularly when studying wild or nonmodel organisms. RF is becoming increasingly used in ecological and population genetics because, unlike traditional methods, it can efficiently analyse thousands of loci simultaneously and account for nonadditive interactions. However, understanding both the power and limitations of Random Forest is important for its proper implementation and the interpretation of results. We therefore provide a practical introduction to the algorithm and its use for identifying associations between molecular markers and phenotypes, discussing such topics as data limitations, algorithm initiation and optimization, as well as interpretation. We also provide short R tutorials as examples, with the aim of providing a guide to the implementation of the algorithm. Topics discussed here are intended to serve as an entry point for molecular ecologists interested in employing Random Forest to identify trait associations in genomic data sets. 相似文献
18.
In genome-wide association studies (GWAS), single-marker analysis is usually employed to identify the most significant single
nucleotide polymorphisms (SNPs). The trend test has been proposed for analysis of case-control association. Three trend tests,
optimal for the recessive, additive and dominant models respectively, are available. When the underlying genetic model is
unknown, the maximum of the three trend test results (MAX) has been shown to be robust against genetic model misspecification.
Since the asymptotic distribution of MAX depends on the allele frequency of the SNP, using the P-value of MAX for ranking may be different from using the MAX statistic. Calculating the P-value of MAX for 300,000 (300 K) or more SNPs is computationally intensive and the software and program to obtain the P-value of MAX are not widely available. On the other hand, the MAX statistic is very easy to calculate without complex computer
programs. Thus, we study whether or not one could use the MAX statistic instead of its P-value to rank SNPs in GWAS. The approaches using the MAX and its P-value to rank SNPs are referred to as MAX-rank and P-rank. By applying MAX-rank and P-rank to simulated and four real datasets
from GWAS, we found the ranks of SNPs with true association are very similar using both approaches. Thus, we recommend to
use MAX-rank for genome-wide scans. After the top-ranked SNPs are identified, their P-values based on MAX can be calculated and compared with the significance level.
The work of Q. Li was partially supported by the Knowledge Innovation Program of the Chinese Academy of Sciences, No. 30465W0
and 30475V0. The research of Z Li was partially sponsored by NIH grant EY014478. 相似文献
19.
Background
The availability of sequences from whole genomes to reconstruct the tree of life has the potential to enable the development of phylogenomic hypotheses in ways that have not been before possible. A significant bottleneck in the analysis of genomic-scale views of the tree of life is the time required for manual curation of genomic data into multi-gene phylogenetic matrices.Results
To keep pace with the exponentially growing volume of molecular data in the genomic era, we have developed an automated technique, ASAP (Automated Simultaneous Analysis Phylogenetics), to assemble these multigene/multi species matrices and to evaluate the significance of individual genes within the context of a given phylogenetic hypothesis.Conclusion
Applications of ASAP may enable scientists to re-evaluate species relationships and to develop new phylogenomic hypotheses based on genome-scale data. 相似文献20.
The HapMap project has given case-control association studies a unique opportunity to uncover the genetic basis of complex diseases. However, persistent issues in such studies remain the proper quantification of, testing for, and correction for population stratification (PS). In this paper, we present the first unified paradigm that addresses all three fundamental issues within one statistical framework. Our unified approach makes use of an omnibus quantity (delta), which can be estimated in a case-control study from suitable null loci. We show how this estimated value can be used to quantify PS, to statistically test for PS, and to correct for PS, all in the context of case-control studies. Moreover, we provide guidelines for interpreting values of delta in association studies (e.g., at alpha = 0.05, a delta of size 0.416 is small, a delta of size 0.653 is medium, and a delta of size 1.115 is large). A novel feature of our testing procedure is its ability to test for either strictly any PS or only 'practically important' PS. We also performed simulations to compare our correction procedure with Genomic Control (GC). Our results show that, unlike GC, it maintains good Type I error rates and power across all levels of PS. 相似文献