共查询到20条相似文献,搜索用时 15 毫秒
1.
Wang K 《Biostatistics (Oxford, England)》2012,13(4):724-733
The central theme in case-control genetic association studies is to efficiently identify genetic markers associated with trait status. Powerful statistical methods are critical to accomplishing this goal. A popular method is the omnibus Pearson's chi-square test applied to genotype counts. To achieve increased power, tests based on an assumed trait model have been proposed. However, they are not robust to model misspecification. Much research has been carried out on enhancing robustness of such model-based tests. An analysis framework that tests the equality of allele frequency while allowing for different deviation from Hardy-Weinberg equilibrium (HWE) between cases and controls is proposed. The proposed method does not require specification of trait models nor HWE. It involves only 1 degree of freedom. The likelihood ratio statistic, score statistic, and Wald statistic associated with this framework are introduced. Their performance is evaluated by extensive computer simulation in comparison with existing methods. 相似文献
2.
The most simple and commonly used approach for genetic associations is the case-control study design of unrelated people. This design is susceptible to population stratification. This problem is obviated in family-based studies, but it is usually difficult to accumulate large enough samples of well-characterized families. We addressed empirically whether the two designs give similar estimates of association in 93 investigations where both unrelated case-control and family-based designs had been employed. Estimated odds ratios differed beyond chance between the two designs in only four instances (4%). The summary relative odds ratio (ROR) (the ratio of odds ratios obtained from unrelated case-control and family-based studies) was close to unity (0.96 [95% confidence interval, 0.91-1.01]). There was no heterogeneity in the ROR across studies (amount of heterogeneity beyond chance I(2) = 0%). Differences on whether results were nominally statistically significant (p < 0.05) or not with the two designs were common (opposite classification rates 14% and 17%); this reflected largely differences in power. Conclusions were largely similar in diverse subgroup analyses. Unrelated case-control and family-based designs give overall similar estimates of association. We cannot rule out rare large biases or common small biases. 相似文献
3.
Family based association study (FBAS) has the advantages of controlling for population stratification and testing for linkage and association simultaneously. We propose a retrospective multilevel model (rMLM) approach to analyze sibship data by using genotypic information as the dependent variable. Simulated data sets were generated using the simulation of linkage and association (SIMLA) program. We compared rMLM to sib transmission/disequilibrium test (S-TDT), sibling disequilibrium test (SDT), conditional logistic regression (CLR) and generalized estimation equations (GEE) on the measures of power, type I error, estimation bias and standard error. The results indicated that rMLM was a valid test of association in the presence of linkage using sibship data. The advantages of rMLM became more evident when the data contained concordant sibships. Compared to GEE, rMLM had less underestimated odds ratio (OR). Our results support the application of rMLM to detect gene-disease associations using sibship data. However, the risk of increasing type I error rate should be cautioned when there is association without linkage between the disease locus and the genotyped marker. 相似文献
4.
Epstein MP Veal CD Trembath RC Barker JN Li C Satten GA 《American journal of human genetics》2005,76(4):592-608
The selection of an appropriate control sample for use in association mapping requires serious deliberation. Unrelated controls are generally easy to collect, but the resulting analyses are susceptible to spurious association arising from population stratification. Parental controls are popular, since triads comprising a case and two parents can be used in analyses that are robust to this stratification. However, parental controls are often expensive and difficult to collect. In some situations, studies may have both parental and unrelated controls available for analysis. For example, a candidate-gene study may analyze triads but may have an additional sample of unrelated controls for examination of background linkage disequilibrium in genomic regions. Also, studies may collect a sample of triads to confirm results initially found using a traditional case-control study. Initial association studies also may collect each type of control, to provide insurance against the weaknesses of the other type. In these situations, resulting samples will consist of some triads, some unrelated controls, and, possibly, some unrelated cases. Rather than analyze the triads and unrelated subjects separately, we present a likelihood-based approach for combining their information in a single combined association analysis. Our approach allows for joint analysis of data from both triad and case-control study designs. Simulations indicate that our proposed approach is more powerful than association tests that are based on each separate sample. Our approach also allows for flexible modeling and estimation of allele effects, as well as for missing parental data. We illustrate the usefulness of our approach using SNP data from a candidate-gene study of psoriasis. 相似文献
5.
6.
7.
Optimized group sequential study designs for tests of genetic linkage and association in complex diseases
下载免费PDF全文

The study of genetic linkage or association in complex traits requires large sample sizes, as the expected effect sizes are small and extremely low significance levels need to be adopted. One possible way to reduce the numbers of phenotypings and genotypings is the use of a sequential study design. Here, average sample sizes are decreased by conducting interim analyses with the possibility to stop the investigation early if the result is significant. We applied optimized group sequential study designs to the analysis of genetic linkage (one-sided mean test) and association (two-sided transmission/disequilibrium test). For designs with two and three stages at overall significance levels of.05 and.0001 and a power of.8, we calculated necessary sample sizes, time points, and critical boundaries for interim and final analyses. Monte Carlo simulation analyses were performed to confirm the validity of the asymptotic approximation. Furthermore, we calculated average sample sizes required under the null and alternative hypotheses in the different study designs. It was shown that the application of a group sequential design led to a maximal increase in sample size of 8% under the null hypothesis, compared with the fixed-sample design. This was contrasted by savings of up to 20% in average sample sizes under the alternative hypothesis, depending on the applied design. These savings affect the amounts of genotyping and phenotyping required for a study and therefore lead to a significant decrease in cost and time. 相似文献
8.
Tan Q Christiansen L Christensen K Bathum L Li S Zhao JH Kruse TA 《Genetical research》2005,86(3):223-231
Haplotype inference has become an important part of human genetic data analysis due to its functional and statistical advantages over the single-locus approach in linkage disequilibrium mapping. Different statistical methods have been proposed for detecting haplotype - disease associations using unphased multi-locus genotype data, ranging from the early approach by the simple gene-counting method to the recent work using the generalized linear model. However, these methods are either confined to case - control design or unable to yield unbiased point and interval estimates of haplotype effects. Based on the popular logistic regression model, we present a new approach for haplotype association analysis of human disease traits. Using haplotype-based parameterization, our model infers the effects of specific haplotypes (point estimation) and constructs confidence interval for the risks of haplotypes (interval estimation). Based on the estimated parameters, the model calculates haplotype frequency conditional on the trait value for both discrete and continuous traits. Moreover, our model provides an overall significance level for the association between the disease trait and a group or all of the haplotypes. Featured by the direct maximization in haplotype estimation, our method also facilitates a computer simulation approach for correcting the significance level of individual haplotype to adjust for multiple testing. We show, by applying the model to an empirical data set, that our method based on the well-known logistic regression model is a useful tool for haplotype association analysis of human disease traits. 相似文献
9.
Accurate haplotype inference for multiple linked single-nucleotide polymorphisms using sibship data 总被引:1,自引:0,他引:1
下载免费PDF全文

Sibships are commonly used in genetic dissection of complex diseases, particularly for late-onset diseases. Haplotype-based association studies have been advocated as powerful tools for fine mapping and positional cloning of complex disease genes. Existing methods for haplotype inference using data from relatives were originally developed for pedigree data. In this study, we proposed a new statistical method for haplotype inference for multiple tightly linked single-nucleotide polymorphisms (SNPs), which is tailored for extensively accumulated sibship data. This new method was implemented via an expectation-maximization (EM) algorithm without the usual assumption of linkage equilibrium among markers. Our EM algorithm does not incur extra computational burden for haplotype inference using sibship data when compared with using unrelated parental data. Furthermore, its computational efficiency is not affected by increasing sibship size. We examined the robustness and statistical performance of our new method in simulated data created from an empirical haplotype data set of human growth hormone gene 1. The utility of our method was illustrated with an application to the analyses of haplotypes of three candidate genes for osteoporosis. 相似文献
10.
We studied a trend test for genetic association between disease and the number of risk alleles using case-control data. When the data are sampled from families, this trend test can be adjusted to take into account the correlations among family members in complex pedigrees. However, the test depends on the scores based on the underlying genetic model and thus it may have substantial loss of power when the model is misspecified. Since the mode of inheritance will be unknown for complex diseases, we have developed two robust trend tests for case-control studies using family data. These robust tests have relatively good power for a class of possible genetic models. The trend tests and robust trend tests were applied to a dataset of Genetic Analysis Workshop 14 from the Collaborative Study on the Genetics of Alcoholism. 相似文献
11.
A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test. 总被引:36,自引:9,他引:36
下载免费PDF全文

Linkage analysis with genetic markers has been successful in the localization of genes for many monogenic human diseases. In studies of complex diseases, however, tests that rely on linkage disequilibrium (the simultaneous presence of linkage and association) are often more powerful than those that rely on linkage alone. This advantage is illustrated by the transmission/disequilibrium test (TDT). The TDT requires data (marker genotypes) for affected individuals and their parents; for some diseases, however, data from parents may be difficult or impossible to obtain. In this article, we describe a method, called the "sib TDT" (or "S-TDT"), that overcomes this problem by use of marker data from unaffected sibs instead of from parents, thus allowing application of the principle of the TDT to sibships without parental data. In a single collection of families, there might be some that can be analyzed only by the TDT and others that are suitable for analysis by the S-TDT. We show how all the data may be used jointly in one overall TDT-type procedure that tests for linkage in the presence of association. These extensions of the TDT will be valuable for the study of diseases of late onset, such as non-insulin-dependent diabetes, cardiovascular diseases, and other diseases associated with aging. 相似文献
12.
A simulated study of historical controls using real data 总被引:1,自引:0,他引:1
Data from the first and second National Wilms' Tumor study were used to simulate how use of the first study's historical controls in the design and analysis of the second study might have influenced the conclusions of the investigations. It was seen that the conclusions from a fully-randomized study can differ in substances from one using historical controls. 相似文献
13.
We study a two-stage analysis of genetic association for case-control studies. In the first stage, we compare Hardy-Weinberg disequilibrium coefficients between cases and controls and, in the second stage, we apply the Cochran- Armitage trend test. The two analyses are statistically independent when Hardy-Weinberg equilibrium holds in the population, so all the samples are used in both stages. The significance level in the first stage is adaptively determined based on its conditional power. Given the level in the first stage, the level for the second stage analysis is determined with the overall Type I error being asymptotically controlled. For finite sample sizes, a parametric bootstrap method is used to control the overall Type I error rate. This two-stage analysis is often more powerful than the Cochran-Armitage trend test alone for a large association study. The new approach is applied to SNPs from a real study. 相似文献
14.
Dudbridge F 《Human heredity》2008,66(2):87-98
Missing data occur in genetic association studies for several reasons including missing family members and uncertain haplotype phase. Maximum likelihood is a commonly used approach to accommodate missing data, but it can be difficult to apply to family-based association studies, because of possible loss of robustness to confounding by population stratification. Here a novel likelihood for nuclear families is proposed, in which distinct sets of association parameters are used to model the parental genotypes and the offspring genotypes. This approach is robust to population structure when the data are complete, and has only minor loss of robustness when there are missing data. It also allows a novel conditioning step that gives valid analysis for multiple offspring in the presence of linkage. Unrelated subjects are included by regarding them as the children of two missing parents. Simulations and theory indicate similar operating characteristics to TRANSMIT, but with no bias with missing data in the presence of linkage. In comparison with FBAT and PCPH, the proposed model is slightly less robust to population structure but has greater power to detect strong effects. In comparison to APL and MITDT, the model is more robust to stratification and can accommodate sibships of any size. The methods are implemented for binary and continuous traits in software, UNPHASED, available from the author. 相似文献
15.
A H Brown 《Biometrics》1975,31(1):145-160
Procedures for estimating the genetic parameters of plant populations frequently employ progeny testing to ascertain the genotype of maternal plants. However, when experimental resources are limited (e.g., electrophoretic markers), the large progeny sizes required for accurate typing severely restricts the numbers of families which can be tested. In this paper, four experimental designs with partial progeny testing are compared with the standard procedure of complete testing for their statistical efficiency in estimating the gene frequency, fixation index, and outcrossing rate at a single diallelic locus. It is shown that substantial increases in efficiency can be obtained (especially in inbred populations) if one or two individuals per family are assayed, and then further progeny testing is confined to those families which give rise to a heterozygote in this initial screening. Sample size for various purposes are computed and factors affecting the applicability of such "censored" designs are discussed. 相似文献
16.
For assessment of genetic association between single-nucleotide polymorphisms (SNPs) and disease status, the logistic-regression model or generalized linear model is typically employed. However, testing for deviation from Hardy-Weinberg proportion in a patient group could be another approach for genetic-association studies. The Hardy-Weinberg proportion is one of the most important principles in population genetics. Deviation from Hardy-Weinberg proportion among cases (patients) could provide additional evidence for the association between SNPs and diseases. To develop a more powerful statistical test for genetic-association studies, we combined evidence about deviation from Hardy-Weinberg proportion in case subjects and standard regression approaches that use case and control subjects. In this paper, we propose two approaches for combining such information: the mean-based tail-strength measure and the median-based tail-strength measure. These measures integrate logistic regression and Hardy-Weinberg-proportion tests for the study of the association between a binary disease outcome and an SNP on the basis of case- and control-subject data. For both mean-based and median-based tail-strength measures, we derived exact formulas to compute p values. We also developed an approach for obtaining empirical p values with the use of a resampling procedure. Results from simulation studies and real-disease studies demonstrate that the proposed approach is more powerful than the traditional logistic-regression model. The type I error probabilities of our approach were also well controlled. 相似文献
17.
Complex diseases, by definition, involve multiple factors, including gene-gene interactions and gene-environment interactions. Researchers commonly rely on simulated data to evaluate their approaches for detecting high-order interactions in disease gene mapping. A publicly available simulation program to generate samples involving complex genetic and environmental interactions is of great interest to the community. We have developed a software package named gs1.0, which has been widely used since its publication. In this article, we present an upgraded version gs2.0, which not only inherits its capacity to generate realistic genotype data but also provides great functionality and flexibility to simulate various interaction models. In addition to a standalone version, a user-friendly web server (http://cbc.case.edu/gs) has been set up to help users to build complex interaction models. Furthermore, by utilizing three three-locus models as an example, we have shown how realistic model parameters can be chosen in generating simulated data. 相似文献
18.
Population-based tests of association have used data from either case-control studies or studies based on trios (affected child and parents). Case-control studies are more prone to false-positive results caused by inappropriate controls, which can occur if, for example, there is population admixture or stratification. An advantage of family-based tests is that cases and controls are well matched, but parental data may not always be available, especially for late-onset diseases. Three recent family-based tests of association and linkage utilize unaffected siblings as surrogates for untyped parents. In this paper, we propose an extension of one of these tests. We describe and compare the four tests in the context of a complex disease for both biallelic and multiallelic markers, as well as for sibships of different sizes. We also examine the consequences of having some parental data in the sample. 相似文献
19.
Garner C 《Human heredity》2006,61(1):22-26
BACKGROUND: The optimal control sample would be ethnically-matched and at minimal risk of developing the disease. Alternatively, one could collect random individuals from the population or select individuals to reduce the number of at-risk individuals in the sample. The effect of randomly selected individuals in a control sample on the statistical power and the odds ratio estimate was investigated. METHODS: Case and control genotype distributions were simulated using standard genetic models with an additional term representing the proportion of unidentified cases in the control sample. Power and odds ratio were calculated from the genotype distributions generated under different sampling scenarios using established methods. RESULTS: Random sampling of controls resulted in a loss in power and a reduction in the odds ratio estimate to a degree that is determined by the proportion of random sampling and the prevalence of the disease. Random sampling resulted in a 19% loss in power for a disease having prevalence of 0.20, compared to a control sample that contained no at-risk individuals. Having random controls results in a decrease in the odds ratio estimate. CONCLUSIONS: Investigators planning case-control genetic association studies should be aware of the statistical costs of different ascertainment approaches. 相似文献
20.