首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The ascertainment problem arises when families are sampled by a nonrandom process and some assumption about this sampling process must be made in order to estimate genetic parameters. Under classical ascertainment assumptions, estimation of genetic parameters cannot be separated from estimation of the parameters of the ascertainment process, so that any misspecification of the ascertainment process causes biases in estimation of the genetic parameters. Ewens and Shute proposed a resolution to this problem, involving conditioning the likelihood of the sample on the part of the data which is "relevant to ascertainment." The usefulness of this approach can only be assessed by examining the properties (in particular, bias and standard error) of the estimates which arise by using it for a wide range of parameter values and family size distributions and then comparing these biases and standard errors with those arising under classical ascertainment procedures. These comparisons are carried out in the present paper, and we also compare the proposed method with procedures which condition on, or ignore, parts of the data.  相似文献   

2.
Cannings and Thompson suggested conditioning on the phenotypes of the probands to correct for ascertainment in the analysis of pedigree data. The method assumes single ascertainment and can be expected to yield asymptotically biased parameter estimates except in this specific case. However, because the method is easy to apply, we investigated the degree of bias in the more typical situation of multiple ascertainment, in the hope that the bias might be small and that the method could be applied more generally. To explore the utility of conditioning on probands to correct for multiple ascertainment, we calculated the asymptotic value of the segregation ratio for two versions of the simple Mendelian segregation model on sibship data. For both versions, we found that this asymptotic value decreased approximately linearly as the ascertainment probability increased. When ascertainment was complete, the segregation-ratio estimates were zero, not just asymptotically but for finite sample size as well. In some cases, conditioning on probands actually resulted in greater parameter bias than no ascertainment correction at all. These results hold for a variety of sibship-size distributions, several modes of inheritance, and a wide range of population prevalences of affected individuals.  相似文献   

3.
Ascertainment-adjusted parameter estimates from a genetic analysis are typically assumed to reflect the parameter values in the original population from which the ascertained data were collected. Burton et al. (2000) recently showed that, given unmodeled parameter heterogeneity, the standard ascertainment adjustment leads to biased parameter estimates of the population-based values. This finding has important implications in complex genetic studies, because of the potential existence of unmodeled genetic parameter heterogeneity. The authors further stated the important point that, given unmodeled heterogeneity, the ascertainment-adjusted parameter estimates reflect the true parameter values in the ascertained subpopulation. They illustrated these statements with two examples. By revisiting these examples, we demonstrate that if the ascertainment scheme and the nature of the data can be correctly modeled, then an ascertainment-adjusted analysis returns population-based parameter estimates. We further demonstrate that if the ascertainment scheme and data cannot be modeled properly, then the resulting ascertainment-adjusted analysis produces parameter estimates that generally do not reflect the true values in either the original population or the ascertained subpopulation.  相似文献   

4.
Wu LY  Sun L  Bull SB 《Human heredity》2006,62(2):84-96
BACKGROUND/AIMS: In genome-wide linkage analysis of quantitative trait loci (QTL), locus-specific heritability estimates are biased when the original data are used to both localize linkage and estimate effects, due to maximization of the LOD score over the genome. Positive bias is increased by adoption of stringent significance levels to control genome-wide type I error. We propose multi-locus bootstrap resampling estimators for bias reduction in the situation in which linkage peaks at more than one QTL are of interest. METHODS: Bootstrap estimates were based on repeated sample splitting in the original dataset. We conducted simulation studies in nuclear families with 0 to 5 QTLs and applied the methods in a genome-wide analysis of a blood pressure phenotype in extended pedigrees from the Framingham Heart Study (FHS). RESULTS: Compared to na?ve estimates in the original simulation samples, bootstrap estimates had reduced bias and smaller mean squared error. In the FHS pedigrees, the bootstrap yielded heritability estimates as much as 70% smaller than in the original sample. CONCLUSIONS: Because effect estimates obtained in an initial study are typically inflated relative to those expected in an independent replication study, successful replication will be more likely when sample size requirements are based on bias-reduced estimates.  相似文献   

5.
Genome-wide association studies (GWAS) provide an important approach to identifying common genetic variants that predispose to human disease. A typical GWAS may genotype hundreds of thousands of single nucleotide polymorphisms (SNPs) located throughout the human genome in a set of cases and controls. Logistic regression is often used to test for association between a SNP genotype and case versus control status, with corresponding odds ratios (ORs) typically reported only for those SNPs meeting selection criteria. However, when these estimates are based on the original data used to detect the variant, the results are affected by a selection bias sometimes referred to the "winner's curse" (Capen and others, 1971). The actual genetic association is typically overestimated. We show that such selection bias may be severe in the sense that the conditional expectation of the standard OR estimator may be quite far away from the underlying parameter. Also standard confidence intervals (CIs) may have far from the desired coverage rate for the selected ORs. We propose and evaluate 3 bias-reduced estimators, and also corresponding weighted estimators that combine corrected and uncorrected estimators, to reduce selection bias. Their corresponding CIs are also proposed. We study the performance of these estimators using simulated data sets and show that they reduce the bias and give CI coverage close to the desired level under various scenarios, even for associations having only small statistical power.  相似文献   

6.
We present a new method for simulating samples of marker haplotypes, genotypes, or diplotypes in case-control studies in which the markers are linked to a disease locus in any specified region of the genome. The method allows realistic features to be incorporated into the simulations, including selection acting on disease alleles, sample ascertainment of disease chromosomes and polymorphic markers, a genetic dominance model of disease expression that allows incomplete penetrance and phenocopies, and an accurate genetic map of recombination rates and hotspots for recombination in the human genome (or, alternatively, an improved method for simulating the distribution of hotspots). The new method uses an approach that combines simulation of the coalescent process for the sampled chromosomes with a diffusion process used to model the evolution of the disease-mutation frequency over time. Examples illustrate how the method may be used to study the expected power of a marker-disease association study.  相似文献   

7.
Next-generation sequencing has led to many complex-trait rare-variant (RV) association studies. Although single-variant association analysis can be performed, it is grossly underpowered. Therefore, researchers have developed many RV association tests that aggregate multiple variant sites across a genetic region (e.g., gene), and test for the association between the trait and the aggregated genotype. After these aggregate tests detect an association, it is only possible to estimate the average genetic effect for a group of RVs. As a result of the "winner’s curse," such an estimate can be biased. Although for common variants one can obtain unbiased estimates of genetic parameters by analyzing a replication sample, for RVs it is desirable to obtain unbiased genetic estimates for the study where the association is identified. This is because there can be substantial heterogeneity of RV sites and frequencies even among closely related populations. In order to obtain an unbiased estimate for aggregated RV analysis, we developed bootstrap-sample-split algorithms to reduce the bias of the winner’s curse. The unbiased estimates are greatly important for understanding the population-specific contribution of RVs to the heritability of complex traits. We also demonstrate both theoretically and via simulations that for aggregate RV analysis the genetic variance for a gene or region will always be underestimated, sometimes substantially, because of the presence of noncausal variants or because of the presence of causal variants with effects of different magnitudes or directions. Therefore, even if RVs play a major role in the complex-trait etiologies, a portion of the heritability will remain missing, and the contribution of RVs to the complex-trait etiologies will be underestimated.  相似文献   

8.
We tested the power of a segregation analysis method (first proposed by Elandt-Johnson) to distinguish between single-locus and two-locus models, with and without environmentally caused reduced penetrance. We also looked at the effect of ascertainment probability on the analysis and at the proband-conditioned ascertainment correction proposed by Cannings and Thompson. We found that: (1) the segregation analysis has sufficient power to distinguish between the fully-penetrant double-recessive (RR) model and the fully-penetrant single-locus dominant and recessive models; (2) the method can also distinguish fairly well between the dominant-recessive (DR) and RR models, even when one does not take into account the population prevalence; (3) the method has much less power to distinguish between the fully-penetrant RR model and the single-locus models with reduced penetrance; (4) when environmental penetrance is taken account of in the analysis, the power of the method to distinguish between the one- and two-locus models improved substantially; (5) the estimates of ascertainment probability, pi, were robust, regardless of the model under which the data were generated; and (6) the Cannings-Thompson approach to ascertainment correction worked well only when the pi used to generate the data was less than .1.  相似文献   

9.
In population-based case-control association studies, the regular chi (2) test is often used to investigate association between a candidate locus and disease. However, it is well known that this test may be biased in the presence of population stratification and/or genotyping error. Unlike some other biases, this bias will not go away with increasing sample size. On the contrary, the false-positive rate will be much larger when the sample size is increased. The usual family-based designs are robust against population stratification, but they are sensitive to genotype error. In this article, we propose a novel method of simultaneously correcting for the bias arising from population stratification and/or for the genotyping error in case-control studies. The appropriate corrections depend on sample odds ratios of the standard 2x3 tables of genotype by case and control from null loci. Therefore, the test is simple to apply. The corrected test is robust against misspecification of the genetic model. If the null hypothesis of no association is rejected, the corrections can be further used to estimate the effect of the genetic factor. We considered a simulation study to investigate the performance of the new method, using parameter values similar to those found in real-data examples. The results show that the corrected test approximately maintains the expected type I error rate under various simulation conditions. It also improves the power of the association test in the presence of population stratification and/or genotyping error. The discrepancy in power between the tests with correction and those without correction tends to be more extreme as the magnitude of the bias becomes larger. Therefore, the bias-correction method proposed in this article should be useful for the genetic analysis of complex traits.  相似文献   

10.
The detrimental effects of the winner’s curse, including overestimation of the genetic effects of associated variants and underestimation of sufficient sample sizes for replication studies are well-recognized in genome-wide association studies (GWAS). These effects can be expected to worsen as the field moves from GWAS into whole genome sequencing. To date, few studies have reported statistical adjustments to the naive estimates, due to the lack of suitable statistical methods and computational tools. We have developed an efficient genome-wide non-parametric method that explicitly accounts for the threshold, ranking, and allele frequency effects in whole genome scans. Here, we implement the method to provide bias-reduced estimates via bootstrap re-sampling (BR-squared) for association studies of both disease status and quantitative traits, and we report the results of applying BR-squared to GWAS of psoriasis and HbA1c. We observed over 50% reduction in the genetic effect size estimation for many associated SNPs. This translates into a greater than fourfold increase in sample size requirements for successful replication studies, which in part explains some of the apparent failures in replicating the original signals. Our analysis suggests that adjusting for the winner’s curse is critical for interpreting findings from whole genome scans and planning replication and meta-GWAS studies, as well as in attempts to translate findings into the clinical setting.  相似文献   

11.
Feng R  Zhang H 《Human genetics》2006,119(4):429-435
Most genetic studies recruit high risk families and the discoveries are based on non-random selected groups. We must consider the consequences of this ascertainment process in order to apply the results of genetic research to the general population. In previous reports, we developed a latent variable model to assess the familial aggregation and inheritability of ordinal-scaled diseases, and found a major gene component of alcoholism after applying the model to the data from the Yale family study of comorbidity of alcoholism and anxiety (YFSCAA). In this report, we examine the ascertainment effects on parameter estimates and correct potential bias in the latent variable model. The simulation studies for various ascertainment schemes suggest that our ascertainment adjustment is necessary and effective. We also find that the estimated effects are relatively unbiased for the particular ascertainment scheme used in the YFSCAA, which assures the validity of our earlier conclusion.  相似文献   

12.
A mediation model explores the direct and indirect effects between an independent variable and a dependent variable by including other variables (or mediators). Mediation analysis has recently been used to dissect the direct and indirect effects of genetic variants on complex diseases using case-control studies. However, bias could arise in the estimations of the genetic variant-mediator association because the presence or absence of the mediator in the study samples is not sampled following the principles of case-control study design. In this case, the mediation analysis using data from case-control studies might lead to biased estimates of coefficients and indirect effects. In this article, we investigated a multiple-mediation model involving a three-path mediating effect through two mediators using case-control study data. We propose an approach to correct bias in coefficients and provide accurate estimates of the specific indirect effects. Our approach can also be used when the original case-control study is frequency matched on one of the mediators. We employed bootstrapping to assess the significance of indirect effects. We conducted simulation studies to investigate the performance of the proposed approach, and showed that it provides more accurate estimates of the indirect effects as well as the percent mediated than standard regressions. We then applied this approach to study the mediating effects of both smoking and chronic obstructive pulmonary disease (COPD) on the association between the CHRNA5-A3 gene locus and lung cancer risk using data from a lung cancer case-control study. The results showed that the genetic variant influences lung cancer risk indirectly through all three different pathways. The percent of genetic association mediated was 18.3% through smoking alone, 30.2% through COPD alone, and 20.6% through the path including both smoking and COPD, and the total genetic variant-lung cancer association explained by the two mediators was 69.1%.  相似文献   

13.
Accurate estimates of the penetrance rate of autosomal dominant conditions are important, among other issues, for optimizing recurrence risks in genetic counseling. The present work on penetrance rate estimation from pedigree data considers the following situations: 1) estimation of the penetrance rate K (brief review of the method); 2) construction of exact credible intervals for K estimates; 3) specificity and heterogeneity issues; 4) penetrance rate estimates obtained through molecular testing of families; 5) lack of information about the phenotype of the pedigree generator; 6) genealogies containing grouped parent-offspring information; 7) ascertainment issues responsible for the inflation of K estimates.  相似文献   

14.
A method of historical inference that accounts for ascertainment bias is developed and applied to single-nucleotide polymorphism (SNP) data in humans. The data consist of 84 short fragments of the genome that were selected, from three recent SNP surveys, to contain at least two polymorphisms in their respective ascertainment samples and that were then fully resequenced in 47 globally distributed individuals. Ascertainment bias is the deviation, from what would be observed in a random sample, caused either by discovery of polymorphisms in small samples or by locus selection based on levels or patterns of polymorphism. The three SNP surveys from which the present data were derived differ both in their protocols for ascertainment and in the size of the samples used for discovery. We implemented a Monte Carlo maximum-likelihood method to fit a subdivided-population model that includes a possible change in effective size at some time in the past. Incorrectly assuming that ascertainment bias does not exist causes errors in inference, affecting both estimates of migration rates and historical changes in size. Migration rates are overestimated when ascertainment bias is ignored. However, the direction of error in inferences about changes in effective population size (whether the population is inferred to be shrinking or growing) depends on whether either the numbers of SNPs per fragment or the SNP-allele frequencies are analyzed. We use the abbreviation "SDL," for "SNP-discovered locus," in recognition of the genomic-discovery context of SNPs. When ascertainment bias is modeled fully, both the number of SNPs per SDL and their allele frequencies support a scenario of growth in effective size in the context of a subdivided population. If subdivision is ignored, however, the hypothesis of constant effective population size cannot be rejected. An important conclusion of this work is that, in demographic or other studies, SNP data are useful only to the extent that their ascertainment can be modeled.  相似文献   

15.
Inferring the parentage of a sample of individuals is often a prerequisite for many types of analysis in molecular ecology, evolutionary biology and quantitative genetics. In all but a few cases, the method of parentage assignment is divorced from the methods used to estimate the parameters of primary interest, such as mate choice or heritability. Here we present a Bayesian approach that simultaneously estimates the parentage of a sample of individuals and a wide range of population-level parameters in which we are interested. We show that joint estimation of parentage and population-level parameters increases the power of parentage assignment, reduces bias in parameter estimation, and accurately evaluates uncertainty in both. We illustrate the method by analysing a number of simulated test data sets, and through a re-analysis of parentage in the Seychelles warbler, Acrocephalus sechellensis. A combination of behavioural, spatial and genetic data are used in the analyses and, importantly, the method does not require strong prior information about the relationship between nongenetic data and parentage.  相似文献   

16.
Murphy A  Weiss ST  Lange C 《PLoS genetics》2008,4(9):e1000197
For genome-wide association studies in family-based designs, we propose a powerful two-stage testing strategy that can be applied in situations in which parent-offspring trio data are available and all offspring are affected with the trait or disease under study. In the first step of the testing strategy, we construct estimators of genetic effect size in the completely ascertained sample of affected offspring and their parents that are statistically independent of the family-based association/transmission disequilibrium tests (FBATs/TDTs) that are calculated in the second step of the testing strategy. For each marker, the genetic effect is estimated (without requiring an estimate of the SNP allele frequency) and the conditional power of the corresponding FBAT/TDT is computed. Based on the power estimates, a weighted Bonferroni procedure assigns an individually adjusted significance level to each SNP. In the second stage, the SNPs are tested with the FBAT/TDT statistic at the individually adjusted significance levels. Using simulation studies for scenarios with up to 1,000,000 SNPs, varying allele frequencies and genetic effect sizes, the power of the strategy is compared with standard methodology (e.g., FBATs/TDTs with Bonferroni correction). In all considered situations, the proposed testing strategy demonstrates substantial power increases over the standard approach, even when the true genetic model is unknown and must be selected based on the conditional power estimates. The practical relevance of our methodology is illustrated by an application to a genome-wide association study for childhood asthma, in which we detect two markers meeting genome-wide significance that would not have been detected using standard methodology.  相似文献   

17.
Certain human hereditary conditions, notably those with low penetrance and those which require an environmental event such as infectious disease exposure, are difficult to localize in pedigree analysis, because of uncertainty in the phenotype of an affected patient's relatives. An approach to locating these genes in human cohort studies would be to use association analysis, which depends on linkage disequilibrium of flanking polymorphic DNA markers. In theory, a high degree of linkage disequilibrium between genes separated by 10-20 cM will be generated and persist in populations that have a history of recent (3-20 generations ago) admixture between genetically differentiated racial groups, such as has occurred in African Americans and Hispanic populations. We have conducted analytic and computer simulations to quantify the effect of genetic, genomic, and population parameters that affect the amount and ascertainment of linkage disequilibrium in populations with a history of genetic admixture. Our goal is to thoroughly explore the ranges of all relevant parameters or factors (e.g., sample size and degree of genetic differentiation between populations) that may be involved in gene localization studies, in hopes of prescribing guidelines for an efficient mapping strategy. The results provide reasonable limits on sample size (200-300 patients), marker number (200-300 in 20-cM intervals), and allele differentiation (loci with allele frequency difference of > or = .3 between admixed parent populations) to produce an efficient approach (> 95% ascertainment) for locating genes not easily tracked in human pedigrees.  相似文献   

18.
We describe three statistical results that we have found to be useful in case-control genetic association testing. All three involve combining the discovery of novel genetic variants, usually by sequencing, with genotyping methods that recognize previously discovered variants. We first consider expanding the list of known variants by concentrating variant-discovery in cases. Although the naive inclusion of cases-only sequencing data would create a bias, we show that some sequencing data may be retained, even if controls are not sequenced. Furthermore, for alleles of intermediate frequency, cases-only sequencing with bias-correction entails little if any loss of power, compared to dividing the same sequencing effort among cases and controls. Secondly, we investigate more strongly focused variant discovery to obtain a greater enrichment for disease-related variants. We show how case status, family history, and marker sharing enrich the discovery set by increments that are multiplicative with penetrance, enabling the preferential discovery of high-penetrance variants. A third result applies when sequencing is the primary means of counting alleles in both cases and controls, but a supplementary pooled genotyping sample is used to identify the variants that are very rare. We show that this raises no validity issues, and we evaluate a less expensive and more adaptive approach to judging rarity, based on group-specific variants. We demonstrate the important and unusual caveat that this method requires equal sample sizes for validity. These three results can be used to more efficiently detect the association of rare genetic variants with disease.  相似文献   

19.
Population genetic studies provide insights into the evolutionary processes that influence the distribution of sequence variants within and among wild populations. FST is among the most widely used measures for genetic differentiation and plays a central role in ecological and evolutionary genetic studies. It is commonly thought that large sample sizes are required in order to precisely infer FST and that small sample sizes lead to overestimation of genetic differentiation. Until recently, studies in ecological model organisms incorporated a limited number of genetic markers, but since the emergence of next generation sequencing, the panel size of genetic markers available even in non-reference organisms has rapidly increased. In this study we examine whether a large number of genetic markers can substitute for small sample sizes when estimating FST. We tested the behavior of three different estimators that infer FST and that are commonly used in population genetic studies. By simulating populations, we assessed the effects of sample size and the number of markers on the various estimates of genetic differentiation. Furthermore, we tested the effect of ascertainment bias on these estimates. We show that the population sample size can be significantly reduced (as small as n = 4–6) when using an appropriate estimator and a large number of bi-allelic genetic markers (k>1,000). Therefore, conservation genetic studies can now obtain almost the same statistical power as studies performed on model organisms using markers developed with next-generation sequencing.  相似文献   

20.
Jiang  Wei  Yu  Weichuan 《BMC genomics》2016,17(1):19-32
Background

Replication study is a commonly used verification method to filter out false positives in genome-wide association studies (GWAS). If an association can be confirmed in a replication study, it will have a high confidence to be true positive. To design a replication study, traditional approaches calculate power by treating replication study as another independent primary study. These approaches do not use the information given by primary study. Besides, they need to specify a minimum detectable effect size, which may be subjective. One may think to replace the minimum effect size with the observed effect sizes in the power calculation. However, this approach will make the designed replication study underpowered since we are only interested in the positive associations from the primary study and the problem of the “winner’s curse” will occur.

Results

An Empirical Bayes (EB) based method is proposed to estimate the power of replication study for each association. The corresponding credible interval is estimated in the proposed approach. Simulation experiments show that our method is better than other plug-in based estimators in terms of overcoming the winner’s curse and providing higher estimation accuracy. The coverage probability of given credible interval is well-calibrated in the simulation experiments. Weighted average method is used to estimate the average power of all underlying true associations. This is used to determine the sample size of replication study. Sample sizes are estimated on 6 diseases from Wellcome Trust Case Control Consortium (WTCCC) using our method. They are higher than sample sizes estimated by plugging observed effect sizes in power calculation.

Conclusions

Our new method can objectively determine replication study’s sample size by using information extracted from primary study. Also the winner’s curse is alleviated. Thus, it is a better choice when designing replication studies of GWAS. The R-package is available at: http://bioinformatics.ust.hk/RPower.html.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号