首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Genetic case-control association studies often include data on clinical covariates, such as body mass index (BMI), smoking status, or age, that may modify the underlying genetic risk of case or control samples. For example, in type 2 diabetes, odds ratios for established variants estimated from low–BMI cases are larger than those estimated from high–BMI cases. An unanswered question is how to use this information to maximize statistical power in case-control studies that ascertain individuals on the basis of phenotype (case-control ascertainment) or phenotype and clinical covariates (case-control-covariate ascertainment). While current approaches improve power in studies with random ascertainment, they often lose power under case-control ascertainment and fail to capture available power increases under case-control-covariate ascertainment. We show that an informed conditioning approach, based on the liability threshold model with parameters informed by external epidemiological information, fully accounts for disease prevalence and non-random ascertainment of phenotype as well as covariates and provides a substantial increase in power while maintaining a properly controlled false-positive rate. Our method outperforms standard case-control association tests with or without covariates, tests of gene x covariate interaction, and previously proposed tests for dealing with covariates in ascertained data, with especially large improvements in the case of case-control-covariate ascertainment. We investigate empirical case-control studies of type 2 diabetes, prostate cancer, lung cancer, breast cancer, rheumatoid arthritis, age-related macular degeneration, and end-stage kidney disease over a total of 89,726 samples. In these datasets, informed conditioning outperforms logistic regression for 115 of the 157 known associated variants investigated (P-value = 1×10−9). The improvement varied across diseases with a 16% median increase in χ2 test statistics and a commensurate increase in power. This suggests that applying our method to existing and future association studies of these diseases may identify novel disease loci.  相似文献   

2.
Polanski A  Kimmel M 《Genetics》2003,165(1):427-436
We present new methodology for calculating sampling distributions of single-nucleotide polymorphism (SNP) frequencies in populations with time-varying size. Our approach is based on deriving analytical expressions for frequencies of SNPs. Analytical expressions allow for computations that are faster and more accurate than Monte Carlo simulations. In contrast to other articles showing analytical formulas for frequencies of SNPs, we derive expressions that contain coefficients that do not explode when the genealogy size increases. We also provide analytical formulas to describe the way in which the ascertainment procedure modifies SNP distributions. Using our methods, we study the power to test the hypothesis of exponential population expansion vs. the hypothesis of evolution with constant population size. We also analyze some of the available SNP data and we compare our results of demographic parameters estimation to those obtained in previous studies in population genetics. The analyzed data seem consistent with the hypothesis of past population growth of modern humans. The analysis of the data also shows a very strong sensitivity of estimated demographic parameters to changes of the model of the ascertainment procedure.  相似文献   

3.
Genomewide association studies are now a widely used approach in the search for loci that affect complex traits. After detection of significant association, estimates of penetrance and allele-frequency parameters for the associated variant indicate the importance of that variant and facilitate the planning of replication studies. However, when these estimates are based on the original data used to detect the variant, the results are affected by an ascertainment bias known as the "winner's curse." The actual genetic effect is typically smaller than its estimate. This overestimation of the genetic effect may cause replication studies to fail because the necessary sample size is underestimated. Here, we present an approach that corrects for the ascertainment bias and generates an estimate of the frequency of a variant and its penetrance parameters. The method produces a point estimate and confidence region for the parameter estimates. We study the performance of this method using simulated data sets and show that it is possible to greatly reduce the bias in the parameter estimates, even when the original association study had low power. The uncertainty of the estimate decreases with increasing sample size, independent of the power of the original test for association. Finally, we show that application of the method to case-control data can improve the design of replication studies considerably.  相似文献   

4.
Despite increased interest in applying single nucleotide polymorphism (SNP) data to questions in natural systems, one unresolved issue is to what extent the ascertainment bias induced during the SNP discovery phase will impact available analysis methods. Although most studies addressing ascertainment bias have focused on human populations, it is not clear whether existing methods will work when applied to other species with more complex demographic histories and more significant levels of population structure. Here we present findings from an empirical approach to exploring the effect of population structure on issues of ascertainment bias in the Eastern Fence Lizard, Sceloporus undulatus. We find that frequency spectra and summary statistics were highly sensitive to SNP discovery strategy, necessitating careful selection of the initial ascertainment panel. Randomly selected ascertainment panels performed equally well as ascertainment panels chosen to jointly sample geographic, phenotypic, and genetic diversity. Geographically restricted panels resulted in larger biases. Additionally, we found existing ascertainment bias correction methods, which were not developed for geographically structured data sets, were largely effective at reducing the impact of ascertainment bias. Because bias correction methods performed well even when underlying assumptions were violated, our results suggest tools are currently available to analyze SNP data in structured populations.  相似文献   

5.
Feng R  Zhang H 《Human genetics》2006,119(4):429-435
Most genetic studies recruit high risk families and the discoveries are based on non-random selected groups. We must consider the consequences of this ascertainment process in order to apply the results of genetic research to the general population. In previous reports, we developed a latent variable model to assess the familial aggregation and inheritability of ordinal-scaled diseases, and found a major gene component of alcoholism after applying the model to the data from the Yale family study of comorbidity of alcoholism and anxiety (YFSCAA). In this report, we examine the ascertainment effects on parameter estimates and correct potential bias in the latent variable model. The simulation studies for various ascertainment schemes suggest that our ascertainment adjustment is necessary and effective. We also find that the estimated effects are relatively unbiased for the particular ascertainment scheme used in the YFSCAA, which assures the validity of our earlier conclusion.  相似文献   

6.
Causal mediation analyses with rank preserving models   总被引:2,自引:0,他引:2  
We present a linear rank preserving model (RPM) approach for analyzing mediation of a randomized baseline intervention's effect on a univariate follow-up outcome. Unlike standard mediation analyses, our approach does not assume that the mediating factor is also randomly assigned to individuals in addition to the randomized baseline intervention (i.e., sequential ignorability), but does make several structural interaction assumptions that currently are untestable. The G-estimation procedure for the proposed RPM represents an extension of the work on direct effects of randomized intervention effects for survival outcomes by Robins and Greenland (1994, Journal of the American Statistical Association 89, 737-749) and on intervention non-adherence by Ten Have et al. (2004, Journal of the American Statistical Association 99, 8-16). Simulations show good estimation and confidence interval performance by the proposed RPM approach under unmeasured confounding relative to the standard mediation approach, but poor performance under departures from the structural interaction assumptions. The trade-off between these assumptions is evaluated in the context of two suicide/depression intervention studies.  相似文献   

7.
Thornton KR  Jensen JD 《Genetics》2007,175(2):737-750
Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genome scan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection. We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps.  相似文献   

8.
The association of a candidate gene with disease can be efficiently evaluated by a case-control study in which allele frequencies are compared for diseased cases and unaffected controls. However, when the distribution of genotypes in the population deviates from Hardy-Weinberg proportions, the frequency of genotypes--rather than alleles--should be compared by the Armitage test for trend. We present formulas for power and sample size for studies that use Armitage's trend test. The formulas make no assumptions about Hardy-Weinberg equilibrium, but do assume random ascertainment of cases and controls, all of whom are independent of one another. We demonstrate the accuracy of the formulas by simulations.  相似文献   

9.
The Cannings and Thompson approach to correcting for ascertainment bias in family studies is extended to settings with multiple ascertainment. The extension is based on maximizing a pseudolikelihood. Two approaches to computing standard errors for the maximum pseudolikelihood estimate are described. One is especially simple to compute, while the other is more generally applicable. Simulation experiments suggest that the standard-error computations can be quite accurate.  相似文献   

10.
We propose a likelihood ratio test to assess that sampling has been completed in closed population size estimation studies. More precisely, we assess if the expected number of subjects that have never been sampled is below a user-specified threshold. The likelihood ratio test statistic has a nonstandard distribution under the null hypothesis. Critical values can be easily approximated and tabulated, and they do not depend on model specification. We illustrate in a simulation study and three real data examples, one of which involves ascertainment bias of amyotrophic lateral sclerosis in Gulf War veterans.  相似文献   

11.
Species-specific differences in microsatellite locus length and ascertainment bias have both been proposed to explain differences in microsatellite variability and length usually observed when loci isolated in one species are used to survey variation in a related species. Here we provide a simple algebraic approach to independently estimate the contributions of true species-specific length differences and ascertainment bias. We apply this approach to a reciprocal-isolation microsatellite study and show contributions of both ascertainment bias and a true longer average microsatellite length in Drosophila melanogaster compared with D. simulans.  相似文献   

12.
The effect of proband designation on segregation analysis   总被引:5,自引:4,他引:1       下载免费PDF全文
In many family studies, it is often difficult to know exactly how the families were ascertained. Even if known, the circumstances under which the families came to the attention of the study may violate the assumptions of classical ascertainment bias correction. The purpose of this work was to investigate the effect on segregation analysis of violations of the assumptions of the classical ascertainment model. We simulated family data generated under a simple recessive model of inheritance. We then ascertained families under different "scenarios." These scenarios were designed to simulate actual conditions under which families come to the attention of-and then interact with-a clinic or genetic study. We show that how one designates probands, which one must do under the classical ascertainment model, can influence parameter estimation and hypothesis testing. We demonstrate that, in some cases, there may be no "correct" way to designate probands. Further, we show that interactions within the family, the conditions under which the genetic study must function, and even social influences can have a profound effect on segregation analysis. We also propose a method for dealing with the ascertainment problem that is applicable to almost any study situation.  相似文献   

13.
G-estimation of structural nested models (SNMs) plays an important role in estimating the effects of time-varying treatments with appropriate adjustment for time-dependent confounding. As SNMs for a failure time outcome, structural nested accelerated failure time models (SNAFTMs) and structural nested cumulative failure time models have been developed. The latter models are included in the class of structural nested mean models (SNMMs) and are not involved in artificial censoring, which induces several difficulties in g-estimation of SNAFTMs. Recently, restricted mean time lost (RMTL), which corresponds to the area under a distribution function up to a restriction time, is attracting attention in clinical trial communities as an appropriate summary measure of a failure time outcome. In this study, we propose another SNMM for a failure time outcome, which is called structural nested RMTL model (SNRMTLM) and describe randomized and observational g-estimation procedures that use different assumptions for the treatment mechanism in a randomized trial setting. We also provide methods to estimate marginal RMTLs under static treatment regimes using estimated SNRMTLMs. A simulation study evaluates finite-sample performances of the proposed methods compared with the conventional intention-to-treat and per-protocol analyses. We illustrate the proposed methods using data from a randomized controlled trial for cardiovascular disease with treatment changes. G-estimation of SNRMTLMs is a useful tool to estimate the effects of time-varying treatments on a failure time outcome.  相似文献   

14.
The ascertainment problem arises when families are sampled by a nonrandom process and some assumption about this sampling process must be made in order to estimate genetic parameters. Under classical ascertainment assumptions, estimation of genetic parameters cannot be separated from estimation of the parameters of the ascertainment process, so that any misspecification of the ascertainment process causes biases in estimation of the genetic parameters. Ewens and Shute proposed a resolution to this problem, involving conditioning the likelihood of the sample on the part of the data which is "relevant to ascertainment." The usefulness of this approach can only be assessed by examining the properties (in particular, bias and standard error) of the estimates which arise by using it for a wide range of parameter values and family size distributions and then comparing these biases and standard errors with those arising under classical ascertainment procedures. These comparisons are carried out in the present paper, and we also compare the proposed method with procedures which condition on, or ignore, parts of the data.  相似文献   

15.
Joffe MM  Yang WP  Feldman H 《Biometrics》2012,68(1):275-286
In principle, G-estimation is an attractive approach for dealing with confounding by variables affected by treatment. It has rarely been applied for estimation of the effects of treatment on failure-time outcomes. Part of this is due to artificial censoring, an analytic device which considers some subjects who actually were observed to fail as if they were censored. Artificial censoring leads to a lack of smoothness in the estimating function, which can pose problems in variance estimation and in optimization. It also can lead to failure to have solutions to the usual estimating functions, which then raises questions about the appropriate criteria for optimization. To improve performance of the optimization procedures, we consider approaches for reducing the amount of artificial censoring, propose the substitution of smooth for indicator functions, and propose the use of estimating functions scaled to a measure of the information in the data; we evaluate performance of these approaches using simulation. We also consider appropriate optimization criteria in the presence of information loss due to artificial censoring. We motivate and illustrate our approaches using observational data on the effect of erythropoietin on mortality among subjects on hemodialysis.  相似文献   

16.
Certain human hereditary conditions, notably those with low penetrance and those which require an environmental event such as infectious disease exposure, are difficult to localize in pedigree analysis, because of uncertainty in the phenotype of an affected patient's relatives. An approach to locating these genes in human cohort studies would be to use association analysis, which depends on linkage disequilibrium of flanking polymorphic DNA markers. In theory, a high degree of linkage disequilibrium between genes separated by 10-20 cM will be generated and persist in populations that have a history of recent (3-20 generations ago) admixture between genetically differentiated racial groups, such as has occurred in African Americans and Hispanic populations. We have conducted analytic and computer simulations to quantify the effect of genetic, genomic, and population parameters that affect the amount and ascertainment of linkage disequilibrium in populations with a history of genetic admixture. Our goal is to thoroughly explore the ranges of all relevant parameters or factors (e.g., sample size and degree of genetic differentiation between populations) that may be involved in gene localization studies, in hopes of prescribing guidelines for an efficient mapping strategy. The results provide reasonable limits on sample size (200-300 patients), marker number (200-300 in 20-cM intervals), and allele differentiation (loci with allele frequency difference of > or = .3 between admixed parent populations) to produce an efficient approach (> 95% ascertainment) for locating genes not easily tracked in human pedigrees.  相似文献   

17.
For genome-wide association studies in family-based designs, a new, universally applicable approach is proposed. Using a modified Liptak’s method, we combine the p-value of the family-based association test (FBAT) statistic with the p-value for the Van Steen-statistic. The Van Steen-statistic is independent of the FBAT-statistic and utilizes information that is ignored by traditional FBAT-approaches. The new test statistic takes advantages of all available information about the genetic association, while, by virtue of its design, it achieves complete robustness against confounding due to population stratification. The approach is suitable for the analysis of almost any trait type for which FBATs are available, e.g. binary, continuous, time-to-onset, multivariate, etc. The efficiency and the validity of the new approach depend on the specification of a nuisance/tuning parameter and the weight parameters in the modified Liptak’s method. For different trait types and ascertainment conditions, we discuss general guidelines for the optimal specification of the tuning parameter and the weight parameters. Our simulation experiments and an application to an Alzheimer study show the validity and the efficiency of the new method, which achieves power levels that are comparable to those of population-based approaches.  相似文献   

18.
McNemar's test is popular for assessing the difference between proportions when two observations are taken on each experimental unit. It is useful under a variety of epidemiological study designs that produce correlated binary outcomes. In studies involving outcome ascertainment, cost or feasibility concerns often lead researchers to employ error-prone surrogate diagnostic tests. Assuming an available gold standard diagnostic method, we address point and confidence interval estimation of the true difference in proportions and the paired-data odds ratio by incorporating external or internal validation data. We distinguish two special cases, depending on whether it is reasonable to assume that the diagnostic test properties remain the same for both assessments (e.g., at baseline and at follow-up). Likelihood-based analysis yields closed-form estimates when validation data are external and requires numeric optimization when they are internal. The latter approach offers important advantages in terms of robustness and efficient odds ratio estimation. We consider internal validation study designs geared toward optimizing efficiency given a fixed cost allocated for measurements. Two motivating examples are presented, using gold standard and surrogate bivariate binary diagnoses of bacterial vaginosis (BV) on women participating in the HIV Epidemiology Research Study (HERS).  相似文献   

19.
The increasing use of single nucleotide polymorphisms (SNPs) in studies of nonmodel organisms accentuates the need to evaluate the influence of ascertainment bias on accurate ecological or evolutionary inference. Using a panel of 1641 expressed sequence tag-derived SNPs developed for northwest Atlantic cod (Gadus morhua), we examined the influence of ascertainment bias and its potential impact on assignment of individuals to populations ranging widely in origin. We hypothesized that reductions in assignment success would be associated with lower diversity in geographical regions outside the location of ascertainment. Individuals were genotyped from 13 locations spanning much of the contemporary range of Atlantic cod. Diversity, measured as average sample heterozygosity and number of polymorphic loci, declined (c. 30%) from the western (H(e) = 0.36) to eastern (H(e) = 0.25) Atlantic, consistent with a signal of ascertainment bias. Assignment success was examined separately for pools of loci representing differing degrees of reductions in diversity. SNPs displaying the largest declines in diversity produced the most accurate assignment in the ascertainment region (c. 83%) and the lowest levels of correct assignment outside the ascertainment region (c. 31%). Interestingly, several isolated locations showed no effect of assignment bias and consistently displayed 100% correct assignment. Contrary to expectations, estimates of accurate assignment range-wide using all loci displayed remarkable similarity despite reductions in diversity. Our results support the use of large SNP panels in assignment studies of high geneflow marine species. However, our evidence of significant reductions in assignment success using some pools of loci suggests that ascertainment bias may influence assignment results and should be evaluated in large-scale assignment studies.  相似文献   

20.
判定直系同源关系的进化分析方法   总被引:1,自引:0,他引:1  
如何正确判定基因之间的直系同源 (ortholog)和旁系同源 (paralog)关系 ,仍是基因组功能诠释和比较基因组学中有待更好解决的关键问题。在以前的工作中 ,曾用进化分析方法解决多基因家族的直系 /旁系同源关系的判定问题 ,现进而完整地展开判定直系同源关系的进化分析方法。从 44个同源蛋白质家族的案例观察表明 ,与流行的COG方法 (直系同源蛋白质的聚类 )比较 ,本方法能一般的判定直系同源关系以及能准确的诠释基因组的分子功能  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号