首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
It is commonly assumed that the parameter estimates of a statistical genetics model that has been adjusted for ascertainment will estimate parameters in the general population from which the ascertained subpopulation was originally drawn. We show that this is true only in certain restricted circumstances. More generally, ascertainment-adjusted parameter estimates reflect parameters in the ascertained subpopulation. In many situations, this shift in perspective is immaterial: the parameters of interest are the same in the ascertained sample and in the population from which it was drawn, and it is therefore irrelevant to which population inferences are presumed to apply. In other circumstances, however, this is not so. This has important implications, particularly for studies investigating the etiology of complex diseases.  相似文献   

2.
We revisit the usual conditional likelihood for stratum-matched case-control studies and consider three alternatives that may be more appropriate for family-based gene-characterization studies: First, the prospective likelihood, that is, Pr(D/G,A second, the retrospective likelihood, Pr(G/D); and third, the ascertainment-corrected joint likelihood, Pr(D,G/A). These likelihoods provide unbiased estimators of genetic relative risk parameters, as well as population allele frequencies and baseline risks. The parameter estimates based on the retrospective likelihood remain unbiased even when the ascertainment scheme cannot be modeled, as long as ascertainment only depends on families' phenotypes. Despite the need to estimate additional parameters, the prospective, retrospective, and joint likelihoods can lead to considerable gains in efficiency, relative to the conditional likelihood, when estimating genetic relative risk. This is true if baseline risks and allele frequencies can be assumed to be homogeneous. In the presence of heterogeneity, however, the parameter estimates assuming homogeneity can be seriously biased. We discuss the extent of this problem and present a mixed models approach for providing consistent parameter estimates when baseline risks and allele frequencies are heterogeneous. The efficiency gains of the mixed-model prospective, retrospective, and joint likelihoods relative to the efficiency of conditional likelihood are small in the situations presented here.  相似文献   

3.
On Ewens'' equivalence theorem for ascertainment sampling schemes   总被引:1,自引:1,他引:0       下载免费PDF全文
The usual likelihood formulations for segregation analysis of a genetic trait ignore both the at-risk but unobservable families and the demographic structure of the surrounding population. Families are not ascertained if, by chance, they have no affected members or if the affected members are not ascertained. Ewens has shown that likelihoods which take into explicit account both unobservable families and demographic parameters lead to the same maximum likelihood estimates of segregation and ascertainment parameters as the usual likelihoods. This paper provides an alternative proof of Ewens' theorem based on the Poisson distribution and simple continuous optimization techniques.  相似文献   

4.
Feng R  Zhang H 《Human genetics》2006,119(4):429-435
Most genetic studies recruit high risk families and the discoveries are based on non-random selected groups. We must consider the consequences of this ascertainment process in order to apply the results of genetic research to the general population. In previous reports, we developed a latent variable model to assess the familial aggregation and inheritability of ordinal-scaled diseases, and found a major gene component of alcoholism after applying the model to the data from the Yale family study of comorbidity of alcoholism and anxiety (YFSCAA). In this report, we examine the ascertainment effects on parameter estimates and correct potential bias in the latent variable model. The simulation studies for various ascertainment schemes suggest that our ascertainment adjustment is necessary and effective. We also find that the estimated effects are relatively unbiased for the particular ascertainment scheme used in the YFSCAA, which assures the validity of our earlier conclusion.  相似文献   

5.
OBJECTIVES: The Admixture test is routinely used in linkage analysis to take account of genetic heterogeneity, and yields an estimate of the proportion of families (alpha) segregating the linked disease gene. In complex disorders, the assumptions of the Admixture test are violated. We therefore explore how the estimate of alpha relates to the true proportion of linked families with a complex disorder in a population or dataset. METHODS: We simulated a two-locus heterogeneity model and varied genetic parameters, ascertainment scheme and phenocopy frequency. RESULTS: In this model, alpha is almost always overestimated, by as little as 5% to as much as 60%. The bias is largely attributable to (1). intrafamilial heterogeneity arising from ascertainment of families with many affected members or from analysis of dense pedigrees; (2). low informativeness, which occurs in the presence of reduced penetrance; and (3). differences in the evidence for linkage in linked and unlinked families. This bias is also affected by the analysis phenocopy frequency, but only if the linked locus is dominant and the unlinked locus is recessive. CONCLUSIONS: We conclude that, in complex diseases, the Admixture test has greater value in detecting linkage than in estimating the proportion of linked families in a dataset.  相似文献   

6.

Background

In genetic studies of rare complex diseases it is common to ascertain familial data from population based registries through all incident cases diagnosed during a pre-defined enrollment period. Such an ascertainment procedure is typically taken into account in the statistical analysis of the familial data by constructing either a retrospective or prospective likelihood expression, which conditions on the ascertainment event. Both of these approaches lead to a substantial loss of valuable data.

Methodology and Findings

Here we consider instead the possibilities provided by a Bayesian approach to risk analysis, which also incorporates the ascertainment procedure and reference information concerning the genetic composition of the target population to the considered statistical model. Furthermore, the proposed Bayesian hierarchical survival model does not require the considered genotype or haplotype effects be expressed as functions of corresponding allelic effects. Our modeling strategy is illustrated by a risk analysis of type 1 diabetes mellitus (T1D) in the Finnish population-based on the HLA-A, HLA-B and DRB1 human leucocyte antigen (HLA) information available for both ascertained sibships and a large number of unrelated individuals from the Finnish bone marrow donor registry. The heterozygous genotype DR3/DR4 at the DRB1 locus was associated with the lowest predictive probability of T1D free survival to the age of 15, the estimate being 0.936 (0.926; 0.945 95% credible interval) compared to the average population T1D free survival probability of 0.995.

Significance

The proposed statistical method can be modified to other population-based family data ascertained from a disease registry provided that the ascertainment process is well documented, and that external information concerning the sizes of birth cohorts and a suitable reference sample are available. We confirm the earlier findings from the same data concerning the HLA-DR3/4 related risks for T1D, and also provide here estimated predictive probabilities of disease free survival as a function of age.  相似文献   

7.
Nielsen R  Hubisz MJ  Clark AG 《Genetics》2004,168(4):2373-2382
Most of the available SNP data have eluded valid population genetic analysis because most population genetical methods do not correctly accommodate the special discovery process used to identify SNPs. Most of the available SNP data have allele frequency distributions that are biased by the ascertainment protocol. We here show how this problem can be corrected by obtaining maximum-likelihood estimates of the true allele frequency distribution. In simple cases, the ML estimate of the true allele frequency distribution can be obtained analytically, but in other cases computational methods based on numerical optimization or the EM algorithm must be used. We illustrate the new correction method by analyzing some previously published SNP data from the SNP Consortium. Appropriate treatment of SNP ascertainment is vital to our ability to make correct inferences from the data of the International HapMap Project.  相似文献   

8.
Tai JJ  Hsiao CK 《Human heredity》2001,51(4):192-198
In human genetic analysis, data are collected through the so-called 'ascertainment procedure'. Statistically this sampling scheme can be thought of as a multistage sampling method. At the first stage, one or several probands are ascertained. At the subsequent stages, a sequential sampling scheme is applied. Sampling in such a way is virtually a nonrandom procedure, which, in most cases, causes biased estimation which may be intractable. This paper focuses on the underlying causes of the intractability problem of ascertained genetic data. Three types of parameters, i.e. target, design and nuisance parameters, are defined as the essences to formulate the true likelihood of a set of data. These parameters are also classified into explicit or implicit parameters depending on whether they can be expressed explicity in the likelihood function. For ascertained genetic data, a sequential scheme is regarded as an implicit design parameter, and a true pedigree structure as an implicit nuisance parameter. The intractability problem is attributed to loss of information of any implicit parameter in likelihood formulation. Several approaches to build a likelihood for estimation of the segregation ratio when only an observed pedigree structure is available are proposed.  相似文献   

9.
Shwachman-Diamond syndrome is a rare disorder of unknown cause. Reports have indicated the occurrence of affected siblings, but formal segregation analysis has not been performed. In families collected for genetic studies, the mean paternal age and mean difference in parental ages were found to be consistent with the general population. We determined estimates of segregation proportion in a cohort of 84 patients with complete sibship data under the assumption of complete ascertainment, using the Li and Mantel estimator, and of single ascertainment with the Davie modification. A third estimate was also computed with the expectation-maximization (EM) algorithm. All three estimates supported an autosomal recessive mode of inheritance, but complete ascertainment was found to be unlikely. Although there are no overt signs of disease in adult carriers (parents), the use of serum trypsinogen levels to indicate exocrine pancreatic dysfunction was evaluated as a potential measure for heterozygote expression. No consistent differences were found in levels between parents and a normal control population. Although genetic heterogeneity cannot be excluded, our results indicate that simulation and genetic analyses of Shwachman-Diamond syndrome should consider a recessive model of inheritance.  相似文献   

10.
Genomewide association studies are now a widely used approach in the search for loci that affect complex traits. After detection of significant association, estimates of penetrance and allele-frequency parameters for the associated variant indicate the importance of that variant and facilitate the planning of replication studies. However, when these estimates are based on the original data used to detect the variant, the results are affected by an ascertainment bias known as the "winner's curse." The actual genetic effect is typically smaller than its estimate. This overestimation of the genetic effect may cause replication studies to fail because the necessary sample size is underestimated. Here, we present an approach that corrects for the ascertainment bias and generates an estimate of the frequency of a variant and its penetrance parameters. The method produces a point estimate and confidence region for the parameter estimates. We study the performance of this method using simulated data sets and show that it is possible to greatly reduce the bias in the parameter estimates, even when the original association study had low power. The uncertainty of the estimate decreases with increasing sample size, independent of the power of the original test for association. Finally, we show that application of the method to case-control data can improve the design of replication studies considerably.  相似文献   

11.
Summary Estimation of abundance is important in both open and closed population capture–recapture analysis, but unmodeled heterogeneity of capture probability leads to negative bias in abundance estimates. This article defines and develops a suite of open population capture–recapture models using finite mixtures to model heterogeneity of capture and survival probabilities. Model comparisons and parameter estimation use likelihood‐based methods. A real example is analyzed, and simulations are used to check the main features of the heterogeneous models, especially the quality of estimation of abundance, survival, recruitment, and turnover. The two major advances in this article are the provision of realistic abundance estimates that take account of heterogenetiy of capture, and an appraisal of the amount of overestimation of survival arising from conditioning on the first capture when heterogeneity of survival is present.  相似文献   

12.
The Danish Twin Registry is the oldest national twin register in the world, initiated in 1954 by ascertainment of twins born from 1870 to 1910. During a number of studies birth cohorts have been added to the register, and by the recent addition of birth cohorts from 1931 to 1952 the Registry now comprizes 127 birth cohorts of twins from 1870 to 1996, with a total of more than 65,000 twin pairs included. In all cohorts the ascertainment has been population-based and independent of the traits studied, although different procedures of ascertainment have been employed. In the oldest cohorts only twin pairs with both twins surviving to age 6 have been included while from 1931 all ascertained twins are included. The completeness of the ascertainment after adjustment for infant mortality is high, with approximately 90% ascertained up to 1968, and complete ascertainment of all liveborn twin pairs since 1968. The Danish Twin Registry is used as a source for large studies on genetic influence on aging and age-related health problems, normal variation in clinical parameters associated with the metabolic syndrome and cardiovascular diseases, and clinical studies of specific diseases. The combination of survey data with data obtained by linkage to national health related registers enables follow-up studies both of the general twin population and of twins from clinical studies.  相似文献   

13.
Single-nucleotide polymorphism (SNP) data are routinely obtained by sequencing a region of interest in a small panel, constructing a chip with probes specific to sites found to vary in the panel, and using the chip to assay subsequent samples. The size of the chip is often reduced by removing low-frequency alleles from the set of SNPs. Using coalescent estimation of the scaled population size parameter, Θ, as a test case, we demonstrate the loss of information inherent in this procedure and develop corrections for coalescent analysis of SNPs obtained via a panel. We show that more accurate Θ-estimates can be recovered if the panel size is known, but at considerable computational cost as the panel individuals must be explicitly modeled in the analysis. We extend this technique to apply to the case where rare alleles have been omitted from the SNP panel. We find that when appropriate corrections for panel ascertainment and rare-allele omission are used, the biases introduced by ascertainment are largely correctable, but recovered estimates are less accurate than would be obtained with fully sequenced data. This method is then applied to recombinant multiple population data to investigate the effects of recombination and migration on the estimate of Θ.  相似文献   

14.
The effect of proband designation on segregation analysis   总被引:5,自引:4,他引:1       下载免费PDF全文
In many family studies, it is often difficult to know exactly how the families were ascertained. Even if known, the circumstances under which the families came to the attention of the study may violate the assumptions of classical ascertainment bias correction. The purpose of this work was to investigate the effect on segregation analysis of violations of the assumptions of the classical ascertainment model. We simulated family data generated under a simple recessive model of inheritance. We then ascertained families under different "scenarios." These scenarios were designed to simulate actual conditions under which families come to the attention of-and then interact with-a clinic or genetic study. We show that how one designates probands, which one must do under the classical ascertainment model, can influence parameter estimation and hypothesis testing. We demonstrate that, in some cases, there may be no "correct" way to designate probands. Further, we show that interactions within the family, the conditions under which the genetic study must function, and even social influences can have a profound effect on segregation analysis. We also propose a method for dealing with the ascertainment problem that is applicable to almost any study situation.  相似文献   

15.
The pooling robustness property of distance sampling results in unbiased abundance estimation even when sources of variation in detection probability are not modeled. However, this property cannot be relied upon to produce unbiased subpopulation abundance estimates when using a single pooled detection function that ignores subpopulations. We investigate by simulation the effect of differences in subpopulation detectability upon bias in subpopulation abundance estimates. We contrast subpopulation abundance estimates using a pooled detection function with estimates derived using a detection function model employing a subpopulation covariate. Using point transect survey data from a multispecies songbird study, species-specific abundance estimates are compared using pooled detection functions with and without a small number of adjustment terms, and a detection function with species as a covariate. With simulation, we demonstrate the bias of subpopulation abundance estimates when a pooled detection function is employed. The magnitude of the bias is positively related to the magnitude of disparity between the subpopulation detection functions. However, the abundance estimate for the entire population remains unbiased except when there is extreme heterogeneity in detection functions. Inclusion of a detection function model with a subpopulation covariate essentially removes the bias of the subpopulation abundance estimates. The analysis of the songbird point count surveys shows some bias in species-specific abundance estimates when a pooled detection function is used. Pooling robustness is a unique property of distance sampling, producing unbiased abundance estimates at the level of the study area even in the presence of large differences in detectability between subpopulations. In situations where subpopulation abundance estimates are required for data-poor subpopulations and where the subpopulations can be identified, we recommend the use of subpopulation as a covariate to reduce bias induced in subpopulation abundance estimates.  相似文献   

16.
A resolution of the ascertainment sampling problem. III. Pedigrees.   总被引:4,自引:3,他引:1       下载免费PDF全文
When nuclear families are sampled by an ascertainment procedure whose properties are not known, biased estimates of genetic parameters will arise if an incorrect specification of the ascertainment procedure is made. Elsewhere we have put forward a resolution of this problem by introducing an ascertainment-assumption-free (AAF) method, for nuclear family data, which gives asymptotically unbiased estimators no matter what the true nature of the ascertainment process. In the present paper we extend this method to cover pedigree data. Problems that arise with pedigrees but not with families--for example, the question of which families in a pedigree are "ascertainable"--are also considered. Comparisons of numerical results for pedigrees and nuclear families are also made.  相似文献   

17.
Aspects of parameter estimation in ascertainment sampling schemes.   总被引:6,自引:6,他引:0       下载免费PDF全文
It has recently been suggested that ascertainment sampling estimation procedures commonly used are not fully efficient in that the number of unobserved families is an unknown parameter that should be estimated (contrary to common practice) along with the genetic parameters for fully efficient estimation. It has also been suggested that the frequency distribution of family size contains unknown parameters that should similarly be estimated with the genetic parameters. These two suggestions are considered in this paper. It is shown by means of an equivalence theorem that in both cases the estimates and their variances obtained by adopting the suggested procedure are identical with those found by ignoring the unobserved families and by ignoring the family-size distribution. This demonstration leads to a formal justification of further procedures, in particular: (1) use of "method-of-moments" estimators, (2) ignoring the ascertainment scheme in some cases when estimating parameters, and (3) forming estimates of parameters when various parts of the data are obtained by different ascertainment schemes.  相似文献   

18.
K. R. Koots  J. P. Gibson 《Genetics》1996,143(3):1409-1416
A data set of 1572 heritability estimates and 1015 pairs of genetic and phenotypic correlation estimates, constructed from a survey of published beef cattle genetic parameter estimates, provided a rare opportunity to study realized sampling variances of genetic parameter estimates. The distribution of both heritability estimates and genetic correlation estimates, when plotted against estimated accuracy, was consistent with random error variance being some three times the sampling variance predicted from standard formulae. This result was consistent with the observation that the variance of estimates of heritabilities and genetic correlations between populations were about four times the predicted sampling variance, suggesting few real differences in genetic parameters between populations. Except where there was a strong biological or statistical expectation of a difference, there was little evidence for differences between genetic and phenotypic correlations for most trait combinations or for differences in genetic correlations between populations. These results suggest that, even for controlled populations, estimating genetic parameters specific to a given population is less useful than commonly believed. A serendipitous discovery was that, in the standard formula for theoretical standard error of a genetic correlation estimate, the heritabilities refer to the estimated values and not, as seems generally assumed, the true population values.  相似文献   

19.
An extensive Monte Carlo study has been carried out in order to study the effect of measurement error on the precision of parameter estimates of an insulin binding system. Hypothetical radioimmunoassay experiments were generated for insulin binding to erythrocytes. The design of experiments followed strictly the protocol of real experiments. Randomly generated error was added to the synthetic data. The standard technique, a weighted non-linear regression analysis, was employed to re-estimate parameters of a model of two receptor sites and a model of negative co-operativity. As the original parameter values were known, the differences between original and estimated values was studied for (a) measurement error in the range from 0-17%, (b) random initial estimates and (c) error-free non-specific binding. In addition, analytical estimates of parameter precision were compared with the true between-experiment variation of parameter estimates. At the measurement error of 12%, a one site model is recommended to estimate the high affinity population of the two sites model. Plausible results can be expected in 90% of experiments, the between-experiment variation being approximately 30%. The model of two receptor sites gives approximately two thirds of plausible results. The high affinity population can be estimated with the between-experiment variation of 40%, the low affinity population is virtually unidentifiable with the between-experiment variation of approximately 100% and parameter estimates biased to higher values. Only half of the results obtained from the model of negative co-operativity are plausible, the variation in parameter estimates ranges from 90-150% and estimates are biased to higher values. At the level of 12% measurement error, random initial estimates do not significantly affect the estimation process, provided initial estimates are selected from a feasible range. At the same measurement error, the error-free non-specific binding does not improve the results, indicating that the mean of six replicates may be taken as a reliable estimate of non-specific binding. The analytical estimates of the coefficient of variation systematically underestimates the true between-experiments coefficient of variation, the difference has been found to be about 50%.  相似文献   

20.
The ascertainment problem arises when families are sampled by a nonrandom process and some assumption about this sampling process must be made in order to estimate genetic parameters. Under classical ascertainment assumptions, estimation of genetic parameters cannot be separated from estimation of the parameters of the ascertainment process, so that any misspecification of the ascertainment process causes biases in estimation of the genetic parameters. Ewens and Shute proposed a resolution to this problem, involving conditioning the likelihood of the sample on the part of the data which is "relevant to ascertainment." The usefulness of this approach can only be assessed by examining the properties (in particular, bias and standard error) of the estimates which arise by using it for a wide range of parameter values and family size distributions and then comparing these biases and standard errors with those arising under classical ascertainment procedures. These comparisons are carried out in the present paper, and we also compare the proposed method with procedures which condition on, or ignore, parts of the data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号