首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Estimating intraclass correlation for binary data   总被引:5,自引:0,他引:5  
This paper reviews many different estimators of intraclass correlation that have been proposed for binary data and compares them in an extensive simulation study. Some of the estimators are very specific, while others result from general methods such as pseudo-likelihood and extended quasi-likelihood estimation. The simulation study identifies several useful estimators, one of which does not seem to have been considered previously for binary data. Estimators based on extended quasi-likelihood are found to have a substantial bias in some circumstances.  相似文献   

3.
A logistic regression with random effects model is commonly applied to analyze clustered binary data, and every cluster is assumed to have a different proportion of success. However, it could be of interest to obtain the proportion of success over clusters (i.e. the marginal proportion of success). Furthermore, the degree of correlation among data of the same cluster (intraclass correlation) is also a relevant concept to assess, but when using logistic regression with random effects it is not possible to get an analytical expression of the estimators for marginal proportion and intraclass correlation. In our paper, we assess and compare approaches using different kinds of approximations: based on the logistic‐normal mixed effects model (LN), linear mixed model (LMM), and generalized estimating equations (GEE). The comparisons are completed by using two real data examples and a simulation study. The results show the performance of the approaches strongly depends on the magnitude of the marginal proportion, the intraclass correlation, and the sample size. In general, the reliability of the approaches get worsen with low marginal proportion and large intraclass correlation. LMM and GEE approaches arises as reliable approaches when the sample size is large.  相似文献   

4.
Familial aggregation for 33 different variables from the craniofacial complex was estimated through intraclass correlation coefficients for four different relationships: parent-offspring paris, sibs, cousins and unrelated pairs. The population chosen for the study was La Sabana, D.F., a Venezuelan isolate of Negroid origin. The general tendency observed among the different correlations was as anticipated: sibs show higher correlations than cousins and these were higher than for unrelated pairs. Parent-offspring correlations were lower than expected. The significant correlations observed among sibs for 17 of the variables indicate aggregation due to genetic and/or common environmental factors. On the other hand, little genetic determination was detected for sella-C point distance or for upper dental arch depth both of which show intraclass sib correlations ≤0.1.  相似文献   

5.
When we employ cluster sampling to collect data with matched pairs, the assumption of independence between all matched pairs is not likely true. This paper notes that applying interval estimators, that do not account for the intraclass correlation between matched pairs, to estimate the simple difference between two proportions of response can be quite misleading, especially when both the number of matched pairs per cluster and the intraclass correlation between matched pairs within clusters are large. This paper develops two asymptotic interval estimators of the simple difference, that accommodate the data of cluster sampling with correlated matched pairs. This paper further applies Monte Carlo simulation to compare the finite sample performance of these estimators and demonstrates that the interval estimator, derived from a quadratic equation proposed here, can actually perform quite well in a variety of situations.  相似文献   

6.
Summary At least two common practices exist when a negative variance component estimate is obtained, either setting it to zero or not reporting the estimate. The consequences of these practices are investigated in the context of the intraclass correlation estimation in terms of bias, variance and mean squared error (MSE). For the one-way analysis of variance random effects model and its extension to the common correlation model, we compare five estimators: analysis of variance (ANOVA), concentrated ANOVA, truncated ANOVA and two maximum likelihood-like (ML) estimators. For the balanced case, the exact bias and MSE are calculated via numerical integration of the exact sample distributions, while a Monte Carlo simulation study is conducted for the unbalanced case. The results indicate that the ANOVA estimator performs well except for designs with family size n = 2. The two ML estimators are generally poor, and the concentrated and truncated ANOVA estimators have some advantages over the ANOVA in terms of MSE. However, the large biases may make the concentrated and truncated ANOVA estimators objectionable when intraclass correlation () is small. Bias should be a concern when a pooled estimate is obtained from the literature since <0.05 in many genetic studies.  相似文献   

7.
A Mu?oz  B Rosner  V Carey 《Biometrics》1986,42(3):653-658
In the statistical analysis of twinship and familial data, one often encounters the need for regression methods that control for the possibly different aggregation of the twins and the members of the families according to their type (e.g., monozygotic, dizygotic for twinship data). We present the maximum likelihood solution for the regression analysis of data in the presence of heterogeneous intraclass correlations. This work extends previous results for the case of a homogeneous intraclass correlation. An application of the methods for the analysis of twinship data is included.  相似文献   

8.
Estimating the effects of haplotypes on the age of onset of a disease is an important step toward the discovery of genes that influence complex human diseases. A haplotype is a specific sequence of nucleotides on the same chromosome of an individual and can only be measured indirectly through the genotype. We consider cohort studies which collect genotype data on a subset of cohort members through case-cohort or nested case-control sampling. We formulate the effects of haplotypes and possibly time-varying environmental variables on the age of onset through a broad class of semiparametric regression models. We construct appropriate nonparametric likelihoods, which involve both finite- and infinite-dimensional parameters. The corresponding nonparametric maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Consistent variance-covariance estimators are provided, and efficient and reliable numerical algorithms are developed. Simulation studies demonstrate that the asymptotic approximations are accurate in practical settings and that case-cohort and nested case-control designs are highly cost-effective. An application to a major cardiovascular study is provided.  相似文献   

9.
We are interested in the estimation of average treatment effects based on right-censored data of an observational study. We focus on causal inference of differences between t-year absolute event risks in a situation with competing risks. We derive doubly robust estimation equations and implement estimators for the nuisance parameters based on working regression models for the outcome, censoring, and treatment distribution conditional on auxiliary baseline covariates. We use the functional delta method to show that these estimators are regular asymptotically linear estimators and estimate their variances based on estimates of their influence functions. In empirical studies, we assess the robustness of the estimators and the coverage of confidence intervals. The methods are further illustrated using data from a Danish registry study.  相似文献   

10.
Since Liang and Zeger (1986) proposed the ‘generalized estimating equations’ approach for the estimation of regression parameters in models with correlated discrete responses, a lot of work has been devoted to the investigation of the properties of the corresponding GEE estimators. However, the effects of different kinds of covariates have often been overlooked. In this paper it is shown that the use of non-singular block invariant matrices of covariates, as e.g. a design matrix in an analysis of variance model, leads to GEE estimators which are identical regardless of the ‘working’ correlation matrix used. Moreover, they are efficient (McCullagh, 1983). If on the other hand only covariates are used which are invariant within blocks, the efficiency gain in choosing the ‘correct’ vs. an ‘incorrect’ correlation structure is shown to be negligible. The results of a simple simulation study suggest that although different GEE estimators are not identical and are not as efficient as a ML estimator, the differences are still negligible if both types of invariant covariates are present.  相似文献   

11.
Cereal crop yield is determined by different yield components such as seed weight, seed number per spike and the tiller number and spikes. Negative correlations between these traits are often attributed to resource limitation. However, recent evidence suggests that the same genes or regulatory modules can regulate both inflorescence branching and tillering. It is therefore important to explore the role of genetic correlations between different yield components in small grain cereals. In this work, we studied pleiotropic effects of row type genes on seed size, seed number per spike, thousand grain weight, and tillering in barley to better understand the genetic correlations between individual yield components. Allelic mutants of nine different row type loci (36 mutants), in the original spring barley varieties Barke, Bonus and Foma and introgressed in the spring barley cultivar Bowman, were phenotyped under greenhouse and outdoor conditions. We identified two main mutant groups characterized by their relationships between seed and tillering parameters. The first group comprises all mutants with an increased number of seeds and significant change in tiller number at early development (group 1a) or reduced tillering only at full maturity (group 1b). Mutants in the second group are characterized by a reduction in seeds per spike and tiller number, thus exhibiting positive correlations between seed and tiller number. Reduced tillering at full maturity (group 1b) is likely due to resource limitations. In contrast, altered tillering at early development (groups 1a and 2) suggests that the same genes or regulatory modules affect inflorescence and shoot branching. Understanding the genetic bases of the trade-offs between these traits is important for the genetic manipulation of individual yield components.  相似文献   

12.
13.
Three different estimators are presented for the types of parameters present in mathematical models of animal epidemics. The estimators make use of the data collected during an epidemic, which may be limited, incomplete, or under collection on an ongoing basis. When data are being collected on an ongoing basis, the estimated parameters can be used to evaluate putative control strategies. These estimators were tested using simulated epidemics based on a spatial, discrete-time, gravity-type, stochastic mathematical model containing two parameters. Target epidemics were simulated with the model and the three estimators were implemented using various combinations of collected data to independently determine the two parameters.  相似文献   

14.
Cockerham CC 《Genetics》1973,74(4):701-712
A genic analysis of variance of data on mate pairs for a codominant gene is developed. This analysis provides estimators of the correlation, F, of genes within individuals, of the correlation, Θ, of genes between mates, and of various variances—all relative to the correlation or variation among genes of nonmates. The data are manipulated into marginal distributions to produce another method of obtaining the same estimators. Several examples are given of how assumptions about the model and parameters modify the estimators and which were utilized in constructing χ2 tests of hypotheses concerning F and Θ.—A recessive gene is also considered. Only the frequency of recessive genotypes and the correlation of recessive mates are estimable in this case unless one makes very demanding assumptions about the model.—Numerical examples of the analysis of variance and estimators are given for both a codominant and recessive gene.  相似文献   

15.
Quantitative risk assessments in public health settings intend to describe the hazard of a specific exposure in a given population on the basis of epidemiological and/or experimental results. Two different risk quantities, the absolute lifetime excess risk and the loss-of-lifetime, which differ in their definition of hazard, are discussed and compared. For both measures estimation procedures are derived and the relationship between the various estimates which are currently in use are investigated. It is shown that the two most common estimators can be written as special cases of a more general concept. This leads to conclusions about the assumptions on which different estimation procedures are implicitly based. For all discussed estimators variance estimates are derived. The analytical results for both risk parameters will be elucidated by an example on lung cancer risk due to residential radon in Germany.  相似文献   

16.
Twin studies of child temperament using objective measures consistently suggest moderate heritability for most dimensions. However, parent rating measures produce unusual patterns of results. Intraclass correlations for identical (MZ) twins are typically high, whereas fraternal (DZ) twin intraclass correlations are much lower than would be predicted from an additive genetic model. The 'too low' DZ correlations can be explained by parent-rating biases that either exaggerate the differences between DZ twins (contrast effects) or that inflate the similarity of MZ twins (assimilation effects), or by the presence of non-additive genetic variance. To evaluate the three possible explanations, we used model-fitting procedures applied to parent-rating data averaged across 14, 20, 24, and 36 months of age in a sample of 196 twin pairs participating in the MacArthur Longitudinal Twin Study. The data were best described by a model that included contrast effects. Implications for non-twin research are discussed.  相似文献   

17.
This paper discusses interval estimation for the ratio of the mean failure times on the basis of paired exponential observations. This paper considers five interval estimators: the confidence interval using an idea similar to Fieller's theorem (CIFT), the confidence interval using an exact parametric test (CIEP), the confidence interval using the marginal likelihood ratio test (CILR), the confidence interval assuming no matching effect (CINM), and the confidence interval using a locally most powerful test (CIMP). To evaluate and compare the performance of these five interval estimators, this paper applies Monte Carlo simulation. This paper notes that with respect to the coverage probability, use of the CIFT, CILR, or CIMP, although which are all derived based on large sample theory, can perform well even when the number of pairs n is as small as 10. As compared with use of the CILR, this paper finds that use of the CIEP with equal tail probabilities is likely to lose efficiency. However, this loss can be reduced by using the optimal tail probabilities to minimize the average length when n is small (<20). This paper further notes that use of the CIMP is preferable to the CIEP in a variety of situations considered here. In fact, the average length of the CIMP with use of the optimal tail probabilities can even be shorter than that of the CILR. When the intraclass correlation between failure times within pairs is 0 (i.e., the failure times within the same pair are independent), the CINM, which is derived for two independent samples, is certainly the best one among the five interval estimators considered here. When there is an intraclass correlation but which is small (<0.10), the CIFT is recommended for obtaining a relatively short interval estimate without sacrificing the loss of the coverage probability. When the intraclass correlation is moderate or large, either the CILR or the CIMP with the optimal tail probabilities is preferable to the others. This paper also notes that if the intraclass correlation between failure times within pairs is large, use of the CINM can be misleading, especially when the number of pairs is large.  相似文献   

18.
Nonlinear mixed effects models for repeated measures data   总被引:51,自引:1,他引:50  
We propose a general, nonlinear mixed effects model for repeated measures data and define estimators for its parameters. The proposed estimators are a natural combination of least squares estimators for nonlinear fixed effects models and maximum likelihood (or restricted maximum likelihood) estimators for linear mixed effects models. We implement Newton-Raphson estimation using previously developed computational methods for nonlinear fixed effects models and for linear mixed effects models. Two examples are presented and the connections between this work and recent work on generalized linear mixed effects models are discussed.  相似文献   

19.
Several estimators have been proposed that use molecular marker data to infer the degree of relatedness for pairs of individuals. The objective of this study was to evaluate the performance of seven estimators when applied to marker data of a set of 33 key individuals from a large complex apple pedigree. The evaluation considered different scenarios of allele frequencies and different numbers of marker loci. The method of moments estimators were Similarity, Queller-Goodknight, Lynch-Ritland and Wang. The maximum likelihood estimators were Thompson, Anderson-Weir and Jacquard. The pedigree-based coancestry coefficients were taken as the point of reference in calculating correlations and root mean square error (RMSE). The marker data comprised 86 multi-allelic SSR markers on 17 linkage groups, covering 11 Morgans. Additionally, we simulated 10 datasets conditional on the real pedigree to support the results on the real dataset. None of the estimators outperformed the others. Knowledge of allele frequencies appeared to be the most influential, i.e., the highest correlations and lowest RMSE were found when frequencies from the founder population were available. When equal allele frequencies were used, all estimators resulted in very similar, but on average lower, correlations. The use of allele frequencies estimated from the set of 33 individuals gave, on average, the poorest results. The maximum likelihood estimators and the Lynch-Ritland estimator were the most sensitive to allele frequencies. The results from the simulation study fully supported the trends in results of the real dataset. This study indicated that high correlations (up to 0.90) and small RMSE (below 0.03), may be obtained when population allelic frequencies are available. In this scenario, the performances of the various estimators were similar, but seemed to favor the maximum likelihood estimators. In the absence of reliable allele frequencies the method of moments estimators were shown to be more robust. The number of marker loci influenced the average performance of the estimators; however, the ranking was not affected. Correlations up to 0.80 were obtained when two markers per chromosome and appropriate allele frequencies were available. Adding more markers to the current dataset may lead to marginal improvements.  相似文献   

20.
T H Meuwissen 《Biometrics》1991,47(1):195-203
The effect of family structure is of increasing importance in modern breeding schemes, because increased intraclass correlations between relatives due to improved breeding value estimation methods use all family information, and increased family sizes are possible with improved reproduction rates. In addition, reduction of the generation intervals in modern breeding schemes leads to increased intraclass (family) correlations, because young animals have little information on individual or on progeny performance. This paper derives an approximation for the selection differential in a population divided into families. The result is then extended to an approximation for the selection differentials in populations that are divided into full sib families within paternal half sib families. The approximation is compared with Monte Carlo results, from which it is concluded that the approximation is satisfactory (i.e., rarely more than 5% in error). In some practical situations the approximation is shown to be not more than 2% in error. With high intraclass correlations and few animals selected, the reduction of the selection differentials is maximal. When breeding values are based on family information and the family structure is not accounted for, overestimation of the selection differentials can be up to 61%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号