首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Since Liang and Zeger (1986) proposed the ‘generalized estimating equations’ approach for the estimation of regression parameters in models with correlated discrete responses, a lot of work has been devoted to the investigation of the properties of the corresponding GEE estimators. However, the effects of different kinds of covariates have often been overlooked. In this paper it is shown that the use of non-singular block invariant matrices of covariates, as e.g. a design matrix in an analysis of variance model, leads to GEE estimators which are identical regardless of the ‘working’ correlation matrix used. Moreover, they are efficient (McCullagh, 1983). If on the other hand only covariates are used which are invariant within blocks, the efficiency gain in choosing the ‘correct’ vs. an ‘incorrect’ correlation structure is shown to be negligible. The results of a simple simulation study suggest that although different GEE estimators are not identical and are not as efficient as a ML estimator, the differences are still negligible if both types of invariant covariates are present.  相似文献   

2.
Abstract A phylogenetic analysis of the Melanthripidae genus Cranothrips Bagnall is presented. A data matrix with continuous and discrete characters was analysed under parsimony criteria. Continuous and discrete characters were analysed, separately and in combination. When the different blocks of characters were analysed separately, important differences in tree topologies occurred. The optimal tree obtained from discrete characters alone was similar to the tree resulting from total evidence. For most groups, the support values resulting from all the evidence analysis were higher than those obtained from the discrete‐only analysis. Two new species from Australia are described and illustrated, Cranothrips ibisca sp.n. and Cranothrips conostylus sp.n. A key to the 12 species in the genus is provided. Additionally, the host associations and the distributional patterns of the four worldwide genera of Melanthripidae are discussed.  相似文献   

3.
A phylogenetic analysis of the Australian Aeolothripidae genus Desmothrips Hood is presented. A data matrix with 27 species is analysed under parsimony criteria. The monophyly of Desmothrips is recovered. Continuous and discrete characters were analysed separately and in combination, and continuous characters were rescaled and analysed under equal weights. Three new species from the northwestern and one from the southeastern areas of Australia are described and illustrated. A key to the 18 species of Desmothrips is provided.  相似文献   

4.
Within behavioural research, non‐normally distributed data with a complicated structure are common. For instance, data can represent repeated observations of quantities on the same individual. The regression analysis of such data is complicated both by the interdependency of the observations (response variables) and by their non‐normal distribution. Over the last decade, such data have been more and more frequently analysed using generalized mixed‐effect models. Some researchers invoke the heavy machinery of mixed‐effect modelling to obtain the desired population‐level (marginal) inference, which can be achieved by using simpler tools—namely by marginal models. This paper highlights marginal modelling (using generalized estimating equations [GEE]) as an alternative method. In various situations, GEE can be based on fewer assumptions and directly generate estimates (population‐level parameters) which are of immediate interest to the behavioural researcher (such as population means). Using four examples from behavioural research, we demonstrate the use, advantages, and limits of the GEE approach as implemented within the functions of the ‘geepack’ package in R.  相似文献   

5.
6.
Longitudinal data analysis for discrete and continuous outcomes   总被引:170,自引:0,他引:170  
S L Zeger  K Y Liang 《Biometrics》1986,42(1):121-130
Longitudinal data sets are comprised of repeated observations of an outcome and a set of covariates for each of many subjects. One objective of statistical analysis is to describe the marginal expectation of the outcome variable as a function of the covariates while accounting for the correlation among the repeated observations for a given subject. This paper proposes a unifying approach to such analysis for a variety of discrete and continuous outcomes. A class of generalized estimating equations (GEEs) for the regression parameters is proposed. The equations are extensions of those used in quasi-likelihood (Wedderburn, 1974, Biometrika 61, 439-447) methods. The GEEs have solutions which are consistent and asymptotically Gaussian even when the time dependence is misspecified as we often expect. A consistent variance estimate is presented. We illustrate the use of the GEE approach with longitudinal data from a study of the effect of mothers' stress on children's morbidity.  相似文献   

7.
This paper presents a method for analysing longitudinal data when there are dropouts. In particular, we develop a simple method based on generalized linear mixture models for handling nonignorable dropouts for a variety of discrete and continuous outcomes. Statistical inference for the model parameters is based on a generalized estimating equations (GEE) approach (Liang and Zeger, 1986). The proposed method yields estimates of the model parameters that are valid when nonresponse is nonignorable under a variety of assumptions concerning the dropout process. Furthermore, the proposed method can be implemented using widely available statistical software. Finally, an example using data from a clinical trial of contracepting women is used to illustrate the methodology.  相似文献   

8.
Abstract This study is concerned with statistical methods used for the analysis of comparative data (in which observations are not expected to be independent because they are sampled across phylogenetically related species). The phylogenetically independent contrasts (PIC), phylogenetic generalized least‐squares (PGLS), and phylogenetic autocorrelation (PA) methods are compared. Although the independent contrasts are not orthogonal, they are independent if the data conform to the Brownian motion model of evolution on which they are based. It is shown that uncentered correlations and regressions through the origin using the PIC method are identical to those obtained using PGLS with an intercept included in the model. The PIC method is a special case of PGLS. Corrected standard errors are given for estimates of the ancestral states based on the PGLS approach. The treatment of trees with hard polytomies is discussed and is shown to be an algorithmic rather than a statistical problem. Some of the relationships among the methods are shown graphically using the multivariate space in which variables are represented as vectors with respect to OTUs used as coordinate axes. The maximum‐likelihood estimate of the autoregressive parameter, ρ, has not been computed correctly in previous studies (an appendix with MATLAB code provides a corrected algorithm). The importance of the eigenvalues and eigenvectors of the connection matrix, W, for the distribution of ρ is discussed. The PA method is shown to have several problems that limit its usefulness in comparative studies. Although the PA method is a generalized least‐squares procedure, it cannot be made equivalent to the PGLS method using a phylogenetic model.  相似文献   

9.
We propose a new method to estimate and correct for phylogenetic inertia in comparative data analysis. The method, called phylogenetic eigenvector regression (PVR) starts by performing a principal coordinate analysis on a pairwise phylogenetic distance matrix between species. Traits under analysis are regressed on eigenvectors retained by a broken-stick model in such a way that estimated values express phylogenetic trends in data and residuals express independent evolution of each species. This partitioning is similar to that realized by the spatial autoregressive method, but the method proposed here overcomes the problem of low statistical performance that occurs with autoregressive method when phylogenetic correlation is low or when sample size is too small to detect it. Also, PVR is easier to perform with large samples because it is based on well-known techniques of multivariate and regression analyses. We evaluated the performance of PVR and compared it with the autoregressive method using real datasets and simulations. A detailed worked example using body size evolution of Carnivora mammals indicated that phylogenetic inertia in this trait is elevated and similarly estimated by both methods. In this example, Type I error at α = 0.05 of PVR was equal to 0.048, but an increase in the number of eigenvectors used in the regression increases the error. Also, similarity between PVR and the autoregressive method, defined by correlation between their residuals, decreased by overestimating the number of eigenvalues necessary to express the phylogenetic distance matrix. To evaluate the influence of cladogram topology on the distribution of eigenvalues extracted from the double-centered phylogenetic distance matrix, we analyzed 100 randomly generated cladograms (up to 100 species). Multiple linear regression of log transformed variables indicated that the number of eigenvalues extracted by the broken-stick model can be fully explained by cladogram topology. Therefore, the broken-stick model is an adequate criterion for determining the correct number of eigenvectors to be used by PVR. We also simulated distinct levels of phylogenetic inertia by producing a trend across 10, 25, and 50 species arranged in “comblike” cladograms and then adding random vectors with increased residual variances around this trend. In doing so, we provide an evaluation of the performance of both methods with data generated under different evolutionary models than tested previously. The results showed that both PVR and autoregressive method are efficient in detecting inertia in data when sample size is relatively high (more than 25 species) and when phylogenetic inertia is high. However, PVR is more efficient at smaller sample sizes and when level of phylogenetic inertia is low. These conclusions were also supported by the analysis of 10 real datasets regarding body size evolution in different animal clades. We concluded that PVR can be a useful alternative to an autoregressive method in comparative data analysis.  相似文献   

10.
Phylogenetic meta-analysis   总被引:1,自引:0,他引:1  
Meta-analysis is a powerful statistical technique that combines the results of independent studies to identify general trends. When the species under examination are not independent however, it is also necessary to incorporate phylogenetic information into the analysis. Unfortunately, current meta-analytic approaches cannot account for lack of independence resulting from shared evolutionary history, so a general solution to this problem is lacking. In this article, I derive a model for phylogenetic meta-analysis, so that data across studies may be summarized with evolutionary history explicitly incorporated. The approach takes advantage of common aspects of linear statistical models used by both meta-analysis and the phylogenetic comparative method, thereby allowing them to be analytically combined. In this manner, the correlation structure generated by phylogenetic history can be incorporated directly into the meta-analytic procedure. I illustrate the approach by examining the prevalence of body size clines in mammals. The approach is general, and can also be used to incorporate correlation structure among studies generated by other factors, such as spatial or temporal proximity, or environmental similarity. Therefore, this procedure provides a general statistical template for meta-analytic techniques that can account for attributes that generate nonindependence among studies. Implications of the phylogenetic meta-analysis are discussed.  相似文献   

11.
In a recent study, the phylogeny of Caseidae (a herbivorous family of Palaeozoic synapsids belonging to the paraphyletic grade known as pelycosaurs) was analysed with a dataset employing more than three hundred continuous morphological characters in an effort to follow the principles of total evidence. Continuous characters are a source of great debate, with disagreements surrounding their suitability for and treatment in phylogenetic analysis. A number of shortcomings were identified in the handling of continuous characters in this study of caseids, including the use of gap weighting to discretize the characters and potential issues with redundancy and character non‐independence. Therefore, an alternative treatment for these characters is suggested here. First, rather than using gap weighting, the continuous characters were analysed in the program TNT, in which the raw values can be treated as continuous rather than discrete. Second, prior to the phylogenetic analysis, the continuous characters were subjected to a log‐ratio principal component analysis, and then the principal components were included in the character matrix rather than the raw ratios. Analysing the original data in TNT produced little difference in the results, but using the principal components as continuous characters resulted in alternative positions for Caseopsis agilis, Ennatosaurus tecton and Caseoides sanangeloensis. The differences are judged to be due to the reduced redundancy of the characters, the smaller number of principal components not overwhelming the discrete characters and the use of a scaling method which allows principal components with a higher variance to have a greater influence on the analysis. The positions of highly fragmentary fossils depended heavily on the method used to treat the missing characters in the principal component analysis, and so the method proposed here is not recommended for analysing very incomplete taxa.  相似文献   

12.
Missing data are a common problem in longitudinal studies in the health sciences. Motivated by data from the Muscatine Coronary Risk Factor (MCRF) study, a longitudinal study of obesity, we propose a simple imputation method for handling non-ignorable non-responses (i.e., when non-response is related to the specific values that should have been obtained) in longitudinal studies with either discrete or continuous outcomes. In the proposed approach, two regression models are specified; one for the marginal mean of the response, the other for the conditional mean of the response given non-response patterns. Statistical inference for the model parameters is based on the generalized estimating equations (GEE) approach. An appealing feature of the proposed method is that it can be readily implemented using existing, widely-available statistical software. The method is illustrated using longitudinal data on obesity from the MCRF study.  相似文献   

13.
Summary The generalized estimating equation (GEE) has been a popular tool for marginal regression analysis with longitudinal data, and its extension, the weighted GEE approach, can further accommodate data that are missing at random (MAR). Model selection methodologies for GEE, however, have not been systematically developed to allow for missing data. We propose the missing longitudinal information criterion (MLIC) for selection of the mean model, and the MLIC for correlation (MLICC) for selection of the correlation structure in GEE when the outcome data are subject to dropout/monotone missingness and are MAR. Our simulation results reveal that the MLIC and MLICC are effective for variable selection in the mean model and selecting the correlation structure, respectively. We also demonstrate the remarkable drawbacks of naively treating incomplete data as if they were complete and applying the existing GEE model selection method. The utility of proposed method is further illustrated by two real applications involving missing longitudinal outcome data.  相似文献   

14.
Brownian motion computer simulation was used to test the statistical properties of a spatial autoregressive method in estimating evolutionary correlations between two traits using interspecific comparative data. When applied with a phylogeny of 42 species, the method exhibited reasonable Type I and II error rates. Estimation abilities were comparable to those of independent contrasts and minimum evolution (parsimony) methods, and generally superior to a traditional nonphylogenetic approach (not taking phylogenies into account at all). However, the autoregressive method performed extremely poorly with a smaller phylogeny (15 species) and with nearly independent (“star”) phylogenies. In both of these situations, any phylogenetic autocorrelation present in the data was not detected by the method. Results show how diagnostic techniques (e.g., Moran's I) can be useful in detecting and avoiding such situations, but that such techniques should not be used as definitive evidence that phylogenetic correlation is not present in a set of comparative data. The correction factor (α) proposed by Gittleman and Kot (1990) for use in weighting phylogenetic information had little effect in most analyses of 15 or 42 species with incorrect phylogenetic information, and may require much larger sample sizes before significant improvement is shown. With the sample sizes tested in this study, however, the autoregressive method implemented with this correction factor and correct phylogenetic information led to downwardly biased estimates of the absolute magnitude of the evolutionary correlation between two traits. Cautions and recommendations for implemention of the spatial autoregressive method are given; computer programs to conduct the analyses are available on request.  相似文献   

15.
Many critical ecological issues require the analysis of large spatial point data sets – for example, modelling species distributions, abundance and spread from survey data. But modelling spatial relationships, especially in large point data sets, presents major computational challenges. We use a novel Bayesian hierarchical statistical approach, 'spatial predictive process' modelling, to predict the distribution of a major invasive plant species, Celastrus orbiculatus , in the northeastern USA. The model runs orders of magnitude faster than traditional geostatistical models on a large data set of c . 4000 points, and performs better than generalized linear models, generalized additive models and geographically weighted regression in cross-validation. We also use this approach to model simultaneously the distributions of a set of four major invasive species in a spatially explicit multivariate model. This multispecies analysis demonstrates that some pairs of species exhibit negative residual spatial covariation, suggesting potential competitive interaction or divergent responses to unmeasured factors.  相似文献   

16.
BACKGROUND AND AIMS: The amount of DNA per chromosome set is known to be a fairly constant characteristic of a species. Its interspecific variation is enormous, but the biological significance of this variation is little understood. Some of the characters believed to be correlated with DNA amount are alpine habitat, life history and breeding system. In the present study, the aim is to distinguish between direct causal connections and chance correlation of the amount of DNA in the genus Veronica. METHODS: Estimates of DNA amount were analysed for 42 members of Veroniceae in connection with results from a phylogenetic analysis of plastid trnL-F DNA sequences and tested correlations using standard statistical tests, phylogenetically independent contrasts and a model-based generalized least squares method to distinguish the phylogenetic effect on the results. KEY RESULTS: There appears to be a lower upper limit for DNA amount in annuals than in perennials. Most DNAC-values in Veroniceae are below the mean DNA C-value for annuals in angiosperms as a whole. However, the long-debated correlation of low genome size with annual life history is not significant (P = 0.12) using either standard statistical tests or independent contrasts, but it is significant with the generalized least squares method (P < 0.01). CONCLUSIONS: The correlation of annual life history and low genome size found in earlier studies could be due to the association of annual life history and selfing, which is significantly correlated with low genome size using any of the three tests applied. This correlation can be explained by models showing a reduction in transposable elements in selfers. A significant correlation of higher genome sizes with alpine habitats was also detected.  相似文献   

17.
We performed multi-directional chromosome painting in a comparative cytogenetic study of the three Atelinae species Brachyteles arachnoides, Ateles paniscus paniscus and Ateles belzebuth marginatus, in order to reconstruct phylogenetic relationships within this Platyrrhini subfamily. Comparative chromosome maps between these species were established by multi-color fluorescence in situ hybridization (FISH) employing human, Saguinus oedipus and Lagothrix lagothricha chromosome-specific probes. The three species included in this study and four previously analyzed species from all four Atelinae genera were subjected to a phylogenetic analysis on the basis of a data matrix comprised of 82 discrete chromosome characters. The results confirmed that Atelinae represent a monophyletic clade with a putative ancestral karyotype of 2n = 62 chromosomes. Phylogenetic analysis revealed an evolutionary branching sequence [Alouatta [Brachyteles [Lagothrix and Ateles]]] in Atelinae and [Ateles belzebuth marginatus [Ateles paniscus paniscus [Ateles belzebuth hybridus and Ateles geoffroyi]]] in genus Ateles. The chromosomal data support a re-evaluation of the taxonomic status of Ateles b. hybridus.  相似文献   

18.
A new method of coding polymorphic multiistate characters for phylogenetic analysis is presented. By dividing such characters into subcharacters, their frequency distributions can be represented with discrete states. Differential weighting is used to counter the effect of representing one character with multiple characters. The new method, generalized frequency coding (GFC), is potentially superior to previously used methods in that it incorporates more information and is applicable to both qualitative and quantitative characters. When applied to a previously published data set that includes both types of polymorphic multistate characters, the method performed well, as assessed with g1 and nonparametric bootstrap statistics and giving results congruent with those of other studies. The data set was also used to compare GFC with both gap-weighting and Manhattan distance step matrix coding. On these grounds and for philosophical reasons, we consider GFC to be a better estimator of phylogeny.  相似文献   

19.
GEE with Gaussian estimation of the correlations when data are incomplete   总被引:4,自引:0,他引:4  
This paper considers a modification of generalized estimating equations (GEE) for handling missing binary response data. The proposed method uses Gaussian estimation of the correlation parameters, i.e., the estimating function that yields an estimate of the correlation parameters is obtained from the multivariate normal likelihood. The proposed method yields consistent estimates of the regression parameters when data are missing completely at random (MCAR). However, when data are missing at random (MAR), consistency may not hold. In a simulation study with repeated binary outcomes that are missing at random, the magnitude of the potential bias that can arise is examined. The results of the simulation study indicate that, when the working correlation matrix is correctly specified, the bias is almost negligible for the modified GEE. In the simulation study, the proposed modification of GEE is also compared to the standard GEE, multiple imputation, and weighted estimating equations approaches. Finally, the proposed method is illustrated using data from a longitudinal clinical trial comparing two therapeutic treatments, zidovudine (AZT) and didanosine (ddI), in patients with HIV.  相似文献   

20.
Two phylogenetic comparative methods, independent contrasts and generalized least squares models, can be used to determine the statistical relationship between two or more traits. We show that the two approaches are functionally identical and that either can be used to make statistical inferences about values at internal nodes of a phylogenetic tree (hypothetical ancestors), to estimate relationships between characters, and to predict values for unmeasured species. Regression equations derived from independent contrasts can be placed back onto the original data space, including computation of both confidence intervals and prediction intervals for new observations. Predictions for unmeasured species (including extinct forms) can be made increasingly accurate and precise as the specificity of their placement on a phylogenetic tree increases, which can greatly increase statistical power to detect, for example, deviation of a single species from an allometric prediction. We reexamine published data for basal metabolic rates (BMR) of birds and show that conventional and phylogenetic allometric equations differ significantly. In new results, we show that, as compared with nonpasserines, passerines exhibit a lower rate of evolution in both body mass and mass-corrected BMR; passerines also have significantly smaller body masses than their sister clade. These differences may justify separate, clade-specific allometric equations for prediction of avian basal metabolic rates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号