首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
M Tsujitani  G G Koch 《Biometrics》1991,47(3):1135-1141
This article describes graphical diagnostic methods for log odds ratio regression models. To study the effects of an additional covariate on log odds ratio regression analysis, three types of residual plots based on weighted least squares (WLS) are discussed: (i) added variable plot (partial regression plot), (ii) partial residual plot, and (iii) augmented partial residual plot. These plots provide diagnostic procedures for identifying heterogeneity of error variances, outliers, or nonlinearity of the model. They are especially useful for clarifying whether including a covariate as a linear term is appropriate, or whether quadratic or other nonlinear transformations are preferable. A well-known data set for case-control studies is analyzed to illustrate the residual plots.  相似文献   

2.
G Heller  J S Simonoff 《Biometrics》1992,48(1):101-115
Although the analysis of censored survival data using the proportional hazards and linear regression models is common, there has been little work examining the ability of these estimators to predict time to failure. This is unfortunate, since a predictive plot illustrating the relationship between time to failure and a continuous covariate can be far more informative regarding the risk associated with the covariate than a Kaplan-Meier plot obtained by discretizing the variable. In this paper the predictive power of the Cox (1972, Journal of the Royal Statistical Society, Series B 34, 187-202) proportional hazards estimator and the Buckley-James (1979, Biometrika 66, 429-436) censored regression estimator are compared. Using computer simulations and heuristic arguments, it is shown that the choice of method depends on the censoring proportion, strength of the regression, the form of the censoring distribution, and the form of the failure distribution. Several examples are provided to illustrate the usefulness of the methods.  相似文献   

3.
Abstract We examined variation in bird species richness, abundance and guild composition along an agricultural gradient in New Guinea, and looked for any additive influence of habitat heterogeneity on these variables. The study was based on a grid of survey plots, six plots wide and 24 plots long with the long axis running from a settlement 2.4 km through active and abandoned agricultural plots towards a large area of forest. Each circular survey plot (25 m radius) was assigned to a broad habitat type, ten habitat measures taken, and birds counted for 1 h in each plot. Principal component analysis (PCA) habitat axis 1 described an axis of decreasing forest alteration (larger trees, greater tree densities, fuller canopy) that was positively correlated with distance from the settlement. Bird richness and abundance were highest at intermediate disturbance levels (plots with mid‐range axis 1 scores). Proportions of insectivores and frugivores increased with decreasing forest alteration, while proportions of nectarivores decreased. We calculated three measures of habitat heterogeneity by comparing each plot's PCA score to those of eight neighbouring plots (50–110 m away). These measures reflected how different the plot was to its neighbours, how variable the habitat was around the plot, and the degree to which the plot bordered less disturbed forest. We related these measures to plot bird variable scores independently, and to residuals following regressions of bird scores against PCA scores. Heterogeneity measures had no significant influence on abundance or richness measures, but there were greater proportions of frugivores in plots showing a given degree of habitat alteration if they bordered more pristine habitat. While we readily identified differences in bird communities along the agricultural gradient, the influences of habitat heterogeneity were not striking for birds at this fine scale.  相似文献   

4.
Abstract. In European phytosociology, variable plot sizes are traditionally used for sampling different vegetation types. This practice may generate problems in current vegetation or habitat survey projects based on large data sets, which include relevés made by many authors at different times. In order to determine the extent of variation in plot sizes used in European phytosociology, we collected a data set of 41 174 relevés with an indication of plot size, published in six major European journals focusing on phytosociology from 1970 to 2000. As an additional data set, we took 27 365 relevés from the Czech National Phytosociological Database. From each data set, we calculated basic statistical figures for plot sizes used to sample vegetation of various phytosociological classes. The results show that in Europe the traditionally used size of vegetation plots is roughly proportional to vegetation height; however, there is a large variation in plot size, both within and among vegetation classes. The effect of variable plot sizes on vegetation analysis and classification is not sufficiently known, but use of standardized plot sizes would be desirable in future projects of vegetation or habitat survey. Based on our analysis, we suggest four plot sizes as possible standards. They are 4 m2 for sampling aquatic vegetation and low‐grown herbaceous vegetation, 16 m2 for most grassland, heathland and other herbaceous or low‐scrub vegetation types, 50 m2 for scrub, and 200 m2 for woodlands. It has been pointed out that in some situations, sampling in either small or large plots may result in assignment of relevés to different phytosociological classes or habitat types. Therefore defining vegetation and habitat types as scale‐dependent concepts is needed.  相似文献   

5.
This study describes changes in woody vegetation in the Mwanihana forest, Udzungwa Mountains National Park, Tanzania, over an altitude range of 470–1700 m. Two methods, fixed‐ and variable‐area plots, are compared to elucidate altitudinal variation in tropical forest structure, diversity and community composition. Six 25 m × 100 m fixed area plots recorded a total of 2143 woody stems of ≥3 cm d.b.h. from 204 species. The 78 variable‐area plots recorded the nearest twenty trees of ≥20 cm d.b.h. to an objectively chosen point, giving a total of 1560 stems in 9.1 ha from 156 species. A linear trend of increasing stem density with altitude was seen for variable‐area plots. Species diversity is highest at high elevations. There was no clear zonation of elevational vegetation types. Restricted range taxa occur at all altitudes sampled. The study also revealed some methodological considerations. Bias in sample size and plot area can be tested by employing two sampling methods. Of the two methods used, fixed area plots are preferred as variable area plots are impractical in tangled understorey. Plot size must be controlled for in order to make reliable observations of diversity. Sampling along a continuous or near‐continuous altitudinal gradient with sufficient replication is also important.  相似文献   

6.
Li L  Shao J  Palta M 《Biometrics》2005,61(3):824-830
Covariate measurement error in regression is typically assumed to act in an additive or multiplicative manner on the true covariate value. However, such an assumption does not hold for the measurement error of sleep-disordered breathing (SDB) in the Wisconsin Sleep Cohort Study (WSCS). The true covariate is the severity of SDB, and the observed surrogate is the number of breathing pauses per unit time of sleep, which has a nonnegative semicontinuous distribution with a point mass at zero. We propose a latent variable measurement error model for the error structure in this situation and implement it in a linear mixed model. The estimation procedure is similar to regression calibration but involves a distributional assumption for the latent variable. Modeling and model-fitting strategies are explored and illustrated through an example from the WSCS.  相似文献   

7.
Lin DY  Wei LJ  Ying Z 《Biometrics》2002,58(1):1-12
Residuals have long been used for graphical and numerical examinations of the adequacy of regression models. Conventional residual analysis based on the plots of raw residuals or their smoothed curves is highly subjective, whereas most numerical goodness-of-fit tests provide little information about the nature of model misspecification. In this paper, we develop objective and informative model-checking techniques by taking the cumulative sums of residuals over certain coordinates (e.g., covariates or fitted values) or by considering some related aggregates of residuals, such as moving sums and moving averages. For a variety of statistical models and data structures, including generalized linear models with independent or dependent observations, the distributions of these stochastic processes tinder the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be easily generated by computer simulation. Each observed process can then be compared, both graphically and numerically, with a number of realizations from the Gaussian process. Such comparisons enable one to assess objectively whether a trend seen in a residual plot reflects model misspecification or natural variation. The proposed techniques are particularly useful in checking the functional form of a covariate and the link function. Illustrations with several medical studies are provided.  相似文献   

8.
It is a common practice to analyze the data from agricultural experiments using randomized block design where each block has homogeneous experimental units (plots). However, many times, during the period of study, due to unavoidable circumstances such as natural calamities, say floods, for example, plots are no longer homogeneous. Thus, it is desirable to consider the plot effects also in the model. An appropriate model and an example of the analysis of data with plot effects is provided here.  相似文献   

9.
Recently, several multiple plot similarity indices have been presented that cure some of the problems associated with the approaches for the calculation of compositional similarity for groups of plots by averaging pairwise similarities. These new indices calculate the similarity between more than two plots whilst considering the species composition on all compared plots. The resulting similarity value is true for the whole group of plots considered (called neighborhood in the following). Here, we review the possibilities for multiple plot similarity calculation and additionally explore coefficients that examine multiple plot similarity between a reference plot (named focal plot in the following) and any number of surrounding plots. The latter represent measures of singularity. Further, we establish a framework for applying these two kinds of multiple plot measures to gridded data including an algorithm for testing the significance of calculated values against random expectations. The capability of multiple plot measures for detecting species compositional gradients and local/regional hotspots within this framework is tested. For this purpose, several artificial data sets with known gradients in species composition (random, gradient, central hotspot, hotspot bottom right) are constructed on the basis of a real data set from a Tundra ecosystem in northern Sweden (Abisko). The coefficients that best reflect the positions of the plots on the realized gradients in species composition are considered as performing best with regard to pattern detection. The tested measures of multiple plot similarity and singularity produced considerably different results when applied to one real and 4 artificial data sets. The newly proposed symmetric singularity coefficient has the best overall performance which makes it suitable for local/regional hotspot detection and for incorporating local to regional similarity analyses in reserve selection procedures.  相似文献   

10.
Error propagation and scaling for tropical forest biomass estimates   总被引:10,自引:0,他引:10  
The above-ground biomass (AGB) of tropical forests is a crucial variable for ecologists, biogeochemists, foresters and policymakers. Tree inventories are an efficient way of assessing forest carbon stocks and emissions to the atmosphere during deforestation. To make correct inferences about long-term changes in biomass stocks, it is essential to know the uncertainty associated with AGB estimates, yet this uncertainty is rarely evaluated carefully. Here, we quantify four types of uncertainty that could lead to statistical error in AGB estimates: (i) error due to tree measurement; (ii) error due to the choice of an allometric model relating AGB to other tree dimensions; (iii) sampling uncertainty, related to the size of the study plot; (iv) representativeness of a network of small plots across a vast forest landscape. In previous studies, these sources of error were reported but rarely integrated into a consistent framework. We estimate all four terms in a 50 hectare (ha, where 1 ha = 10(4) m2) plot on Barro Colorado Island, Panama, and in a network of 1 ha plots scattered across central Panama. We find that the most important source of error is currently related to the choice of the allometric model. More work should be devoted to improving the predictive power of allometric models for biomass.  相似文献   

11.
An experiment was carried out to assess the significance of inter-plot competition in a yield trial of potato cultivars. Seventeen cultivars were deliberately chosen and assessed for yield in single-drill and four-drill plots. Inter-plot competition for fresh-weight yield was a significant factor in the single-drill plots. It was modelled using a common competition coefficient with a covariate based on neighbour fresh-weight yields. In contrast, there was no statistically significant inter-plot competition for specific gravity. After adjustment for inter-plot competition, varietal ranking in estimated monoculture yield differed little from that based on unadjusted means. However, there was a reduction in the range of yield estimates, and a closer agreement with the observed pure-stand yields from the inner two drills of the four-drill plots. The adjustment for monoculture performance was most pronounced for the higher and lower yielding varieties, as expected from the assumption that the performance of high yielding varieties was enhanced in a competitive environment at the expense of low yielding ones. A general and flexible method of estimating competition coefficients in variety trials, together with a suitable algorithm, was developed and is explained in an appendix. It was used to check for inter-plot competition in a number of potato trials with single-drill plots and a large number of entries. Competition was found in some trials but not in others. Thus, where potato tubers are grown in single-drill plots for assessment of fresh-weight yield, adjustment should be made for inter-plot competition when evidence of inter-drill competition is found.  相似文献   

12.
Multiple imputation (MI) is used to handle missing at random (MAR) data. Despite warnings from statisticians, continuous variables are often recoded into binary variables. With MI it is important that the imputation and analysis models are compatible; variables should be imputed in the same form they appear in the analysis model. With an encoded binary variable more accurate imputations may be obtained by imputing the underlying continuous variable. We conducted a simulation study to explore how best to impute a binary variable that was created from an underlying continuous variable. We generated a completely observed continuous outcome associated with an incomplete binary covariate that is a categorized version of an underlying continuous covariate, and an auxiliary variable associated with the underlying continuous covariate. We simulated data with several sample sizes, and set 25% and 50% of data in the covariate to MAR dependent on the outcome and the auxiliary variable. We compared the performance of five different imputation methods: (a) Imputation of the binary variable using logistic regression; (b) imputation of the continuous variable using linear regression, then categorizing into the binary variable; (c, d) imputation of both the continuous and binary variables using fully conditional specification (FCS) and multivariate normal imputation; (e) substantive-model compatible (SMC) FCS. Bias and standard errors were large when the continuous variable only was imputed. The other methods performed adequately. Imputation of both the binary and continuous variables using FCS often encountered mathematical difficulties. We recommend the SMC-FCS method as it performed best in our simulation studies.  相似文献   

13.
Summary Soil movement from plot to plot in long-term field experiments caused by tillage, soil fauna, wind, and water leads to experimental errors. The paper attempts to quantify the total movement in current long-term field experiments.A soil movement model was fitted to soil-phosphorus (P) recordings of two 90-year-old field experiments. The model fitted well and indicated why the soil P concentration of the unmanured plots had increased for many years. The removal of P by crops from the unmanured plots had simply been more than compensated for by the soil exchange between the unmanured plots and the adjacent P-fertilized ones.Furthermore, the model was used for simulating soil movement in 21 of the worlds more than 50-year-old field experiments assuming the same rate of soil transport as estimated before. The simulations showed that on average of these experiments only 28% of the plough-layer soil present in their net-plots (the central quarter of each plot) to day should originate from the plough-layer soil that was inside the plots when the experiments started.The work indicates that the movement of soil is a serious general problem in long-term field experiments, a problem with implications for our understanding of long-term processes in agro-ecosystems.  相似文献   

14.
Dunson DB  Perreault SD 《Biometrics》2001,57(1):302-308
This article describes a general class of factor analytic models for the analysis of clustered multivariate data in the presence of informative missingness. We assume that there are distinct sets of cluster-level latent variables related to the primary outcomes and to the censoring process, and we account for dependency between these latent variables through a hierarchical model. A linear model is used to relate covariates and latent variables to the primary outcomes for each subunit. A generalized linear model accounts for covariate and latent variable effects on the probability of censoring for subunits within each cluster. The model accounts for correlation within clusters and within subunits through a flexible factor analytic framework that allows multiple latent variables and covariate effects on the latent variables. The structure of the model facilitates implementation of Markov chain Monte Carlo methods for posterior estimation. Data from a spermatotoxicity study are analyzed to illustrate the proposed approach.  相似文献   

15.
In this study, an individual tree crown ratio (CR) model was developed with a data set from a total of 3134 Mongolian oak (Quercus mongolica) trees within 112 sample plots allocated in Wangqing Forest Bureau of northeast China. Because of high correlation among the observations taken from the same sampling plots, the random effects at levels of both blocks defined as stands that have different site conditions and plots were taken into account to develop a nested two-level nonlinear mixed-effect model. Various stand and tree characteristics were assessed to explore their contributions to improvement of model prediction. Diameter at breast height, plot dominant tree height and plot dominant tree diameter were found to be significant predictors. Exponential model with plot dominant tree height as a predictor had a stronger ability to account for the heteroskedasticity. When random effects were modeled at block level alone, the correlations among the residuals remained significant. These correlations were successfully reduced when random effects were modeled at both block and plot levels. The random effects from the interaction of blocks and sample plots on tree CR were substantially large. The model that took into account both the block effect and the interaction of blocks and sample plots had higher prediction accuracy than the one with the block effect and population average considered alone. Introducing stand density into the model through dummy variables could further improve its prediction. This implied that the developed method for developing tree CR models of Mongolian oak is promising and can be applied to similar studies for other tree species.  相似文献   

16.
The focus of many medical applications is to model the impact of several factors on time to an event. A standard approach for such analyses is the Cox proportional hazards model. It assumes that the factors act linearly on the log hazard function (linearity assumption) and that their effects are constant over time (proportional hazards (PH) assumption). Variable selection is often required to specify a more parsimonious model aiming to include only variables with an influence on the outcome. As follow-up increases the effect of a variable often gets weaker, which means that it varies in time. However, spurious time-varying effects may also be introduced by mismodelling other parts of the multivariable model, such as omission of an important covariate or an incorrect functional form of a continuous covariate. These issues interact. To check whether the effect of a variable varies in time several tests for non-PH have been proposed. However, they are not sufficient to derive a model, as appropriate modelling of the shape of time-varying effects is required. In three examples we will compare five recently published strategies to assess whether and how the effects of covariates from a multivariable model vary in time. For practical use we will give some recommendations.  相似文献   

17.
The scatter plot is a well known and easily applicable graphical tool to explore relationships between two quantitative variables. For the exploration of relations between multiple variables, generalisations of the scatter plot are useful. We present an overview of multivariate scatter plots focussing on the following situations. Firstly, we look at a scatter plot for portraying relations between quantitative variables within one data matrix. Secondly, we discuss a similar plot for the case of qualitative variables. Thirdly, we describe scatter plots for the relationships between two sets of variables where we focus on correlations. Finally, we treat plots of the relationships between multiple response and predictor variables, focussing on the matrix of regression coefficients. We will present both known and new results, where an important original contribution concerns a procedure for the inclusion of scales for the variables in multivariate scatter plots. We provide software for drawing such scales. We illustrate the construction and interpretation of the plots by means of examples on data collected in a genomic research program on taste in tomato.  相似文献   

18.
19.
Historical ecological data are valuable for reconstructing early environmental and vegetation community conditions and examining change to vegetation communities and disturbance regimes over decadal and longer temporal scales, but these data are not free from error. We examine the spatial uncertainties associated with 18,000 vegetation plots in the decades-old California Vegetation Type Mapping (VTM) dataset that has been digitized for use in modern ecological analysis. We examine the relationship between plot location error and basemap year, basemap scale, plot elevation, plot slope, and general plot habitat type. Bivariate plots and classification and regression tree analysis (CART) confirm that basemap scale and age are the strongest explanation of total error. Total error in spatial location for all plots ranged from 126.9 m to 462.3 m; plots drawn on 15-min (1:62,500-scale) basemaps had total error ranging from 126 m to 199.7 m, and plots drawn on coarser-scale basemaps (1:125,000-scale) had total errors ranging from 241 m to 461.2 m. Relocation of individual VTM plots is considerably easier for plots originally marked on 1:62,500-scale maps produced after 1904, and more difficult for plots originally marked on 1:125,000-scale maps produced before 1898. Biogeographical analyses that rely less on relocating individual plots, such as environmental niche modeling or multivariate analyses can alleviate some of these concerns, but all researchers using these kinds of data need to consider errors in spatial location of plots. The paper also discusses ways in which the differing spatial error might be reported and visualized by those using the dataset, and how the data might be used in modern environmental niche models.  相似文献   

20.
Analysis with time-to-event data in clinical and epidemiological studies often encounters missing covariate values, and the missing at random assumption is commonly adopted, which assumes that missingness depends on the observed data, including the observed outcome which is the minimum of survival and censoring time. However, it is conceivable that in certain settings, missingness of covariate values is related to the survival time but not to the censoring time. This is especially so when covariate missingness is related to an unmeasured variable affected by the patient's illness and prognosis factors at baseline. If this is the case, then the covariate missingness is not at random as the survival time is censored, and it creates a challenge in data analysis. In this article, we propose an approach to deal with such survival-time-dependent covariate missingness based on the well known Cox proportional hazard model. Our method is based on inverse propensity weighting with the propensity estimated by nonparametric kernel regression. Our estimators are consistent and asymptotically normal, and their finite-sample performance is examined through simulation. An application to a real-data example is included for illustration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号