首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 697 毫秒
1.
Spatial autocorrelation and red herrings in geographical ecology   总被引:14,自引:1,他引:13  
Aim Spatial autocorrelation in ecological data can inflate Type I errors in statistical analyses. There has also been a recent claim that spatial autocorrelation generates ‘red herrings’, such that virtually all past analyses are flawed. We consider the origins of this phenomenon, the implications of spatial autocorrelation for macro‐scale patterns of species diversity and set out a clarification of the statistical problems generated by its presence. Location To illustrate the issues involved, we analyse the species richness of the birds of western/central Europe, north Africa and the Middle East. Methods Spatial correlograms for richness and five environmental variables were generated using Moran's I coefficients. Multiple regression, using both ordinary least‐squares (OLS) and generalized least squares (GLS) assuming a spatial structure in the residuals, were used to identify the strongest predictors of richness. Autocorrelation analyses of the residuals obtained after stepwise OLS regression were undertaken, and the ranks of variables in the full OLS and GLS models were compared. Results Bird richness is characterized by a quadratic north–south gradient. Spatial correlograms usually had positive autocorrelation up to c. 1600 km. Including the environmental variables successively in the OLS model reduced spatial autocorrelation in the residuals to non‐detectable levels, indicating that the variables explained all spatial structure in the data. In principle, if residuals are not autocorrelated then OLS is a special case of GLS. However, our comparison between OLS and GLS models including all environmental variables revealed that GLS de‐emphasized predictors with strong autocorrelation and long‐distance clinal structures, giving more importance to variables acting at smaller geographical scales. Conclusion Although spatial autocorrelation should always be investigated, it does not necessarily generate bias. Rather, it can be a useful tool to investigate mechanisms operating on richness at different spatial scales. Claims that analyses that do not take into account spatial autocorrelation are flawed are without foundation.  相似文献   

2.
In isothermal titration calorimetry (ITC), the two main sources of random (statistical) error are associated with the extraction of the heat q from the measured temperature changes and with the delivery of metered volumes of titrant. The former leads to uncertainty that is approximately constant and the latter to uncertainty that is proportional to q. The role of these errors in the analysis of ITC data by nonlinear least squares is examined for the case of 1:1 binding, M+X right arrow over left arrow MX. The standard errors in the key parameters-the equilibrium constant Ko and the enthalpy DeltaHo-are assessed from the variance-covariance matrix computed for exactly fitting data. Monte Carlo calculations confirm that these "exact" estimates will normally suffice and show further that neglect of weights in the nonlinear fitting can result in significant loss of efficiency. The effects of the titrant volume error are strongly dependent on assumptions about the nature of this error: If it is random in the integral volume instead of the differential volume, correlated least-squares is required for proper analysis, and the parameter standard errors decrease with increasing number of titration steps rather than increase.  相似文献   

3.
Short phylogenetic distances between taxa occur, for example, in studies on ribosomal RNA-genes with slow substitution rates. For consistently short distances, it is proved that in the completely singular limit of the covariance matrix ordinary least squares (OLS) estimates are minimum variance or best linear unbiased (BLU) estimates of phylogenetic tree branch lengths. Although OLS estimates are in this situation equal to generalized least squares (GLS) estimates, the GLS chi-square likelihood ratio test will be inapplicable as it is associated with zero degrees of freedom. Consequently, an OLS normal distribution test or an analogous bootstrap approach will provide optimal branch length tests of significance for consistently short phylogenetic distances. As the asymptotic covariances between branch lengths will be equal to zero, it follows that the product rule can be used in tree evaluation to calculate an approximate simultaneous confidence probability that all interior branches are positive.  相似文献   

4.
Stocks of commercial fish are often modelled using sampling data of various types, of unknown precision, and from various sources assumed independent. We want each set to contribute to estimates of the parameters in relation to its precision and goodness of fit with the model. Iterative re-weighting of the sets is proposed for linear models until the weight of each set is found to be proportional to (relative weighting) or equal to (absolute weighting) the set-specific residual invariances resulting from a generalised least squares fit. Formulae for the residual variances are put forward involving fractional allocation of degrees of freedom depending on the numbers of independent observations in each set, the numbers of sets contributing to the estimate of each parameter, and the number of weights estimated. To illustrate the procedure, numbers of the 1984 year-class of North Sea cod (a) landed commercially each year, and (b) caught per unit of trawling time by an annual groundfish survey are modelled as a function of age to estimate total mortality, Z, relative catching power of the two fishing methods, and relative precision of the two sets of observations as indices of stock abundance. It was found that the survey abundance indices displayed residual variance about 29 times higher than that of the annual landings.  相似文献   

5.
Abstract Comparative methods are widely used in ecology and evolution. The most frequently used comparative methods are based on an explicit evolutionary model. However, recent approaches have been popularized that are without an evolutionary basis or an underlying null model. Here we highlight the limitations of such techniques in comparative analyses by using simulations to compare two commonly used comparative methods with and without evolutionary basis, respectively: generalized least squares (GLS) and phylogenetic eigenvector regression (PVR). We find that GLS methods are more efficient at estimating model parameters and produce lower variance in parameter estimates, lower phylogenetic signal in residuals, and lower Type I error rates than PVR methods. These results can very likely be generalized to eigenvector methods that control for space and both space and phylogeny. We highlight that GLS methods can be adapted in numerous ways and that the variance structure used in these models can be flexibly optimized to each data set.  相似文献   

6.
Aim  In their recent paper, Kissling & Carl (2008 ) recommended the spatial error simultaneous autoregressive model (SARerr) over ordinary least squares (OLS) for modelling species distribution. We compared these models with the generalized least squares model (GLS) and a variant of SAR (SARvario). GLS and SARvario are superior to standard implementations of SAR because the spatial covariance structure is described by a semivariogram model.
Innovation  We used the complete datasets employed by Kissling & Carl (2008 ), with strong spatial autocorrelation, and two datasets in which the spatial structure was degraded by sample reduction and grid coarsening. GLS performed consistently better than OLS, SARerr and SARvario in all datasets, especially in terms of goodness of fit. SARvario was marginally better than SARerr in the degraded datasets.
Main conclusions  GLS was more reliable than SAR-based models, so its use is recommended when dealing with spatially autocorrelated data.  相似文献   

7.

Background

Questions about the reliability of parametric standard errors (SEs) from nonlinear least squares (LS) algorithms have led to a general mistrust of these precision estimators that is often unwarranted.

Methods

The importance of non-Gaussian parameter distributions is illustrated by converting linear models to nonlinear by substituting eA, ln A, and 1/A for a linear parameter a. Monte Carlo (MC) simulations characterize parameter distributions in more complex cases, including when data have varying uncertainty and should be weighted, but weights are neglected. This situation leads to loss of precision and erroneous parametric SEs, as is illustrated for the Lineweaver-Burk analysis of enzyme kinetics data and the analysis of isothermal titration calorimetry data.

Results

Non-Gaussian parameter distributions are generally asymmetric and biased. However, when the parametric SE is < 10% of the magnitude of the parameter, both the bias and the asymmetry can usually be ignored. Sometimes nonlinear estimators can be redefined to give more normal distributions and better convergence properties.

Conclusion

Variable data uncertainty, or heteroscedasticity, can sometimes be handled by data transforms but more generally requires weighted LS, which in turn require knowledge of the data variance.

General significance

Parametric SEs are rigorously correct in linear LS under the usual assumptions, and are a trustworthy approximation in nonlinear LS provided they are sufficiently small — a condition favored by the abundant, precise data routinely collected in many modern instrumental methods.  相似文献   

8.
O'Neill ME  Mathews KL 《Biometrics》2002,58(1):216-224
This article develops a weighted least squares version of Levene's test of homogeneity of variance for a general design, available both for univariate and multivariate situations. When the design is balanced, the univariate and two common multivariate test statistics turn out to be proportional to the corresponding ordinary least squares test statistics obtained from an analysis of variance of the absolute values of the standardized mean-based residuals from the original analysis of the data. The constant of proportionality is simply a design-dependent multiplier (which does not necessarily tend to unity). Explicit results are presented for randomized block and Latin square designs and are illustrated for factorial treatment designs and split-plot experiments. The distribution of the univariate test statistic is close to a standard F-distribution, although it can be slightly underdispersed. For a complex design, the test assesses homogeneity of variance across blocks, treatments, or treatment factors and offers an objective interpretation of residual plots.  相似文献   

9.
The experimental variance of enzymic steady-state kinetic experiments depends on velocity as approximated by a power function (Var(v) = K1 . valpha (Askel?f, P., Korsfeldt, M. and Mannervik, B. (1976) Eur. J. Biochem. 69, 61--67). The values of the constants (K1, alpha) can be estimated by making replicate measurements of velocity, and the inverse of the function can then be used as a weighting factor. In order to avoid measurement of a large number of replicates to establish the error structure of a kinetic data set, a different approach was tested. After a preliminary regression using a 'good model', which satisfies reasonable goodness-of-fit criteria, the residuals were taken to represent the experimental error. The neighbouring residuals were grouped together and the sum of their mean squared values was used as a measure of the variance in the neighbourhood of the corresponding measurements. The values of the constants obtained in this way agreed with those obtained by replicates.  相似文献   

10.
Analysis of fluorescence decay kinetics aims at the determination of the analytic expression and the numerical values of the pertinent parameters which describe the decay process. In the well-known method of least-squares, one assumes a plausible functional form for the decay data and adjusts the values of the parameters until the statistically best fit is obtained between the data and the calculated decay function, i.e., until the sum of the weighted squares of the residuals is at a minimum. It is shown that proper weighting of the squares of the residuals may markedly improve the quality of the analysis. Such weighting requires information about the character of the experimental noise, which is often available, e.g., when the noise is due to counting error in photon-counting techniques. Furthermore, dramatic improvements in the accuracy of the analysis may often be achieved by use of auxiliary information available about the system studied. For example, the preexponents in a multiexponential fluorescence decay of a mixture of chromophores (such as tryptophan residues in a protein molecule) may sometimes be estimated independently; much higher accuracy can then be attained for the decay lifetimes by analysis of the decay kinetics. It is proposed that the shape of the autocorrelation function of the weighted residuals may serve as a convenient criterion for the quality of fit between the experimental data and the decay function obtained by analysis. The above conclusions were reached by analysis of computer-simulated experiments, and the usefulness of this approach is illustrated. The importance of stating the uncertainties in the estimated parameters inherent in the analysis of decay kinetics is stressed.  相似文献   

11.
Huang J  Ma S  Xie H 《Biometrics》2006,62(3):813-820
We consider two regularization approaches, the LASSO and the threshold-gradient-directed regularization, for estimation and variable selection in the accelerated failure time model with multiple covariates based on Stute's weighted least squares method. The Stute estimator uses Kaplan-Meier weights to account for censoring in the least squares criterion. The weighted least squares objective function makes the adaptation of this approach to multiple covariate settings computationally feasible. We use V-fold cross-validation and a modified Akaike's Information Criterion for tuning parameter selection, and a bootstrap approach for variance estimation. The proposed method is evaluated using simulations and demonstrated on a real data example.  相似文献   

12.
Susko E 《Systematic biology》2011,60(5):668-675
Generalized least squares (GLS) methods provide a relatively fast means of constructing a confidence set of topologies. Because they utilize information about the covariances between distances, it is reasonable to expect additional efficiency in estimation and confidence set construction relative to other least squares (LS) methods. Difficulties have been found to arise in a number of practical settings due to estimates of covariance matrices being ill conditioned or even noninvertible. We present here new ways of estimating the covariance matrices for distances that are much more likely to be positive definite, as the actual covariance matrices are. A thorough investigation of performance is also conducted. An alternative to GLS that has been proposed for constructing confidence sets of topologies is weighted least squares (WLS). As currently implemented, this approach is equivalent to the use of GLS but with covariances set to zero rather than being estimated. In effect, this approach assumes normality of the estimated distances and zero covariances. As the results here illustrate, this assumption leads to poor performance. A 95% confidence set is almost certain to contain the true topology but will contain many more topologies than are needed. On the other hand, the results here also indicate that, among LS methods, WLS performs quite well at estimating the correct topology. It turns out to be possible to improve the performance of WLS for confidence set construction through a relatively inexpensive normal parametric bootstrap that utilizes the same variances and covariances of GLS. The resulting procedure is shown to perform at least as well as GLS and thus provides a reasonable alternative in cases where covariance matrices are ill conditioned.  相似文献   

13.
The major and structurally unique glucosinolate (GLS) in leaves of Eruca sativa L. (salad rocket) was identified as 4-mercaptobutyl GLS. Both 4-methylthiobutyl GLS and 4-methylsulfinylbutyl GLS were also present, but at lower concentrations. The 4-mercaptobutyl GLS was observed to oxidise under common GLS extraction conditions, generating a disulfide GLS that may be reduced efficiently by tris(2-carboxyethyl) phosphine hydrochloride (TCEP) to reform the parent molecule. The identities of 4-mercaptobutyl GLS and of the corresponding dimeric GLS were confirmed by LC/MS, MS/MS and NMR. Myrosinase treatment of an enriched GLS fraction or of the purified dimer GLS generated a mixture of unique bi-functional disulfides, including bis-(4-isothiocyanatobutyl) disulfide (previously identified elsewhere). TCEP reduction of the purified dimer, followed by myrosinase treatment, yielded only 4-mercaptobutyl ITC. GLS-derived volatiles generated by autolysis of fresh seedlings and true leaves were 4-mercaptobutyl ITC (from the newly identified GLS), 4-methylthiobutyl ITC (from 4-methylthiobutyl GLS) and 4-methylsulfinylbutyl ITC (from 4-methylsulfinyl-butyl GLS); no unusual bi-functional disulfides were found in fresh leaf autolysate. These results led to the conclusion that, in planta, the new GLS must be present as 4-mercaptobutyl GLS and not as the disulfide found after extraction and sample concentration. This new GLS and its isothiocyanate are likely to contribute to the unique odour and flavour of E. sativa.  相似文献   

14.
A method for fitting regression models to data that exhibit spatial correlation and heteroskedasticity is proposed. It is well known that ignoring a nonconstant variance does not bias least-squares estimates of regression parameters; thus, data analysts are easily lead to the false belief that moderate heteroskedasticity can generally be ignored. Unfortunately, ignoring nonconstant variance when fitting variograms can seriously bias estimated correlation functions. By modeling heteroskedasticity and standardizing by estimated standard deviations, our approach eliminates this bias in the correlations. A combination of parametric and nonparametric regression techniques is used to iteratively estimate the various components of the model. The approach is demonstrated on a large data set of predicted nitrogen runoff from agricultural lands in the Midwest and Northern Plains regions of the U.S.A. For this data set, the model comprises three main components: (1) the mean function, which includes farming practice variables, local soil and climate characteristics, and the nitrogen application treatment, is assumed to be linear in the parameters and is fitted by generalized least squares; (2) the variance function, which contains a local and a spatial component whose shapes are left unspecified, is estimated by local linear regression; and (3) the spatial correlation function is estimated by fitting a parametric variogram model to the standardized residuals, with the standardization adjusting the variogram for the presence of heteroskedasticity. The fitting of these three components is iterated until convergence. The model provides an improved fit to the data compared with a previous model that ignored the heteroskedasticity and the spatial correlation.  相似文献   

15.

Background  

The least squares (LS) method for constructing confidence sets of trees is closely related to LS tree building methods, in which the goodness of fit of the distances measured on the tree (patristic distances) to the observed distances between taxa is the criterion used for selecting the best topology. The generalized LS (GLS) method for topology testing is often frustrated by the computational difficulties in calculating the covariance matrix and its inverse, which in practice requires approximations. The weighted LS (WLS) allows for a more efficient albeit approximate calculation of the test statistic by ignoring the covariances between the distances.  相似文献   

16.
The simultaneous estimation of individual growth curves and a mean growth curve is accomplished by weighted least squares. A polynomial curve is fitted for each individual and the polynomial parameters are linear functions of parameters corresponding to covariates. A simple, computationally efficient variance-covariance estimator is derived. The resultant estimate is used in the weighted least squares estimation. The results are compared to empirical Bayes estimation.  相似文献   

17.
1.
1. A general purpose digital computer program is described for application to biological experiments that require a non-linear regression analysis. The mathematical function, or model, to be fitted to a given set of experimental data is written as a section within the program. Given initial estimates for the parameters of the function, the program uses an iterative procedure to adjust the parameters until the sum of squares of residuals has converged to a minimum.  相似文献   

18.
1. The normalization of biochemical data to weight them appropriately for parameter estimation is considered, with reference particularly to data from tracer kinetics and enzyme kinetics. If the data are in replicate, it is recommended that the sum of squared deviations for each experimental variable at each time or concentration point is divided by the local variance at that point. 2. If there is only one observation for each variable at each sampling point, normalization may still be required if the observations cover more than one order of magnitude, but there is no absolute criterion for judging the effect of the weighting that is produced. The goodness of fit that is produced by minimizing the weighted sum of squares of deviations must be judged subjectively. It is suggested that the goodness of fit may be regarded as satisfactory if the data points are distributed uniformly on either side of the fitted curve. A chi-square test may be used to decide whether the distribution is abnormal. The proportion of the residual variance associated with points on one or other side of the fitted curve may also be taken into account, because this gives an indication of the sensitivity of the residual variance to movement of the curve away from particular data points. These criteria for judging the effect of weighting are only valid if the model equation may reasonably be expected to apply to all the data points. 3. On this basis, normalizing by dividing the deviation for each data point by the experimental observation or by the equivalent value calculated by the model equation may both be shown to produce a consistent bias for numerically small observations, the former biasing the curve towards the smallest observations, the latter tending to produce a curve that is above the numerically smaller data points. It was found that dividing each deviation by the mean of observed and calculated variable appropriate to it produces a weighting that is fairly free from bias as judged by the criteria mentioned above. This normalization factor was tested on published data from both tracer kinetics and enzyme kinetics.  相似文献   

19.
Investigation of protein‐ligand interactions obtained from experiments has a crucial part in the design of newly discovered and effective drugs. Analyzing the data extracted from known interactions could help scientists to predict the binding affinities of promising ligands before conducting experiments. The objective of this study is to advance the CIFAP (compressed images for affinity prediction) method, which is relevant to a protein‐ligand model, identifying 2D electrostatic potential images by separating the binding site of protein‐ligand complexes and using the images for predicting the computational affinity information represented by pIC50 values. The CIFAP method has 2 phases, namely, data modeling and prediction. In data modeling phase, the separated 3D structure of the binding pocket with the ligand inside is fitted into an electrostatic potential grid box, which is then compressed through 3 orthogonal directions into three 2D images for each protein‐ligand complex. Sequential floating forward selection technique is performed for acquiring prediction patterns from the images. In the prediction phase, support vector regression (SVR) and partial least squares regression are used for testing the quality of the CIFAP method for predicting the binding affinity of 45 CHK1 inhibitors derived from 2‐aminothiazole‐4‐carboxamide. The results show that the CIFAP method using both support vector regression and partial least squares regression is very effective for predicting the binding affinities of CHK1‐ligand complexes with low‐error values and high correlation. As a future work, the results could be improved by working on the pose of the ligands inside the grid.  相似文献   

20.
New methods are used to compare seven qPCR analysis methods for their performance in estimating the quantification cycle (Cq) and amplification efficiency (E) for a large test data set (94 samples for each of 4 dilutions) from a recent study. Precision and linearity are assessed using chi-square (χ2), which is the minimized quantity in least-squares (LS) fitting, equivalent to the variance in unweighted LS, and commonly used to define statistical efficiency. All methods yield Cqs that vary strongly in precision with the starting concentration N0, requiring weighted LS for proper calibration fitting of Cq vs log(N0). Then χ2 for cubic calibration fits compares the inherent precision of the Cqs, while increases in χ2 for quadratic and linear fits show the significance of nonlinearity. Nonlinearity is further manifested in unphysical estimates of E from the same Cq data, results which also challenge a tenet of all qPCR analysis methods — that E is constant throughout the baseline region. Constant-threshold (Ct) methods underperform the other methods when the data vary considerably in scale, as these data do.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号