首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented.  相似文献   

2.
Aims There are numerous grassland ecosystem types on the Tibetan Plateau. These include the alpine meadow and steppe and degraded alpine meadow and steppe. This study aimed at developing a method to estimate aboveground biomass (AGB) for these grasslands from hyperspectral data and to explore the feasibility of applying air/satellite-borne remote sensing techniques to AGB estimation at larger scales.Methods We carried out a field survey to collect hyperspectral reflectance and AGB for five major grassland ecosystems on the Tibetan Plateau and calculated seven narrow-band vegetation indices and the vegetation index based on universal pattern decomposition (VIUPD) from the spectra to estimate AGB. First, we investigated correlations between AGB and each of these vegetation indices to identify the best estimator of AGB for each ecosystem type. Next, we estimated AGB for the five pooled ecosystem types by developing models containing dummy variables. At last, we compared the predictions of simple regression models and the models containing dummy variables to seek an ecosystem type-independent model to improve prediction of AGB for these various grassland ecosystems from hyperspectral measurements.Important findings When we considered each ecosystem type separately, all eight vegetation indices provided good estimates of AGB, with the best predictor of AGB varying among different ecosystems. When AGB of all the five ecosystems was estimated together using a simple linear model, VIUPD showed the lowest prediction error among the eight vegetation indices. The regression models containing dummy variables predicted AGB with higher accuracy than the simple models, which could be attributed to the dummy variables accounting for the effects of ecosystem type on the relationship between AGB and vegetation index (VI). These results suggest that VIUPD is the best predictor of AGB among simple regression models. Moreover, both VIUPD and the soil-adjusted VI could provide accurate estimates of AGB with dummy variables integrated in regression models. Therefore, ground-based hyperspectral measurements are useful for estimating AGB, which indicates the potential of applying satellite/airborne remote sensing techniques to AGB estimation of these grasslands on the Tibetan Plateau.  相似文献   

3.
Multiple components linear least-squares methods have been proposed for the detection of periodic components in nonsinusoidal longitudinal time series. However, a proper test for comparison of parameters obtained from this method for two or more time series is not yet available. Accordingly, we propose two methods, one parametric and one nonparametric, to compare parameters from rhythmometric models with multiple components. The parametric method is based on techniques commonly and generally employed in linear regression analysis. The comparison of parameters among two or more time series is accomplished by the use of so-called dummy variables. The nonparametric method is based on bootstrap techniques. This approach basically tests if the difference in any given parameter obtained by fitting a model with the same periods to two different longitudinal time series differs from zero. This method calculates a confidence interval for the difference in the tested parameter. If this interval does not contain zero, it can be concluded that the parameters obtained from the two time series are different with high probability. An estimation of the p-value for the corresponding test can also be calculated. By the use of similar bootstrap techniques, confidence intervals can also be obtained for any parameter derived from the multiple component fit of several periods to nonsinusoidal longitudinal time series, including the orthophase (peak time), bathyphase (trough time), and global amplitude (difference between the maximum and the minimum) of the fitted model waveform. These methods represent a valuable tool for the comparison of rhythm parameters obtained by multiple component analysis, and they render this approach as a generally applicable one for waveform representation and detection of periodicities in nonsinusoidal, sparse, and noisy longitudinal time series sampled with either equidistant or unequidistant observations.  相似文献   

4.
5.
MIXED MODEL APPROACHES FOR ESTIMATING GENETIC VARIANCES AND COVARIANCES   总被引:62,自引:4,他引:58  
The limitations of methods for analysis of variance(ANOVA)in estimating genetic variances are discussed. Among the three methods(maximum likelihood ML, restricted maximum likelihood REML, and minimum norm quadratic unbiased estimation MINQUE)for mixed linear models, MINQUE method is presented with formulae for estimating variance components and covariances components and for predicting genetic effects. Several genetic models, which cannot be appropriately analyzed by ANOVA methods, are introduced in forms of mixed linear models. Genetic models with independent random effects can be analyzed by MINQUE(1)method whieh is a MINQUE method with all prior values setting 1. MINQUE(1)method can give unbiased estimation for variance components and covariance components, and linear unbiased prediction (LUP) for genetic effects. There are more complicate genetic models for plant seeds which involve correlated random effects. MINQUE(0/1)method, which is a MINQUE method with all prior covariances setting 0 and all prior variances setting 1, is suitable for estimating variance and covariance components in these models. Mixed model approaches have advantage over ANOVA methods for the capacity of analyzing unbalanced data and complicated models. Some problems about estimation and hypothesis test by MINQUE method are discussed.  相似文献   

6.
冠幅是反映单木生长状态及构建林木生长收获模型的重要变量。本研究以辽东山区大边沟林场10~55年生红松人工林为对象,基于66块固定样地的2763株红松的每木检尺数据,选取冠幅基础模型,采用再参数化的方法引入单木竞争指标(Rd),利用哑变量的方法引入了林分密度、林层变量,构建不同分位点(0.50、0.90、0.93、0.95、0.96、0.99)的冠幅分位数回归模型,并与传统方法进行比较,选取模拟林分最大冠幅的最优分位点。为反映林分中单木冠幅在林木个体之间的差异,建立了基于样地水平的最优分位点的线性混合效应分位数回归冠幅模型,分析各变量对单木冠幅的影响。结果表明: 基于F统计检验,不同林分密度和林层的冠幅模型具有显著差异,在基础模型中引入林层、林分密度和竞争后,模型Ra2提高0.0104,均方根误差降低0.0115,均方误差降低为7.4%;与最小二乘法比较,分位数回归模型能够较好地模拟林分状态下的单木最大冠幅,并选出0.96分位点和0.93分位点作为上林层和下林层的分位数回归模型的最优分位点。引入混合效应的线性分位数回归模型的赤池信息准则、贝叶斯信息准则、HQ信息准则等评价指标优于传统分位数回归,参数标准误显著降低,混合效应的引入很好地解释了样地之间的差异。就上林层和下林层而言,林分密度越大,最大冠幅越小;相对直径越大,最大冠幅越大,其中林分密度对下林层的冠幅影响大于上林层,当林分密度足够大时,冠幅随着胸径的增大先增大后降低。本研究构建的基于混合效应的分位数回归模型能有效提高模型的拟合优度,今后可通过调控林分密度、适度抚育间伐等措施,实现对辽东山区红松人工林的科学营建和可持续发展。  相似文献   

7.
Hydrophobicity (logP) as well as quantiative structure-toxicity relationships (QSTRs) of some benzene derivatives acting by narcosis have been established based on narcotic mechanisms of action and toxicity data to the fathead minnow, Daphnia magna and Vibrio fischeri using information-theoretic topological index (Id). Excellent results are obtained in multiparametric regression upon introduction of dummy parameters (indicator variables). Consistent increase in R(2)(A) values indicated that inspite of collinarity between Id and one of the indicator variables (I(3-6)) the proposed models are statistically significant.  相似文献   

8.
Hans C  Dunson DB 《Biometrics》2005,61(4):1018-1026
In regression applications with categorical predictors, interest often focuses on comparing the null hypothesis of homogeneity to an ordered alternative. This article proposes a Bayesian approach for addressing this problem in the setting of normal linear and probit regression models. The regression coefficients are assigned a conditionally conjugate prior density consisting of mixtures of point masses at 0 and truncated normal densities, with a (possibly unknown) changepoint parameter included to accommodate umbrella ordering. Two strategies of prior elicitation are considered: (1) a Bayesian Bonferroni approach in which the probability of the global null hypothesis is specified and local hypotheses are considered independent; and (2) an approach which treats these probabilities as random. A single Gibbs sampling chain can be used to obtain posterior probabilities for the different hypotheses and to estimate regression coefficients and predictive quantities either by model averaging or under the preferred hypothesis. The methods are applied to data from a carcinogenesis study.  相似文献   

9.
An adaptive multivariate test is proposed for a subset of regression coefficients in a linear model. This adaptive method uses the studentized deleted residuals to calculate an appropriate weight for each observation. The weights are then used to compute Wilk's lambda for the weighted model. The adaptive test is performed by permuting the independent variables corresponding to those parameters that are assumed to equal zero in the null hypothesis. The permuted variables are then weighted to obtain a permutation test statistic that is used to estimate the p-value. An example is presented of a multivariate regression that uses systolic and diastolic blood pressure as dependent variables with age and body mass index as independent variables. The simulation results show that the adaptive test maintains its size for the three multivariate error distributions that were used in the study. For normal error models the power of the adaptive test nearly equaled that of the non-adaptive test. For models that used non-normal errors the adaptive test was considerably more powerful than the traditional non-adaptive test.  相似文献   

10.
A new method for estimating skeletal age at death from the morphology of the auricular surface of the ilium is presented. It uses a multiple regression analysis with dummy variables, and is based on the examination of 700 modern Japanese skeletal remains with age records. The observer using this method needs only to check for the presence or absence of nine (for a male) or seven (for a female) features on the auricular surface and to select the parameter estimates of each feature, calculated by multiple regression analysis with dummy variables. The observer can obtain an estimated age from the sum of parameter estimates. It is shown that a fine granular texture of the auricular surface is typical of younger individuals, whereas a heavily porous texture is characteristic of older individuals, and that both of these features are very useful for estimating age. Our method is shown here to be more accurate than other methods, especially in the older age ranges. Since the auricular surface allows more expedient observations than other parts of the skeleton, this new method can be expected to improve the overall accuracy of estimating skeletal age at death.  相似文献   

11.
普通克立格法在昆虫生态学中的应用   总被引:6,自引:3,他引:3  
地统计学是以区域化变量为基础,以变差函数为主要工龄,分析空间相关变量结构的统计方法。在对波动较大的实验变差函数进行拟合时,虽无法获得最优拟合,但运用人机对话的拟合方法来灵活选取参数,可以得到较理想的变差函数模型的参数。本文运用加权多项式回归法以及人机对话的方法,得到了较理想的1级与2级球状模型拟合结果,同时利用直线函数对实验变差函数进行了拟合,最后利用普通Kriging法,对待估计点进行各理论模型的最优、线性、无偏内插估计,得出克立格内插权重。将此方法应用于广东省四会市大沙镇富溪乡试验田稻飞观测数据,由待估点周围若干观测点的数据,有效地估计出待估点的昆虫分布密度,并讨论比较了不同理论模型的拟合效果以及估计误差。结果表明,2级球状模型的拟合最好,一级球状模型次之,直线函数的拟合最差,但直线函数计算最为简便。  相似文献   

12.
Gianola D  Sorensen D 《Genetics》2004,167(3):1407-1424
Multivariate models are of great importance in theoretical and applied quantitative genetics. We extend quantitative genetic theory to accommodate situations in which there is linear feedback or recursiveness between the phenotypes involved in a multivariate system, assuming an infinitesimal, additive, model of inheritance. It is shown that structural parameters defining a simultaneous or recursive system have a bearing on the interpretation of quantitative genetic parameter estimates (e.g., heritability, offspring-parent regression, genetic correlation) when such features are ignored. Matrix representations are given for treating a plethora of feedback-recursive situations. The likelihood function is derived, assuming multivariate normality, and results from econometric theory for parameter identification are adapted to a quantitative genetic setting. A Bayesian treatment with a Markov chain Monte Carlo implementation is suggested for inference and developed. When the system is fully recursive, all conditional posterior distributions are in closed form, so Gibbs sampling is straightforward. If there is feedback, a Metropolis step may be embedded for sampling the structural parameters, since their conditional distributions are unknown. Extensions of the model to discrete random variables and to nonlinear relationships between phenotypes are discussed.  相似文献   

13.
SUMMARY: Differential gene expression detection using microarrays has received lots of research interests recently. Many methods have been proposed, including variants of F-statistics, non-parametric approaches and empirical Bayesian methods etc. The SAM statistics has been shown to have good performance in empirical studies. SAM is more like an ad hoc shrinkage method. The idea is that for small sample microarray data, it is often useful to pool information across genes to improve efficiency. Under Bayesian framework Smyth formally derived the test statistics with shrinkage using the hierarchical models. In this paper we cast differential gene expression detection in the familiar framework of linear regression model. Commonly used test statistics correspond to using least squares to estimate the regression parameters. Based on the vast literature of research on linear models, we can naturally consider other alternatives. Here we explore the penalized linear regression. We propose the penalized t-/F-statistics for two-class microarray data based on [Formula: see text] penalty. We will show that the penalized test statistics intuitively makes sense and through applications we illustrate its good performance. AVAILABILITY: Supplementary information including program codes, more detailed analysis results and R functions for the proposed methods can be found at http://www.biostat.umn.edu/~baolin/research CONTACT: baolin@biostat.umn.edu SUPPLEMENTARY INFORMATION: http://www.biostat.umn.edu/~baolin/research.  相似文献   

14.
Application of Bayes's theorem to the analysis of nonlinear regression models is limited by numerical problems associated with calculation of integrals of functions of several variables. For k-parameter models that are linear in l of the parameters, a dimension-reduction procedure is described for factoring the posterior distribution into the product of a multivariate normal density and a function of k-l nonlinear parameters. Integrals can then be calculated with (k-l)-dimensional numerical integration. A four-parameter, two-compartment pharmacokinetic model of lidocaine disposition is analyzed using a change of variables in order to obtain a model that is linear in two parameters. It is shown that a Bayesian analysis, with reduction of dimensionality, applied to this model produces appropriate results with reasonable CPU-time requirements.  相似文献   

15.
Aim   Although parameter estimates are not as affected by spatial autocorrelation as Type I errors, the change from classical null hypothesis significance testing to model selection under an information theoretic approach does not completely avoid problems caused by spatial autocorrelation. Here we briefly review the model selection approach based on the Akaike information criterion (AIC) and present a new routine for Spatial Analysis in Macroecology (SAM) software that helps establishing minimum adequate models in the presence of spatial autocorrelation.
Innovation    We illustrate how a model selection approach based on the AIC can be used in geographical data by modelling patterns of mammal species in South America represented in a grid system ( n  = 383) with 2° of resolution, as a function of five environmental explanatory variables, performing an exhaustive search of minimum adequate models considering three regression methods: non-spatial ordinary least squares (OLS), spatial eigenvector mapping and the autoregressive (lagged-response) model. The models selected by spatial methods included a smaller number of explanatory variables than the one selected by OLS, and minimum adequate models contain different explanatory variables, although model averaging revealed a similar rank of explanatory variables.
Main conclusions    We stress that the AIC is sensitive to the presence of spatial autocorrelation, generating unstable and overfitted minimum adequate models to describe macroecological data based on non-spatial OLS regression. Alternative regression techniques provided different minimum adequate models and have different uncertainty levels. Despite this, the averaged model based on Akaike weights generates consistent and robust results across different methods and may be the best approach for understanding of macroecological patterns.  相似文献   

16.
Bayesian multimodel inference for geostatistical regression models   总被引:2,自引:0,他引:2  
Johnson DS  Hoeting JA 《PloS one》2011,6(11):e25677
The problem of simultaneous covariate selection and parameter inference for spatial regression models is considered. Previous research has shown that failure to take spatial correlation into account can influence the outcome of standard model selection methods. A Markov chain Monte Carlo (MCMC) method is investigated for the calculation of parameter estimates and posterior model probabilities for spatial regression models. The method can accommodate normal and non-normal response data and a large number of covariates. Thus the method is very flexible and can be used to fit spatial linear models, spatial linear mixed models, and spatial generalized linear mixed models (GLMMs). The Bayesian MCMC method also allows a priori unequal weighting of covariates, which is not possible with many model selection methods such as Akaike's information criterion (AIC). The proposed method is demonstrated on two data sets. The first is the whiptail lizard data set which has been previously analyzed by other researchers investigating model selection methods. Our results confirmed the previous analysis suggesting that sandy soil and ant abundance were strongly associated with lizard abundance. The second data set concerned pollution tolerant fish abundance in relation to several environmental factors. Results indicate that abundance is positively related to Strahler stream order and a habitat quality index. Abundance is negatively related to percent watershed disturbance.  相似文献   

17.
Errors‐in‐variables models in high‐dimensional settings pose two challenges in application. First, the number of observed covariates is larger than the sample size, while only a small number of covariates are true predictors under an assumption of model sparsity. Second, the presence of measurement error can result in severely biased parameter estimates, and also affects the ability of penalized methods such as the lasso to recover the true sparsity pattern. A new estimation procedure called SIMulation‐SELection‐EXtrapolation (SIMSELEX) is proposed. This procedure makes double use of lasso methodology. First, the lasso is used to estimate sparse solutions in the simulation step, after which a group lasso is implemented to do variable selection. The SIMSELEX estimator is shown to perform well in variable selection, and has significantly lower estimation error than naive estimators that ignore measurement error. SIMSELEX can be applied in a variety of errors‐in‐variables settings, including linear models, generalized linear models, and Cox survival models. It is furthermore shown in the Supporting Information how SIMSELEX can be applied to spline‐based regression models. A simulation study is conducted to compare the SIMSELEX estimators to existing methods in the linear and logistic model settings, and to evaluate performance compared to naive methods in the Cox and spline models. Finally, the method is used to analyze a microarray dataset that contains gene expression measurements of favorable histology Wilms tumors.  相似文献   

18.
To test hypotheses regarding relations between meaningful parameters, it is often necessary to calculate these parameters from other directly measured variables. For example, the relationship between O2 consumption and O2 delivery may be of interest, although these may be computed from measurements of cardiac output and blood O2 contents. If a measured variable is used in the calculation of two derived parameters, error in the measurement will couple the calculated parameters and introduce a bias, which can lead to incorrect conclusions. This paper presents a method of correcting for this bias in the linear regression coefficient and the Pearson correlation coefficient when calculations involve the nonlinear and linear combination of the measured variables. The general solution is obtained when the first two terms of a Taylor series expansion of the function can be used to represent the function, as in the case of multiplication. A significance test for the hypothesis that the regression coefficient is equal to zero is also presented. Physiological examples are provided demonstrating this technique, and the correction methods are also applied in simulations to verify the adequacy of the technique and to test for the magnitude of the coupling effect. In two previous studies of O2 consumption and delivery, the effect of coupled error is shown to be small when the range of O2 deliveries studied is large, and measurement errors are of reasonable size.  相似文献   

19.
A fundamental challenge to understanding patterns in ecological systems lies in employing methods that can analyse, test and draw inference from measured associations between variables across scales. Hierarchical linear models (HLM) use advanced estimation algorithms to measure regression relationships and variance–covariance parameters in hierarchically structured data. Although hierarchical models have occasionally been used in the analysis of ecological data, their full potential to describe scales of association, diagnose variance explained, and to partition uncertainty has not been employed. In this paper we argue that the use of the HLM framework can enable significantly improved inference about ecological processes across levels of organization. After briefly describing the principals behind HLM, we give two examples that demonstrate a protocol for building hierarchical models and answering questions about the relationships between variables at multiple scales. The first example employs maximum likelihood methods to construct a two-level linear model predicting herbivore damage to a perennial plant at the individual- and patch-scale; the second example uses Bayesian estimation techniques to develop a three-level logistic model of plant flowering probability across individual plants, microsites and populations. HLM model development and diagnostics illustrate the importance of incorporating scale when modelling associations in ecological systems and offer a sophisticated yet accessible method for studies of populations, communities and ecosystems. We suggest that a greater coupling of hierarchical study designs and hierarchical analysis will yield significant insights on how ecological processes operate across scales.  相似文献   

20.
Shin Y  Raudenbush SW 《Biometrics》2007,63(4):1262-1268
The development of model-based methods for incomplete data has been a seminal contribution to statistical practice. Under the assumption of ignorable missingness, one estimates the joint distribution of the complete data for thetainTheta from the incomplete or observed data y(obs). Many interesting models involve one-to-one transformations of theta. For example, with y(i) approximately N(mu, Sigma) for i= 1, ... , n and theta= (mu, Sigma), an ordinary least squares (OLS) regression model is a one-to-one transformation of theta. Inferences based on such a transformation are equivalent to inferences based on OLS using data multiply imputed from f(y(mis) | y(obs), theta) for missing y(mis). Thus, identification of theta from y(obs) is equivalent to identification of the regression model. In this article, we consider a model for two-level data with continuous outcomes where the observations within each cluster are dependent. The parameters of the hierarchical linear model (HLM) of interest, however, lie in a subspace of Theta in general. This identification of the joint distribution overidentifies the HLM. We show how to characterize the joint distribution so that its parameters are a one-to-one transformation of the parameters of the HLM. This leads to efficient estimation of the HLM from incomplete data using either the transformation method or the method of multiple imputation. The approach allows outcomes and covariates to be missing at either of the two levels, and the HLM of interest can involve the regression of any subset of variables on a disjoint subset of variables conceived as covariates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号