首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
Percentage is widely used to describe different results in food microbiology, e.g., probability of microbial growth, percent inactivated, and percent of positive samples. Four sets of percentage data, percent-growth-positive, germination extent, probability for one cell to grow, and maximum fraction of positive tubes, were obtained from our own experiments and the literature. These data were modeled using linear and logistic regression. Five methods were used to compare the goodness of fit of the two models: percentage of predictions closer to observations, range of the differences (predicted value minus observed value), deviation of the model, linear regression between the observed and predicted values, and bias and accuracy factors. Logistic regression was a better predictor of at least 78% of the observations in all four data sets. In all cases, the deviation of logistic models was much smaller. The linear correlation between observations and logistic predictions was always stronger. Validation (accomplished using part of one data set) also demonstrated that the logistic model was more accurate in predicting new data points. Bias and accuracy factors were found to be less informative when evaluating models developed for percentage data, since neither of these indices can compare predictions at zero. Model simplification for the logistic model was demonstrated with one data set. The simplified model was as powerful in making predictions as the full linear model, and it also gave clearer insight in determining the key experimental factors.  相似文献   

2.
Neuropeptides are an important class of signaling molecules that result from complex and variable posttranslational processing of precursor proteins and thus are difficult to identify based solely on genomic information. Bioinformatics prediction of precursor cleavage sites can support effective biochemical characterization of neuropeptides. Neuropeptide cleavage models were developed using comprehensive human, mouse, rat, and cattle precursor data sets and used to compare predicted neuropeptide processing across these species. Logistic regression and artificial neural network models were used to predict cleavages based on amino acid and physiochemical properties of amino acids at precursor sequence locations proximal to cleavage. Correct cleavage classification rates across species and models ranged from 85% to 100%, suggesting that amino acid and amino acid properties have major impact on the probability of cleavage and that these factors have comparable effects in human, mouse, rat, and cattle. The variable accuracy of each species-specific model to predict cleavage sites indicated that there are species- and precursor-specific processing patterns. Prediction of mouse cleavages using rat models was highly accurate, yet the reverse was not observed. Sensitivity and specificity revealed that logistic models are well suited to maximize the rate of true noncleavage predictions with moderate rates of true cleavage predictions; meanwhile, artificial neural networks maximize the rate of true cleavage predictions with moderate to low true noncleavage predictions. Logistic models also provided insights into the strength of the amino acid associations with cleavage. Prediction of neuropeptide cleavage sites using human, mouse, rat, and cattle models are available at . Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users. Allison Tegge and Bruce Southey contributed equally to this work.  相似文献   

3.
Although multicenter data are common, many prediction model studies ignore this during model development. The objective of this study is to evaluate the predictive performance of regression methods for developing clinical risk prediction models using multicenter data, and provide guidelines for practice. We compared the predictive performance of standard logistic regression, generalized estimating equations, random intercept logistic regression, and fixed effects logistic regression. First, we presented a case study on the diagnosis of ovarian cancer. Subsequently, a simulation study investigated the performance of the different models as a function of the amount of clustering, development sample size, distribution of center-specific intercepts, the presence of a center-predictor interaction, and the presence of a dependency between center effects and predictors. The results showed that when sample sizes were sufficiently large, conditional models yielded calibrated predictions, whereas marginal models yielded miscalibrated predictions. Small sample sizes led to overfitting and unreliable predictions. This miscalibration was worse with more heavily clustered data. Calibration of random intercept logistic regression was better than that of standard logistic regression even when center-specific intercepts were not normally distributed, a center-predictor interaction was present, center effects and predictors were dependent, or when the model was applied in a new center. Therefore, to make reliable predictions in a specific center, we recommend random intercept logistic regression.  相似文献   

4.
5.
Bayesian multimodel inference for geostatistical regression models   总被引:2,自引:0,他引:2  
Johnson DS  Hoeting JA 《PloS one》2011,6(11):e25677
The problem of simultaneous covariate selection and parameter inference for spatial regression models is considered. Previous research has shown that failure to take spatial correlation into account can influence the outcome of standard model selection methods. A Markov chain Monte Carlo (MCMC) method is investigated for the calculation of parameter estimates and posterior model probabilities for spatial regression models. The method can accommodate normal and non-normal response data and a large number of covariates. Thus the method is very flexible and can be used to fit spatial linear models, spatial linear mixed models, and spatial generalized linear mixed models (GLMMs). The Bayesian MCMC method also allows a priori unequal weighting of covariates, which is not possible with many model selection methods such as Akaike's information criterion (AIC). The proposed method is demonstrated on two data sets. The first is the whiptail lizard data set which has been previously analyzed by other researchers investigating model selection methods. Our results confirmed the previous analysis suggesting that sandy soil and ant abundance were strongly associated with lizard abundance. The second data set concerned pollution tolerant fish abundance in relation to several environmental factors. Results indicate that abundance is positively related to Strahler stream order and a habitat quality index. Abundance is negatively related to percent watershed disturbance.  相似文献   

6.
Aim We modelled the relationship of breeding evidence for five species of forest songbirds (ruby-crowned kinglet (Regulus calendula) Blackburnian warbler (Dendroica fusca), black-throated blue warbler (Dendroica caerulescens), bay-breasted warbler (Dendrioca castanea) and Connecticut warbler (Oporornis agilis)) and a variety of macro-climate variables to examine the importance of climate as a factor determining distribution of breeding in these species and to assess the usefulness of spatial predictions generated from these models. Location Modelling was conducted over the entire province of Ontario, Canada, an area of ≈900,000 km2. Methods Data on the distribution of breeding in the province was derived from the Breeding Bird Atlas of Ontario. We used logistic regression to model the relationship between the probability of breeding (assessed in 10 km×10 km blocks) and estimates of a variety of climate variables at the same scale. Models were selected that had the least number of explanatory variables while at the same time having close to the best possible classification accuracy. Results The final models for these five species had from one to six explanatory variables and an overall concordance of 70.4% to 86.3% indicating a good classification accuracy. Results from subsampling 50% of the original data ten times indicate that (1) the classification accuracy of the model for data used to generate the model is not very sensitive to the specific observations used to generate the model (2) the classification accuracy of test data is close to the classification accuracy of the model data and (3) the classification accuracy of the test data is not dependent on the specific observations used to generate the model. We generated a spatial prediction of the probability of occurrence of each species for Ontario using the relationships defined by the logistic regression models and using 1 km gridded estimates of the necessary climate variables. These probability maps closely matched the maps of observed evidence of breeding from the Atlas. Main conclusions Although mechanisms controlling breeding distribution cannot be determined using this method, we can conclude that (1) macro-climate is an important factor directly and/or indirectly determining distribution of breeding in these species and (2) spatial predictions of probability of breeding are accurate enough to be useful in predicting probability of breeding in unsampled areas.  相似文献   

7.
我国林火发生预测模型研究进展   总被引:2,自引:0,他引:2  
通过文献回顾,总结了国内林火发生预测模型的研究现状,并从林火发生驱动因子、林火发生概率预测模型、林火发生频次预测模型和模型检验方法等方面进行归纳分析。得出以下结论: 1)气象、地形、植被、可燃物、人类活动等因素是影响林火发生及模型预测精度的主要驱动因子;2)林火发生概率模型中,地理加权逻辑斯蒂回归模型考虑了变量之间的空间相关性,Gompit回归模型适宜非对称结构的林火数据,随机森林模型不需要多重共线性检验,在避免过度拟合的同时提高了预测精度,是林火发生概率预测模型的优选方法之一;3)林火发生频次模型中,负二项回归模型更适合对过度离散数据进行模拟,零膨胀模型和栅栏模型可以处理林火数据中包含大量零值的问题;4)ROC检验、AIC检验、似然比检验和Wald检验方法是林火概率和频次模型的常用检验方法。林火发生预测模型研究仍是我国当前林火管理工作的重点,预测模型的选择需要依据不同地区林火数据特点。此外,构建林火预测模型时需要考虑更多的影响因素,以提高模型预测精度;未来,需要进一步探索其他数学模型在林火发生预测中的应用,不断提高林火发生预测模型的准确度。  相似文献   

8.
An estimate of the risk, adjusted for confounders, can be obtained from a fitted logistic regression model, but it substantially over-estimates when the outcome is not rare. The log binomial model, binomial errors and log link, is increasingly being used for this purpose. However this model's performance, goodness of fit tests and case-wise diagnostics have not been studied. Extensive simulations are used to compare the performance of the log binomial, a logistic regression based method proposed by Schouten et al. (1993) and a Poisson regression approach proposed by Zou (2004) and Carter, Lipsitz, and Tilley (2005). Log binomial regression resulted in "failure" rates (non-convergence, out-of-bounds predicted probabilities) as high as 59%. Estimates by the method of Schouten et al. (1993) produced fitted log binomial probabilities greater than unity in up to 19% of samples to which a log binomial model had been successfully fit and in up to 78% of samples when the log binomial model fit failed. Similar percentages were observed for the Poisson regression approach. Coefficient and standard error estimates from the three models were similar. Rejection rates for goodness of fit tests for log binomial fit were around 5%. Power of goodness of fit tests was modest when an incorrect logistic regression model was fit. Examples demonstrate the use of the methods. Uncritical use of the log binomial regression model is not recommended.  相似文献   

9.
A new multi-model approach (MMA) for sweat loss prediction is proposed to improve prediction accuracy. MMA was computed as the average of sweat loss predicted by two existing thermoregulation models: i.e., the rational model SCENARIO and the empirical model Heat Strain Decision Aid (HSDA). Three independent physiological datasets, a total of 44 trials, were used to compare predictions by MMA, SCENARIO, and HSDA. The observed sweat losses were collected under different combinations of uniform ensembles, environmental conditions (15–40°C, RH 25–75%), and exercise intensities (250–600 W). Root mean square deviation (RMSD), residual plots, and paired t tests were used to compare predictions with observations. Overall, MMA reduced RMSD by 30–39% in comparison with either SCENARIO or HSDA, and increased the prediction accuracy to 66% from 34% or 55%. Of the MMA predictions, 70% fell within the range of mean observed value ± SD, while only 43% of SCENARIO and 50% of HSDA predictions fell within the same range. Paired t tests showed that differences between observations and MMA predictions were not significant, but differences between observations and SCENARIO or HSDA predictions were significantly different for two datasets. Thus, MMA predicted sweat loss more accurately than either of the two single models for the three datasets used. Future work will be to evaluate MMA using additional physiological data to expand the scope of populations and conditions.  相似文献   

10.
Large-scale surveys, such as national forest inventories and vegetation monitoring programs, usually have complex sampling designs that include geographical stratification and units organized in clusters. When models are developed using data from such programs, a key question is whether or not to utilize design information when analyzing the relationship between a response variable and a set of covariates. Standard statistical regression methods often fail to account for complex sampling designs, which may lead to severely biased estimators of model coefficients. Furthermore, ignoring that data are spatially correlated within clusters may underestimate the standard errors of regression coefficient estimates, with a risk for drawing wrong conclusions. We first review general approaches that account for complex sampling designs, e.g. methods using probability weighting, and stress the need to explore the effects of the sampling design when applying logistic regression models. We then use Monte Carlo simulation to compare the performance of the standard logistic regression model with two approaches to model correlated binary responses, i.e. cluster-specific and population-averaged logistic regression models. As an example, we analyze the occurrence of epiphytic hair lichens in the genus Bryoria; an indicator of forest ecosystem integrity. Based on data from the National Forest Inventory (NFI) for the period 1993–2014 we generated a data set on hair lichen occurrence on  >100,000 Picea abies trees distributed throughout Sweden. The NFI data included ten covariates representing forest structure and climate variables potentially affecting lichen occurrence. Our analyses show the importance of taking complex sampling designs and correlated binary responses into account in logistic regression modeling to avoid the risk of obtaining notably biased parameter estimators and standard errors, and erroneous interpretations about factors affecting e.g. hair lichen occurrence. We recommend comparisons of unweighted and weighted logistic regression analyses as an essential step in development of models based on data from large-scale surveys.  相似文献   

11.
Over 90 percent of the more than 250,000 hip fractures that occur annually in the United States are the result of falls from standing height. Despite this, the stresses associated with femoral fracture from a fall have not been investigated previously. Our objectives were to use three-dimensional finite element models of the proximal femur (with geometries and material properties based directly on quantitative computed tomography) to compare predicted stress distributions for one-legged stance and for a fall to the lateral greater trochanter. We also wished to test the correspondence between model predictions and in vitro strain gage data and failure loads for cadaveric femora subjected to these loading conditions. An additional goal was to use the model predictions to compare the sensitivity of several imaging sites in the proximal femur which are used for the in vivo prediction of hip fracture risk. In this first of two parts, linear finite element models of two unpaired human cadaveric femora were generated. In Part II, the models were extended to include nonlinear material properties for the cortical and trabecular bone. While there was poor correspondence between strain gage data and model predictions, there was excellent agreement between the in vitro failure data and the linear model, especially using a von Mises effective strain failure criterion. Both the onset of structural yielding (within 22 and 4 percent) and the load at fracture (within 8 and 5 percent) were predicted accurately for the two femora tested. For the simulation of one-legged stance, the peak stresses occurred in the primary compressive trabeculae of the subcapital region.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

12.
Joint regression analysis of correlated data using Gaussian copulas   总被引:2,自引:0,他引:2  
Song PX  Li M  Yuan Y 《Biometrics》2009,65(1):60-68
Summary .  This article concerns a new joint modeling approach for correlated data analysis. Utilizing Gaussian copulas, we present a unified and flexible machinery to integrate separate one-dimensional generalized linear models (GLMs) into a joint regression analysis of continuous, discrete, and mixed correlated outcomes. This essentially leads to a multivariate analogue of the univariate GLM theory and hence an efficiency gain in the estimation of regression coefficients. The availability of joint probability models enables us to develop a full maximum likelihood inference. Numerical illustrations are focused on regression models for discrete correlated data, including multidimensional logistic regression models and a joint model for mixed normal and binary outcomes. In the simulation studies, the proposed copula-based joint model is compared to the popular generalized estimating equations, which is a moment-based estimating equation method to join univariate GLMs. Two real-world data examples are used in the illustration.  相似文献   

13.
14.
Hardegree SP 《Annals of botany》2006,97(6):1115-1125
BACKGROUND AND AIMS: The purpose of this study was to compare the relative accuracy of different thermal-germination models in predicting germination-time under constant-temperature conditions. Of specific interest was the assessment of shape assumptions associated with the cardinal-temperature germination model and probit distribution often used to distribute thermal coefficients among seed subpopulations. METHODS: The seeds of four rangeland grass species were germinated over the constant-temperature range of 3-38 degrees C and monitored for subpopulation variability in germination-rate response. Subpopulation-specific germination rate was estimated as a function of temperature and residual model error for three variations of the cardinal-temperature model, non-linear regression and piece-wise linear regression. The data were used to test relative model fit under alternative assumptions regarding model shape. KEY RESULTS: In general, optimal model fit was obtained by limiting model-shape assumptions. All models were relatively accurate in the sub-optimal temperature range except in the 3 degrees C treatment where predicted germination times were in error by as much as 70 d for the cardinal-temperature models. CONCLUSIONS: Germination model selection should be driven by research objectives. Cardinal-temperature models yield coefficients that can be directly compared for purposes of screening germplasm. Other model formulations, however, may be more accurate in predicting germination-time, especially at low temperatures where small errors in predicted rate can result in relatively large errors in germination time.  相似文献   

15.
16.
J M Neuhaus  N P Jewell 《Biometrics》1990,46(4):977-990
Recently a great deal of attention has been given to binary regression models for clustered or correlated observations. The data of interest are of the form of a binary dependent or response variable, together with independent variables X1,...., Xk, where sets of observations are grouped together into clusters. A number of models and methods of analysis have been suggested to study such data. Many of these are extensions in some way of the familiar logistic regression model for binary data that are not grouped (i.e., each cluster is of size 1). In general, the analyses of these clustered data models proceed by assuming that the observed clusters are a simple random sample of clusters selected from a population of clusters. In this paper, we consider the application of these procedures to the case where the clusters are selected randomly in a manner that depends on the pattern of responses in the cluster. For example, we show that ignoring the retrospective nature of the sample design, by fitting standard logistic regression models for clustered binary data, may result in misleading estimates of the effects of covariates and the precision of estimated regression coefficients.  相似文献   

17.
1. We compared the capacity of logistic regression (LR) and classification tree (CT) models to predict microhabitat use and the summer distribution of juvenile Atlantic salmon, Salmo salar, in two reaches of a small stream in eastern Quebec. 2. The models predicted the presence or absence of salmon at a location on the basis of habitat features (depth, current velocity, presence of instream and overhead cover, substratum particle size, and distance to stream bank) measured at that location. Models were validated by means of crossover field tests evaluating the performance of models developed for one reach (calibration trials) when applied to the other reach (validation trials). Model performance was evaluated with regard to accuracy, generality and ease of use and interpretation. Prediction maps based on habitat features were also built to compare the observed position of fish with those predicted by LR and CT models. 3. The spatial distribution of active fish differed markedly from that of resting fish, apparently as a result of the selection for water greater than about 30 cm depth by active fish and for the presence of rocky cover by resting fish. 4. All models made accurate predictions, validated by crossover trials. For both LR and CT models, the prediction maps reflected well the actual fish distributions. However, CT models were easier to build and interpret than LR models. CT models also had less variable performance and a smaller decline in predictive capability in crossover trials (for fish at rest), suggesting that they may be more transferable than LR models.  相似文献   

18.
Horton NJ  Laird NM 《Biometrics》2001,57(1):34-42
This article presents a new method for maximum likelihood estimation of logistic regression models with incomplete covariate data where auxiliary information is available. This auxiliary information is extraneous to the regression model of interest but predictive of the covariate with missing data. Ibrahim (1990, Journal of the American Statistical Association 85, 765-769) provides a general method for estimating generalized linear regression models with missing covariates using the EM algorithm that is easily implemented when there is no auxiliary data. Vach (1997, Statistics in Medicine 16, 57-72) describes how the method can be extended when the outcome and auxiliary data are conditionally independent given the covariates in the model. The method allows the incorporation of auxiliary data without making the conditional independence assumption. We suggest tests of conditional independence and compare the performance of several estimators in an example concerning mental health service utilization in children. Using an artificial dataset, we compare the performance of several estimators when auxiliary data are available.  相似文献   

19.
D P Byar  N Mantel 《Biometrics》1975,31(4):943-947
Interrelationships among three response-time models which incorporate covariate information are explored. The most general of these models is the logistic-exponential in which the log odds of the probability of responding in a fixed interval is assumed to be a linear function of the covariates; this model includes a parameter W for the width of discrete time intervals in which responses occur. As W leads to O this model is equivalent to a continuous time exponential model in which the log hazard is linear in the covariates. As W leads to infininity it is equivalent to a continuous time exponential model in which the hazard itself is a linear function of the covariates. This second model was fitted to the data used in an earlier publication describing the logistic exponential model, and very close agreement of the estimates of the regression coefficients is demonstrated.  相似文献   

20.
We propose using a variant of logistic regression (LR) with-regularization to fit gene–gene and gene–environment interaction models. Studies haveshown that many common diseases are influenced by interactionof certain genes. LR models with quadratic penalization notonly correctly characterizes the influential genes along withtheir interaction structures but also yields additional benefitsin handling high-dimensional, discrete factors with a binaryresponse. We illustrate the advantages of using an -regularization scheme and compare its performancewith that of "multifactor dimensionality reduction" and "FlexTree,"2 recent tools for identifying gene–gene interactions.Through simulated and real data sets, we demonstrate that ourmethod outperforms other methods in the identification of theinteraction structures as well as prediction accuracy. In addition,we validate the significance of the factors selected throughbootstrap analyses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号