首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Percentage is widely used to describe different results in food microbiology, e.g., probability of microbial growth, percent inactivated, and percent of positive samples. Four sets of percentage data, percent-growth-positive, germination extent, probability for one cell to grow, and maximum fraction of positive tubes, were obtained from our own experiments and the literature. These data were modeled using linear and logistic regression. Five methods were used to compare the goodness of fit of the two models: percentage of predictions closer to observations, range of the differences (predicted value minus observed value), deviation of the model, linear regression between the observed and predicted values, and bias and accuracy factors. Logistic regression was a better predictor of at least 78% of the observations in all four data sets. In all cases, the deviation of logistic models was much smaller. The linear correlation between observations and logistic predictions was always stronger. Validation (accomplished using part of one data set) also demonstrated that the logistic model was more accurate in predicting new data points. Bias and accuracy factors were found to be less informative when evaluating models developed for percentage data, since neither of these indices can compare predictions at zero. Model simplification for the logistic model was demonstrated with one data set. The simplified model was as powerful in making predictions as the full linear model, and it also gave clearer insight in determining the key experimental factors.  相似文献   

2.
Neuropeptides are an important class of signaling molecules that result from complex and variable posttranslational processing of precursor proteins and thus are difficult to identify based solely on genomic information. Bioinformatics prediction of precursor cleavage sites can support effective biochemical characterization of neuropeptides. Neuropeptide cleavage models were developed using comprehensive human, mouse, rat, and cattle precursor data sets and used to compare predicted neuropeptide processing across these species. Logistic regression and artificial neural network models were used to predict cleavages based on amino acid and physiochemical properties of amino acids at precursor sequence locations proximal to cleavage. Correct cleavage classification rates across species and models ranged from 85% to 100%, suggesting that amino acid and amino acid properties have major impact on the probability of cleavage and that these factors have comparable effects in human, mouse, rat, and cattle. The variable accuracy of each species-specific model to predict cleavage sites indicated that there are species- and precursor-specific processing patterns. Prediction of mouse cleavages using rat models was highly accurate, yet the reverse was not observed. Sensitivity and specificity revealed that logistic models are well suited to maximize the rate of true noncleavage predictions with moderate rates of true cleavage predictions; meanwhile, artificial neural networks maximize the rate of true cleavage predictions with moderate to low true noncleavage predictions. Logistic models also provided insights into the strength of the amino acid associations with cleavage. Prediction of neuropeptide cleavage sites using human, mouse, rat, and cattle models are available at . Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users. Allison Tegge and Bruce Southey contributed equally to this work.  相似文献   

3.
Computational models of electrical activity and calcium signaling in cardiac myocytes are important tools for understanding physiology. The sensitivity of these models to changes in parameters is often not well-understood, however, because parameter evaluation can be a time-consuming, tedious process. I demonstrate here what I believe is a novel method for rapidly determining how changes in parameters affect outputs. In three models of the ventricular action potential, parameters were randomized, repeated simulations were run, important outputs were calculated, and multivariable regression was performed on the collected results. Random parameters included both maximal rates of ion transport and gating variable characteristics. The procedure generated simplified, empirical models that predicted outputs resulting from new sets of input parameters. The linear regression models were quite accurate, despite nonlinearities in the mechanistic models. Moreover, the regression coefficients, which represent parameter sensitivities, were robust, even when parameters were varied over a wide range. Most importantly, a side-by-side comparison of two similar models identified fundamental differences in model behavior, and revealed model predictions that were both consistent with, and inconsistent with, experimental data. This new method therefore shows promise as a tool for the characterization and assessment of computational models. The general strategy may also suggest methods for integrating traditional quantitative models with large-scale data sets obtained using high-throughput technologies.  相似文献   

4.
ABSTRACT Ecologists often develop complex regression models that include multiple categorical and continuous variables, interactions among predictors, and nonlinear relationships between the response and predictor variables. Nomograms, which are graphical devices for presenting mathematical functions and calculating output values, can aid biologists in interpreting and presenting these complex models. To illustrate benefits of nomograms, we developed a logistic regression model of elk (Cervus elaphus) resource selection. With this model, we demonstrated how a nomogram helps scientists and managers interpret interactions among variables, compare the relative biological importance of variables, and examine predicted shapes of relationships (e.g., linear vs. nonlinear) between response and predictor variables. Although our example focused on logistic regression, nomograms are equally useful for other linear and nonlinear models. Regardless of the approach used for model development, nomograms and other graphical summaries can help scientists and managers develop, interpret, and apply statistical models.  相似文献   

5.
我国林火发生预测模型研究进展   总被引:2,自引:0,他引:2  
通过文献回顾,总结了国内林火发生预测模型的研究现状,并从林火发生驱动因子、林火发生概率预测模型、林火发生频次预测模型和模型检验方法等方面进行归纳分析。得出以下结论: 1)气象、地形、植被、可燃物、人类活动等因素是影响林火发生及模型预测精度的主要驱动因子;2)林火发生概率模型中,地理加权逻辑斯蒂回归模型考虑了变量之间的空间相关性,Gompit回归模型适宜非对称结构的林火数据,随机森林模型不需要多重共线性检验,在避免过度拟合的同时提高了预测精度,是林火发生概率预测模型的优选方法之一;3)林火发生频次模型中,负二项回归模型更适合对过度离散数据进行模拟,零膨胀模型和栅栏模型可以处理林火数据中包含大量零值的问题;4)ROC检验、AIC检验、似然比检验和Wald检验方法是林火概率和频次模型的常用检验方法。林火发生预测模型研究仍是我国当前林火管理工作的重点,预测模型的选择需要依据不同地区林火数据特点。此外,构建林火预测模型时需要考虑更多的影响因素,以提高模型预测精度;未来,需要进一步探索其他数学模型在林火发生预测中的应用,不断提高林火发生预测模型的准确度。  相似文献   

6.
Aim We modelled the relationship of breeding evidence for five species of forest songbirds (ruby-crowned kinglet (Regulus calendula) Blackburnian warbler (Dendroica fusca), black-throated blue warbler (Dendroica caerulescens), bay-breasted warbler (Dendrioca castanea) and Connecticut warbler (Oporornis agilis)) and a variety of macro-climate variables to examine the importance of climate as a factor determining distribution of breeding in these species and to assess the usefulness of spatial predictions generated from these models. Location Modelling was conducted over the entire province of Ontario, Canada, an area of ≈900,000 km2. Methods Data on the distribution of breeding in the province was derived from the Breeding Bird Atlas of Ontario. We used logistic regression to model the relationship between the probability of breeding (assessed in 10 km×10 km blocks) and estimates of a variety of climate variables at the same scale. Models were selected that had the least number of explanatory variables while at the same time having close to the best possible classification accuracy. Results The final models for these five species had from one to six explanatory variables and an overall concordance of 70.4% to 86.3% indicating a good classification accuracy. Results from subsampling 50% of the original data ten times indicate that (1) the classification accuracy of the model for data used to generate the model is not very sensitive to the specific observations used to generate the model (2) the classification accuracy of test data is close to the classification accuracy of the model data and (3) the classification accuracy of the test data is not dependent on the specific observations used to generate the model. We generated a spatial prediction of the probability of occurrence of each species for Ontario using the relationships defined by the logistic regression models and using 1 km gridded estimates of the necessary climate variables. These probability maps closely matched the maps of observed evidence of breeding from the Atlas. Main conclusions Although mechanisms controlling breeding distribution cannot be determined using this method, we can conclude that (1) macro-climate is an important factor directly and/or indirectly determining distribution of breeding in these species and (2) spatial predictions of probability of breeding are accurate enough to be useful in predicting probability of breeding in unsampled areas.  相似文献   

7.
A new multi-model approach (MMA) for sweat loss prediction is proposed to improve prediction accuracy. MMA was computed as the average of sweat loss predicted by two existing thermoregulation models: i.e., the rational model SCENARIO and the empirical model Heat Strain Decision Aid (HSDA). Three independent physiological datasets, a total of 44 trials, were used to compare predictions by MMA, SCENARIO, and HSDA. The observed sweat losses were collected under different combinations of uniform ensembles, environmental conditions (15–40°C, RH 25–75%), and exercise intensities (250–600 W). Root mean square deviation (RMSD), residual plots, and paired t tests were used to compare predictions with observations. Overall, MMA reduced RMSD by 30–39% in comparison with either SCENARIO or HSDA, and increased the prediction accuracy to 66% from 34% or 55%. Of the MMA predictions, 70% fell within the range of mean observed value ± SD, while only 43% of SCENARIO and 50% of HSDA predictions fell within the same range. Paired t tests showed that differences between observations and MMA predictions were not significant, but differences between observations and SCENARIO or HSDA predictions were significantly different for two datasets. Thus, MMA predicted sweat loss more accurately than either of the two single models for the three datasets used. Future work will be to evaluate MMA using additional physiological data to expand the scope of populations and conditions.  相似文献   

8.
Over 90 percent of the more than 250,000 hip fractures that occur annually in the United States are the result of falls from standing height. Despite this, the stresses associated with femoral fracture from a fall have not been investigated previously. Our objectives were to use three-dimensional finite element models of the proximal femur (with geometries and material properties based directly on quantitative computed tomography) to compare predicted stress distributions for one-legged stance and for a fall to the lateral greater trochanter. We also wished to test the correspondence between model predictions and in vitro strain gage data and failure loads for cadaveric femora subjected to these loading conditions. An additional goal was to use the model predictions to compare the sensitivity of several imaging sites in the proximal femur which are used for the in vivo prediction of hip fracture risk. In this first of two parts, linear finite element models of two unpaired human cadaveric femora were generated. In Part II, the models were extended to include nonlinear material properties for the cortical and trabecular bone. While there was poor correspondence between strain gage data and model predictions, there was excellent agreement between the in vitro failure data and the linear model, especially using a von Mises effective strain failure criterion. Both the onset of structural yielding (within 22 and 4 percent) and the load at fracture (within 8 and 5 percent) were predicted accurately for the two femora tested. For the simulation of one-legged stance, the peak stresses occurred in the primary compressive trabeculae of the subcapital region.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

9.
10.
Although a number of regression models for ordinal responses have been proposed, these models are not widely known and applied in epidemiology and biomedical research. Overviews of these models are either highly technical or consider only a small part of this class of models so that it is difficult to understand the features of the models and to recognize important relations between them. In this paper we give an overview of logistic regression models for ordinal data based upon cumulative and conditional probabilities. We show how the most popular ordinal regression models, namely the proportional odds model and the continuation ratio model, are embedded in the framework of generalized linear models. We describe the characteristics and interpretations of these models and show how the calculations can be performed by means of SAS and S‐Plus. We illustrate and compare the methods by applying them to data of a study investigating the effect of several risk factors on diabetic retinopathy. A special aspect is the violation of the usual assumption of equal slopes which makes the correct application of standard models impossible. We show how to use extensions of the standard models to work adequately with this situation.  相似文献   

11.
12.
We describe a supervised prediction method for diagnosis of acute myeloid leukemia (AML) from patient samples based on flow cytometry measurements. We use a data driven approach with machine learning methods to train a computational model that takes in flow cytometry measurements from a single patient and gives a confidence score of the patient being AML-positive. Our solution is based on an regularized logistic regression model that aggregates AML test statistics calculated from individual test tubes with different cell populations and fluorescent markers. The model construction is entirely data driven and no prior biological knowledge is used. The described solution scored a 100% classification accuracy in the DREAM6/FlowCAP2 Molecular Classification of Acute Myeloid Leukaemia Challenge against a golden standard consisting of 20 AML-positive and 160 healthy patients. Here we perform a more extensive validation of the prediction model performance and further improve and simplify our original method showing that statistically equal results can be obtained by using simple average marker intensities as features in the logistic regression model. In addition to the logistic regression based model, we also present other classification models and compare their performance quantitatively. The key benefit in our prediction method compared to other solutions with similar performance is that our model only uses a small fraction of the flow cytometry measurements making our solution highly economical.  相似文献   

13.
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale.  相似文献   

14.
Although multicenter data are common, many prediction model studies ignore this during model development. The objective of this study is to evaluate the predictive performance of regression methods for developing clinical risk prediction models using multicenter data, and provide guidelines for practice. We compared the predictive performance of standard logistic regression, generalized estimating equations, random intercept logistic regression, and fixed effects logistic regression. First, we presented a case study on the diagnosis of ovarian cancer. Subsequently, a simulation study investigated the performance of the different models as a function of the amount of clustering, development sample size, distribution of center-specific intercepts, the presence of a center-predictor interaction, and the presence of a dependency between center effects and predictors. The results showed that when sample sizes were sufficiently large, conditional models yielded calibrated predictions, whereas marginal models yielded miscalibrated predictions. Small sample sizes led to overfitting and unreliable predictions. This miscalibration was worse with more heavily clustered data. Calibration of random intercept logistic regression was better than that of standard logistic regression even when center-specific intercepts were not normally distributed, a center-predictor interaction was present, center effects and predictors were dependent, or when the model was applied in a new center. Therefore, to make reliable predictions in a specific center, we recommend random intercept logistic regression.  相似文献   

15.
Knowledge about distribution and habitat requirements of species is important for analyzing their role in marine ecosystems or establishing sanctuaries. However, knowledge is scarce especially in many chondrichthyan species. In this study, the spatial distribution of the stingray Neotrygon kuhlii on the Australian North and Northwest Shelf was predicted model-based for the first time. Predictions based on two different types of habitat suitability models, logistic regression and maximum entropy modeling. Catch data of N. kuhlii from Australian trawl surveys combined with randomly selected pseudo-absences were used for modeling together with data sets of several environmental variables. Both modeling methods yielded plausible and validated habitat suitability models containing water depth and salinity as significant independent variables. The model-based predictions of the probability of occurrence of N. kuhlii were similar for both methods and thus emphasized the goodness of the models. Following the predictions, N. kuhlii has its highest probability of occurrence in about 60 m water depth and at a salinity of about 35 PSU. The results indicate that both modeling methods are powerful tools to predict spatial distribution and habitat quality for marine fish species. Therefore, they are suitable for detecting possible distribution in areas with only few field records.  相似文献   

16.
We compared probability surfaces derived using one set of environmental variables in three Geographic Information Systems (GIS) -based approaches: logistic regression and Akaike's Information Criterion (AIC),Multiple Criteria Evaluation (MCE),and Bayesian Analysis (specifically Dempster-Shafer theory). We used lynx Lynx canadensis as our focal species,and developed our environment relationship model using track data collected in Banff National Park,Alberta,Canada,during winters from 1997 to 2000. The accuracy of the three spatial models were compared using a contingency table method. We determined the percentage of cases in which both presence and absence points were correctly classified (overall accuracy),the failure to predict a species where it occurred (omission error) and the prediction of presence where there was absence (commission error). Our overall accuracy showed the logistic regression approach was the most accurate (74.51% ). The multiple criteria evaluation was intermediate (39.22%),while the Dempster-Shafer (D-S) theory model was the poorest (29.90%). However,omission and commission error tell us a different story: logistic regression had the lowest commission error,while D-S theory produced the lowest omission error. Our results provide evidence that habitat modellers should evaluate all three error measures when ascribing confidence in their model. We suggest that for our study area at least,the logistic regression model is optimal. However,where sample size is small or the species is very rare,it may also be useful to explore and/or use a more ecologically cautious modelling approach (e.g. Dempster-Shafer) that would over-predict,protect more sites,and thereby minimize the risk of missing critical habitat in conservation plans.  相似文献   

17.
Intensive care units (ICUs) are increasingly interested in assessing and improving their performance. ICU Length of Stay (LoS) could be seen as an indicator for efficiency of care. However, little consensus exists on which prognostic method should be used to adjust ICU LoS for case-mix factors. This study compared the performance of different regression models when predicting ICU LoS. We included data from 32,667 unplanned ICU admissions to ICUs participating in the Dutch National Intensive Care Evaluation (NICE) in the year 2011. We predicted ICU LoS using eight regression models: ordinary least squares regression on untransformed ICU LoS,LoS truncated at 30 days and log-transformed LoS; a generalized linear model with a Gaussian distribution and a logarithmic link function; Poisson regression; negative binomial regression; Gamma regression with a logarithmic link function; and the original and recalibrated APACHE IV model, for all patients together and for survivors and non-survivors separately. We assessed the predictive performance of the models using bootstrapping and the squared Pearson correlation coefficient (R2), root mean squared prediction error (RMSPE), mean absolute prediction error (MAPE) and bias. The distribution of ICU LoS was skewed to the right with a median of 1.7 days (interquartile range 0.8 to 4.0) and a mean of 4.2 days (standard deviation 7.9). The predictive performance of the models was between 0.09 and 0.20 for R2, between 7.28 and 8.74 days for RMSPE, between 3.00 and 4.42 days for MAPE and between −2.99 and 1.64 days for bias. The predictive performance was slightly better for survivors than for non-survivors. We were disappointed in the predictive performance of the regression models and conclude that it is difficult to predict LoS of unplanned ICU admissions using patient characteristics at admission time only.  相似文献   

18.
Motivated by a clinical prediction problem, a simulation study was performed to compare different approaches for building risk prediction models. Robust prediction models for hospital survival in patients with acute heart failure were to be derived from three highly correlated blood parameters measured up to four times, with predictive ability having explicit priority over interpretability. Methods that relied only on the original predictors were compared with methods using an expanded predictor space including transformations and interactions. Predictors were simulated as transformations and combinations of multivariate normal variables which were fitted to the partly skewed and bimodally distributed original data in such a way that the simulated data mimicked the original covariate structure. Different penalized versions of logistic regression as well as random forests and generalized additive models were investigated using classical logistic regression as a benchmark. Their performance was assessed based on measures of predictive accuracy, model discrimination, and model calibration. Three different scenarios using different subsets of the original data with different numbers of observations and events per variable were investigated. In the investigated setting, where a risk prediction model should be based on a small set of highly correlated and interconnected predictors, Elastic Net and also Ridge logistic regression showed good performance compared to their competitors, while other methods did not lead to substantial improvements or even performed worse than standard logistic regression. Our work demonstrates how simulation studies that mimic relevant features of a specific data set can support the choice of a good modeling strategy.  相似文献   

19.
20.
Binary regression models for spatial data are commonly used in disciplines such as epidemiology and ecology. Many spatially referenced binary data sets suffer from location error, which occurs when the recorded location of an observation differs from its true location. When location error occurs, values of the covariates associated with the true spatial locations of the observations cannot be obtained. We show how a change of support (COS) can be applied to regression models for binary data to provide coefficient estimates when the true values of the covariates are unavailable, but the unknown location of the observations are contained within nonoverlapping arbitrarily shaped polygons. The COS accommodates spatial and nonspatial covariates and preserves the convenient interpretation of methods such as logistic and probit regression. Using a simulation experiment, we compare binary regression models with a COS to naive approaches that ignore location error. We illustrate the flexibility of the COS by modeling individual-level disease risk in a population using a binary data set where the locations of the observations are unknown but contained within administrative units. Our simulation experiment and data illustration corroborate that conventional regression models for binary data that ignore location error are unreliable, but that the COS can be used to eliminate bias while preserving model choice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号