首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present a graphical measure of assessing the explanatory power of regression models with a binary response. The binary regression quantile plot and an area defined by it are used for the visual comparison and ordering of nested binary response regression models. The plot shows how well various models explain the data. Two data sets are analyzed and the area representing the fit of a model is shown to agree with the usual likelihood ratio test.  相似文献   

2.
A variety of linear and nonlinear mathematical models have been proposed to characterize Salmonella mutagenicity data sets, but no systematic procedure has been suggested for comparing two or more data sets across experiments, laboratories, occasions, mutagens or treatment conditions. In this paper, a general method for data-set comparison is provided. Nonlinear regression techniques are applied to real data sets. Data-set and parameter equivalence are described in depth. Confidence-band construction for nonlinear models and other graphical techniques are presented as auxiliary tools. Key Statistical Analysis System (SAS) code programs are provided.  相似文献   

3.
Various mechanistic and black-box models were applied for on-line estimations of viable cell concentrations in fed-batch cultivation processes for CHO cells. Data from six fed-batch cultivation experiments were used to identify the underlying models and further six independent data sets were used to determine the performance of the estimators. The performances were quantified by means of the root mean square error (RMSE) between the estimates and the corresponding off-line measured validation data sets. It is shown that even simple techniques based on empirical and linear model approaches provide a fairly good on-line estimation performance. Best results with respect to the validation data sets were obtained with hybrid models, multivariate linear regression technique and support vector regression. Hybrid models provide additional important information about the specific cellular growth rates during the cultivation.  相似文献   

4.
Recent advances in big data and analytics research have provided a wealth of large data sets that are too big to be analyzed in their entirety, due to restrictions on computer memory or storage size. New Bayesian methods have been developed for data sets that are large only due to large sample sizes. These methods partition big data sets into subsets and perform independent Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then combine the independent subset posterior samples to estimate a posterior density given the full data set. These approaches were shown to be effective for Bayesian models including logistic regression models, Gaussian mixture models and hierarchical models. Here, we introduce the R package parallelMCMCcombine which carries out four of these techniques for combining independent subset posterior samples. We illustrate each of the methods using a Bayesian logistic regression model for simulation data and a Bayesian Gamma model for real data; we also demonstrate features and capabilities of the R package. The package assumes the user has carried out the Bayesian analysis and has produced the independent subposterior samples outside of the package. The methods are primarily suited to models with unknown parameters of fixed dimension that exist in continuous parameter spaces. We envision this tool will allow researchers to explore the various methods for their specific applications and will assist future progress in this rapidly developing field.  相似文献   

5.
Binary regression models for spatial data are commonly used in disciplines such as epidemiology and ecology. Many spatially referenced binary data sets suffer from location error, which occurs when the recorded location of an observation differs from its true location. When location error occurs, values of the covariates associated with the true spatial locations of the observations cannot be obtained. We show how a change of support (COS) can be applied to regression models for binary data to provide coefficient estimates when the true values of the covariates are unavailable, but the unknown location of the observations are contained within nonoverlapping arbitrarily shaped polygons. The COS accommodates spatial and nonspatial covariates and preserves the convenient interpretation of methods such as logistic and probit regression. Using a simulation experiment, we compare binary regression models with a COS to naive approaches that ignore location error. We illustrate the flexibility of the COS by modeling individual-level disease risk in a population using a binary data set where the locations of the observations are unknown but contained within administrative units. Our simulation experiment and data illustration corroborate that conventional regression models for binary data that ignore location error are unreliable, but that the COS can be used to eliminate bias while preserving model choice.  相似文献   

6.
J M Neuhaus  N P Jewell 《Biometrics》1990,46(4):977-990
Recently a great deal of attention has been given to binary regression models for clustered or correlated observations. The data of interest are of the form of a binary dependent or response variable, together with independent variables X1,...., Xk, where sets of observations are grouped together into clusters. A number of models and methods of analysis have been suggested to study such data. Many of these are extensions in some way of the familiar logistic regression model for binary data that are not grouped (i.e., each cluster is of size 1). In general, the analyses of these clustered data models proceed by assuming that the observed clusters are a simple random sample of clusters selected from a population of clusters. In this paper, we consider the application of these procedures to the case where the clusters are selected randomly in a manner that depends on the pattern of responses in the cluster. For example, we show that ignoring the retrospective nature of the sample design, by fitting standard logistic regression models for clustered binary data, may result in misleading estimates of the effects of covariates and the precision of estimated regression coefficients.  相似文献   

7.
It is often assumed that the von Bertalanffy growth model (VBGM) is appropriate to describe growth in length-at-age of elasmobranchs. However, a review of the literature suggests that a two-phase growth model could better describe growth in elasmobranchs. We compare the two-phase growth model (TPGM) with the VBGM for 18 data sets of elasmobranch species, by fitting the models to 36 age-length-at-age data pairs available. The Akaike Information Criteria (AIC) and the difference in AIC between both models revealed that in 23 cases the probability that the TPGM was true ≥50%. The VBGM tends to estimate larger L values than the two-phase growth model, while the k parameter tends to be underestimated. The growth rate in length-at-age appears tends to decrease near the age at first maturity in several species of elasmobranch. The importance of the TPGM lies in that it may better describe this aspect of the life history of many elasmobranchs. In this context, we conclude that the TPGM should be used along with other growth models in order to precisely estimate elasmobranch life history parameters.  相似文献   

8.
Yang X  Belin TR  Boscardin WJ 《Biometrics》2005,61(2):498-506
Across multiply imputed data sets, variable selection methods such as stepwise regression and other criterion-based strategies that include or exclude particular variables typically result in models with different selected predictors, thus presenting a problem for combining the results from separate complete-data analyses. Here, drawing on a Bayesian framework, we propose two alternative strategies to address the problem of choosing among linear regression models when there are missing covariates. One approach, which we call "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. A second strategy is to conduct Bayesian variable selection and missing data imputation simultaneously within one Gibbs sampling process, which we call "simultaneously impute and select" (SIAS). The methods are implemented and evaluated using the Bayesian procedure known as stochastic search variable selection for multivariate normal data sets, but both strategies offer general frameworks within which different Bayesian variable selection algorithms could be used for other types of data sets. A study of mental health services utilization among children in foster care programs is used to illustrate the techniques. Simulation studies show that both ITS and SIAS outperform complete-case analysis with stepwise variable selection and that SIAS slightly outperforms ITS.  相似文献   

9.
Computational models of electrical activity and calcium signaling in cardiac myocytes are important tools for understanding physiology. The sensitivity of these models to changes in parameters is often not well-understood, however, because parameter evaluation can be a time-consuming, tedious process. I demonstrate here what I believe is a novel method for rapidly determining how changes in parameters affect outputs. In three models of the ventricular action potential, parameters were randomized, repeated simulations were run, important outputs were calculated, and multivariable regression was performed on the collected results. Random parameters included both maximal rates of ion transport and gating variable characteristics. The procedure generated simplified, empirical models that predicted outputs resulting from new sets of input parameters. The linear regression models were quite accurate, despite nonlinearities in the mechanistic models. Moreover, the regression coefficients, which represent parameter sensitivities, were robust, even when parameters were varied over a wide range. Most importantly, a side-by-side comparison of two similar models identified fundamental differences in model behavior, and revealed model predictions that were both consistent with, and inconsistent with, experimental data. This new method therefore shows promise as a tool for the characterization and assessment of computational models. The general strategy may also suggest methods for integrating traditional quantitative models with large-scale data sets obtained using high-throughput technologies.  相似文献   

10.
Qin LX  Self SG 《Biometrics》2006,62(2):526-533
Identification of differentially expressed genes and clustering of genes are two important and complementary objectives addressed with gene expression data. For the differential expression question, many "per-gene" analytic methods have been proposed. These methods can generally be characterized as using a regression function to independently model the observations for each gene; various adjustments for multiplicity are then used to interpret the statistical significance of these per-gene regression models over the collection of genes analyzed. Motivated by this common structure of per-gene models, we proposed a new model-based clustering method--the clustering of regression models method, which groups genes that share a similar relationship to the covariate(s). This method provides a unified approach for a family of clustering procedures and can be applied for data collected with various experimental designs. In addition, when combined with per-gene methods for assessing differential expression that employ the same regression modeling structure, an integrated framework for the analysis of microarray data is obtained. The proposed methodology was applied to two microarray data sets, one from a breast cancer study and the other from a yeast cell cycle study.  相似文献   

11.
Considering that, the temporal trend in stocking, expressed as number of trees per unit area, is the opposite of that of growth, and that both trajectories are sigmoidal, we derived a temporal trajectory of density decrease by reversing the temporal trend of a generalized growth function. We derived and analysed twelve stand-level mortality models by using four data sets from monospecific even-aged stands. Stand dominant height rather than stand age was incorporated as an indicator of the growth stage and a careful examination of the models conformity with the essential logical properties of the stand-level survival models was conducted. We first tested the models adequacy and general predictive performance by fitting them to parameterization data sets and subsequently assessing them with validation data. The regression equations were re-fitted afterwards over the total data sets to make use of all available information in the final parameter estimates. Nine model formulations were successfully fitted and four of them were the most adequate in describing stand density decrease with dominant height growth. The site-specific effect was incorporated in the newly derived models through the predictor variable and the stand-specific starting density was accounted for through a specific model parameter. These new dominant height-dependent mortality equations can be considered for inclusion in the framework of stand-level growth models as transition functions.  相似文献   

12.
I evaluated the predictive ability of statistical models obtained by applying seven methods of variable selection to 12 ecological and environmental data sets. Cross-validation, involving repeated splits of each data set into training and validation subsets, was used to obtain honest estimates of predictive ability that could be fairly compared among methods. There was surprisingly little difference in predictive ability among five methods based on multiple linear regression. Stepwise methods performed similarly to exhaustive algorithms for subset selection, and the choice of criterion for comparing models (Akaike's information criterion, Schwarz's Bayesian information criterion or F statistics) had little effect on predictive ability. For most of the data sets, two methods based on regression trees yielded models with substantially lower predictive ability. I argue that there is no 'best' method of variable selection and that any of the regression-based approaches discussed here is capable of yielding useful predictive models.  相似文献   

13.
14.
This study explores the ability of regression models, with no knowledge of the underlying physiology, to estimate physiological parameters relevant for metabolism and endocrinology. Four regression models were compared: multiple linear regression (MLR), principal component regression (PCR), partial least-squares regression (PLS) and regression using artificial neural networks (ANN). The pathway of mammalian gluconeogenesis was analyzed using [U−13C]glucose as tracer. A set of data was simulated by randomly selecting physiologically appropriate metabolic fluxes for the 9 steps of this pathway as independent variables. The isotope labeling patterns of key intermediates in the pathway were then calculated for each set of fluxes, yielding 29 dependent variables. Two thousand sets were created, allowing independent training and test data. Regression models were asked to predict the nine fluxes, given only the 29 isotopomers. For large training sets (>50) the artificial neural network model was superior, capturing 95% of the variability in the gluconeogenic flux, whereas the three linear models captured only 75%. This reflects the ability of neural networks to capture the inherent non-linearities of the metabolic system. The effect of error in the variables and the addition of random variables to the data set was considered. Model sensitivities were used to find the isotopomers that most influenced the predicted flux values. These studies provide the first test of multivariate regression models for the analysis of isotopomer flux data. They provide insight for metabolomics and the future of isotopic tracers in metabolic research where the underlying physiology is complex or unknown.We acknowledge the support of NIH Grant DK58533 and the DuPont-MIT Alliance.  相似文献   

15.
It is typical in QTL mapping experiments that the number of markers under investigation is large. This poses a challenge to commonly used regression models since the number of feature variables is usually much larger than the sample size, especially, when epistasis effects are to be considered. The greedy nature of the conventional stepwise procedures is well known and is even more conspicuous in such cases. In this article, we propose a two-phase procedure based on penalized likelihood techniques and extended Bayes information criterion (EBIC) for QTL mapping. The procedure consists of a screening phase and a selection phase. In the screening phase, the main and interaction features are alternatively screened by a penalized likelihood mechanism. In the selection phase, a low-dimensional approach using EBIC is applied to the features retained in the screening phase to identify QTL. The two-phase procedure has the asymptotic property that its positive detection rate (PDR) and false discovery rate (FDR) converge to 1 and 0, respectively, as sample size goes to infinity. The two-phase procedure is compared with both traditional and recently developed approaches by simulation studies. A real data analysis is presented to demonstrate the application of the two-phase procedure.  相似文献   

16.
The generalized estimating equations (GEE) derived by Liang and Zeger to analyze longitudinal data have been used in a wide range of medical and biological applications. To make regression a useful and meaningful statistical tool, emphasis should be placed not only on inference or fitting, but also on diagnosing potential data problems. Most of the usual diagnostics for linear regression models have been generalized for GEE. However, global influence measures based on the volume of confidence ellipsoids are not available for GEE analysis. This article presents an extension of these measures that is valid for correlated‐measures regression analysis using GEEs. The proposed measures are illustrated by an analysis of epileptic seizure count data arising from a study of prograbide as an adjuvant therapy for partial seizures and some simulated data sets.  相似文献   

17.
Abstract. The use of Generalized Linear Models (GLM) in vegetation analysis has been advocated to accommodate complex species response curves. This paper investigates the potential advantages of using classification and regression trees (CART), a recursive partitioning method that is free of distributional assumptions. We used multiple logistic regression (a form of GLM) and CART to predict the distribution of three major oak species in California. We compared two types of model: polynomial logistic regression models optimized to account for non‐linearity and factor interactions, and simple CART‐models. Each type of model was developed using learning data sets of 2085 and 410 sample cases, and assessed on test sets containing 2016 and 3691 cases respectively. The responses of the three species to environmental gradients were varied and often non‐homogeneous or context dependent. We tested the methods for predictive accuracy: CART‐models performed significantly better than our polynomial logistic regression models in four of the six cases considered, and as well in the two remaining cases. CART also showed a superior ability to detect factor interactions. Insight gained from CART‐models then helped develop improved parametric models. Although the probabilistic form of logistic regression results is more adapted to test theories about species responses to environmental gradients, we found that CART‐models are intuitive, easy to develop and interpret, and constitute a valuable tool for modeling species distributions.  相似文献   

18.
19.
S M Snapinn  R D Small 《Biometrics》1986,42(3):583-592
Regression models of the type proposed by McCullagh (1980, Journal of the Royal Statistical Society, Series B 42, 109-142) are a general and powerful method of analyzing ordered categorical responses, assuming categorization of an (unknown) continuous response of a specified distribution type. Tests of significance with these models are generally based on likelihood-ratio statistics that have asymptotic chi 2 distributions; therefore, investigators with small data sets may be concerned with the small-sample behavior of these tests. In a Monte Carlo sampling study, significance tests based on the ordinal model are found to be powerful, but a modified test procedure (using an F distribution with a finite number of degrees of freedom for the denominator) is suggested such that the empirical significance level agrees more closely with the nominal significance level in small-sample situations. We also discuss the parallels between an ordinal regression model assuming underlying normality and conventional multiple regression. We illustrate the model with two data sets: one from a study investigating the relationship between phosphorus in soil and plant-available phosphorus in corn grown in that soil, and the other from a clinical trial comparing analgesic drugs.  相似文献   

20.
Summary The maternal age dependence of Down's syndrome rates was analyzed by two mathematical models, a discontinuous (DS) slope model which fits different exponential equations to different parts of the 20–49 age interval and a CPE model which fits a function that is the sum of a constant and exponential term over this whole 20–49 range. The CPE model had been considered but rejected by Penrose, who preferred models postulating changes with age assuming either a power function X10, where X is age or a Poisson model in which accumulation of 17 events was the assumed threshold for the occurrence of Down's syndrome. However, subsequent analyses indicated that the two models preferred by Penrose did not fit recent data sets as well as the DS or CPE model. Here we report analyses of broadened power and Poisson models in which n (the postulated number of independent events) can vary. Five data sets are analyzed. For the power models the range of the optimal n is 11 to 13; for the Poisson it is 17 to 25. The DS, Poisson, and power models each give the best fit to one data set; the CPE, to two sets. No particular model is clearly preferable. It appears unlikely that, with a data set from any single available source, a specific etiologic hypothesis for the maternal age dependence of Down's syndrome can be clearly inferred by the use of these or similar regression models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号