首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Variable selection is critical in competing risks regression with high-dimensional data. Although penalized variable selection methods and other machine learning-based approaches have been developed, many of these methods often suffer from instability in practice. This paper proposes a novel method named Random Approximate Elastic Net (RAEN). Under the proportional subdistribution hazards model, RAEN provides a stable and generalizable solution to the large-p-small-n variable selection problem for competing risks data. Our general framework allows the proposed algorithm to be applicable to other time-to-event regression models, including competing risks quantile regression and accelerated failure time models. We show that variable selection and parameter estimation improved markedly using the new computationally intensive algorithm through extensive simulations. A user-friendly R package RAEN is developed for public use. We also apply our method to a cancer study to identify influential genes associated with the death or progression from bladder cancer.  相似文献   

2.
The fit of the logit and probit models for quantal response data can be improved by embedding these classical models within a richer parametric family indexed by one or two shape parameters. In this paper, a symmetric extended logistic model indexed by a shape parameter λ is discussed with application to dose response curves. The usual maximum likelihood method is employed to estimate the parameters of the model. The need to include the shape parameter λ is illustrated by analyzing a set of real experimental data and comparing the fit of the extended logistic model to those obtained by the standard logit and probit models.  相似文献   

3.
The investigators in the past have developed some models of temperature distribution in the human limb assuming it as a regular circular or elliptical tapered cylinder. But in reality the limb is not of regular tapered cylindrical shape. The radius and eccentricity are not same throughout the limb. In view of above a model of temperature distribution in the irregular tapered elliptical shaped human limb is proposed for a three dimensional steady state case in this paper. The limb is assumed to be composed of multiple cylindrical substructures with variable radius and eccentricity. The mathematical model incorporates the effect of blood mass flow rate, metabolic activity and thermal conductivity. The outer surface is exposed to the environment and appropriate boundary conditions have been framed. The finite element method has been employed to obtain the solution. The temperature profiles have been computed in the dermal layers of a human limb and used to study the effect of shape, microstructure and biophysical parameters on temperature distribution in human limbs. The proposed model is one of the most realistic model as compared to conventional models as this can be effectively employed to every regular and nonregular structures of the body with variable radius and eccentricity to study the thermal behaviour.  相似文献   

4.
Owing to the lack of sufficient parameters, certain nonlinear exploitation models of common usage in fisheries management are thought to be too inflexible to portray the productivities of fish stocks with sufficient fidelity. These models [typified by the formulation of Schaefer, 1954] are statistically well behaved, however, since their governing equations all have fixed degrees of nonlinearity. Although one model of record [Pella and Tomlinson, 1969] offers an extra degree of parametric freedom not found in the aforementioned models, its productivity equation contains a variable exponent that introduces variable nonlinearity into the fitting procedure, which is an undesirable property that often leads to ill-determined parameter estimates. A new productivity formulation of fixed degree is developed here which exhibits the extra degree of parametric control not found in the Schaefer model but avoids the instabilities rising from variable exponents (as in the Pella-Tomlinson model) by having its parametric controls wholly contained by its coefficients. Some of the attributes and shortcomings of nonlinear single- and multiple-species models are also discussed.  相似文献   

5.
MOTIVATION: To study lowly expressed genes in microarray experiments, it is useful to increase the photometric gain in the scanning. However, a large gain may cause some pixels for highly expressed genes to become saturated. Spatial statistical models that model spot shapes on the pixel level may be used to infer information about the saturated pixel intensities. Other possible applications for spot shape models include data quality control and accurate determination of spot centres and spot diameters. RESULTS: Spatial statistical models for spotted microarrays are studied including pixel level transformations and spot shape models. The models are applied to a dataset from 50mer oligonucleotide microarrays with 452 selected Arabidopsis genes. Logarithmic, Box-Cox and inverse hyperbolic sine transformations are compared in combination with four spot shape models: a cylindric plateau shape, an isotropic Gaussian distribution and a difference of two-scaled Gaussian distribution suggested in the literature, as well as a proposed new polynomial-hyperbolic spot shape model. A substantial improvement is obtained for the dataset studied by the polynomial-hyperbolic spot shape model in combination with the Box-Cox transformation. The spatial statistical models are used to correct spot measurements with saturation by extrapolating the censored data. AVAILABILITY: Source code for R is available at http://www.matfys.kvl.dk/~ekstrom/spotshapes/  相似文献   

6.
Auditory perception neurons, also called inner hair cells (IHCs) because of their physical shape, transform the mechanical movements of the basilar membrane into electrical impulses. The impulse coding of the IHC is the main information carrier in the auditory process and is the basis for improvements in cochlear implants as well as for low-rate, high-quality speech processing and compression. This paper compares biologically motivated models (Meddis, Cooke, Payton) with a newly developed model which is transfer function oriented. The new model has only three reservoirs and the parameters can be controlled through five small ROM tables. This model is compared with the often used Meddis model in terms of accuracy, system parameter flexibility, and hardware effort in an FPGA implementation. Received: 26 March 1997 / Accepted in revised version: 27 May 1997  相似文献   

7.
We put forward a new item response model which is an extension of the binomial error model first introduced by Keats and Lord. Like the binomial error model, the basic latent variable can be interpreted as a probability of responding in a certain way to an arbitrarily specified item. For a set of dichotomous items, this model gives predictions that are similar to other single parameter IRT models (such as the Rasch model) but has certain advantages in more complex cases. The first is that in specifying a flexible two-parameter Beta distribution for the latent variable, it is easy to formulate models for randomized experiments in which there is no reason to believe that either the latent variable or its distribution vary over randomly composed experimental groups. Second, the elementary response function is such that extensions to more complex cases (e.g., polychotomous responses, unfolding scales) are straightforward. Third, the probability metric of the latent trait allows tractable extensions to cover a wide variety of stochastic response processes.  相似文献   

8.
Computational models of electrical activity and calcium signaling in cardiac myocytes are important tools for understanding physiology. The sensitivity of these models to changes in parameters is often not well-understood, however, because parameter evaluation can be a time-consuming, tedious process. I demonstrate here what I believe is a novel method for rapidly determining how changes in parameters affect outputs. In three models of the ventricular action potential, parameters were randomized, repeated simulations were run, important outputs were calculated, and multivariable regression was performed on the collected results. Random parameters included both maximal rates of ion transport and gating variable characteristics. The procedure generated simplified, empirical models that predicted outputs resulting from new sets of input parameters. The linear regression models were quite accurate, despite nonlinearities in the mechanistic models. Moreover, the regression coefficients, which represent parameter sensitivities, were robust, even when parameters were varied over a wide range. Most importantly, a side-by-side comparison of two similar models identified fundamental differences in model behavior, and revealed model predictions that were both consistent with, and inconsistent with, experimental data. This new method therefore shows promise as a tool for the characterization and assessment of computational models. The general strategy may also suggest methods for integrating traditional quantitative models with large-scale data sets obtained using high-throughput technologies.  相似文献   

9.
Lloyd CJ 《Biometrics》2000,56(3):862-867
The performance of a diagnostic test is summarized by its receiver operating characteristic (ROC) curve. Under quite natural assumptions about the latent variable underlying the test, the ROC curve is convex. Empirical data on a test's performance often comes in the form of observed true positive and false positive relative frequencies under varying conditions. This paper describes a family of regression models for analyzing such data. The underlying ROC curves are specified by a quality parameter delta and a shape parameter mu and are guaranteed to be convex provided delta > 1. Both the position along the ROC curve and the quality parameter delta are modeled linearly with covariates at the level of the individual. The shape parameter mu enters the model through the link functions log(p mu) - log(1 - p mu) of a binomial regression and is estimated either by search or from an appropriate constructed variate. One simple application is to the meta-analysis of independent studies of the same diagnostic test, illustrated on some data of Moses, Shapiro, and Littenberg (1993). A second application, to so-called vigilance data, is given, where ROC curves differ across subjects and modeling of the position along the ROC curve is of primary interest.  相似文献   

10.
The Weibull model is a flexible growth model that describes both general population growth and plant disease progress. However, lack of an asymptotic parameter has limited its wider application. In the present study, an asymptotic parameter K was introduced into the original Weibull model, written as; y = K {1 − exp [− ( t − a ) c ]}, in which a , b , c and K are location, scale, shape, and asymptotic parameters, respectively, y is the proportion of disease and t is time. A wide range of simulated disease progress data sets were generated using logistic, Gompertz and monomolecular models by specifying different parameter values, and fitted to both original and modified Weibull models. The modified model provided statistically better fits for all data than the original model. The modified model can thus improve the curve-fitting ability of the original model which often failed to converge, especially when the asymptote is less than 1.0. Actual disease progress data on wheat leaf rust and tomato root rot with different asymptotic values were also used to compare the original and modified Weibull models. The modified model provided a statistically better fit than the original model, and model estimates of asymptotic parameter K were nearly identical to the actual disease maxima reflecting the characteristics of the host-pathosystem. Comparison of logistic, Gompertz, and Weibull models including parameter K by fitting to the observed data on wheat leaf rust and tomato root rot revealed the applicability of the modified Weibull model, which in a majority of cases provided a statistically superior fit.  相似文献   

11.
This paper examines different mathematical models of insect dispersal and infection spread and compares these with field data. Reaction-diffusion and integro-difference equation models are used to model the spatio-temporal spread of Wolbachia in Drosophila simulans populations. The models include cytoplasmic incompatibility between infected females and uninfected males that creates a threshold density, similar to an Allee effect, preventing increase from low incidence of infection in the host population. The model builds on an earlier model (Turelli & Hoffmann, 1991) by incorporating imperfect maternal transmission. The results of simulations of the models using the same parameter values produce different dynamics for each model. These differences become very marked in the integro-difference equation models when insect dispersal patterns are assumed to be non-Gaussian. The success or failure of invasion by Wolbachia in the simulations may be attributed to the insect dispersal mechanism used in the model rather than the parameter values. As the models predict very different outcomes for the integro-difference models depending on the underlying assumptions of insect dispersal patterns, this emphasizes that good field data on real (rather than idealized) dispersal patterns need to be collected before models such as these can be used for predictive purposes.  相似文献   

12.
Considering that, the temporal trend in stocking, expressed as number of trees per unit area, is the opposite of that of growth, and that both trajectories are sigmoidal, we derived a temporal trajectory of density decrease by reversing the temporal trend of a generalized growth function. We derived and analysed twelve stand-level mortality models by using four data sets from monospecific even-aged stands. Stand dominant height rather than stand age was incorporated as an indicator of the growth stage and a careful examination of the models conformity with the essential logical properties of the stand-level survival models was conducted. We first tested the models adequacy and general predictive performance by fitting them to parameterization data sets and subsequently assessing them with validation data. The regression equations were re-fitted afterwards over the total data sets to make use of all available information in the final parameter estimates. Nine model formulations were successfully fitted and four of them were the most adequate in describing stand density decrease with dominant height growth. The site-specific effect was incorporated in the newly derived models through the predictor variable and the stand-specific starting density was accounted for through a specific model parameter. These new dominant height-dependent mortality equations can be considered for inclusion in the framework of stand-level growth models as transition functions.  相似文献   

13.
This paper has extended and updated my earlier list and analysis of candidate models used in theoretical modelling and empirical examination of species–area relationships (SARs). I have also reviewed trivariate models that can be applied to include a second independent variable (in addition to area) and discussed extensively the justifications for fitting curves to SARs and the choice of model. There is also a summary of the characteristics of several new candidate models, especially extended power models, logarithmic models and parameterizations of the negative-exponential family and the logistic family. I have, moreover, examined the characteristics and shapes of trivariate linear, logarithmic and power models, including combination variables and interaction terms. The choice of models according to best fit may conflict with problems of non-normality or heteroscedasticity. The need to compare parameter estimates between data sets should also affect model choice. With few data points and large scatter, models with few parameters are often preferable. With narrow-scale windows, even inflexible models such as the power model and the logarithmic model may produce good fits, whereas with wider-scale windows where inflexible models do not fit well, more flexible models such as the second persistence (P2) model and the cumulative Weibull distribution may be preferable. When extrapolations and expected shapes are important, one should consider models with expected shapes, e.g. the power model for sample area curves and the P2 model for isolate curves. The choice of trivariate models poses special challenges, which one can more effectively evaluate by inspecting graphical plots.  相似文献   

14.
1. Using data on breeding birds from a 35-year study of Florida scrub-jays Aphelocoma coerulescens (Bosc 1795), we show that survival probabilities are structured by age, birth cohort, and maternal family, but not by sex. Using both accelerated failure time (AFT) and Cox proportional hazard models, the data are best described by models incorporating variation among birth cohorts and greater mortality hazard with increasing age. AFT models using Weibull distributions with the shape parameter > 1 were always the best-fitting models. 2. Shared frailty models allowing for family structure greatly reduce model deviance. The best-fitting models included a term for frailty shared by maternal families. 3. To ask how long a data set must be to reach qualitatively the same conclusions, we repeated the analyses for all possible truncated data sets of 2 years in length or greater. Length of the data set affects the parameter estimates, but not the qualitative conclusions. In all but three of 337 truncated data sets the best-fitting models pointed to same conclusions as the full data set. Shared frailty models appear to be quite robust. 4. The data are not adequate for testing hypotheses as to whether variation in frailty is heritable. 5. Substantial structured heterogeneity for survival exists in this population. Such structured heterogeneity has been shown to have substantial effects in reducing demographic stochasticity.  相似文献   

15.
Errors‐in‐variables models in high‐dimensional settings pose two challenges in application. First, the number of observed covariates is larger than the sample size, while only a small number of covariates are true predictors under an assumption of model sparsity. Second, the presence of measurement error can result in severely biased parameter estimates, and also affects the ability of penalized methods such as the lasso to recover the true sparsity pattern. A new estimation procedure called SIMulation‐SELection‐EXtrapolation (SIMSELEX) is proposed. This procedure makes double use of lasso methodology. First, the lasso is used to estimate sparse solutions in the simulation step, after which a group lasso is implemented to do variable selection. The SIMSELEX estimator is shown to perform well in variable selection, and has significantly lower estimation error than naive estimators that ignore measurement error. SIMSELEX can be applied in a variety of errors‐in‐variables settings, including linear models, generalized linear models, and Cox survival models. It is furthermore shown in the Supporting Information how SIMSELEX can be applied to spline‐based regression models. A simulation study is conducted to compare the SIMSELEX estimators to existing methods in the linear and logistic model settings, and to evaluate performance compared to naive methods in the Cox and spline models. Finally, the method is used to analyze a microarray dataset that contains gene expression measurements of favorable histology Wilms tumors.  相似文献   

16.
Yin G  Ibrahim JG 《Biometrics》2005,61(1):208-216
For multivariate failure time data, we propose a new class of shared gamma frailty models by imposing the Box-Cox transformation on the hazard function, and the product of the baseline hazard and the frailty. This novel class of models allows for a very broad range of shapes and relationships between the hazard and baseline hazard functions. It includes the well-known Cox gamma frailty model and a new additive gamma frailty model as two special cases. Due to the nonnegative hazard constraint, this shared gamma frailty model is computationally challenging in the Bayesian paradigm. The joint priors are constructed through a conditional-marginal specification, in which the conditional distribution is univariate, and it absorbs the nonlinear parameter constraints. The marginal part of the prior specification is free of constraints. The prior distributions allow us to easily compute the full conditionals needed for Gibbs sampling, while incorporating the constraints. This class of shared gamma frailty models is illustrated with a real dataset.  相似文献   

17.
Larsen K 《Biometrics》2005,61(4):1049-1055
This article is motivated by the Women's Health and Aging Study, where information about physical functioning was recorded along with death information in a group of elderly women. The focus is on determining whether having difficulties in daily living tasks is accompanied by a higher mortality rate. To this end, a two-parameter logistic regression model is used for the modeling of binary questionnaire data assuming an underlying continuous latent variable, difficulty in daily living. The Cox model is used for the survival information, and the continuous latent variable is included as an explanatory variable along with other observed variables. Parameters are estimated by maximizing the likelihood for the joint distribution of the items and the time-to-event information. In addition to presenting a new statistical model, this article also illustrates the use of the model in a real data setting and addresses the more practical issues of model building, diagnostics, and parameter interpretation.  相似文献   

18.
Studies of latent traits often collect data for multiple items measuring different aspects of the trait. For such data, it is common to consider models in which the different items are manifestations of a normal latent variable, which depends on covariates through a linear regression model. This article proposes a flexible Bayesian alternative in which the unknown latent variable density can change dynamically in location and shape across levels of a predictor. Scale mixtures of underlying normals are used in order to model flexibly the measurement errors and allow mixed categorical and continuous scales. A dynamic mixture of Dirichlet processes is used to characterize the latent response distributions. Posterior computation proceeds via a Markov chain Monte Carlo algorithm, with predictive densities used as a basis for inferences and evaluation of model fit. The methods are illustrated using data from a study of DNA damage in response to oxidative stress.  相似文献   

19.
Distance-based approaches in phylogenetics such as Neighbor-Joining are a fast and popular approach for building trees. These methods take pairs of sequences, and from them construct a value that, in expectation, is additive under a stochastic model of site substitution. Most models assume a distribution of rates across sites, often based on a gamma distribution. Provided the (shape) parameter of this distribution is known, the method can correctly reconstruct the tree. However, if the shape parameter is not known then we show that topologically different trees, with different shape parameters and associated positive branch lengths, can lead to exactly matching distributions on pairwise site patterns between all pairs of taxa. Thus, one could not distinguish between the two trees using pairs of sequences without some prior knowledge of the shape parameter. More surprisingly, this can happen for any choice of distinct shape parameters on the two trees, and thus the result is not peculiar to a particular or contrived selection of the shape parameters. On a positive note, we point out known conditions where identifiability can be restored (namely, when the branch lengths are clocklike, or if methods such as maximum likelihood are used).  相似文献   

20.
Ecological niche models and species distribution models are used in many fields of science. Despite their popularity, only recently have important aspects of the modeling process like model selection been developed. Choosing environmental variables with which to create these models is another critical part of the process, but methods currently in use are not consistent in their results and no comprehensive approach exists by which to perform this step. Here, we compared seven heuristic methods of variable selection against a novel approach that proposes to select best sets of variables by evaluating performance of models created with all combinations of variables and distinct parameter settings of the algorithm in concert. Our results were that—except for the jackknife method for one of the 12 species and fluctuation index for two of the 12 species—none of the heuristic methods for variable selection coincided with the exhaustive one. Performance decreased in models created using variables selected with heuristic methods and both underfitting and overfitting were detected when comparing their geographic projections with the ones of models created with variables selected with the exhaustive method. Using the exhaustive approach could be time consuming, so a two-step exercise may be necessary. However, using this method identifies adequate variable sets and parameter settings in concert that are associated with increased model performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号