首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 20 毫秒
1.
M LeBlanc  J Crowley 《Biometrics》1992,48(2):411-425
A method is developed for obtaining tree-structured relative risk estimates for censored survival data. The first step of a full likelihood estimation procedure is used in a recursive partitioning algorithm that adopts most aspects of the widely used Classification and Regression Tree (CART) algorithm of Breiman et al. (1984, Classification and Regression Trees, Belmont, California: Wadsworth). The performance of the technique is investigated through stimulation and compared to the tree-structured survival methods proposed by Davis and Anderson (1989, Statistics in Medicine 8, 947-961) and Therneau, Grambsch, and Fleming (1990, Biometrika 77, 147-160).  相似文献   

2.
Su X  Fan J 《Biometrics》2004,60(1):93-99
A method of constructing trees for correlated failure times is put forward. It adopts the backfitting idea of classification and regression trees (CART) (Breiman et al., 1984, in Classification and Regression Trees). The tree method is developed based on the maximized likelihoods associated with the gamma frailty model and standard likelihood-related techniques are incorporated. The proposed method is assessed through simulations conducted under a variety of model configurations and illustrated using the chronic granulomatous disease (CGD) study data.  相似文献   

3.
Abstract. The use of Generalized Linear Models (GLM) in vegetation analysis has been advocated to accommodate complex species response curves. This paper investigates the potential advantages of using classification and regression trees (CART), a recursive partitioning method that is free of distributional assumptions. We used multiple logistic regression (a form of GLM) and CART to predict the distribution of three major oak species in California. We compared two types of model: polynomial logistic regression models optimized to account for non‐linearity and factor interactions, and simple CART‐models. Each type of model was developed using learning data sets of 2085 and 410 sample cases, and assessed on test sets containing 2016 and 3691 cases respectively. The responses of the three species to environmental gradients were varied and often non‐homogeneous or context dependent. We tested the methods for predictive accuracy: CART‐models performed significantly better than our polynomial logistic regression models in four of the six cases considered, and as well in the two remaining cases. CART also showed a superior ability to detect factor interactions. Insight gained from CART‐models then helped develop improved parametric models. Although the probabilistic form of logistic regression results is more adapted to test theories about species responses to environmental gradients, we found that CART‐models are intuitive, easy to develop and interpret, and constitute a valuable tool for modeling species distributions.  相似文献   

4.
Summary Accurately assessing a patient’s risk of a given event is essential in making informed treatment decisions. One approach is to stratify patients into two or more distinct risk groups with respect to a specific outcome using both clinical and demographic variables. Outcomes may be categorical or continuous in nature; important examples in cancer studies might include level of toxicity or time to recurrence. Recursive partitioning methods are ideal for building such risk groups. Two such methods are Classification and Regression Trees (CART) and a more recent competitor known as the partitioning Deletion/Substitution/Addition (partDSA) algorithm, both of which also utilize loss functions (e.g., squared error for a continuous outcome) as the basis for building, selecting, and assessing predictors but differ in the manner by which regression trees are constructed. Recently, we have shown that partDSA often outperforms CART in so‐called “full data” settings (e.g., uncensored outcomes). However, when confronted with censored outcome data, the loss functions used by both procedures must be modified. There have been several attempts to adapt CART for right‐censored data. This article describes two such extensions for partDSA that make use of observed data loss functions constructed using inverse probability of censoring weights. Such loss functions are consistent estimates of their uncensored counterparts provided that the corresponding censoring model is correctly specified. The relative performance of these new methods is evaluated via simulation studies and illustrated through an analysis of clinical trial data on brain cancer patients. The implementation of partDSA for uncensored and right‐censored outcomes is publicly available in the R package, partDSA .  相似文献   

5.
6.
7.
Application of remote sensing (RS) and geographical information system (GIS) techniques has been increased in natural sciences. In fact, it is inevitable applying of these techniques in vegetation studies due to the existence of some problems in traditional methods (e.g. sampling, calculation, analysis and so on). On this scope, scientists must have sufficient information about the efficiency of these techniques as a useful tool in their studies. This study aims to evaluate the IRS-P6 LISS III and Landsat ETM+ efficiency in plant groups’ identification. In order to this purpose, 143 training samples were collected from areas that showed homogenous composition of plant species in at least area of 3600 m2 (60 × 60 m). Coordinates of these training samples were recorded using a GPS device and transferred to a GIS database. Also, ENVI 4.2 package has used to process and analyze the satellites data. Several methods of processing such as; spectral separability, supervised classification and assessment of classification accuracy were used in order to gain a satisfy evaluation of the data efficiency. The results indicated that net farming of alfalfa and Juniperus polycarpus–Artemisia kopetdaghensisi community have the most separability on the satellite images (1.99 for Landsat and 2 for IRS). Against, the least separabilities on the Landsat data were between Ju. polycarpus–Onobrychis cornuta and Ju. polycarpus–Ar. kopetdaghensis communities (1.57) and between Ju. polycarpus–Ar. kopetdaghensis and Ju. polycarpus–Agropyron intermedium (1.53) on the IRS data. According to these results, it is concluded that the satellite data are somedeal able to identify plant groups when vegetation communities are sufficiently homogenous, abundant and spectrally and ecologically separable.  相似文献   

8.
We evaluated the predictive power of two classification techniques, one parametric – discriminant function analysis (DFA) and the other non-parametric – classification and regression tree analysis (CART), in order to provide a non-subjective quantitative method of determining age class in Vancouver Island marmots ( Marmota vancouverensis ) and hoary marmots ( Marmota caligata ). For both techniques we used morphological measurements of known-age male and female marmots from two independent population studies to build and test predictive models of age class. Both techniques had high predictive power (69–86%) for both sexes and both species. Overall, the two methods performed identically with 81% correct classification. DFA was marginally better at discriminating among older more challenging age classes compared to CART. However, in our test samples, cases with missing values in any of the discriminant variables were deleted and hence unclassified by DFA, whereas CART used values from closely correlated variables to substitute for the missing values. Therefore, overall, CART performed better (CART 81% vs DFA 76%) because of its ability to classify incomplete cases. Correct classification rates were approximately 10% higher for hoary marmots than for Vancouver Island marmots, a result that could be attributed to different sets of morphological measurements. Zygomatic arch breadth measured in hoary marmots was the most important predictor of age class in both sexes using both classification techniques. We recommend that CART analysis be performed on data-sets with incomplete records and used as a variable screening tool prior to DFA on more complete data-sets.  相似文献   

9.
Incomplete data are a serious problem in the multivariate analysis of clinical trials. Usually a complete-case analysis is performed: All incomplete observation vectors are excluded from the analysis. Provided that observations are missing randomly, an easy-to-handle available-case analysis is introduced, allowing the analysis of all data without insertion or deletion of observations. This method is applied to parametric and nonparametric test procedures of the O'Brien type, which are more powerful than the conventional Hotelling's T2 for detecting alternatives where the (treatment) effect has the same direction for all observed variables. In addition, the applicability of these so-called directional tests, especially in the case of small samples, and their pros and cons are discussed.  相似文献   

10.
Nonparametric approaches including a classification and regression tree (CART), a nonparametric changepoint analysis (nCPA) and a Bayesian hierarchical modeling (BHM) method were developed to determine ecoregional nutrient response thresholds. A CART analysis revealed that hierarchical structure was important for predicting Chl a concentrations from total nitrogen (TN) and total phosphorus (TP). The nCPA and BHM methods confirmed the CART results for each node in the tree, and the 90% confidence interval for each threshold was calculated to quantify uncertainty. The CART, nCPA, and BHM methods suggested that the nutrient criteria differed significantly within certain nutrient ecoregions and that numerical nutrient criteria of 0.0150–0.222 mg/L TP and 0.300–1.766 mg/L TN may control Chl a concentrations in the various lake ecoregions. The results of this analysis suggest that the integration of CART, nCPA and BHM might be useful for determining nutrient thresholds.  相似文献   

11.
Ideally, randomized trials would be used to compare the long-term effectiveness of dynamic treatment regimes on clinically relevant outcomes. However, because randomized trials are not always feasible or timely, we often must rely on observational data to compare dynamic treatment regimes. An example of a dynamic treatment regime is “start combined antiretroviral therapy (cART) within 6 months of CD4 cell count first dropping below x cells/mm3 or diagnosis of an AIDS-defining illness, whichever happens first” where x can take values between 200 and 500. Recently, Cain et al. (Ann. Intern. Med. 154(8):509–515, 2011) used inverse probability (IP) weighting of dynamic marginal structural models to find the x that minimizes 5-year mortality risk under similar dynamic regimes using observational data. Unlike standard methods, IP weighting can appropriately adjust for measured time-varying confounders (e.g., CD4 cell count, viral load) that are affected by prior treatment. Here we describe an alternative method to IP weighting for comparing the effectiveness of dynamic cART regimes: the parametric g-formula. The parametric g-formula naturally handles dynamic regimes and, like IP weighting, can appropriately adjust for measured time-varying confounders. However, estimators based on the parametric g-formula are more efficient than IP weighted estimators. This is often at the expense of more parametric assumptions. Here we describe how to use the parametric g-formula to estimate risk by the end of a user-specified follow-up period under dynamic treatment regimes. We describe an application of this method to answer the “when to start” question using data from the HIV-CAUSAL Collaboration.  相似文献   

12.
The traditional q1 * methodology for constructing upper confidence limits (UCLs) for the low-dose slopes of quantal dose-response functions has two limitations: (i) it is based on an asymptotic statistical result that has been shown via Monte Carlo simulation not to hold in practice for small, real bioassay experiments (Portier and Hoel, 1983); and (ii) it assumes that the multistage model (which represents cumulative hazard as a polynomial function of dose) is correct. This paper presents an uncertainty analysis approach for fitting dose-response functions to data that does not require specific parametric assumptions or depend on asymptotic results. It has the advantage that the resulting estimates of the dose-response function (and uncertainties about it) no longer depend on the validity of an assumed parametric family nor on the accuracy of the asymptotic approximation. The method derives posterior densities for the true response rates in the dose groups, rather than deriving posterior densities for model parameters, as in other Bayesian approaches (Sielken, 1991), or resampling the observed data points, as in the bootstrap and other resampling methods. It does so by conditioning constrained maximum-entropy priors on the observed data. Monte Carlo sampling of the posterior (constrained, conditioned) probability distributions generate values of response probabilities that might be observed if the experiment were repeated with very large sample sizes. A dose-response curve is fit to each such simulated dataset. If no parametric model has been specified, then a generalized representation (e.g., a power-series or orthonormal polynomial expansion) of the unknown dose-response function is fit to each simulated dataset using “model-free” methods. The simulation-based frequency distribution of all the dose-response curves fit to the simulated datasets yields a posterior distribution function for the low-dose slope of the dose-response curve. An upper confidence limit on the low-dose slope is obtained directly from this posterior distribution. This “Data Cube” procedure is illustrated with a real dataset for benzene, and is seen to produce more policy-relevant insights than does the traditional q1 * methodology. For example, it shows how far apart are the 90%, 95%, and 99% limits and reveals how uncertainty about total and incremental risk vary with dose level (typically being dominated at low doses by uncertainty about the response of the control group, and being dominated at high doses by sampling variability). Strengths and limitations of the Data Cube approach are summarized, and potential decision-analytic applications to making better informed risk management decisions are briefly discussed.  相似文献   

13.
The dynamics of the "Etang de Berre", a brackish lagoon situated close to the French Mediterranean sea coast, is strongly disturbed by freshwater inputs coming from an hydroelectric power station. The system dynamics has been described as a sequence of daily typical states from a set of physicochemical variables such as temperature, salinity and dissolved oxygen rates collected over three years by an automatic sampling station. Each daily pattern summarizes the evolution, hour by hour of the physicochemical variables. This article presents results of forecasts of the states of the system subjected to the simultaneous effects of meteorological conditions and freshwater releases. We recall the main step of the classification tree method used to build up the predictive model (Classification and Regression Trees, Breiman et al., 1984) and we propose a transfer procedure in order to test the stability of the model. Results obtained on the Etang de Berre data set allow us to describe and predict the effects of the environmental variables on the system dynamics with a margin of error. The transfer procedure applied after the tree building process gives a maximum gain in prediction accuracy of about 15%.  相似文献   

14.
Sequence comparison with concave weighting functions   总被引:2,自引:0,他引:2  
We consider efficient methods for computing a difference metric between two sequences of symbols, where the cost of an operation to insert or delete a block of symbols is a concave function of the block's length. Alternatively, sequences can be optimally aligned when gap penalties are a concave function of the gap length. Two algorithms based on the ‘candidate list paradigm’ first used by Waterman (1984) are presented. The first computes significantly more parsimonious candidate lists than Waterman's method. The second method refines the first to the point of guaranteeingO(N 2 lgN) worst-case time complexity, and under certain conditionsO(N 2). Experimental data show how various properties of the comparison problem affect the methods' relative performance. A number of extensions are discussed, among them a technique for constructing optimal alignments inO(N) space in expectation. This variation gives a practical method for comparing long amino sequences on a small computer. This work was supported in part by NSF Grant DCR-8511455.  相似文献   

15.
16.

Background

Classification and regression tree (CART) models are tree-based exploratory data analysis methods which have been shown to be very useful in identifying and estimating complex hierarchical relationships in ecological and medical contexts. In this paper, a Bayesian CART model is described and applied to the problem of modelling the cryptosporidiosis infection in Queensland, Australia.

Methodology/Principal Findings

We compared the results of a Bayesian CART model with those obtained using a Bayesian spatial conditional autoregressive (CAR) model. Overall, the analyses indicated that the nature and magnitude of the effect estimates were similar for the two methods in this study, but the CART model more easily accommodated higher order interaction effects.

Conclusions/Significance

A Bayesian CART model for identification and estimation of the spatial distribution of disease risk is useful in monitoring and assessment of infectious diseases prevention and control.  相似文献   

17.
18.
The results of four macrophyte assessment methods (French Indice Biologique Macrophytique en Rivière, German Reference Index, British Mean Trophic Rank and Dutch Macrophyte Score) were compared, based on plant survey data of medium-sized lowland streams in Central Europe. To intercalibrate the good quality class boundaries two alternative methods were applied: direct comparison and the use of “common metrics”. While the French and British methods were highly related (R2>0.75), the German RI showed less (0.20<R2<0.55) and the Dutch DMS least correlation (R2<0.10) with other methods. Of 70 macrophyte metrics tested only Ellenberg_N was considerably related to three of the national assessment methods, thus representing a potential common metric for intercalibration. Comparison of quality class boundaries via regression analysis using both intercalibration approaches revealed major differences between classifications of the French, German and British methods, which are, in addition, related in a nonlinear way.  相似文献   

19.
Estimation of evolutionary distances between nucleotide sequences   总被引:11,自引:0,他引:11  
A formal mathematical analysis of the substitution process in nucleotide sequence evolution was done in terms of the Markov process. By using matrix algebra theory, the theoretical foundation of Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) and Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984) methods was provided. Extensive computer simulation was used to compare the accuracy and effectiveness of various methods for estimating the evolutionary distance between two nucleotide sequences. It was shown that the multiparameter methods of Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984), Gojobori et al.'s (J. Mol. Evol. 18:414–422, 1982), and Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) are preferable to others for the purpose of phylogenetic analysis when the sequences are long. However, when sequences are short and the evolutionary distance is large, Tajima and Nei's (Mol. Biol. Evol. 1:269–285, 1984) method is superior to others.  相似文献   

20.
Publication bias is a major concern in conducting systematic reviews and meta-analyses. Various sensitivity analysis or bias-correction methods have been developed based on selection models, and they have some advantages over the widely used trim-and-fill bias-correction method. However, likelihood methods based on selection models may have difficulty in obtaining precise estimates and reasonable confidence intervals, or require a rather complicated sensitivity analysis process. Herein, we develop a simple publication bias adjustment method by utilizing the information on conducted but still unpublished trials from clinical trial registries. We introduce an estimating equation for parameter estimation in the selection function by regarding the publication bias issue as a missing data problem under the missing not at random assumption. With the estimated selection function, we introduce the inverse probability weighting (IPW) method to estimate the overall mean across studies. Furthermore, the IPW versions of heterogeneity measures such as the between-study variance and the I2 measure are proposed. We propose methods to construct confidence intervals based on asymptotic normal approximation as well as on parametric bootstrap. Through numerical experiments, we observed that the estimators successfully eliminated bias, and the confidence intervals had empirical coverage probabilities close to the nominal level. On the other hand, the confidence interval based on asymptotic normal approximation is much wider in some scenarios than the bootstrap confidence interval. Therefore, the latter is recommended for practical use.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号