首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Hanson T  Yang M 《Biometrics》2007,63(1):88-95
Methodology for implementing the proportional odds regression model for survival data assuming a mixture of finite Polya trees (MPT) prior on baseline survival is presented. Extensions to frailties and generalized odds rates are discussed. Although all manner of censoring and truncation can be accommodated, we discuss model implementation, regression diagnostics, and model comparison for right-censored data. An advantage of the MPT model is the relative ease with which predictive densities, survival, and hazard curves are generated. Much discussion is devoted to practical implementation of the proposed models, and a novel MCMC algorithm based on an approximating parametric normal model is developed. A modest simulation study comparing the small sample behavior of the MPT model to a rank-based estimator and a real data example is presented.  相似文献   

2.
Spatially Dependent Polya Tree Modeling for Survival Data   总被引:1,自引:0,他引:1  
Summary With the proliferation of spatially oriented time‐to‐event data, spatial modeling in the survival context has received increased recent attention. A traditional way to capture a spatial pattern is to introduce frailty terms in the linear predictor of a semiparametric model, such as proportional hazards or accelerated failure time. We propose a new methodology to capture the spatial pattern by assuming a prior based on a mixture of spatially dependent Polya trees for the baseline survival in the proportional hazards model. Thanks to modern Markov chain Monte Carlo (MCMC) methods, this approach remains computationally feasible in a fully hierarchical Bayesian framework. We compare the spatially dependent mixture of Polya trees (MPT) approach to the traditional spatial frailty approach, and illustrate the usefulness of this method with an analysis of Iowan breast cancer survival data from the Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute. Our method provides better goodness of fit over the traditional alternatives as measured by log pseudo marginal likelihood (LPML), the deviance information criterion (DIC), and full sample score (FSS) statistics.  相似文献   

3.
Tree-based methods are popular nonparametric tools in studying time-to-event outcomes. In this article, we introduce a novel framework for survival trees and ensembles, where the trees partition the dynamic survivor population and can handle time-dependent covariates. Using the idea of randomized tests, we develop generalized time-dependent receiver operating characteristic (ROC) curves for evaluating the performance of survival trees. The tree-building algorithm is guided by decision-theoretic criteria based on ROC, targeting specifically for prediction accuracy. To address the instability issue of a single tree, we propose a novel ensemble procedure based on averaging martingale estimating equations, which is different from existing methods that average the predicted survival or cumulative hazard functions from individual trees. Extensive simulation studies are conducted to examine the performance of the proposed methods. We apply the methods to a study on AIDS for illustration.  相似文献   

4.
The ROC (receiver operating characteristic) curve is the most commonly used statistical tool for describing the discriminatory accuracy of a diagnostic test. Classical estimation of the ROC curve relies on data from a simple random sample from the target population. In practice, estimation is often complicated due to not all subjects undergoing a definitive assessment of disease status (verification). Estimation of the ROC curve based on data only from subjects with verified disease status may be badly biased. In this work we investigate the properties of the doubly robust (DR) method for estimating the ROC curve under verification bias originally developed by Rotnitzky, Faraggi and Schisterman (2006) for estimating the area under the ROC curve. The DR method can be applied for continuous scaled tests and allows for a non‐ignorable process of selection to verification. We develop the estimator's asymptotic distribution and examine its finite sample properties via a simulation study. We exemplify the DR procedure for estimation of ROC curves with data collected on patients undergoing electron beam computer tomography, a diagnostic test for calcification of the arteries.  相似文献   

5.
Summary In medical research, the receiver operating characteristic (ROC) curves can be used to evaluate the performance of biomarkers for diagnosing diseases or predicting the risk of developing a disease in the future. The area under the ROC curve (ROC AUC), as a summary measure of ROC curves, is widely utilized, especially when comparing multiple ROC curves. In observational studies, the estimation of the AUC is often complicated by the presence of missing biomarker values, which means that the existing estimators of the AUC are potentially biased. In this article, we develop robust statistical methods for estimating the ROC AUC and the proposed methods use information from auxiliary variables that are potentially predictive of the missingness of the biomarkers or the missing biomarker values. We are particularly interested in auxiliary variables that are predictive of the missing biomarker values. In the case of missing at random (MAR), that is, missingness of biomarker values only depends on the observed data, our estimators have the attractive feature of being consistent if one correctly specifies, conditional on auxiliary variables and disease status, either the model for the probabilities of being missing or the model for the biomarker values. In the case of missing not at random (MNAR), that is, missingness may depend on the unobserved biomarker values, we propose a sensitivity analysis to assess the impact of MNAR on the estimation of the ROC AUC. The asymptotic properties of the proposed estimators are studied and their finite‐sample behaviors are evaluated in simulation studies. The methods are further illustrated using data from a study of maternal depression during pregnancy.  相似文献   

6.
We have isolated four members of the Arabidopsis cyclophilin (CyP) gene family, designated ROC1 to ROC4 (rotamase CyP). Deduced peptides of ROC1, 2 and 3 are 75% to 91% identical to Brassica napus cytosolic CyP, contain no leader peptides and include a conserved seven amino-acid insertion relative to mammalian cytosolic CyPs. Two other Arabidopsis CyPs, ROC5 (43H1; ATCYP1) and ROC6 (ATCYP2), share these features. ROC1, ROC2, ROC3 and ROC5 are expressed in all tested organs of light-grown plants. ROC2 and ROC5 show elevated expression in flowers. Expression of ROC1, ROC2, and ROC3 decreases in darkness and these genes also exhibit small elevations in expression upon wounding. The five Arabidopsis genes encoding putative cytosolic CyPs (ROC1, 2, 3, 5 and 6) contain no introns. In contrast, ROC4, which encodes a chloroplast stromal CyP, is interrupted by six introns. ROC4 is not expressed in roots, and is strongly induced by light. Phylogenetic trees of all known CyPs and CyP-related proteins provide evidence of possible horizontal transfer of CyP genes between prokaryotes and eukaryotes and of a possible polyphyletic origin of these proteins within eukaryotes. These trees also show significant grouping of eukaryotic CyPs on the basis of subcellular localization and structure. Mitochondrial CyPs are closely related to cytosolic CyPs of the source organism, but endoplasmic reticulum CyPs form separate clades. Known plant CyPs fall into three clades, one including the majority of higher-plant cytosolic CyPs, one including only ROC2 and a related rice CyP, and one including only chloroplast CyPs.  相似文献   

7.
Many medical diagnostic studies involve three ordinal diagnostic groups in which the diagnostic accuracy can be summarized by the volume or partial volume under a Receiver Operating Characteristic (ROC) surface. We study in this paper the statistical comparison of diagnostic accuracy from multiple diagnostic tests when three ordinal diagnostic groups are involved. Under the assumption that the multiple diagnostic tests follow a multivariate normal distribution within each diagnostic group, we provide the asymptotic variance and covariance for the maximum likelihood estimates of the volumes under the ROC surfaces from multiple diagnostic tests and propose statistical tests to test whether the diagnostic accuracy as measured by the volume under the ROC surface is the same for multiple diagnostic tests. We also propose a confidence interval estimate to the difference of two volumes under two ROC surfaces. Our approach depends crucially on the assumptions of normal distributions on diagnostic tests, which might not be robust when such assumptions are violated. Finally, we apply our proposed methodology to a real data set of 118 subjects to compare the diagnostic accuracy of early stage Alzheimer's disease (AD) from multiple neuropsychological tests.  相似文献   

8.
Null models for generating binary phylogenetic trees are useful for testing evolutionary hypotheses and reconstructing phylogenies. We consider two such null models - the Yule and uniform models - and in particular the induced distribution they generate on the number C(n) of cherries in the tree, where a cherry is a pair of leaves each of which is adjacent to a common ancestor. By realizing the process of cherry formation in these two models by extended Polya urn models we show that C(n) is asymptotically normal. We also give exact formulas for the mean and standard deviation of the C(n) in these two models. This allows simple statistical tests for the Yule and uniform null hypotheses.  相似文献   

9.
10.
T Yu 《PloS one》2012,7(7):e40598
The receiver operating characteristic (ROC) curve is an important tool to gauge the performance of classifiers. In certain situations of high-throughput data analysis, the data is heavily class-skewed, i.e. most features tested belong to the true negative class. In such cases, only a small portion of the ROC curve is relevant in practical terms, rendering the ROC curve and its area under the curve (AUC) insufficient for the purpose of judging classifier performance. Here we define an ROC surface (ROCS) using true positive rate (TPR), false positive rate (FPR), and true discovery rate (TDR). The ROC surface, together with the associated quantities, volume under the surface (VUS) and FDR-controlled area under the ROC curve (FCAUC), provide a useful approach for gauging classifier performance on class-skewed high-throughput data. The implementation as an R package is available at http://userwww.service.emory.edu/~tyu8/ROCS/.  相似文献   

11.
Maximum likelihood (ML) is a widely used criterion for selecting optimal evolutionary trees. However, the nature of the likelihood surface for trees is still not sufficiently understood, especially with regard to the frequency of multiple optima. Here, we initiate an analytic study for identifying sequences that generate multiple optima. We concentrate on the problem of optimizing edge weights for a given tree or trees (as opposed to searching through the space of all trees). We report a new approach to computing ML directly, which we have used to find large families of sequences that have multiple optima, including sequences with a continuum of optimal points. Such data sets are best supported by different (two or more) phylogenies that vary significantly in their timings of evolutionary events. Some standard biological processes can lead to data with multiple optima, and consequently the field needs further investigation. Our results imply that hill-climbing techniques as currently implemented in various software packages cannot guarantee that one will find the global ML point, even if it is unique.  相似文献   

12.
A working guide to boosted regression trees   总被引:33,自引:0,他引:33  
1. Ecologists use statistical models for both explanation and prediction, and need techniques that are flexible enough to express typical features of their data, such as nonlinearities and interactions. 2. This study provides a working guide to boosted regression trees (BRT), an ensemble method for fitting statistical models that differs fundamentally from conventional techniques that aim to fit a single parsimonious model. Boosted regression trees combine the strengths of two algorithms: regression trees (models that relate a response to their predictors by recursive binary splits) and boosting (an adaptive method for combining many simple models to give improved predictive performance). The final BRT model can be understood as an additive regression model in which individual terms are simple trees, fitted in a forward, stagewise fashion. 3. Boosted regression trees incorporate important advantages of tree-based methods, handling different types of predictor variables and accommodating missing data. They have no need for prior data transformation or elimination of outliers, can fit complex nonlinear relationships, and automatically handle interaction effects between predictors. Fitting multiple trees in BRT overcomes the biggest drawback of single tree models: their relatively poor predictive performance. Although BRT models are complex, they can be summarized in ways that give powerful ecological insight, and their predictive performance is superior to most traditional modelling methods. 4. The unique features of BRT raise a number of practical issues in model fitting. We demonstrate the practicalities and advantages of using BRT through a distributional analysis of the short-finned eel (Anguilla australis Richardson), a native freshwater fish of New Zealand. We use a data set of over 13 000 sites to illustrate effects of several settings, and then fit and interpret a model using a subset of the data. We provide code and a tutorial to enable the wider use of BRT by ecologists.  相似文献   

13.
This paper explains the theoretical backgrounds of various methods for the evaluation of diagnostic image quality. It introduces the Receiver Operating Characteristics (ROC) methodology. The major prerequisites, some important features of the experimental set-up and of the data analysis of an ROC study are explained. Dedicated software packages to support ROC studies are introduced. A possible method on how to relate image quality and cost-effectiveness is presented. Finally some current problems of the ROC method and possible improvements are discussed.  相似文献   

14.
The accuracy of a single diagnostic test for binary outcome can be summarized by the area under the receiver operating characteristic (ROC) curve. Volume under the surface and hypervolume under the manifold have been proposed as extensions for multiple class diagnosis (Scurfield, 1996, 1998). However, the lack of simple inferential procedures for such measures has limited their practical utility. Part of the difficulty is that calculating such quantities may not be straightforward, even with a single test. The decision rule used to generate the ROC surface requires class probability assessments, which are not provided by the tests. We develop a method based on estimating the probabilities via some procedure, for example, multinomial logistic regression. Bootstrap inferences are proposed to account for variability in estimating the probabilities and perform well in simulations. The ROC measures are compared to the correct classification rate, which depends heavily on class prevalences. An example of tumor classification with microarray data demonstrates that this property may lead to substantially different analyses. The ROC-based analysis yields notable decreases in model complexity over previous analyses.  相似文献   

15.
We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree. We use the program covSEARCH to demonstrate how the use of adaptively sized pools of candidate trees that are updated using confidence tests results in solution sets that are highly likely to contain the true tree. This approach requires more computation than traditional maximum likelihood methods, hence covSEARCH is best suited to small to medium-sized alignments or large alignments with some constrained nodes. The majority rule consensus tree computed from the confidence sets also proves to be different from the generating topology. Although low phylogenetic signal in the input alignment can result in large confidence sets of trees, some biological information can still be obtained based on nodes that exhibit high support within the confidence set. Two real data examples are analyzed: mammal mitochondrial proteins and a small tubulin alignment. We conclude that the technique of confidence set optimization can significantly improve the robustness of phylogenetic inference at a reasonable computational cost. Additionally, when either very short internal branches or very long terminal branches are present, confident resolution of specific bipartitions or subtrees, rather than whole-tree phylogenies, may be the most realistic goal for phylogenetic methods. [Reviewing Editor: Dr. Nicolas Galtier]  相似文献   

16.
Receiver operating characteristic (ROC) curve is commonly used to evaluate and compare the accuracy of classification methods or markers. Estimating ROC curves has been an important problem in various fields including biometric recognition and diagnostic medicine. In real applications, classification markers are often developed under two or more ordered conditions, such that a natural stochastic ordering exists among the observations. Incorporating such a stochastic ordering into estimation can improve statistical efficiency (Davidov and Herman, 2012). In addition, clustered and correlated data arise when multiple measurements are gleaned from the same subject, making estimation of ROC curves complicated due to within-cluster correlations. In this article, we propose to model the ROC curve using a weighted empirical process to jointly account for the order constraint and within-cluster correlation structure. The algebraic properties of resulting summary statistics of the ROC curve such as its area and partial area are also studied. The algebraic expressions reduce to the ones by Davidov and Herman (2012) for independent observations. We derive asymptotic properties of the proposed order-restricted estimators and show that they have smaller mean-squared errors than the existing estimators. Simulation studies also demonstrate better performance of the newly proposed estimators over existing methods for finite samples. The proposed method is further exemplified with the fingerprint matching data from the National Institute of Standards and Technology Special Database 4.  相似文献   

17.
18.
ABSTRACT: BACKGROUND: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. RESULTS: Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. CONCLUSIONS: The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license.  相似文献   

19.
应用遥感技术进行精细地物信息提取是研究生态系统结构、过程和功能的重要手段之一。由于热带地区生态系统复杂,为精细地物信息提取带来很大的不确定性,极易产生"同物异谱"、"同谱异物"的现象。研究以地处热带地区的海南岛精细地物遥感信息提取为例,在综合分析典型地物光谱特征、空间分布、斑块形状等基础上,构建和优化了水陆指数WLI(Water and Land differing Index)、乔灌草指数GSI(Grass and Shrub differing Index)、旱地-沙地指数SSI(Field and Sand differing Index),并结合新型通用植被指数VIUPD(Vegetation Index of the Universal Pattern Decomposition Method)及DEM(Digital Elevation Model)等多源数据,提出基于决策树的面向对象遥感信息提取方法。该方法首先确定要提取的对象,明确对象类别与对象隶属关系,然后逐层逐项的提取天然林、橡胶林、浆纸林等地物信息。结果表明,综合提取的精度达88%,相比传统的监督分类方法精度(66%)提高22个百分点,精度明显提高。  相似文献   

20.
The receiver operating characteristic (ROC) curve is a popular tool to evaluate and compare the accuracy of diagnostic tests to distinguish the diseased group from the nondiseased group when test results from tests are continuous or ordinal. A complicated data setting occurs when multiple tests are measured on abnormal and normal locations from the same subject and the measurements are clustered within the subject. Although least squares regression methods can be used for the estimation of ROC curve from correlated data, how to develop the least squares methods to estimate the ROC curve from the clustered data has not been studied. Also, the statistical properties of the least squares methods under the clustering setting are unknown. In this article, we develop the least squares ROC methods to allow the baseline and link functions to differ, and more importantly, to accommodate clustered data with discrete covariates. The methods can generate smooth ROC curves that satisfy the inherent continuous property of the true underlying curve. The least squares methods are shown to be more efficient than the existing nonparametric ROC methods under appropriate model assumptions in simulation studies. We apply the methods to a real example in the detection of glaucomatous deterioration. We also derive the asymptotic properties of the proposed methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号