期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Dynamic clinical prediction models for discrete time‐to‐event data with competing risks—A case study on the OUTCOMEREA database

Rachel Heyard Jean‐Franois Timsit Wafa Ibn Essaied Leonhard Held 《Biometrical journal. Biometrische Zeitschrift》2019,61(3):514-534

The development of clinical prediction models requires the selection of suitable predictor variables. Techniques to perform objective Bayesian variable selection in the linear model are well developed and have been extended to the generalized linear model setting as well as to the Cox proportional hazards model. Here, we consider discrete time‐to‐event data with competing risks and propose methodology to develop a clinical prediction model for the daily risk of acquiring a ventilator‐associated pneumonia (VAP) attributed to P. aeruginosa (PA) in intensive care units. The competing events for a PA VAP are extubation, death, and VAP due to other bacteria. Baseline variables are potentially important to predict the outcome at the start of ventilation, but may lose some of their predictive power after a certain time. Therefore, we use a landmark approach for dynamic Bayesian variable selection where the set of relevant predictors depends on the time already spent at risk. We finally determine the direct impact of a variable on each competing event through cause‐specific variable selection. 相似文献

2.

Quantifying the predictive accuracy of time-to-event models in the presence of competing risks

Schoop R Beyersmann J Schumacher M Binder H 《Biometrical journal. Biometrische Zeitschrift》2011,53(1):88-112

Prognostic models for time-to-event data play a prominent role in therapy assignment, risk stratification and inter-hospital quality assurance. The assessment of their prognostic value is vital not only for responsible resource allocation, but also for their widespread acceptance. The additional presence of competing risks to the event of interest requires proper handling not only on the model building side, but also during assessment. Research into methods for the evaluation of the prognostic potential of models accounting for competing risks is still needed, as most proposed methods measure either their discrimination or calibration, but do not examine both simultaneously. We adapt the prediction error proposal of Graf et al. (Statistics in Medicine 1999, 18, 2529–2545) and Gerds and Schumacher (Biometrical Journal 2006, 48, 1029–1040) to handle models with competing risks, i.e. more than one possible event type, and introduce a consistent estimator. A simulation study investigating the behaviour of the estimator in small sample size situations and for different levels of censoring together with a real data application follows. 相似文献

3.

High-dimensional feature selection in competing risks modeling: A stable approach using a split-and-merge ensemble algorithm

Han Sun Xiaofeng Wang 《Biometrical journal. Biometrische Zeitschrift》2023,65(2):2100164

Variable selection is critical in competing risks regression with high-dimensional data. Although penalized variable selection methods and other machine learning-based approaches have been developed, many of these methods often suffer from instability in practice. This paper proposes a novel method named Random Approximate Elastic Net (RAEN). Under the proportional subdistribution hazards model, RAEN provides a stable and generalizable solution to the large-p-small-n variable selection problem for competing risks data. Our general framework allows the proposed algorithm to be applicable to other time-to-event regression models, including competing risks quantile regression and accelerated failure time models. We show that variable selection and parameter estimation improved markedly using the new computationally intensive algorithm through extensive simulations. A user-friendly R package RAEN is developed for public use. We also apply our method to a cancer study to identify influential genes associated with the death or progression from bladder cancer. 相似文献

4.

Concordance indices with left-truncated and right-censored data

Nicholas Hartman Sehee Kim Kevin He John D. Kalbfleisch 《Biometrics》2023,79(3):1624-1634

In the context of time-to-event analysis, a primary objective is to model the risk of experiencing a particular event in relation to a set of observed predictors. The Concordance Index (C-Index) is a statistic frequently used in practice to assess how well such models discriminate between various risk levels in a population. However, the properties of conventional C-Index estimators when applied to left-truncated time-to-event data have not been well studied, despite the fact that left-truncation is commonly encountered in observational studies. We show that the limiting values of the conventional C-Index estimators depend on the underlying distribution of truncation times, which is similar to the situation with right-censoring as discussed in Uno et al. (2011) [On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine 30(10), 1105–1117]. We develop a new C-Index estimator based on inverse probability weighting (IPW) that corrects for this limitation, and we generalize this estimator to settings with left-truncated and right-censored data. The proposed IPW estimators are highly robust to the underlying truncation distribution and often outperform the conventional methods in terms of bias, mean squared error, and coverage probability. We apply these estimators to evaluate a predictive survival model for mortality among patients with end-stage renal disease. 相似文献

5.

Penalized estimation of frailty-based illness–death models for semi-competing risks

Harrison T. Reeder Junwei Lu Sebastien Haneuse 《Biometrics》2023,79(3):1657-1669

Semi-competing risks refer to the time-to-event analysis setting, where the occurrence of a non-terminal event is subject to whether a terminal event has occurred, but not vice versa. Semi-competing risks arise in a broad range of clinical contexts, including studies of preeclampsia, a condition that may arise during pregnancy and for which delivery is a terminal event. Models that acknowledge semi-competing risks enable investigation of relationships between covariates and the joint timing of the outcomes, but methods for model selection and prediction of semi-competing risks in high dimensions are lacking. Moreover, in such settings researchers commonly analyze only a single or composite outcome, losing valuable information and limiting clinical utility—in the obstetric setting, this means ignoring valuable insight into timing of delivery after preeclampsia has onset. To address this gap, we propose a novel penalized estimation framework for frailty-based illness–death multi-state modeling of semi-competing risks. Our approach combines non-convex and structured fusion penalization, inducing global sparsity as well as parsimony across submodels. We perform estimation and model selection via a pathwise routine for non-convex optimization, and prove statistical error rate results in this setting. We present a simulation study investigating estimation error and model selection performance, and a comprehensive application of the method to joint risk modeling of preeclampsia and timing of delivery using pregnancy data from an electronic health record. 相似文献

6.

LNRLMI: Linear neighbour representation for predicting lncRNA‐miRNA interactions

Leon Wong Yu‐An Huang Zhu‐Hong You Zhan‐Heng Chen Mei‐Yuan Cao 《Journal of cellular and molecular medicine》2020,24(1):79-87

LncRNA and miRNA are key molecules in mechanism of competing endogenous RNAs(ceRNA), and their interactions have been discovered with important roles in gene regulation. As supplementary to the identification of lncRNA‐miRNA interactions from CLIP‐seq experiments, in silico prediction can select the most potential candidates for experimental validation. Although developing computational tool for predicting lncRNA‐miRNA interaction is of great importance for deciphering the ceRNA mechanism, little effort has been made towards this direction. In this paper, we propose an approach based on linear neighbour representation to predict lncRNA‐miRNA interactions (LNRLMI). Specifically, we first constructed a bipartite network by combining the known interaction network and similarities based on expression profiles of lncRNAs and miRNAs. Based on such a data integration, linear neighbour representation method was introduced to construct a prediction model. To evaluate the prediction performance of the proposed model, k‐fold cross validations were implemented. As a result, LNRLMI yielded the average AUCs of 0.8475 ± 0.0032, 0.8960 ± 0.0015 and 0.9069 ± 0.0014 on 2‐fold, 5‐fold and 10‐fold cross validation, respectively. A series of comparison experiments with other methods were also conducted, and the results showed that our method was feasible and effective to predict lncRNA‐miRNA interactions via a combination of different types of useful side information. It is anticipated that LNRLMI could be a useful tool for predicting non‐coding RNA regulation network that lncRNA and miRNA are involved in. 相似文献

7.

Estimation of separable direct and indirect effects in continuous time

Torben Martinussen Mats Julius Stensrud 《Biometrics》2023,79(1):127-139

Many research questions involve time-to-event outcomes that can be prevented from occurring due to competing events. In these settings, we must be careful about the causal interpretation of classical statistical estimands. In particular, estimands on the hazard scale, such as ratios of cause-specific or subdistribution hazards, are fundamentally hard to interpret causally. Estimands on the risk scale, such as contrasts of cumulative incidence functions, do have a clear causal interpretation, but they only capture the total effect of the treatment on the event of interest; that is, effects both through and outside of the competing event. To disentangle causal treatment effects on the event of interest and competing events, the separable direct and indirect effects were recently introduced. Here we provide new results on the estimation of direct and indirect separable effects in continuous time. In particular, we derive the nonparametric influence function in continuous time and use it to construct an estimator that has certain robustness properties. We also propose a simple estimator based on semiparametric models for the two cause-specific hazard functions. We describe the asymptotic properties of these estimators and present results from simulation studies, suggesting that the estimators behave satisfactorily in finite samples. Finally, we reanalyze the prostate cancer trial from Stensrud et al. (2020). 相似文献

8.

Dynamic prediction: A challenge for biostatisticians,but greatly needed by patients,physicians and the public

Martin Schumacher Stefanie Hieke Gabriele Ihorst Monika Engelhardt 《Biometrical journal. Biometrische Zeitschrift》2020,62(3):822-835

Prognosis is usually expressed in terms of the probability that a patient will or will not have experienced an event of interest t years after diagnosis of a disease. This quantity, however, is of little informative value for a patient who is still event-free after a number of years. Such a patient would be much more interested in the conditional probability of being event-free in the upcoming t years, given that he/she did not experience the event in the s years after diagnosis, called “conditional survival.” It is the simplest form of a dynamic prediction and can be dealt with using straightforward extensions of standard time-to-event analyses in clinical cohort studies. For a healthy individual, a related problem with further complications is the so-called “age-conditional probability of developing cancer” in the next t years. Here, the competing risk of dying from other diseases has to be taken into account. For both situations, the hazard function provides the central dynamic concept, which can be further extended in a natural way to build dynamic prediction models that incorporate both baseline and time-dependent characteristics. Such models are able to exploit the most current information accumulating over time in order to accurately predict the further course or development of a disease. In this article, the biostatistical challenges as well as the relevance and importance of dynamic prediction are illustrated using studies of multiple myeloma, a hematologic malignancy with a formerly rather poor prognosis which has improved over the last few years. 相似文献

9.

基于DNA变异的中国汉族人群脱发表型推断及预测模型评估

薛思瑶李彩霞贠克明丛斌赵雯婷《生物化学与生物物理进展》2022,49(7):1348-1357

目的男性型脱发（male pattern baldness,MPB）,又称为雄激素性脱发（AGA）,是一种常见的男性脱发类型,大约80%的表型差异可以用遗传因素解释。目前的MPB遗传推断研究主要基于欧洲人群,东亚人群相关研究较少。本研究在中国人群中对欧洲人群MPB关联位点进行验证分析,并建立遗传推断模型。方法本研究调查了486个与欧洲人群MPB相关单核苷酸多态性（SNP）位点在312名中国汉族男性中的关联性,分别使用逐步回归和Lasso回归方法对关联出的位点进行筛选。使用逻辑回归算法构建预测模型,通过十折交叉验证的方法评估。之后进一步比较了逻辑回归、k近邻分类器、随机森林、支持向量机4种常用分类器模型对MPB的预测准确性。结果有174个SNP位点与中国汉族男性的MPB显著相关（P＜0.05）。通过不同的筛选方法,分别得到了22个SNP和25个SNP的位点集合。基于上述位点集合建立了22-SNP和 25-SNP两种逻辑回归预测模型。以AUC（ROC曲线下方的面积大小,area under curve）来衡量,两种模型对MPB预测的准确性分别为0.85和0.84;经十折交叉验证后预测准确性分别下降至0.81和0.77。当加入年龄作为预测因子后,两种模型的AUC均达到最大值0.89。从运行结果来看,逻辑回归预测模型较本研究中的其他分类器模型具有明显优势。结论总体而言,虽然预测模型的准确性尚未达到临床期望水平,但SNP在MPB的遗传预测方面仍具备很大的潜力,可以为MPB的早期诊断、临床干预和法庭科学应用提供参考。相似文献

10.

Dynamic prediction of cumulative incidence functions by direct binomial regression

下载免费PDF全文

Mia K. Grand Theo J. M. de Witte Hein Putter 《Biometrical journal. Biometrische Zeitschrift》2018,60(4):734-747

In recent years there have been a series of advances in the field of dynamic prediction. Among those is the development of methods for dynamic prediction of the cumulative incidence function in a competing risk setting. These models enable the predictions to be updated as time progresses and more information becomes available, for example when a patient comes back for a follow‐up visit after completing a year of treatment, the risk of death, and adverse events may have changed since treatment initiation. One approach to model the cumulative incidence function in competing risks is by direct binomial regression, where right censoring of the event times is handled by inverse probability of censoring weights. We extend the approach by combining it with landmarking to enable dynamic prediction of the cumulative incidence function. The proposed models are very flexible, as they allow the covariates to have complex time‐varying effects, and we illustrate how to investigate possible time‐varying structures using Wald tests. The models are fitted using generalized estimating equations. The method is applied to bone marrow transplant data and the performance is investigated in a simulation study. 相似文献

11.

Development and Validation of a Clinical Scoring System for Predicting Risk of HCC in Asymptomatic Individuals Seropositive for Anti-HCV Antibodies

Mei-Hsuan Lee Sheng-Nan Lu Yong Yuan Hwai-I Yang Chin-Lan Jen San-Lin You Li-Yu Wang Gilbert L'Italien Chien-Jen Chen for the R.E.V.E.A.L.-HCV Study Group 《PloS one》2014,9(5)

Background

The development of a risk assessment tool for long-term hepatocellular carcinoma risk would be helpful in identifying high-risk patients and providing information of clinical consultation.

Methods

The model derivation and validation cohorts consisted of 975 and 572 anti-HCV seropositives, respectively. The model included age, alanine aminotransferase (ALT), the ratio of aspirate aminotransferase to ALT, serum HCV RNA levels and cirrhosis status and HCV genotype. Two risk prediction models were developed: one was for all-anti-HCV seropositives, and the other was for anti-HCV seropositives with detectable HCV RNA. The Cox''s proportional hazards models were utilized to estimate regression coefficients of HCC risk predictors to derive risk scores. The cumulative HCC risks in the validation cohort were estimated by Kaplan-Meier methods. The area under receiver operating curve (AUROC) was used to evaluate the performance of the risk models.

Results

All predictors were significantly associated with HCC. The summary risk scores of two models derived from the derivation cohort had predictability of HCC risk in the validation cohort. The summary risk score of the two risk prediction models clearly divided the validation cohort into three groups (p<0.001). The AUROC for predicting 5-year HCC risk in the validation cohort was satisfactory for the two models, with 0.73 and 0.70, respectively.

Conclusion

Scoring systems for predicting HCC risk of HCV-infected patients had good validity and discrimination capability, which may triage patients for alternative management strategies. 相似文献

12.

Competing Risks and Time‐Dependent Covariates

Giuliana Cortese Per K. Andersen 《Biometrical journal. Biometrische Zeitschrift》2010,52(1):138-158

Time‐dependent covariates are frequently encountered in regression analysis for event history data and competing risks. They are often essential predictors, which cannot be substituted by time‐fixed covariates. This study briefly recalls the different types of time‐dependent covariates, as classified by Kalbfleisch and Prentice [The Statistical Analysis of Failure Time Data, Wiley, New York, 2002] with the intent of clarifying their role and emphasizing the limitations in standard survival models and in the competing risks setting. If random (internal) time‐dependent covariates are to be included in the modeling process, then it is still possible to estimate cause‐specific hazards but prediction of the cumulative incidences and survival probabilities based on these is no longer feasible. This article aims at providing some possible strategies for dealing with these prediction problems. In a multi‐state framework, a first approach uses internal covariates to define additional (intermediate) transient states in the competing risks model. Another approach is to apply the landmark analysis as described by van Houwelingen [Scandinavian Journal of Statistics 2007, 34 , 70–85] in order to study cumulative incidences at different subintervals of the entire study period. The final strategy is to extend the competing risks model by considering all the possible combinations between internal covariate levels and cause‐specific events as final states. In all of those proposals, it is possible to estimate the changes/differences of the cumulative risks associated with simple internal covariates. An illustrative example based on bone marrow transplant data is presented in order to compare the different methods. 相似文献

13.

An overview of techniques for linking high-dimensional molecular data to time-to-event endpoints by risk prediction models

Binder H Porzelius C Schumacher M 《Biometrical journal. Biometrische Zeitschrift》2011,53(2):170-189

Analysis of molecular data promises identification of biomarkers for improving prognostic models, thus potentially enabling better patient management. For identifying such biomarkers, risk prediction models can be employed that link high-dimensional molecular covariate data to a clinical endpoint. In low-dimensional settings, a multitude of statistical techniques already exists for building such models, e.g. allowing for variable selection or for quantifying the added value of a new biomarker. We provide an overview of techniques for regularized estimation that transfer this toward high-dimensional settings, with a focus on models for time-to-event endpoints. Techniques for incorporating specific covariate structure are discussed, as well as techniques for dealing with more complex endpoints. Employing gene expression data from patients with diffuse large B-cell lymphoma, some typical modeling issues from low-dimensional settings are illustrated in a high-dimensional application. First, the performance of classical stepwise regression is compared to stage-wise regression, as implemented by a component-wise likelihood-based boosting approach. A second issues arises, when artificially transforming the response into a binary variable. The effects of the resulting loss of efficiency and potential bias in a high-dimensional setting are illustrated, and a link to competing risks models is provided. Finally, we discuss conditions for adequately quantifying the added value of high-dimensional gene expression measurements, both at the stage of model fitting and when performing evaluation. 相似文献

14.

Asymptotic Interval Estimation of Some Cause-Specific Mortality Risk Measures in Epidemiologic Studies

B. Raja Rao Gary M. Marsh Joan Winwood 《Biometrical journal. Biometrische Zeitschrift》1989,31(4):461-475

The present paper describes the algebraic and statistical relationships between the population-based mortality risk measures, the Standardized Mortality Ratio (SMR) and the Standardized Risk Ratio (SRR), and their respective proportional mortality-based counterparts, the internally and externally Standardized Proportional Mortality Ratios (SPMR, S_ePMR). The paper shows, how under some reasonable assumptions, asymptotically precise inferences about population-based risk measures can be made from studies of proportional mortality. Through application of the asymptotic multivariate normal approximation to the multinomial distribution, shortest confidence intervals for the relative SMR (RSMR) involving the corresponding SPMR are constructed for any cause of death. This same technique is also used to construct asymptotic prediction intervals about the cause-specific S_ePMR which, with high probability, contains the corresponding relative SRR (RSRR). The utility of the proportional mortality measures SPMR and S_ePMR as estimators of the corresponding cause-specific risks RSMR and RSRR in occupational epidemiologic research is empirically evaluated using data from two recent occupational cohort studies. Asymptotic Bonferroni type simultaneous inferential techniques are also developed for these measures which facilitate the assessment of overall risk in the presence of several competing factors. 相似文献

15.

Testing prediction accuracy in short-term ecological studies

《Basic and Applied Ecology》2020

Applied ecology is based on an assumption that a management action will result in a predicted outcome. Testing the prediction accuracy of ecological models is the most powerful way of evaluating the knowledge implicit in this cause-effect relationship, however, the prevalence of predictive modeling and prediction testing are spreading slowly in ecology. The challenge of prediction testing is particularly acute for small-scale studies, because withholding data for prediction testing (e.g., via k-fold cross validation) can reduce model precision. However, by necessity small-scale studies are common. We use one such study that explored small mammal abundance along an elevational gradient to test prediction accuracy of models with varying degrees of information content. For each of three small mammal species, we conducted 5000 iterations of the following process: (1) randomly selected 75 % of the data to develop generalized linear models of species abundance that used detailed site measurements as covariates, (2) used an information theoretic approach to compare the top model with detailed covariates to habitat type-only and null models constructed with the same data, (3) tested those models’ ability to predict the 25 % of the randomly withheld data, and (4) evaluated prediction accuracy with a quadratic loss function. Detailed models fit the model-evaluation data best but had greater expected prediction error when predicting out-of-sample data relative to the habitat type models. Relationships between species and detailed site variables may be evident only within the framework of explicitly hierarchical analyses. We show that even with a small but relatively typical dataset (n = 28 sampling locations across 125 km over two years), researchers can effectively compare models with different information content and measure models’ predictive power, thus evaluating their own ecological understanding and defining the limits of their inferences. Identifying the appropriate scope of inference through prediction testing is ecologically valuable and is attainable even with small datasets. 相似文献

16.

ROC-guided survival trees and ensembles

Yifei Sun Sy Han Chiou Mei-Cheng Wang 《Biometrics》2020,76(4):1177-1189

Tree-based methods are popular nonparametric tools in studying time-to-event outcomes. In this article, we introduce a novel framework for survival trees and ensembles, where the trees partition the dynamic survivor population and can handle time-dependent covariates. Using the idea of randomized tests, we develop generalized time-dependent receiver operating characteristic (ROC) curves for evaluating the performance of survival trees. The tree-building algorithm is guided by decision-theoretic criteria based on ROC, targeting specifically for prediction accuracy. To address the instability issue of a single tree, we propose a novel ensemble procedure based on averaging martingale estimating equations, which is different from existing methods that average the predicted survival or cumulative hazard functions from individual trees. Extensive simulation studies are conducted to examine the performance of the proposed methods. We apply the methods to a study on AIDS for illustration. 相似文献

17.

Competing Risk Analysis of Censored Survival Data

Lonita B. Spivey Alan J. Gross 《Biometrical journal. Biometrische Zeitschrift》1987,29(7):795-806

A competing risk model is developed to accommodate both planned Type I censoring and random withdrawals. MLE's, their properties, confidence regions for parameters and mean lifetimes are obtained for a model regarding random censoring as a competing risk and compared to those obtained for the model in which withdrawals are regarded as random censoring. Estimated net and crude probabilities are calculated and compared for the two models. The model is developed for two competing risks, one following a Weibull distribution and the other a Rayleigh distribution, and random withdrawals following a Weibull distribution. 相似文献

18.

Using matrix assisted laser desorption ionisation mass spectrometry (MALDI-MS) profiling in order to predict clinical outcomes of patients with heart failure

Thong Huy Cao Donald J. L. Jones Paulene A. Quinn Daniel Chu Siong Chan Narayan Hafid Helen M. Parry Mohapradeep Mohan Jatinderpal K. Sandhu Stefan D. Anker John G. Cleland Kenneth Dickstein Gerasimos Filippatos Hans L. Hillege Marco Metra Piotr Ponikowski Nilesh J. Samani Dirk J. Van Veldhuisen Faiez Zannad Aeilko H. Zwinderman Adriaan A. Voors Chim C. Lang Leong L. Ng 《Clinical proteomics》2018,15(1):35

Background

Current risk prediction models in heart failure (HF) including clinical characteristics and biomarkers only have moderate predictive value. The aim of this study was to use matrix assisted laser desorption ionisation mass spectrometry (MALDI-MS) profiling to determine if a combination of peptides identified with MALDI-MS will better predict clinical outcomes of patients with HF.

Methods

A cohort of 100 patients with HF were recruited in the biomarker discovery phase (50 patients who died or had a HF hospital admission vs. 50 patients who did not have an event). The peptide extraction from plasma samples was performed using reversed phase C18. Then samples were analysed using MALDI-MS. A multiple peptide biomarker model was discovered that was able to predict clinical outcomes for patients with HF. Finally, this model was validated in an independent cohort with 100 patients with HF.

Results

After normalisation and alignment of all the processed spectra, a total of 11,389 peptides (m/z) were detected using MALDI-MS. A multiple biomarker model was developed from 14 plasma peptides that was able to predict clinical outcomes in HF patients with an area under the receiver operating characteristic curve (AUC) of 1.000 (p?=?0.0005). This model was validated in an independent cohort with 100 HF patients that yielded an AUC of 0.817 (p?=?0.0005) in the biomarker validation phase. Addition of this model to the BIOSTAT risk prediction model increased the predictive probability for clinical outcomes of HF from an AUC value of 0.643 to an AUC of 0.823 (p?=?0.0021). Moreover, using the prediction model of fourteen peptides and the composite model of the multiple biomarker of fourteen peptides with the BIOSTAT risk prediction model achieved a better predictive probability of time-to-event in prediction of clinical events in patients with HF (p?=?0.0005).

Conclusions

The results obtained in this study suggest that a cluster of plasma peptides using MALDI-MS can reliably predict clinical outcomes in HF that may help enable precision medicine in HF.

相似文献

19.

Selective prediction-set models with coverage rate guarantees

Jean Feng Arjun Sondhi Jessica Perry Noah Simon 《Biometrics》2023,79(2):811-825

The current approach to using machine learning (ML) algorithms in healthcare is to either require clinician oversight for every use case or use their predictions without any human oversight. We explore a middle ground that lets ML algorithms abstain from making a prediction to simultaneously improve their reliability and reduce the burden placed on human experts. To this end, we present a general penalized loss minimization framework for training selective prediction-set (SPS) models, which choose to either output a prediction set or abstain. The resulting models abstain when the outcome is difficult to predict accurately, such as on subjects who are too different from the training data, and achieve higher accuracy on those they do give predictions for. We then introduce a model-agnostic, statistical inference procedure for the coverage rate of an SPS model that ensembles individual models trained using K-fold cross-validation. We find that SPS ensembles attain prediction-set coverage rates closer to the nominal level and have narrower confidence intervals for its marginal coverage rate. We apply our method to train neural networks that abstain more for out-of-sample images on the MNIST digit prediction task and achieve higher predictive accuracy for ICU patients compared to existing approaches. 相似文献

20.

Genomic prediction of seedling root length in maize (Zea mays L.)

下载免费PDF全文

Jordon Pace Xiaoqing Yu Thomas Lübberstedt 《The Plant journal : for cell and molecular biology》2015,83(5):903-912

Genotypes with extreme phenotypes are valuable for studying ‘difficult’ quantitative traits. Genomic prediction (GP) might allow the identification of such extremes by phenotyping a training population of limited size and predicting genotypes with extreme phenotypes in large sequences of germplasm collections. We tested this approach employing seedling root traits in maize and the extensively genotyped Ames Panel. A training population made up of 384 inbred lines from the Ames Panel was phenotyped by extracting root traits from images using the software program aria . A ridge regression best linear unbiased prediction strategy was used to train a GP model. Genomic estimated breeding values for the trait ‘total root length’ (TRL) were predicted for 2431 inbred lines, which had previously been genotyped by sequencing. Selections were made for 100 extreme TRL lines and those with the predicted longest or shortest TRL were validated for TRL and other root traits. The two predicted extreme groups with regard to TRL were significantly different (P = 0.0001). The difference in predicted means for TRL between groups was 145.1 cm and 118.7 cm for observed means, which were significantly different (P = 0.001). The accuracy of predicting the rank between 1 and 200 of the validation population based on TRL (longest to shortest) was determined using a Spearman correlation to be ρ = 0.55. Taken together, our results support the idea that GP may be a useful approach for identifying the most informative genotypes in sequenced germplasm collections to facilitate experiments for quantitative inherited traits. 相似文献