共查询到20条相似文献,搜索用时 7 毫秒
1.
Summary Recently meta‐analysis has been widely utilized to combine information across multiple studies to evaluate a common effect. Integrating data from similar studies is particularly useful in genomic studies where the individual study sample sizes are not large relative to the number of parameters of interest. In this article, we are interested in developing robust prognostic rules for the prediction of t ‐year survival based on multiple studies. We propose to construct a composite score for prediction by fitting a stratified semiparametric transformation model that allows the studies to have related but not identical outcomes. To evaluate the accuracy of the resulting score, we provide point and interval estimators for the commonly used accuracy measures including the time‐specific receiver operating characteristic curves, and positive and negative predictive values. We apply the proposed procedures to develop prognostic rules for the 5‐year survival of breast cancer patients based on five breast cancer genomic studies. 相似文献
2.
Jürgen Wellmann 《Biometrical journal. Biometrische Zeitschrift》2000,42(2):215-221
An S‐estimator is defined for the one‐way random effects model, analogous to an S‐estimator in the model of i.i.d. random vectors. The estimator resembles the multivariate S‐estimator with respect to existence and weak continuity. The proof of existence of the estimator yields in addition an upper bound for the breakdown point of the S‐estimator of one of the variance components which is rather low. An improvement of the estimator is proposed which overcomes this deficiency. Nevertheless this estimator is an example that new problems of robustness arise in more structured models. 相似文献
3.
Efron-type measures of prediction error for survival analysis 总被引:3,自引:0,他引:3
Estimates of the prediction error play an important role in the development of statistical methods and models, and in their applications. We adapt the resampling tools of Efron and Tibshirani (1997, Journal of the American Statistical Association92, 548-560) to survival analysis with right-censored event times. We find that flexible rules, like artificial neural nets, classification and regression trees, or regression splines can be assessed, and compared to less flexible rules in the same data where they are developed. The methods are illustrated with data from a breast cancer trial. 相似文献
4.
Alexander von Eye Jochen Brandtstdter 《Biometrical journal. Biometrische Zeitschrift》1988,30(6):651-665
Prediction analysis (PA) of cross classifications is characterized as a method for the analysis of local prediction hypotheses, that is, hypotheses that link particular predictor states to particular states of criteria. To evaluate the success of a prediction, PA compares the observed with an expected frequency distribution. The latter is estimated under the assumption of independence between predictors and criteria. When predictors of criteria have ordinal categories, the success of a prediction hypothesis is overestimated if there is a regression of the cell frequencies on the ranks of the variable categories. Using the method of log-linear models, it is shown how ordinal categories can be taken into account in PA. Numerical examples are given from the areas of cognitive development and drug research. 相似文献
5.
Paul Blanche Jean‐François Dartigues Hélène Jacqmin‐Gadda 《Biometrical journal. Biometrische Zeitschrift》2013,55(5):687-704
To quantify the ability of a marker to predict the onset of a clinical outcome in the future, time‐dependent estimators of sensitivity, specificity, and ROC curve have been proposed accounting for censoring of the outcome. In this paper, we review these estimators, recall their assumptions about the censoring mechanism and highlight their relationships and properties. A simulation study shows that marker‐dependent censoring can lead to important biases for the ROC estimators not adapted to this case. A slight modification of the inverse probability of censoring weighting estimators proposed by Uno et al. (2007) and Hung and Chiang (2010a) performs as well as the nearest neighbor estimator of Heagerty et al. (2000) in the simulation study and has interesting practical properties. Finally, the estimators were used to evaluate abilities of a marker combining age and a cognitive test to predict dementia in the elderly. Data were obtained from the French PAQUID cohort. The censoring appears clearly marker‐dependent leading to appreciable differences between ROC curves estimated with the different methods. 相似文献
6.
7.
Human error analysis is certainly a challenge today for all involved in safety and environmental risk assessment. The risk assessment process should not ignore the role of humans in accidental events and the consequences that may derive from human error. This article presents a case study of the Success Likelihood Index Method (SLIM) applied to the Electric Power Company of Serbia (EPCS), with the aim to disclose the importance of human error analysis in risk assessment. A database on work-related injuries, accidents, and critical interventions that occurred over a 10-year period in the EPCS provided the basis for this study. The research comprised analysis of 1074 workplaces, with a total of 3997 employees. A detailed analysis identified 10 typical human errors, performance shaping factors (PSFs), and estimated human error probability (HEP). Based on the obtained research results one can conclude that PSF control remains crucial for human error reduction, and thus prevention of occupational injuries and fatalities (the number of injuries decreased from 58 in 2012 to 44 in 2013, no fatalities recorded). Furthermore, the case study performed at the EPCS confirmed that the SLIM is highly applicable for quantification of human errors, comprehensive, and easy to perform. 相似文献
8.
Ronny Kuhnert Victor J. Del Rio Vilas James Gallagher Dankmar Böhning 《Biometrical journal. Biometrische Zeitschrift》2008,50(6):993-1005
Estimation of a population size by means of capture‐recapture techniques is an important problem occurring in many areas of life and social sciences. We consider the frequencies of frequencies situation, where a count variable is used to summarize how often a unit has been identified in the target population of interest. The distribution of this count variable is zero‐truncated since zero identifications do not occur in the sample. As an application we consider the surveillance of scrapie in Great Britain. In this case study holdings with scrapie that are not identified (zero counts) do not enter the surveillance database. The count variable of interest is the number of scrapie cases per holding. For count distributions a common model is the Poisson distribution and, to adjust for potential heterogeneity, a discrete mixture of Poisson distributions is used. Mixtures of Poissons usually provide an excellent fit as will be demonstrated in the application of interest. However, as it has been recently demonstrated, mixtures also suffer under the so‐called boundary problem, resulting in overestimation of population size. It is suggested here to select the mixture model on the basis of the Bayesian Information Criterion. This strategy is further refined by employing a bagging procedure leading to a series of estimates of population size. Using the median of this series, highly influential size estimates are avoided. In limited simulation studies it is shown that the procedure leads to estimates with remarkable small bias. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献
9.
On fitting Cox's proportional hazards models to survey data 总被引:8,自引:0,他引:8
10.
11.
Assessment of the misclassification error rate is of high practical relevance in many biomedical applications. As it is a complex problem, theoretical results on estimator performance are few. The origin of most findings are Monte Carlo simulations, which take place in the “normal setting”: The covariables of two groups have a multivariate normal distribution; The groups differ in location, but have the same covariance matrix and the linear discriminant function LDF is used for prediction. We perform a new simulation to compare existing nonparametric estimators in a more complex situation. The underlying distribution is based on a logistic model with six binary as well as continuous covariables. To study estimator performance for varying true error rates, three prediction rules including nonparametric classification trees and parametric logistic regression and sample sizes ranging from 100‐1,000 are considered. In contrast to most published papers we turn our attention to estimator performance based on simple, even inappropriate prediction rules and relatively large training sets. For the major part, results are in agreement with usual findings. The most strikingly behavior was seen in applying (simple) classification trees for prediction: Since the apparent error rate Êrr.app is biased, linear combinations incorporating Êrr.app underestimate the true error rate even for large sample sizes. The .632+ estimator, which was designed to correct for the overoptimism of Efron's .632 estimator for nonparametric prediction rules, performs best of all such linear combinations. The bootstrap estimator Êrr.B0 and the crossvalidation estimator Êrr.cv, which do not depend on Êrr.app, seem to track the true error rate. Although the disadvantages of both estimators – pessimism of Êrr.B0 and high variability of Êrr.cv – shrink with increased sample sizes, they are still visible. We conclude that for the choice of a particular estimator the asymptotic behavior of the apparent error rate is important. For the assessment of estimator performance the variance of the true error rate is crucial, where in general the stability of prediction procedures is essential for the application of estimators based on resampling methods. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献
12.
Harrell's c‐index or concordance C has been widely used as a measure of separation of two survival distributions. In the absence of censored data, the c‐index estimates the Mann–Whitney parameter Pr(X>Y), which has been repeatedly utilized in various statistical contexts. In the presence of randomly censored data, the c‐index no longer estimates Pr(X>Y); rather, a parameter that involves the underlying censoring distributions. This is in contrast to Efron's maximum likelihood estimator of the Mann–Whitney parameter, which is recommended in the setting of random censorship. 相似文献
13.
Online Prediction of the Running Time of Tasks 总被引:7,自引:0,他引:7
Peter A. Dinda 《Cluster computing》2002,5(3):225-236
We describe and evaluate the Running Time Advisor (RTA), a system that can predict the running time of a compute-bound task on a typical shared, unreserved commodity host. The prediction is computed from linear time series predictions of host load and takes the form of a confidence interval that neatly expresses the error associated with the measurement and prediction processes – error that must be captured to make statistically valid decisions based on the predictions. Adaptive applications make such decisions in pursuit of consistent high performance, choosing, for example, the host where a task is most likely to meet its deadline. We begin by describing the system and summarizing the results of our previously published work on host load prediction. We then describe our algorithm for computing predictions of running time from host load predictions. We next evaluate the system using over 100,000 randomized testcases run on 39 different hosts, finding that is indeed capable of computing correct and useful confidence intervals. Finally, we report on our experience with using the RTA in application-oriented real-time scheduling in distributed systems. 相似文献
14.
15.
16.
Axel Benner Manuela Zucknick Thomas Hielscher Carina Ittrich Ulrich Mansmann 《Biometrical journal. Biometrische Zeitschrift》2010,52(1):50-69
The Cox proportional hazards regression model is the most popular approach to model covariate information for survival times. In this context, the development of high‐dimensional models where the number of covariates is much larger than the number of observations ( $p \,{\gg }\, n$ ) is an ongoing challenge. A practicable approach is to use ridge penalized Cox regression in such situations. Beside focussing on finding the best prediction rule, one is often interested in determining a subset of covariates that are the most important ones for prognosis. This could be a gene set in the biostatistical analysis of microarray data. Covariate selection can then, for example, be done by L1‐penalized Cox regression using the lasso (Tibshirani ( 1997 ). Statistics in Medicine 16 , 385–395). Several approaches beyond the lasso, that incorporate covariate selection, have been developed in recent years. This includes modifications of the lasso as well as nonconvex variants such as smoothly clipped absolute deviation (SCAD) (Fan and Li ( 2001 ). Journal of the American Statistical Association 96 , 1348–1360; Fan and Li ( 2002 ). The Annals of Statistics 30 , 74–99). The purpose of this article is to implement them practically into the model building process when analyzing high‐dimensional data with the Cox proportional hazards model. To evaluate penalized regression models beyond the lasso, we included SCAD variants and the adaptive lasso (Zou ( 2006 ). Journal of the American Statistical Association 101 , 1418–1429). We compare them with “standard” applications such as ridge regression, the lasso, and the elastic net. Predictive accuracy, features of variable selection, and estimation bias will be studied to assess the practical use of these methods. We observed that the performance of SCAD and adaptive lasso is highly dependent on nontrivial preselection procedures. A practical solution to this problem does not yet exist. Since there is high risk of missing relevant covariates when using SCAD or adaptive lasso applied after an inappropriate initial selection step, we recommend to stay with lasso or the elastic net in actual data applications. But with respect to the promising results for truly sparse models, we see some advantage of SCAD and adaptive lasso, if better preselection procedures would be available. This requires further methodological research. 相似文献
17.
Anne Chao H.‐Y. Pan Shu‐Chuan Chiang 《Biometrical journal. Biometrische Zeitschrift》2008,50(6):957-970
The Petersen–Lincoln estimator has been used to estimate the size of a population in a single mark release experiment. However, the estimator is not valid when the capture sample and recapture sample are not independent. We provide an intuitive interpretation for “independence” between samples based on 2 × 2 categorical data formed by capture/non‐capture in each of the two samples. From the interpretation, we review a general measure of “dependence” and quantify the correlation bias of the Petersen–Lincoln estimator when two types of dependences (local list dependence and heterogeneity of capture probability) exist. An important implication in the census undercount problem is that instead of using a post enumeration sample to assess the undercount of a census, one should conduct a prior enumeration sample to avoid correlation bias. We extend the Petersen–Lincoln method to the case of two populations. This new estimator of the size of the shared population is proposed and its variance is derived. We discuss a special case where the correlation bias of the proposed estimator due to dependence between samples vanishes. The proposed method is applied to a study of the relapse rate of illicit drug use in Taiwan. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献
18.
目的预测金黄色葡萄球菌肠毒素A蛋白(SEA)的B细胞表位。方法以金黄色葡萄球菌合肥乳源分离株M3基因组DNA为模板,PCR扩增SEA基因并进行序列测定与分析。应用DNAstar protean软件对SEA蛋白的二级结构、柔性、亲水性、表面可能性和抗原指数等多参数进行综合分析,预测其B细胞表位。结果M3分离株的SEA基因全长774bp,编码由257个氨基酸组成的相对分子量为29.67kDa的SEA蛋白,M3分离株SEA基因与标准株的核苷酸序列与氨基酸序列同源性分别为98.7%和98.4%。SEA蛋白的优势B细胞表位位于肽链的第64—68、100~107、138—141、156—160、166~173、213~217和237~244区段。结论预测出SEA蛋白的7个优势B细胞表位,为进而克隆表达表位蛋白,制备针对SEA表位的单克隆抗体奠定了基础。 相似文献
19.
Jari Oksanen 《Plant Ecology》1988,74(1):29-32
When the two first eigenvalues of correspondence analysis are close to each other, their order can be reversed due to random variation in the data. The first axis can actually be in any direction in the plane defined by the two axes. However, the configuration of the points in the plane can remain unchanged but their projections onto any line in the plane can be very variable. The ordering in the first axis is preserved in detrending. The second axis is detrended with respect to the first one and therefore very variable configurations result when the orientation of the first axis in the plane is changed. This can lead to a situation where the detrended solutions are very unstable under random variation and therefore they can be only casually interpretable. 相似文献
20.
本文对更一般的结构模型给出了参数的一种常用的仪器变量估计近似分布方差的一种算法.并且给出了未知真值x服从指数分布的例子.此算法对生物科学中统计规律的探讨有一定的应用价值. 相似文献