期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Additive risk models for survival data with high-dimensional covariates

Ma S Kosorok MR Fine JP 《Biometrics》2006,62(1):202-210

As a useful alternative to Cox's proportional hazard model, the additive risk model assumes that the hazard function is the sum of the baseline hazard function and the regression function of covariates. This article is concerned with estimation and prediction for the additive risk models with right censored survival data, especially when the dimension of the covariates is comparable to or larger than the sample size. Principal component regression is proposed to give unique and numerically stable estimators. Asymptotic properties of the proposed estimators, component selection based on the weighted bootstrap, and model evaluation techniques are discussed. This approach is illustrated with analysis of the primary biliary cirrhosis clinical data and the diffuse large B-cell lymphoma genomic data. It is shown that this methodology is numerically stable and effective in dimension reduction, while still being able to provide satisfactory prediction and classification results. 相似文献

2.

Testing the additional predictive value of high-dimensional molecular data

Anne-Laure Boulesteix Torsten Hothorn 《BMC bioinformatics》2010,11(1):78

Background

While high-dimensional molecular data such as microarray gene expression data have been used for disease outcome prediction or diagnosis purposes for about ten years in biomedical research, the question of the additional predictive value of such data given that classical predictors are already available has long been under-considered in the bioinformatics literature. 相似文献

3.

A non-parametric method for building predictive genetic tests on high-dimensional data 总被引：1，自引：0，他引：1

Ye C Cui Y Wei C Elston RC Zhu J Lu Q 《Human heredity》2011,71(3):161-170

相似文献

4.

Improved performance on high-dimensional survival data by application of Survival-SVM

Van Belle V Pelckmans K Van Huffel S Suykens JA 《Bioinformatics (Oxford, England)》2011,27(1):87-94

MOTIVATION: New application areas of survival analysis as for example based on micro-array expression data call for novel tools able to handle high-dimensional data. While classical (semi-) parametric techniques as based on likelihood or partial likelihood functions are omnipresent in clinical studies, they are often inadequate for modelling in case when there are less observations than features in the data. Support vector machines (svms) and extensions are in general found particularly useful for such cases, both conceptually (non-parametric approach), computationally (boiling down to a convex program which can be solved efficiently), theoretically (for its intrinsic relation with learning theory) as well as empirically. This article discusses such an extension of svms which is tuned towards survival data. A particularly useful feature is that this method can incorporate such additional structure as additive models, positivity constraints of the parameters or regression constraints. RESULTS: Besides discussion of the proposed methods, an empirical case study is conducted on both clinical as well as micro-array gene expression data in the context of cancer studies. Results are expressed based on the logrank statistic, concordance index and the hazard ratio. The reported performances indicate that the present method yields better models for high-dimensional data, while it gives results which are comparable to what classical techniques based on a proportional hazard model give for clinical data. 相似文献

5.

A direct method to evaluate the time‐dependent predictive accuracy for biomarkers

下载免费PDF全文

Weining Shen Jing Ning Ying Yuan 《Biometrics》2015,71(2):439-449

相似文献

6.

Discovering biclusters in gene expression data based on high-dimensional linear geometries

Xiangchao Gan Alan Wee-Chung Liew Hong Yan 《BMC bioinformatics》2008,9(1):209

相似文献

7.

A ridge penalized principal-components approach based on heritability for high-dimensional data

Wang Y Fang Y Jin M 《Human heredity》2007,64(3):182-191

相似文献

8.

Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data

Amalia Annest Roger E Bumgarner Adrian E Raftery Ka Yee Yeung 《BMC bioinformatics》2009,10(1):72

Background

Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes. 相似文献

9.

An overview of techniques for linking high-dimensional molecular data to time-to-event endpoints by risk prediction models

Binder H Porzelius C Schumacher M 《Biometrical journal. Biometrische Zeitschrift》2011,53(2):170-189

Analysis of molecular data promises identification of biomarkers for improving prognostic models, thus potentially enabling better patient management. For identifying such biomarkers, risk prediction models can be employed that link high-dimensional molecular covariate data to a clinical endpoint. In low-dimensional settings, a multitude of statistical techniques already exists for building such models, e.g. allowing for variable selection or for quantifying the added value of a new biomarker. We provide an overview of techniques for regularized estimation that transfer this toward high-dimensional settings, with a focus on models for time-to-event endpoints. Techniques for incorporating specific covariate structure are discussed, as well as techniques for dealing with more complex endpoints. Employing gene expression data from patients with diffuse large B-cell lymphoma, some typical modeling issues from low-dimensional settings are illustrated in a high-dimensional application. First, the performance of classical stepwise regression is compared to stage-wise regression, as implemented by a component-wise likelihood-based boosting approach. A second issues arises, when artificially transforming the response into a binary variable. The effects of the resulting loss of efficiency and potential bias in a high-dimensional setting are illustrated, and a link to competing risks models is provided. Finally, we discuss conditions for adequately quantifying the added value of high-dimensional gene expression measurements, both at the stage of model fitting and when performing evaluation. 相似文献

10.

Using pathology data to evaluate surgical backlogs: considerations for resource planning

Christopher Tran Mike Kadour Matthew J. Cecchini Ken A. Leslie David K. Driman 《CMAJ》2021,193(10):E343

相似文献

11.

Simulating biological impairment to evaluate the accuracy of ecological indicators 总被引：2，自引：0，他引：2

YONG CAO CHARLES P. HAWKINS 《Journal of Applied Ecology》2005,42(5):954-965

相似文献

12.

Using incidental mark‐encounter data to improve survival estimation

Seth M. Harju Scott M. Cambrin Roy C. Averill‐Murray Melia Nafus Kimberleigh J. Field Linda J. Allison 《Ecology and evolution》2020,10(1):360-370

Obtaining robust survival estimates is critical, but sample size limitations often result in imprecise estimates or the failure to obtain estimates for population subgroups. Concurrently, data are often recorded on incidental reencounters of marked individuals, but these incidental data are often unused in survival analyses.
We evaluated the utility of supplementing a traditional survival dataset with incidental data on marked individuals that were collected ad hoc. We used a continuous time‐to‐event exponential survival model to leverage the matching information contained in both datasets and assessed differences in survival among adult and juvenile and resident and translocated Mojave desert tortoises (Gopherus agassizii).
Incorporation of the incidental mark‐encounter data improved precision of all annual survival point estimates, with a 3.4%–37.5% reduction in the spread of the 95% Bayesian credible intervals. We were able to estimate annual survival for three subgroup combinations that were previously inestimable. Point estimates between the radiotelemetry and combined datasets were within |0.029| percentage points of each other, suggesting minimal to no bias induced by the incidental data.
Annual survival rates were high (>0.89) for resident adult and juvenile tortoises in both study sites and for translocated adults in the southern site. Annual survival rates for translocated juveniles at both sites and translocated adults in the northern site were between 0.73 and 0.76. At both sites, translocated adults and juveniles had significantly lower survival than resident adults. High mortality in the northern site was driven primarily by a single pulse in mortalities.
Using exponential survival models to leverage matching information across traditional survival studies and incidental data on marked individuals may serve as a useful tool to improve the precision and estimability of survival rates. This can improve the efficacy of understanding basic population ecology and population monitoring for imperiled species.

相似文献

13.

Assessment of survival prediction models based on microarray data 总被引：1，自引：0，他引：1

Schumacher M Binder H Gerds T 《Bioinformatics (Oxford, England)》2007,23(14):1768-1774

MOTIVATION: In the process of developing risk prediction models, various steps of model building and model selection are involved. If this process is not adequately controlled, overfitting may result in serious overoptimism leading to potentially erroneous conclusions. METHODS: For right censored time-to-event data, we estimate the prediction error for assessing the performance of a risk prediction model (Gerds and Schumacher, 2006; Graf et al., 1999). Furthermore, resampling methods are used to detect overfitting and resulting overoptimism and to adjust the estimates of prediction error (Gerds and Schumacher, 2007). RESULTS: We show how and to what extent the methodology can be used in situations characterized by a large number of potential predictor variables where overfitting may be expected to be overwhelming. This is illustrated by estimating the prediction error of some recently proposed techniques for fitting a multivariate Cox regression model applied to the data of a prognostic study in patients with diffuse large-B-cell lymphoma (DLBCL). AVAILABILITY: Resampling-based estimation of prediction error curves is implemented in an R package called pec available from the authors. 相似文献

14.

Using empirical likelihood to combine data: application to food risk assessment

Crépet A Harari-Kermadec H Tressou J 《Biometrics》2009,65(1):257-266

Summary . This article introduces an original methodology based on empirical likelihood, which aims at combining different food contamination and consumption surveys to provide risk managers with a risk measure, taking into account all the available information. This risk index is defined as the probability that exposure to a contaminant exceeds a safe dose. It is naturally expressed as a nonlinear functional of the different consumption and contamination distributions, more precisely as a generalized U-statistic. This nonlinearity and the huge size of the data sets make direct computation of the problem unfeasible. Using linearization techniques and incomplete versions of the U-statistic, a tractable "approximated" empirical likelihood program is solved yielding asymptotic confidence intervals for the risk index. An alternative "Euclidean likelihood program" is also considered, replacing the Kullback–Leibler distance involved in the empirical likelihood by the Euclidean distance. Both methodologies are tested on simulated data and applied to assess the risk due to the presence of methyl mercury in fish and other seafood. 相似文献

15.

Using conjoint analysis to weight broiler welfare variables based on slaughterhouse data

Tuunainen P Valaja J Valkonen E Hepola H 《Journal of applied animal welfare science : JAAWS》2012,15(1):70-79

The Council of the European Union adopted the welfare directive for broiler chickens in 2007 (Commission of the European Communities, 2007). The directive defines minimum demands and objectives for broiler production ranging from stocking density to welfare. This study found that the level of nonhuman animal welfare can be determined, for example, with welfare indexes that have different weight values for welfare variables. A total of 20 poultry experts from slaughterhouses, the feed industry, administration, and research received simulated data on 24 flocks. This included data for 5 welfare variables based on postmortem meat inspection information on commercial flocks and based on literature: mortality during rearing, foot-pad score, and the incidence of ascites, dermatitis, and cachexia in routine meat inspection. The experts ranked these flocks from the worst to the best according to animal welfare. Conjoint analysis weighted the relative importance of the welfare variables for broiler welfare. The study based relative importance for broiler welfare on 20 expert opinions: 24.6% for mortality, 35.9% for foot-pad score, 19.7% for the incidence of ascites, 15.7% for the incidence of dermatitis, and 4.1% for the incidence of cachexia. In the future, an index for broiler welfare evaluation could be developed based on the results of this study. 相似文献

16.

Using automated electronic medical record data extraction to model ALS survival and progression

Alex G. Karanevich Luke J. Weisbrod Omar Jawdat Richard J. Barohn Byron J. Gajewski Jianghua He Jeffrey M. Statland 《BMC neurology》2018,18(1):205

Background

To assess the feasibility of using automated capture of Electronic Medical Record (EMR) data to build predictive models for amyotrophic lateral sclerosis (ALS) outcomes.

Methods

We used an Informatics for Integrating Biology and the Bedside search discovery tool to identify and extract data from 354 ALS patients from the University of Kansas Medical Center EMR. The completeness and integrity of the data extraction were verified by manual chart review. A linear mixed model was used to model disease progression. Cox proportional hazards models were used to investigate the effects of BMI, gender, and age on survival.

Results

Data extracted from the EMR was sufficient to create simple models of disease progression and survival. Several key variables of interest were unavailable without including a manual chart review. The average ALS Functional Rating Scale – Revised (ALSFRS-R) baseline score at first clinical visit was 34.08, and average decline was ??0.64 per month. Median survival was 27?months after first visit. Higher baseline ALSFRS-R score and BMI were associated with improved survival, higher baseline age was associated with decreased survival.

Conclusions

This study serves to show that EMR-captured data can be extracted and used to track outcomes in an ALS clinic setting, potentially important for post-marketing research of new drugs, or as historical controls for future studies. However, as automated EMR-based data extraction becomes more widely used there will be a need to standardize ALS data elements and clinical forms for data capture so data can be pooled across academic centers.

相似文献

17.

Microarray-based method to evaluate the accuracy of restriction endonucleases HpaII and MspI

Hou P Ji M He N Lu Z 《Biochemical and biophysical research communications》2004,314(1):110-117

A double-strand DNA (ds DNA) microarray was fabricated to analyze the structural perturbations caused by methylation and the different base mismatches in the interaction of the restriction endonucleases HpaII and MspI with DNA. First, a series of synthesized oligonucleotides were arrayed on the aldehyde-coated glass slides. Second, these oligonucleotides were hybridized with target sequences to obtain ds DNA microarray, which includes several types of double strands with the fully methylated, semi-methylated, and unmethylated canonical recognition sequences, semi-methylated and unmethylated base mismatches within the recognition sequences. The cleavage experiments were carried out under normal buffer conditions. The results indicated that MspI could partially cleave methylated and semi-methylated canonical recognition sequences. In contrast, HpaII could not cleave methylated and semi-methylated canonical recognition sequences. HpaII and MspI could both cleave the unmethylated canonical recognition sequence. However, HpaII could partially cleave the sequence containing one GG mismatch and not cleave other base mismatches in the corresponding recognition site. In contrast, MspI could not recognize the base mismatches within the recognition sequence. A good reproducibility was observed in several parallel experiments. The experiment indicates that the microarray technology has great potentials in high-throughput identifying important interactions between protein and DNA. 相似文献

18.

A refined accuracy index to evaluate algorithms of protein secondary structure prediction 总被引：3，自引：0，他引：3

Zhang CT Zhang R 《Proteins》2001,43(4):520-522

Nowadays even a 1% increase of the accuracy for the secondary structure prediction is considered remarkable progress. In this case, we have to consider the reasonableness of the accuracy index Q3, which is used widely. A refined accuracy index, called Q8, is proposed to evaluate algorithms of secondary structure prediction. It is shown that Q8 is superior to the widely used index Q3 in that the former carries more information of the predictive accuracy matrix than does the latter. Therefore, algorithms are evaluated more objectively by Q8 than Q3. Based on 396 nonhomologous proteins, five currently available algorithms of secondary structure prediction were evaluated and compared using the new index Q8. Of the five algorithms, PHD turned out to be the unique algorithm, with Q8 accuracy better than 70%. It is suggested that Q3 should be replaced by Q8 in evaluating secondary structure prediction in future studies. 相似文献

19.

Using simulations to evaluate Mantel‐based methods for assessing landscape resistance to gene flow

下载免费PDF全文

Katherine A. Zeller Tyler G. Creech Katie L. Millette Rachel S. Crowhurst Robert A. Long Helene H. Wagner Niko Balkenhol Erin L. Landguth 《Ecology and evolution》2016,6(12):4115-4128

Mantel‐based tests have been the primary analytical methods for understanding how landscape features influence observed spatial genetic structure. Simulation studies examining Mantel‐based approaches have highlighted major challenges associated with the use of such tests and fueled debate on when the Mantel test is appropriate for landscape genetics studies. We aim to provide some clarity in this debate using spatially explicit, individual‐based, genetic simulations to examine the effects of the following on the performance of Mantel‐based methods: (1) landscape configuration, (2) spatial genetic nonequilibrium, (3) nonlinear relationships between genetic and cost distances, and (4) correlation among cost distances derived from competing resistance models. Under most conditions, Mantel‐based methods performed poorly. Causal modeling identified the true model only 22% of the time. Using relative support and simple Mantel r values boosted performance to approximately 50%. Across all methods, performance increased when landscapes were more fragmented, spatial genetic equilibrium was reached, and the relationship between cost distance and genetic distance was linearized. Performance depended on cost distance correlations among resistance models rather than cell‐wise resistance correlations. Given these results, we suggest that the use of Mantel tests with linearized relationships is appropriate for discriminating among resistance models that have cost distance correlations <0.85 with each other for causal modeling, or <0.95 for relative support or simple Mantel r. Because most alternative parameterizations of resistance for the same landscape variable will result in highly correlated cost distances, the use of Mantel test‐based methods to fine‐tune resistance values will often not be effective. 相似文献

20.

Using average nucleotide identity (ANI) to evaluate microsporidia species boundaries based on their genetic relatedness

Nathalia R. M. de Albuquerque Karen L. Haag 《The Journal of eukaryotic microbiology》2023,70(2):e12944

Microsporidia are obligatory intracellular parasites related to fungi and since their discovery their classification and origin has been controversial due to their unique morphology. Early taxonomic studies of microsporidia were based on ultrastructural spore features, characteristics of their life cycle and transmission modes. However, taxonomy and phylogeny based solely on these characteristics can be misleading. SSU rRNA is a traditional marker used in taxonomical classifications, but the power of SSU rRNA to resolve phylogenetic relationships between microsporidia is considered weak at the species level, as it may not show enough variation to distinguish closely related species. Overall genome relatedness indices (OGRI), such as average nucleotide identity (ANI), allows fast and easy-to-implement comparative measurements between genomes to assess species boundaries in prokaryotes, with a 95% cutoff value for grouping genomes of the same species. Due to the increasing availability of complete genomes, metrics of genome relatedness have been applied for eukaryotic microbes taxonomy such as microsporidia. However, the distribution of ANI values and cutoff values for species delimitation have not yet been fully tested in microsporidia. In this study we examined the distribution of ANI values for 65 publicly available microsporidian genomes and tested whether the 95% cutoff value is a good estimation for circumscribing species based on their genetic relatedness. 相似文献