首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Tree-based methods are popular nonparametric tools in studying time-to-event outcomes. In this article, we introduce a novel framework for survival trees and ensembles, where the trees partition the dynamic survivor population and can handle time-dependent covariates. Using the idea of randomized tests, we develop generalized time-dependent receiver operating characteristic (ROC) curves for evaluating the performance of survival trees. The tree-building algorithm is guided by decision-theoretic criteria based on ROC, targeting specifically for prediction accuracy. To address the instability issue of a single tree, we propose a novel ensemble procedure based on averaging martingale estimating equations, which is different from existing methods that average the predicted survival or cumulative hazard functions from individual trees. Extensive simulation studies are conducted to examine the performance of the proposed methods. We apply the methods to a study on AIDS for illustration.  相似文献   

2.
3.
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble‐based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30‐day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in‐sample and out‐of‐sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short‐term mortality in population‐based samples of subjects with cardiovascular disease.  相似文献   

4.
Wet tropical forests are among the most diverse ecosystems on Earth and can host several hundreds of tree species per hectare. To maintain such diversity, the community must contain large numbers of relatively rare species rather than be dominated by a few very common trees, as is often the case in temperate forests. Explaining the mechanisms preventing dominance by common species has been a major task of tropical forest ecology. One of the most promising mechanisms is negative density dependence (NDD) of tree abundance driven by pests, including fungal diseases (‘pest pressure’). NDD entails that the chance of survival of a sapling increases with the distance from a mature tree of the same species, thus preventing species from becoming locally dominant. Curiously, the strength of NDD is negatively correlated with abundance, meaning that tree species that are more common generally show weaker NDD (Comita et al. 2010 ). Interactions between plants and soil pathogens have been shown to play an important role in NDD (Klironomos 2002 ), and rare species are apparently more strongly affected (Mangan et al. 2010 ). However, the genetic mechanisms underlying this phenomenon have remained obscure. In this issue of Molecular Ecology, Marden et al. ( 2017 ) suggest that reduced diversity of the genes involved in pathogen recognition (Resistance genes or R genes) could explain why NDD is stronger in locally rare species.  相似文献   

5.
Despite potential interactive effects of plant species and genotypic diversity (SD and GD, respectively) on consumers, studies have usually examined these effects separately. We evaluated the individual and combined effects of tree SD and mahogany (Swietenia macrophylla) GD on the arthropod community associated with mahogany. We conducted this study within the context of a tree diversity experiment consisting of 74 plots with 64 saplings/plot. We sampled 24 of these plots, classified as monocultures of mahogany or polycultures of four species (including mahogany). Within each plot type, mahogany was represented by either one or four maternal families. We surveyed arthropods on mahogany and estimated total arthropod abundance and species richness, as well as abundance and richness separately for herbivorous and predatory arthropods. Overall tree SD and mahogany GD had positive effects on total arthropod species richness and abundance on mahogany, and also exerted interactive effects on total species richness (but not abundance). Analyses conducted by trophic level group showed contrasting patterns; SD positively influenced herbivore species richness but not abundance, and did not affect either predator richness or abundance. GD influenced predator species richness but not abundance, and did not influence herbivore abundance or richness. There were interactive effects of GD and SD only for predator species richness. These results provide evidence that intra‐ and inter‐specific plant diversity exert interactive controls on associated consumer communities, and that the relative importance of SD and GD may vary among higher trophic levels, presumably due to differences in the underlying mechanisms or consumer traits.  相似文献   

6.
MOTIVATION: Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. RESULTS: Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors.  相似文献   

7.
8.

Background

microRNAs (miRNAs) are short regulatory RNAs that are involved in several diseases, including cancers. Identifying miRNA functions is very important in understanding disease mechanisms and determining the efficacy of drugs. An increasing number of computational methods have been developed to explore miRNA functions by inferring the miRNA-mRNA regulatory relationships from data. Each of the methods is developed based on some assumptions and constraints, for instance, assuming linear relationships between variables. For such reasons, computational methods are often subject to the problem of inconsistent performance across different datasets. On the other hand, ensemble methods integrate the results from individual methods and have been proved to outperform each of their individual component methods in theory.

Results

In this paper, we investigate the performance of some ensemble methods over the commonly used miRNA target prediction methods. We apply eight different popular miRNA target prediction methods to three cancer datasets, and compare their performance with the ensemble methods which integrate the results from each combination of the individual methods. The validation results using experimentally confirmed databases show that the results of the ensemble methods complement those obtained by the individual methods and the ensemble methods perform better than the individual methods across different datasets. The ensemble method, Pearson+IDA+Lasso, which combines methods in different approaches, including a correlation method, a causal inference method, and a regression method, is the best performed ensemble method in this study. Further analysis of the results of this ensemble method shows that the ensemble method can obtain more targets which could not be found by any of the single methods, and the discovered targets are more statistically significant and functionally enriched. The source codes, datasets, miRNA target predictions by all methods, and the ground truth for validation are available in the Supplementary materials.  相似文献   

9.
Compelling evidence has implicated the Wnt signaling pathway in the pathogenesis of colorectal cancer. We assessed the use of tag single nucleotide polymorphisms (tSNPs) in adenomatous polyposis coli (APC)/β-catenin (CTNNB1) genes to predict outcomes in patients with colorectal cancer. We selected and genotyped 10 tSNP to predict common variants across entire APC and CTNNB1 genes in 282 colorectal cancer patients. The associations of these tSNPs with distant metastasis-free survival and overall survival were evaluated by Kaplan-Meier analysis, Cox regression model, and survival tree analysis. The 5-year overall survival rate was 68.3%. Survival tree analysis identified a higher-order genetic interaction profile consisting of the APC rs565453, CTNNB1 2293303, and APC rs1816769 that was significantly associated with overall survival. The 5-year survival overall rates were 89.2%, 66.1%, and 58.8% for the low-, medium-, and high-risk genetic profiles, respectively (log-rank P = 0.001). After adjusting for possible confounders, including age, gender, carcinoembryonic antigen levels, tumor differentiation, stage, lymphovascular invasion, perineural invasion, and lymph node involvement, the genetic interaction profile remained significant. None of the studied SNPs were individually associated with distant metastasis-free survival and overall survival. Our results suggest that the genetic interaction profile among Wnt pathway SNPs might potentially increase the prognostic value in outcome prediction for colorectal cancer.  相似文献   

10.
Several studies have demonstrated an association between high tumor tissue levels of total tissue inhibitor of metalloproteinases-1 (TIMP-1) and a poor prognosis of primary breast cancer patients. In the present study we investigated whether measurements of the uncomplexed fraction of TIMP-1 added prognostic information to that already obtained from total TIMP-1. We measured the uncomplexed fraction of TIMP-1, using a thoroughly validated ELISA specific for this fraction, in 341 tumor tissue extracts obtained from patients with primary breast cancer. These measurements were related to previously performed measurements of total TIMP-1 as well as to patient outcome. The observation time was 8.3 years (range, 7.3-11.3 years). During this period 136 patients died, and 153 patients experienced recurrence of disease. Cox regression analysis of recurrence-free survival (RFS) suggested that a score based on both uncomplexed and total TIMP-1, reflecting the tumor level of TIMP-1/MMP complexes, would be a more precise estimate of prognosis than total TIMP-1 alone. Univariate survival analysis showed a highly significant relationship between high values of the score and poor outcomes for RFS (p = 0.0002; hazard ratio = 2.7; 95% confidence interval, 1.5-4.8). Similar results were found for overall survival (p = 0.0001; hazard ratio = 3.3; 95% confidence interval, 1.8-6.3). Multivariate analysis of RFS and overall survival demonstrated that the score was significant including the classical prognostic factors used in breast cancer (p < 0.0001). The present study raises the hypothesis that it is the tumor level of TIMP-1/MMP complexes (i.e. activated matrix metalloproteinases) rather than TIMP-1 itself that determines prognosis, supporting the use of the combined score and not only total TIMP-1 in stratification of breast cancer patients.  相似文献   

11.
Large-scale association studies are being undertaken with the hope of uncovering the genetic determinants of complex disease. We describe a computationally efficient method for inferring genealogies from population genotype data and show how these genealogies can be used to fine map disease loci and interpret association signals. These genealogies take the form of the ancestral recombination graph (ARG). The ARG defines a genealogical tree for each locus, and, as one moves along the chromosome, the topologies of consecutive trees shift according to the impact of historical recombination events. There are two stages to our analysis. First, we infer plausible ARGs, using a heuristic algorithm, which can handle unphased and missing data and is fast enough to be applied to large-scale studies. Second, we test the genealogical tree at each locus for a clustering of the disease cases beneath a branch, suggesting that a causative mutation occurred on that branch. Since the true ARG is unknown, we average this analysis over an ensemble of inferred ARGs. We have characterized the performance of our method across a wide range of simulated disease models. Compared with simpler tests, our method gives increased accuracy in positioning untyped causative loci and can also be used to estimate the frequencies of untyped causative alleles. We have applied our method to Ueda et al.'s association study of CTLA4 and Graves disease, showing how it can be used to dissect the association signal, giving potentially interesting results of allelic heterogeneity and interaction. Similar approaches analyzing an ensemble of ARGs inferred using our method may be applicable to many other problems of inference from population genotype data.  相似文献   

12.
13.
A kinetic approach to the prediction of RNA secondary structures   总被引:3,自引:0,他引:3  
A new approach to the prediction of secondary RNA structures based on the analysis of the kinetics of molecular self-organisation is proposed herein. The Markov process is used to describe structural reconstructions during secondary structure formation. This process is modelled by a Monte-Carlo method. Examples of the calculation by this method of the secondary structures kinetic ensemble are given. Distribution of time-dependent probabilities within the ensembles is obtained. An effective method for search for the equilibrium ensemble is also suggested. This method is based on the construction of a tree of all possible secondary structures of RNA. By ascribing a probability for each structure (according to its free energy) the Boltzmann equilibrium ensemble can be obtained.  相似文献   

14.
15.
Many important biological processes (e.g. cellular differentiation during development, aging, disease etiology etc.) are very unlikely controlled by a single gene instead by the underlying complex regulatory interactions between thousands of genes within …  相似文献   

16.
《Genomics》2019,111(5):1115-1123
Gene-environment (G-E) interactions have important implications for the etiology and progression of many complex diseases. Compared to continuous markers and categorical disease status, prognosis has been less investigated, with the additional challenges brought by the unique characteristics of survival outcomes. Most of the existing G-E interaction approaches for prognosis data share the limitation that they cannot accommodate long-tailed or contaminated outcomes. In this study, for prognosis data, we develop a robust G-E interaction identification approach using the censored quantile partial correlation (CQPCorr) technique. The proposed approach is built on the quantile regression technique (and hence has a solid statistical basis), uses weights to easily accommodate censoring, and adopts partial correlation to identify important interactions while properly controlling for the main genetic and environmental effects. In simulation, it outperforms multiple competitors with more accurate identification. In the analysis of TCGA data on lung cancer and melanoma, biologically sensible findings different from using the alternatives are made.  相似文献   

17.
18.
The adaptive potential of the northernmost Pinus sylvestris L. (and other northern tree) populations is considered by examining first the current patterns of quantitative genetic adaptive traits, which show high population differentiation and clines. We then consider the postglacial history of the populations using both paleobiological and genetic data. The current patterns of diversity at nuclear genes suggest that the traces of admixture are mostly visible in mitochondrial DNA variation patterns. There is little evidence of increased diversity due to admixture between an eastern and western colonization lineage, but no signal of reduced diversity (due to sequential bottlenecks) either. Quantitative trait variation in the north is not associated with the colonizing lineages. The current clines arose rapidly and may be based on standing genetic variation. The initial phenotypic response of Scots pine in the north is predicted to be increased survival and growth. The genetic responses are examined based on quantitative genetic predictions of sustained selection response and compared with earlier simulation results that have aimed at more ecological realism. The phenotypic responses of increased growth and survival reduce the opportunity for selection and delay the evolutionary responses. The lengthening of the thermal growing period also causes selection on the critical photoperiod in the different populations. Future studies should aim at including multiple ecological and genetic factors in evaluating potential responses.  相似文献   

19.
MOTIVATION: In cancer research, prediction of time to death or relapse is important for a meaningful tumor classification and selecting appropriate therapies. Survival prognosis is typically based on clinical and histological parameters. There is increasing interest in identifying genetic markers that better capture the status of a tumor in order to improve on existing predictions. The accumulation of genetic alterations during tumor progression can be used for the assessment of the genetic status of the tumor. For modeling dependences between the genetic events, evolutionary tree models have been applied. RESULTS: Mixture models of oncogenetic trees provide a probabilistic framework for the estimation of typical pathogenetic routes. From these models we derive a genetic progression score (GPS) that estimates the genetic status of a tumor. GPS is calculated for glioblastoma patients from loss of heterozygosity measurements and for prostate cancer patients from comparative genomic hybridization measurements. Cox proportional hazard models are then fitted to observed survival times of glioblastoma patients and to times until PSA relapse following radical prostatectomy of prostate cancer patients. It turns out that the genetically defined GPS is predictive even after adjustment for classical clinical markers and thus can be considered a medically relevant prognostic factor. AVAILABILITY: Mtreemix, a software package for estimating tree mixture models, is freely available for non-commercial users at http://mtreemix.bioinf.mpi-sb.mpg.de. The raw cancer datasets and R code for the analysis with Cox models are available upon request from the corresponding author.  相似文献   

20.
Edenhamn P  Höggren M  Carlson A 《Hereditas》2000,133(2):115-122
Genetic diversity is expected to decrease in small and isolated populations as a consequence of founder effects, bottlenecks, inbreeding and genetic drift. In this study we analyse temporal and spatial effects on genetic variation and progeny viability of the European tree frog (Hyla arborea) at two scales. First, the Swedish distribution has been isolated from the continental distribution for more than 8000 thousand years, and secondly, within Sweden, recent habitat alterations that have taken place during this century have increased isolation between local populations. Genetic variation and progeny survival in relation to isolation was studied within the entire Swedish distribution of the tree frog. Allozyme electrophoresis analysis of froglets, sampled across the Swedish distribution, revealed a low overall genetic variation (1.06 alleles/locus) at the protein level in comparison with continental populations (1.54-1.68 alleles/locus). However, egg hatchability (97%) and early larval survival (95%) were not lower than in other parts of the tree frog distribution or in other anuran species. Within the Swedish distribution, early larval survival was lower in isolated breeding ponds than in more central ones. However, no differences in genetic variation were found in relation to isolation. Polymorphism was detected only at a single locus, and was restricted geographically to the eastern part of the Swedish distribution. Bottlenecks due to climatic changes and fragmentation of suitable habitat (primarily natural pastures with ponds) are suggested as possible causes of the low genetic diversity of the Swedish tree frog population.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号