共查询到20条相似文献,搜索用时 15 毫秒
1.
David M. Barnard Matthew J. Germino David S. Pilliod Robert S. Arkle Cara Applestein Bill E. Davidson Matthew R. Fisk 《Restoration Ecology》2019,27(5):1053-1063
Improving predictions of restoration outcomes is increasingly important to resource managers for accountability and adaptive management, yet there is limited guidance for selecting a predictive model from the multitude available. The goal of this article was to identify an optimal predictive framework for restoration ecology using 11 modeling frameworks (including machine learning, inferential, and ensemble approaches) and three data groups (field data, geographic data [GIS], and a combination thereof). We test this approach with a dataset from a large postfire sagebrush reestablishment project in the Great Basin, U.S.A. Predictive power varied among models and data groups, ranging from 58% to 79% accuracy. Finer‐scale field data generally had the greatest predictive power, although GIS data were present in the best models overall. An ensemble prediction computed from the 10 models parameterized to field data was well above average for accuracy but was outperformed by others that prioritized model parsimony by selecting predictor variables based on rankings of their importance among all candidate models. The variation in predictive power among a suite of modeling frameworks underscores the importance of a model comparison and refinement approach that evaluates multiple models and data groups, and selects variables based on their contribution to predictive power. The enhanced understanding of factors influencing restoration outcomes accomplished by this framework has the potential to aid the adaptive management process for improving future restoration outcomes. 相似文献
2.
3.
4.
Söllner J 《Journal of molecular recognition : JMR》2006,19(3):209-214
Recently, new machine learning classifiers for the prediction of linear B-cell epitopes were presented. Here we show the application of Receiver Operator Characteristics (ROC) convex hulls to select optimal classifiers as well as possibilities to improve the post test probability (PTP) to meet real world requirements such as high throughput epitope screening of whole proteomes. The major finding is that ROC convex hulls present an easy to use way to rank classifiers based on their prediction conservativity as well as to select candidates for ensemble classifiers when validating against the antigenicity profile of 10 HIV-1 proteins. We also show that linear models are at least equally efficient to model the available data when compared to multi-layer feed-forward neural networks. 相似文献
5.
6.
microRNA(miRNA)是一类不编码蛋白的调控小分子RNA,在真核生物中发挥着广泛而重要的调控功能.由于miRNA的表达具有时空特异性,因而通过计算方法预测miRNA而后有针对性的实验验证是miRNA发现的一条重要途径.降低假阳性率是miRNA预测方法面临的重要挑战.本研究采用集成学习方法构建预测miRNA前体的分类器SVMbagging,对训练集、测试集和独立测试集的结果表明,本研究的方法性能稳健、假阳性率低,具有很好的泛化能力,尤其是当阈值取0.9时,特异性高达99.90%,敏感性在26%以上,适合于全基因组预测.采用SVMbagging在人全基因组中预测miRNA前体,当取阈值0.9时,得到14933个可能的miRNA前体.通过与高通量小RNA测序数据的比较,发现其中4481个miRNA前体具有完全匹配的小RNA序列,与理论估计的真阳性数值非常接近.最后,对32个可能的miRNA进行实验验证,确定其中2条为真实的miRNA. 相似文献
7.
The effectiveness of comparative modeling approaches for protein structure prediction can be substantially improved by incorporating predicted structural information in the initial sequence-structure alignment. Motivated by the approaches used to align protein structures, this article focuses on developing machine learning approaches for estimating the RMSD value of a pair of protein fragments. These estimated fragment-level RMSD values can be used to construct the alignment, assess the quality of an alignment, and identify high-quality alignment segments. We present algorithms to solve this fragment-level RMSD prediction problem using a supervised learning framework based on support vector regression and classification that incorporates protein profiles, predicted secondary structure, effective information encoding schemes, and novel second-order pairwise exponential kernel functions. Our comprehensive empirical study shows superior results compared with the profile-to-profile scoring schemes. We also show that for protein pairs with low sequence similarity (less than 12% sequence identity) these new local structural features alone or in conjunction with profile-based information lead to alignments that are considerably accurate than those obtained by schemes that use only profile and/or predicted secondary structure information. 相似文献
8.
In the 'omic' era, hundreds of genomes are available for protein sequence analysis, and some 30 per cent of all sequences are of membrane proteins. Unlike globular proteins, a 3D model for membrane proteins can hardly be computed starting from the sequence. Why is this so? What can we really compute and with what reliability? These and other matters are outlined. 相似文献
9.
Megan L. Smith Megan Ruffley Anahí Espíndola David C. Tank Jack Sullivan Bryan C. Carstens 《Molecular ecology》2017,26(17):4562-4573
Phylogeographic data sets have grown from tens to thousands of loci in recent years, but extant statistical methods do not take full advantage of these large data sets. For example, approximate Bayesian computation (ABC) is a commonly used method for the explicit comparison of alternate demographic histories, but it is limited by the “curse of dimensionality” and issues related to the simulation and summarization of data when applied to next‐generation sequencing (NGS) data sets. We implement here several improvements to overcome these difficulties. We use a Random Forest (RF) classifier for model selection to circumvent the curse of dimensionality and apply a binned representation of the multidimensional site frequency spectrum (mSFS) to address issues related to the simulation and summarization of large SNP data sets. We evaluate the performance of these improvements using simulation and find low overall error rates (~7%). We then apply the approach to data from Haplotrema vancouverense, a land snail endemic to the Pacific Northwest of North America. Fifteen demographic models were compared, and our results support a model of recent dispersal from coastal to inland rainforests. Our results demonstrate that binning is an effective strategy for the construction of a mSFS and imply that the statistical power of RF when applied to demographic model selection is at least comparable to traditional ABC algorithms. Importantly, by combining these strategies, large sets of models with differing numbers of populations can be evaluated. 相似文献
10.
Luís Reino Pedro Beja Miguel B. Araújo Stéphane Dray Pedro Segurado 《Diversity & distributions》2013,19(4):423-432
Aim
Although the negative effects of habitat fragmentation have been widely documented at the landscape scale, much less is known about its impacts on species distributions at the biogeographical scale. We hypothesize that fragmentation influences the large‐scale distribution of area‐ and edge‐sensitive species by limiting their occurrence in regions with fragmented habitats , despite otherwise favourable environmental conditions. We test this hypothesis by assessing the interplay of climate and landscape factors influencing the distribution of the calandra lark, a grassland specialist that is highly sensitive to habitat fragmentation.Location
Iberia Peninsula, Europe.Methods
Ecological niche modelling was used to investigate the relative influence of climate/topography, landscape fragmentation and spatial structure on calandra lark distribution. Modelling assumed explicitly a hierarchically structured effect among explanatory variables, with climate/topography operating at broader spatial scales than landscape variables. An eigenvector‐based spatial filtering approach was used to cancel bias introduced by spatial autocorrelation. The information theoretic approach was used in model selection, and variation partitioning was used to isolate the unique and shared effects of sets of explanatory variables.Results
Climate and topography were the most influential variables shaping the distribution of calandra lark, but incorporating landscape metrics contributed significantly to model improvement. The probability of calandra lark occurrence increased with total habitat area and declined with the number of patches and edge density. Variation partitioning showed a strong overlap between variation explained by climate/topography and landscape variables. After accounting for spatial structure in species distribution, the explanatory power of environmental variables remained largely unchanged.Main conclusions
We have shown here that landscape fragmentation can influence species distributions at the biogeographical scale. Incorporating fragmentation metrics into large‐scale ecological niche models may contribute for a better understanding of mechanism driving species distributions and for improving predictive modelling of range shifts associated with land use and climate changes.11.
免疫细胞浸润对癌症的诊断与预后有着重要意义。文中收集TCGA数据库已收录的非小细胞肺癌肿瘤与正常组织基因表达数据,利用CIBERSORT工具得到22种免疫细胞占比来评估免疫细胞浸润情况。以22种免疫细胞占比为特征,用机器学习方法构建了非小细胞肺癌肿瘤与正常组织的分类模型,其中随机森林方法构建的模型分类效果AUC=0.987、敏感性0.98及特异性0.84。并且用随机森林方法构建的肺腺癌和肺鳞癌肿瘤组织分类模型效果AUC=0.827、敏感性0.75及特异性0.77。用LASSO回归筛选22种免疫细胞特征,保留8种强相关特征组成的免疫细胞评分结合临床特征构建了非小细胞肺癌预后模型。经评估及验证,预后模型C-index=0.71并且3年和5年的校准曲线拟合良好,可以对预后风险度进行准确预测。本研究基于免疫细胞浸润所构建的分类模型与预后模型,旨在对非小细胞肺癌的诊断与预后研究提供新的策略。 相似文献
12.
Samad Jahandideh Lukasz Jaroszewski Adam Godzik 《Acta Crystallographica. Section D, Structural Biology》2014,70(3):627-635
Obtaining diffraction quality crystals remains one of the major bottlenecks in structural biology. The ability to predict the chances of crystallization from the amino‐acid sequence of the protein can, at least partly, address this problem by allowing a crystallographer to select homologs that are more likely to succeed and/or to modify the sequence of the target to avoid features that are detrimental to successful crystallization. In 2007, the now widely used XtalPred algorithm [Slabinski et al. (2007), Protein Sci. 16 , 2472–2482] was developed. XtalPred classifies proteins into five `crystallization classes' based on a simple statistical analysis of the physicochemical features of a protein. Here, towards the same goal, advanced machine‐learning methods are applied and, in addition, the predictive potential of additional protein features such as predicted surface ruggedness, hydrophobicity, side‐chain entropy of surface residues and amino‐acid composition of the predicted protein surface are tested. The new XtalPred‐RF (random forest) achieves significant improvement of the prediction of crystallization success over the original XtalPred. To illustrate this, XtalPred‐RF was tested by revisiting target selection from 271 Pfam families targeted by the Joint Center for Structural Genomics (JCSG) in PSI‐2, and it was estimated that the number of targets entered into the protein‐production and crystallization pipeline could have been reduced by 30% without lowering the number of families for which the first structures were solved. The prediction improvement depends on the subset of targets used as a testing set and reaches 100% (i.e. twofold) for the top class of predicted targets. 相似文献
13.
随机森林模型在分类与回归分析中的应用 总被引:25,自引:0,他引:25
随机森林(random forest)模型是由Breiman和Cutler在2001年提出的一种基于分类树的算法。它通过对大量分类树的汇总提高了模型的预测精度,是取代神经网络等传统机器学习方法的新的模型。随机森林的运算速度很快,在处理大数据时表现优异。随机森林不需要顾虑一般回归分析面临的多元共线性的问题,不用做变量选择。现有的随机森林软件包给出了所有变量的重要性。另外,随机森林便于计算变量的非线性作用,而且可以体现变量间的交互作用(interaction)。它对离群值也不敏感。本文通过3个案例,分别介绍了随机森林在昆虫种类的判别分析、有无数据的分析(取代逻辑斯蒂回归)和回归分析上的应用。案例的数据格式和R语言代码可为研究随机森林在分类与回归分析中的应用提供参考。 相似文献
14.
物种分布模型被广泛应用于评估气候变化对物种分布的影响。随着计算机和统计学的发展, 模拟物种分布的模型层出不穷, 但对这些模型的相对表现知之甚少, 因此需要对其进行对比分析, 以便更可靠地评估气候变化的影响。该文采用3个比较新颖的组合集成学习(ensemble learning)模型(随机森林(random forest, RF)、广义助推法和NeuralEnsembles)、3个常规模型(广义线性模型、广义加法模型和分类回归树)、3个大气环流模型(global circulation model, GCM) (MIROC32_medres, JP; CCCMA_CGCM3, CA; BCCR-BCM2.0, NW)和一个气体排放情景(SRES_A2), 模拟分析了马尾松(Pinus massoniana)历史基准气候(1961-1990)和未来3个不同时期(2010-2039, 2020s; 2040-2069, 2050s; 2070-2099, 2080s)的潜在分布。基于环境阈值方法选择物种不发生区, 依据ClimateChina软件进行当前和未来气候数据的降尺度处理, 采用接收机工作特征曲线(receiver operator characteristic, ROC)下的面积(area under the curve, AUC)、Kappa值和真实技巧统计法(true skill statistic, TSS)以及马尾松种子区划范围来评价模型的预测精度。结果表明: 6个物种分布模型都具有较高的预测精度, 但组合集成学习模型的预测精度稍高于其他常规模型, 其中RF的预测精度最高。3个GCM和6个模型模拟条件下, 马尾松对气候变化的响应格局既有一致性也有异同性。一致性表现在: 随着时间的推移, 马尾松分布区将逐渐向北迁移, 未来潜在分布区的面积将逐渐增加; 异同性表现在: 在不同模型和不同气候情景下, 马尾松潜在分布区的迁移距离和面积变化幅度不同, 其中NW模式下预测的变化幅度小于CA和JP模式; RF模型预测的分布区迁移距离和面积变化幅度最大。随着时间的推移, 未来马尾松的18个潜在分布空间预测图(6个模型 × 3 GCM)之间的差异也逐渐增大, 其中空间不一致性地区主要集中发生在马尾松潜在分布区的北部和西部边缘地带。模型本身不同的构建原理以及GCM之间的差异是导致预测结果存在差异的主要原因。 相似文献
15.
《中国生物化学与分子生物学报》2021,37(6):822-829
蛋白质S-亚磺酰化是可逆的蛋白质翻译后修饰(post-translational modifications, PTMs),在生物生长中发挥至关重要的作用。同时,它与一些疾病相关。因此,无论是从基础研究还是药物开发的角度,都面临着一个具有挑战性的问题:哪些是属于S-亚磺酰化位点?为了解决这个问题,本文开发了一种基于机器学习的预测方法。该系统主要步骤为:(1)将这些蛋白质组合成等长度的伪氨基酸;(2)使用下采样方法来平衡训练数据集;(3)通过集成方法建立一个综合的预测系统进行预测。最终,得到的独立测试集的准确率达到90.77%,其他各个指标对比现有方法提升效果明显,为生物信息学的发展提供了帮助。本文建立了一个友好的web服务器预测网站:http: //www.jci-bioinfo.cn/iSulf_Wide-PseAAC,通过该网站不需要复杂的计算公式即可在线预测,它将为用户提供便利和进一步研究的指南。与此同时,本文中使用到的数学方法会解决类似相关领域的诸多其他问题。 相似文献
16.
17.
Ronald C. Kessler Sherri Rose Karestan C. Koenen Elie G. Karam Paul E. Stang Dan J. Stein Steven G. Heeringa Eric D. Hill Israel Liberzon Katie A. McLaughlin Samuel A. McLean Beth E. Pennell Maria Petukhova Anthony J. Rosellini Ayelet M. Ruscio Victoria Shahly Arieh Y. Shalev Derrick Silove Alan M. Zaslavsky Matthias C. Angermeyer Evelyn J. Bromet José Miguel Caldas de Almeida Giovanni de Girolamo Peter de Jonge Koen Demyttenaere Silvia E. Florescu Oye Gureje Josep Maria Haro Hristo Hinkov Norito Kawakami Viviane Kovess‐Masfety Sing Lee Maria Elena Medina‐Mora Samuel D. Murphy Fernando Navarro‐Mateu Marina Piazza Jose Posada‐Villa Kate Scott Yolanda Torres Maria Carmen Viana 《World psychiatry》2014,13(3):265-274
Post‐traumatic stress disorder (PTSD) should be one of the most preventable mental disorders, since many people exposed to traumatic experiences (TEs) could be targeted in first response settings in the immediate aftermath of exposure for preventive intervention. However, these interventions are costly and the proportion of TE‐exposed people who develop PTSD is small. To be cost‐effective, risk prediction rules are needed to target high‐risk people in the immediate aftermath of a TE. Although a number of studies have been carried out to examine prospective predictors of PTSD among people recently exposed to TEs, most were either small or focused on a narrow sample, making it unclear how well PTSD can be predicted in the total population of people exposed to TEs. The current report investigates this issue in a large sample based on the World Health Organization (WHO)'s World Mental Health Surveys. Retrospective reports were obtained on the predictors of PTSD associated with 47,466 TE exposures in representative community surveys carried out in 24 countries. Machine learning methods (random forests, penalized regression, super learner) were used to develop a model predicting PTSD from information about TE type, socio‐demographics, and prior histories of cumulative TE exposure and DSM‐IV disorders. DSM‐IV PTSD prevalence was 4.0% across the 47,466 TE exposures. 95.6% of these PTSD cases were associated with the 10.0% of exposures (i.e., 4,747) classified by machine learning algorithm as having highest predicted PTSD risk. The 47,466 exposures were divided into 20 ventiles (20 groups of equal size) ranked by predicted PTSD risk. PTSD occurred after 56.3% of the TEs in the highest‐risk ventile, 20.0% of the TEs in the second highest ventile, and 0.0‐1.3% of the TEs in the 18 remaining ventiles. These patterns of differential risk were quite stable across demographic‐geographic sub‐samples. These results demonstrate that a sensitive risk algorithm can be created using data collected in the immediate aftermath of TE exposure to target people at highest risk of PTSD. However, validation of the algorithm is needed in prospective samples, and additional work is warranted to refine the algorithm both in terms of determining a minimum required predictor set and developing a practical administration and scoring protocol that can be used in routine clinical practice. 相似文献
18.
Maurizio FIASCH Maria CUZZOLA Giuseppe IRRERA Pasquale IACOPINO Francesco Carlo MORABITO 《生物学前沿》2011,6(4):263-273
Acute graft-versus-host disease (aGVHD) is a serious systemic complication of allogeneic hematopoietic stem cell transplantation
(HSCT) causing considerable morbidity and mortality. Acute GVHD occurs when alloreactive donor-derived T cells recognize host-recipient
antigens as foreign. These trigger a complex multiphase process that ultimately results in apoptotic injury in target organs.
The early events leading to GVHD seem to occur very soon, presumably within hours from the graft infusion. Therefore, when
the first signs of aGVHD clinically manifest, the disease has been ongoing for several days at the cellular level, and the
inflammatory cytokine cascade is fully activated. So, it comes as no surprise that progress in treatment based on clinical
diagnosis of aGVHD has been limited in the past 30 years. It is likely that a pre-emptive strategy using systemic high-dose
corticosteroids as early as possible could improve the outcome of aGVHD. Due to the deleterious effects of such treatment
particularly in terms of infection risk posed by systemic steroid administration in a population that is already immune-suppressed,
it is critical to identify biomarker signatures for approaching this very complex task. Some research groups have begun addressing
this issue through molecular and proteomic analyses, combining these approaches with computational intelligence techniques,
with the specific aim of facilitating the identification of diagnostic biomarkers in aGVHD. In this review, we focus on the
aGVHD scenario and on the more recent state-of-the-art. We also attempt to give an overview of the classical and novel techniques
proposed as medical decision support system for the diagnosis of GVHD. 相似文献
19.
赖氨酸琥珀酰化是一种新型的翻译后修饰,在蛋白质调节和细胞功能控制中发挥重要作用,所以准确识别蛋白质中的琥珀酰化位点是有必要的。传统的实验耗费物力和财力。通过计算方法预测是近段时间以来提出的一种高效的预测方法。本研究中,我们开发了一种新的预测方法iSucc-PseAAC,它是通过使用多种分类算法结合不同的特征提取方法。最终发现,基于耦合序列(PseAAC)特征提取下,使用支持向量机分类效果是最好的,并结合集成学习解决了数据不平衡问题。与现有方法预测效果对比,iSucc-PseAAC在区分赖氨酸琥珀酰化位点方面,更具有意义和实用性。 相似文献
20.
Buckley–James (BJ) model is a typical semiparametric accelerated failure time model, which is closely related to the ordinary least squares method and easy to be constructed. However, traditional BJ model built on linearity assumption only captures simple linear relationships, while it has difficulty in processing nonlinear problems. To overcome this difficulty, in this paper, we develop a novel regression model for right-censored survival data within the learning framework of BJ model, basing on random survival forests (RSF), extreme learning machine (ELM), and L2 boosting algorithm. The proposed method, referred to as ELM-based BJ boosting model, employs RSF for covariates imputation first, then develops a new ensemble of ELMs—ELM-based boosting algorithm for regression by ensemble scheme of L2 boosting, and finally, uses the output function of the proposed ELM-based boosting model to replace the linear combination of covariates in BJ model. Due to fitting the logarithm of survival time with covariates by the nonparametric ELM-based boosting method instead of the least square method, the ELM-based BJ boosting model can capture both linear covariate effects and nonlinear covariate effects. In both simulation studies and real data applications, in terms of concordance index and integrated Brier sore, the proposed ELM-based BJ boosting model can outperform traditional BJ model, two kinds of BJ boosting models proposed by Wang et al., RSF, and Cox proportional hazards model. 相似文献