首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于机器学习的肠道菌群数据建模与分析研究综述   总被引:1,自引:0,他引:1  
人体肠道菌群与人类的健康和疾病存在密切关系,对肠道菌群的宏基因组数据进行建模和分析,在疾病预测及诊断相关领域科学研究和社会应用方面均具有重要意义。本文从大数据分析和机器学习的角度,对人体肠道菌群数据的建模、分析和预测算法的原理、过程以及典型研究应用实例进行综述,以期推动肠道菌群分析相关研究发展以及探索结合机器学习算法进行肠道菌群分析的有效方式,同时也为开发基于肠道菌群数据的新型诊疗手段提供借鉴,推动我国精准医疗事业发展。  相似文献   

2.
3.
Aim Predicting species distribution is of fundamental importance for ecology and conservation. However, distribution models are usually established for only one region and it is unknown whether they can be transferred to other geographical regions. We studied the distribution of six amphibian species in five regions to address the question of whether the effect of landscape variables varied among regions. We analysed the effect of 10 variables extracted in six concentric buffers (from 100 m to 3 km) describing landscape composition around breeding ponds at different spatial scales. We used data on the occurrence of amphibian species in a total of 655 breeding ponds. We accounted for proximity to neighbouring populations by including a connectivity index to our models. We used logistic regression and information‐theoretic model selection to evaluate candidate models for each species. Location Switzerland. Results The explained deviance of each species’ best models varied between 5% and 32%. Models that included interactions between a region and a landscape variable were always included in the most parsimonious models. For all species, models including region‐by‐landscape interactions had similar support (Akaike weights) as models that did not include interaction terms. The spatial scale at which landscape variables affected species distribution varied from 100 m to 1000 m, which was in agreement with several recent studies suggesting that land use far away from the ponds can affect pond occupancy. Main conclusions Different species are affected by different landscape variables at different spatial scales and these effects may vary geographically, resulting in a generally low transferability of distribution models across regions. We also found that connectivity seems generally more important than landscape variables. This suggests that metapopulation processes may play a more important role in species distribution than habitat characteristics.  相似文献   

4.
本研究评估了西藏唐古拉山以北地区(唐北地区)湖泊动态并预测了湖泊空间格局变化.使用面向对象分类和光谱角向量变化检测方法生成了2000-2015年西藏唐北地区每5年一期的生态系统分布数据.以此为基础,分析了湖泊与其他生态系统之间的转换和空间格局特征,评估了湖泊空间格局的动态及其与相关自然地理因素的关系.通过增强回归树识别了不同因素对湖泊动态的贡献,使用GEOMOD模型预测了湖泊到2030年的空间变化.结果表明:唐北地区在2000-2015年间湖泊增加了14.2%,是唐北地区生态系统变化的主要形式之一.区域内15个面积大于10 km2的湖泊有10个增加,另有5个减少,且缩减量较低.通过空间格局分析发现,唐北地区湖泊斑块表现为面积和数量同时增加,大斑块面积比重略有上升.扩张幅度高的湖泊多分布于海拔高、坡度大、温度低、降水少、距离冰川近的区域.位于现有湖泊周边、温度低、降水少、坡度小的区域转变为湖泊的几率较高.根据过去15年的趋势,到2030年,唐北地区湖泊将继续增加119 km2,主要变化形式从大湖扩张转变为小型水面扩张.  相似文献   

5.
Due to socioeconomic differences, the accuracy and extent of reporting on the occurrence of native species differs among countries, which can impact the performance of species distribution models. We assessed the importance of geographical biases in occurrence data on model performance using Hydrilla verticillata as a case study. We used Maxent to predict potential North American distribution of the aquatic invasive macrophyte based upon training data from its native range. We produced a model using all available native range occurrence data, then explored the change in model performance produced by omitting subsets of training data based on political boundaries. We also compared those results with models trained on data from which a random sample of occurrence data was omitted from across the native range. Although most models accurately predicted the occurrence of H. verticillata in North America (AUC > 0.7600), data omissions influenced model predictions. Omitting data based on political boundaries resulted in larger shifts in model accuracy than omitting randomly selected occurrence data. For well‐documented species like H. verticillata, missing records from single countries or ecoregions may minimally influence model predictions, but for species with fewer documented occurrences or poorly understood ranges, geographic biases could misguide predictions. Regardless of focal species, we recommend that future species distribution modeling efforts begin with a reflection on potential spatial biases of available occurrence data. Improved biodiversity surveillance and reporting will provide benefit not only in invaded ranges but also within under‐reported and unexplored native ranges.  相似文献   

6.
任慧敏  王圣云 《生态学报》2023,43(12):4858-4867
实现低碳化的人类福祉提升是可持续发展的根本要求,人类福祉提升的碳强度指标是衡量可持续发展程度的新指标。论文使用了人类福祉碳强度指标,应用探索性空间数据分析法分析了全球人类福祉碳强度的时空演进格局,运用空间杜宾模型揭示了全球人类福祉碳强度的影响因素及其空间溢出效应。研究发现:(1)1980—2016年全球人类福祉碳强度明显降低,但在北美洲、欧洲和大洋洲国家与其他区域国家(地区)之间存在显著差距。(2)经济增长、能源消费结构、工业化、资本积累、死亡率等因素提高了全球人类福祉碳强度,贸易依存度则是不断降低全球人类福祉碳强度的主要驱动力。(3)影响全球人类福祉碳强度差异的因素呈现区域异质性,能源消费结构、贸易依存度、工业化、资本积累、死亡率等因素对全球人类福祉碳强度的影响,因其经济发展水平和阶段不同而体现出明显的“南北差异”和不同的空间溢出效应。(4)周边国家的能源消费结构、工业化、资本积累和死亡率等因素会增加本国人类福祉碳强度,但本国人类福祉碳强度也会随着周边国家的经济增长和贸易依存度而降低。  相似文献   

7.
8.
中国东部暖温带刺槐花期空间格局的模拟与预测   总被引:1,自引:0,他引:1  
徐琳  陈效逑  杜星 《生态学报》2013,33(12):3584-3593
模拟刺槐开花日期与气温之间的空间关系,对于揭示蜜源植物物候空间格局形成的生态机制和掌握养蜂生产的时宜,具有重要的科学意义.利用中国东部暖温带26个站点1986-2005年的刺槐开花始期、盛期和末期数据,建立了基于日均温的多年平均和逐年物候空间模型,模拟多年平均和逐年刺槐开花日期的空间格局,并对模型进行了空间外推检验.进而,将1986-2005年8 km×8 km分辨率的日均温格点数据代人多年平均和逐年物候空间模型,得到连续地理空间多年平均和逐年刺槐开花日期的空间格局,并尝试设计了研究区内转地放蜂的适宜路线.结果表明:中国东部暖温带1986-2005年多年平均及逐年最佳期间日均温的空间格局分别控制着多年平均和逐年刺槐开花日期的空间格局.各地多年平均刺槐开花日期的空间序列与最佳期间日均温的空间序列呈显著负相关(P<0.001),多年平均气温—物候空间模型对刺槐开花始期、盛期和末期的方差解释量分别为87%、86%和77%,模拟的均方根误差(RMSE)分别为2.5、2.7d和4.1d.同样,各地逐年刺槐开花日期的空间序列与最佳期间日均温的空间序列均呈显著负相关(P<0.05),逐年气温-物候空间模型对刺槐开花始期、盛期和末期的方差解释量分别介于44%-94%、57%-92%和39%-84%之间,模拟的平均RMSE分别为3.9、4.0d和5.4d.预测得到的连续地理空间多年平均刺槐开花日期呈现出自南向北、从平原向丘陵和山地逐渐推迟的空间演进特征.据此,中国东部暖温带地区转地放蜂可以沿西线、中线和东线进行,放蜂的大致持续时间可达40-50 d.此外,预测得到的连续地理空间1986-2005年期间刺槐开花始期、盛期和末期的线性趋势以提前为主,呈显著提前的面积分别占总面积的78%、26%和32%.  相似文献   

9.
10.
11.
The application of species distribution models (SDMs) to areas outside of where a model was created allows informed decisions across large spatial scales, yet transferability remains a challenge in ecological modeling. We examined how regional variation in animal‐environment relationships influenced model transferability for Canada lynx (Lynx canadensis), with an additional conservation aim of modeling lynx habitat across the northwestern United States. Simultaneously, we explored the effect of sample size from GPS data on SDM model performance and transferability. We used data from three geographically distinct Canada lynx populations in Washington (n = 17 individuals), Montana (n = 66), and Wyoming (n = 10) from 1996 to 2015. We assessed regional variation in lynx‐environment relationships between these three populations using principal components analysis (PCA). We used ensemble modeling to develop SDMs for each population and all populations combined and assessed model prediction and transferability for each model scenario using withheld data and an extensive independent dataset (n = 650). Finally, we examined GPS data efficiency by testing models created with sample sizes of 5%–100% of the original datasets. PCA results indicated some differences in environmental characteristics between populations; models created from individual populations showed differential transferability based on the populations'' similarity in PCA space. Despite population differences, a single model created from all populations performed as well, or better, than each individual population. Model performance was mostly insensitive to GPS sample size, with a plateau in predictive ability reached at ~30% of the total GPS dataset when initial sample size was large. Based on these results, we generated well‐validated spatial predictions of Canada lynx distribution across a large portion of the species'' southern range, with precipitation and temperature the primary environmental predictors in the model. We also demonstrated substantial redundancy in our large GPS dataset, with predictive performance insensitive to sample sizes above 30% of the original.  相似文献   

12.
Clinical prediction models play a key role in risk stratification, therapy assignment and many other fields of medical decision making. Before they can enter clinical practice, their usefulness has to be demonstrated using systematic validation. Methods to assess their predictive performance have been proposed for continuous, binary, and time-to-event outcomes, but the literature on validation methods for discrete time-to-event models with competing risks is sparse. The present paper tries to fill this gap and proposes new methodology to quantify discrimination, calibration, and prediction error (PE) for discrete time-to-event outcomes in the presence of competing risks. In our case study, the goal was to predict the risk of ventilator-associated pneumonia (VAP) attributed to Pseudomonas aeruginosa in intensive care units (ICUs). Competing events are extubation, death, and VAP due to other bacteria. The aim of this application is to validate complex prediction models developed in previous work on more recently available validation data.  相似文献   

13.
Aim Variation partitioning based on canonical analysis is the most commonly used analysis to investigate community patterns according to environmental and spatial predictors. Ecologists use this method in order to understand the pure contribution of the environment independent of space, and vice versa, as well as to control for inflated type I error in assessing the environmental component under spatial autocorrelation. Our goal is to use numerical simulations to compare how different spatial predictors and model selection procedures perform in assessing the importance of the spatial component and in controlling for type I error while testing environmental predictors. Innovation We determine for the first time how the ability of commonly used (polynomial regressors) and novel methods based on eigenvector maps compare in the realm of spatial variation partitioning. We introduce a novel forward selection procedure to select spatial regressors for community analysis. Finally, we point out a number of issues that have not been previously considered about the joint explained variation between environment and space, which should be taken into account when reporting and testing the unique contributions of environment and space in patterning ecological communities. Main conclusions In tests of species‐environment relationships, spatial autocorrelation is known to inflate the level of type I error and make the tests of significance invalid. First, one must determine if the spatial component is significant using all spatial predictors (Moran's eigenvector maps). If it is, consider a model selection for the set of spatial predictors (an individual‐species forward selection procedure is to be preferred) and use the environmental and selected spatial predictors in a partial regression or partial canonical analysis scheme. This is an effective way of controlling for type I error in such tests. Polynomial regressors do not provide tests with a correct level of type I error.  相似文献   

14.
Aim The transferability of species distribution models requires that species show climatic equilibrium throughout their entire distribution area. We test this assumption for the case of the spotted hyena, Crocuta crocuta, a large carnivore that has shifted its distribution over the last 100,000 years from a widespread Eurasian and African range to its current geographical distribution, restricted to the Sub‐Saharan areas of the African continent. Location Western Eurasia and Africa. Methods The current realized distribution of C. crocuta was estimated using presences and reliable absences as well as climatic, land‐cover and anthropic variables as predictors. The potential distribution was estimated using presences and a set of pseudo‐absences selected from localities outside climatically suitable localities, with only climatic variables serving as predictors. The current potential distribution was transferred to the Last Interglacial period (126,000 yr bp ) using the palaeoclimatic data yielded by the GENESIS 2 general circulation model, and validated with European fossil data. Generalized linear models were used on all occasions. Results Climatic variables are able to predict the current distribution of the species with high accuracy. The geographical projection of this model indicates that the species is distributed over almost all of its potential suitable area, which allows us to suppose that the current distribution of this species is in climatic equilibrium. However, the time transference of model predictions for the western Eurasian region reveals almost no suitable conditions for hyenas, despite the widespread presence of C. crocuta fossil remains on this continent during the Last Interglacial period. Main conclusions Our results indicate that, even when model results suggest a climatic equilibrium for a species distribution, the time transferability of such models does not necessarily provide realistic results. This occurs because the current geographical range does not allow estimations of all of the environmental requirements of a species. Therefore, any model trained with current data risks underestimating the potential suitable environmental and geographical range for species in a new area or time period.  相似文献   

15.
A combined transmembrane topology and signal peptide prediction method   总被引:31,自引:0,他引:31  
An inherent problem in transmembrane protein topology prediction and signal peptide prediction is the high similarity between the hydrophobic regions of a transmembrane helix and that of a signal peptide, leading to cross-reaction between the two types of predictions. To improve predictions further, it is therefore important to make a predictor that aims to discriminate between the two classes. In addition, topology information can be gained when successfully predicting a signal peptide leading a transmembrane protein since it dictates that the N terminus of the mature protein must be on the non-cytoplasmic side of the membrane. Here, we present Phobius, a combined transmembrane protein topology and signal peptide predictor. The predictor is based on a hidden Markov model (HMM) that models the different sequence regions of a signal peptide and the different regions of a transmembrane protein in a series of interconnected states. Training was done on a newly assembled and curated dataset. Compared to TMHMM and SignalP, errors coming from cross-prediction between transmembrane segments and signal peptides were reduced substantially by Phobius. False classifications of signal peptides were reduced from 26.1% to 3.9% and false classifications of transmembrane helices were reduced from 19.0% to 7.7%. Phobius was applied to the proteomes of Homo sapiens and Escherichia coli. Here we also noted a drastic reduction of false classifications compared to TMHMM/SignalP, suggesting that Phobius is well suited for whole-genome annotation of signal peptides and transmembrane regions. The method is available at as well as at  相似文献   

16.
  1. Download : Download high-res image (147KB)
  2. Download : Download full-size image
  相似文献   

17.
植被类型制约着土壤结构和元素的异质化过程,致使土壤养分空间分布存在差异性.本文研究了典型喀斯特小流域不同植被类型间土壤养分(全氮TN、全磷TP、全钾TK、有机质SOM)含量分布的差异性,分析比较了普通克里金、回归模型、基于植被类型的回归模型对土壤养分预测的精度.结果表明: TN、TK、SOM与植被类型显著相关(P<0.05),TP与植被类型无显著相关(P=0.390),且TN和SOM在灌木林与耕地之间的差异性显著,TK在乔木林与灌草丛、灌木林与耕地、灌草丛与耕地间的含量差异皆显著;非连续的典型喀斯特小流域地形因子空间异质性较高,基于各样点间真实地形因子的多元线性回归预测模型精度优于基于已知点和预测点位置信息的普通克里金预测方法,且基于植被类型的回归预测模型提高了TN的预测精度.  相似文献   

18.
19.
A Monod kinetic model, logistic equation model, and statistical regression model were developed for a Chinese hamster ovary cell bioprocess operated under three different modes of operation (batch, bolus fed‐batch, and continuous fed‐batch) and grown on two different bioreactor scales (3 L bench‐top and 15 L pilot‐scale). The Monod kinetic model was developed for all modes of operation under study and predicted cell density, glucose glutamine, lactate, and ammonia concentrations well for the bioprocess. However, it was computationally demanding due to the large number of parameters necessary to produce a good model fit. The transferability of the Monod kinetic model structure and parameter set across bioreactor scales and modes of operation was investigated and a parameter sensitivity analysis performed. The experimentally determined parameters had the greatest influence on model performance. They changed with scale and mode of operation, but were easily calculated. The remaining parameters, which were fitted using a differential evolutionary algorithm, were not as crucial. Logistic equation and statistical regression models were investigated as alternatives to the Monod kinetic model. They were less computationally intensive to develop due to the absence of a large parameter set. However, modeling of the nutrient and metabolite concentrations proved to be troublesome due to the logistic equation model structure and the inability of both models to incorporate a feed. The complexity, computational load, and effort required for model development has to be balanced with the necessary level of model sophistication when choosing which model type to develop for a particular application. © 2012 American Institute of Chemical Engineers Biotechnol. Prog., 2013  相似文献   

20.
细胞外基质蛋白质在细胞的一系列生物过程中发挥着重要作用,它的异常调节会导致很多重大疾病。理论细胞外基质蛋白质参考数据是实现细胞外基质蛋白质高效鉴定的基础,研究者们已经基于机器学习的方法开发出一系列的细胞外基质蛋白质预测工具。文中首先阐述了基于机器学习模型构建细胞外基质蛋白质预测工具的基本流程,之后以工具为单位总结了已有细胞外基质蛋白质预测工具的研究成果,最后提出了细胞外基质蛋白质预测工具目前面临的问题和可能的优化方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号