首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract

The use of modern data science has recently emerged as a promising new path to tackling the complex challenges involved in the creation of next-generation chemistry and materials. However, despite the appeal of this potentially transformative development, the chemistry community has yet to incorporate it as a central tool in every-day work. Our research program is designed to enable and advance this emerging research approach. It is centred around the creation of a software ecosystem that brings together physics-based modelling, high-throughput in silico screening and data analytics (i.e. the use of machine learning and informatics for the validation, mining and modelling of chemical data). This cyberinfrastructure is devised to offer a comprehensive set of data science techniques and tools as well as a general-purpose scope to make it as versatile and widely applicable as possible. It also emphasises user-friendliness to make it accessible to the community at large. It thus provides the means for the large-scale exploration of chemical space and for a better understanding of the hidden mechanisms that determine the properties of complex chemical systems. Such insights can dramatically accelerate, streamline and ultimately transform the way chemical research is conducted. Aside from serving as a production-level tool, our cyberinfrastructure is also designed to facilitate and assess methodological innovation. Both the software and method development work are driven by concrete molecular design problems, which also allow us to assess the efficacy of the overall cyberinfrastructure.  相似文献   

2.
Leveraging existing presence records and geospatial datasets, species distribution modeling has been widely applied to informing species conservation and restoration efforts. Maxent is one of the most popular modeling algorithms, yet recent research has demonstrated Maxent models are vulnerable to prediction errors related to spatial sampling bias and model complexity. Despite elevated rates of biodiversity imperilment in stream ecosystems, the application of Maxent models to stream networks has lagged, as has the availability of tools to address potential sources of error and calculate model evaluation metrics when modeling in nonraster environments (such as stream networks). Herein, we use Maxent and customized R code to estimate the potential distribution of paddlefish (Polyodon spathula) at a stream‐segment level within the Arkansas River basin, USA, while accounting for potential spatial sampling bias and model complexity. Filtering the presence data appeared to adequately remove an eastward, large‐river sampling bias that was evident within the unfiltered presence dataset. In particular, our novel riverscape filter provided a repeatable means of obtaining a relatively even coverage of presence data among watersheds and streams of varying sizes. The greatest differences in estimated distributions were observed among models constructed with default versus AICC‐selected parameterization. Although all models had similarly high performance and evaluation metrics, the AICC‐selected models were more inclusive of westward‐situated and smaller, headwater streams. Overall, our results solidified the importance of accounting for model complexity and spatial sampling bias in SDMs constructed within stream networks and provided a roadmap for future paddlefish restoration efforts in the study area.  相似文献   

3.
Mathematical equations are fundamental to modeling biological networks, but as networks get large and revisions frequent, it becomes difficult to manage equations directly or to combine previously developed models. Multiple simultaneous efforts to create graphical standards, rule‐based languages, and integrated software workbenches aim to simplify biological modeling but none fully meets the need for transparent, extensible, and reusable models. In this paper we describe PySB, an approach in which models are not only created using programs, they are programs. PySB draws on programmatic modeling concepts from little b and ProMot, the rule‐based languages BioNetGen and Kappa and the growing library of Python numerical tools. Central to PySB is a library of macros encoding familiar biochemical actions such as binding, catalysis, and polymerization, making it possible to use a high‐level, action‐oriented vocabulary to construct detailed models. As Python programs, PySB models leverage tools and practices from the open‐source software community, substantially advancing our ability to distribute and manage the work of testing biochemical hypotheses. We illustrate these ideas using new and previously published models of apoptosis.  相似文献   

4.
Models and data used to describe species–area relationships confound sampling with ecological process as they fail to acknowledge that estimates of species richness arise due to sampling. This compromises our ability to make ecological inferences from and about species–area relationships. We develop and illustrate hierarchical community models of abundance and frequency to estimate species richness. The models we propose separate sampling from ecological processes by explicitly accounting for the fact that sampled patches are seldom completely covered by sampling plots and that individuals present in the sampling plots are imperfectly detected. We propose a multispecies abundance model in which community assembly is treated as the summation of an ensemble of species‐level Poisson processes and estimate patch‐level species richness as a derived parameter. We use sampling process models appropriate for specific survey methods. We propose a multispecies frequency model that treats the number of plots in which a species occurs as a binomial process. We illustrate these models using data collected in surveys of early‐successional bird species and plants in young forest plantation patches. Results indicate that only mature forest plant species deviated from the constant density hypothesis, but the null model suggested that the deviations were too small to alter the form of species–area relationships. Nevertheless, results from simulations clearly show that the aggregate pattern of individual species density–area relationships and occurrence probability–area relationships can alter the form of species–area relationships. The plant community model estimated that only half of the species present in the regional species pool were encountered during the survey. The modeling framework we propose explicitly accounts for sampling processes so that ecological processes can be examined free of sampling artefacts. Our modeling approach is extensible and could be applied to a variety of study designs and allows the inclusion of additional environmental covariates.  相似文献   

5.
Existing cure‐rate survival models are generally not convenient for modeling and estimating the survival quantiles of a patient with specified covariate values. This paper proposes a novel class of cure‐rate model, the transform‐both‐sides cure‐rate model (TBSCRM), that can be used to make inferences about both the cure‐rate and the survival quantiles. We develop the Bayesian inference about the covariate effects on the cure‐rate as well as on the survival quantiles via Markov Chain Monte Carlo (MCMC) tools. We also show that the TBSCRM‐based Bayesian method outperforms existing cure‐rate models based methods in our simulation studies and in application to the breast cancer survival data from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) database.  相似文献   

6.
The Quality by Design (QbD) approach to the production of therapeutic monoclonal antibodies (mAbs) emphasizes an understanding of the production process ensuring product quality is maintained throughout. Current methods for measuring critical quality attributes (CQAs) such as glycation and glycosylation are time and resource intensive, often, only tested offline once per batch process. Process analytical technology (PAT) tools such as Raman spectroscopy combined with chemometric modeling can provide real time measurements process variables and are aligned with the QbD approach. This study utilizes these tools to build partial least squares (PLS) regression models to provide real time monitoring of glycation and glycosylation profiles. In total, seven cell line specific chemometric PLS models; % mono-glycated, % non-glycated, % G0F-GlcNac, % G0, % G0F, % G1F, and % G2F were considered. PLS models were initially developed using small scale data to verify the capability of Raman to measure these CQAs effectively. Accurate PLS model predictions were observed at small scale (5 L). At manufacturing scale (2000 L) some glycosylation models showed higher error, indicating that scale may be a key consideration in glycosylation profile PLS model development. Model robustness was then considered by supplementing models with a single batch of manufacturing scale data. This data addition had a significant impact on the predictive capability of each model, with an improvement of 77.5% in the case of the G2F. The finalized models show the capability of Raman as a PAT tool to deliver real time monitoring of glycation and glycosylation profiles at manufacturing scale.  相似文献   

7.
The aim of the ecospat package is to make available novel tools and methods to support spatial analyses and modeling of species niches and distributions in a coherent workflow. The package is written in the R language (R Development Core Team) and contains several features, unique in their implementation, that are complementary to other existing R packages. Pre‐modeling analyses include species niche quantifications and comparisons between distinct ranges or time periods, measures of phylogenetic diversity, and other data exploration functionalities (e.g. extrapolation detection, ExDet). Core modeling brings together the new approach of ensemble of small models (ESM) and various implementations of the spatially‐explicit modeling of species assemblages (SESAM) framework. Post‐modeling analyses include evaluation of species predictions based on presence‐only data (Boyce index) and of community predictions, phylogenetic diversity and environmentally‐constrained species co‐occurrences analyses. The ecospat package also provides some functions to supplement the ‘biomod2’ package (e.g. data preparation, permutation tests and cross‐validation of model predictive power). With this novel package, we intend to stimulate the use of comprehensive approaches in spatial modelling of species and community distributions.  相似文献   

8.
9.
10.
Functional trait composition is increasingly recognized as key to better understand and predict community responses to environmental gradients. Predictive approaches traditionally model the weighted mean trait values of communities (CWMs) as a function of environmental gradients. However, most approaches treat traits as independent regardless of known tradeoffs between them, which could lead to spurious predictions. To address this issue, we suggest jointly modeling a suit of functional traits along environmental gradients while accounting for relationships between traits. We use generalized additive mixed effect models to predict the functional composition of alpine grasslands in the Guisane Valley (France). We demonstrate that, compared to traditional approaches, joint trait models explain considerable amounts of variation in CWMs, yield less uncertainty in trait CWM predictions and provide more realistic spatial projections when extrapolating to novel environmental conditions. Modeling traits and their co‐variation jointly is an alternative and superior approach to predicting traits independently. Additionally, compared to a ‘predict first, assemble later’ approach that estimates trait CWMs post hoc based on stacked species distribution models, our ‘assemble first, predict later’ approach directly models trait‐responses along environmental gradients, and does not require data and models on species’ distributions, but only mean functional trait values per community plot. This highlights the great potential of joint trait modeling approaches in large‐scale mapping applications, such as spatial projections of the functional composition of vegetation and associated ecosystem services as a response to contemporary global change.  相似文献   

11.
ModEco: an integrated software package for ecological niche modeling   总被引:2,自引:0,他引:2  
Qinghua Guo  Yu Liu 《Ecography》2010,33(4):637-642
ModEco is a software package for ecological niche modeling. It integrates a range of niche modeling methods within a geographical information system. ModEco provides a user friendly platform that enables users to explore, analyze, and model species distribution data with relative ease. ModEco has several unique features: 1) it deals with different types of ecological observation data, such as presence and absence data, presence‐only data, and abundance data; 2) it provides a range of models when dealing with presence‐only data, such as presence‐only models, pseudo‐absence models, background vs presence data models, and ensemble models; and 3) it includes relatively comprehensive tools for data visualization, feature selection, and accuracy assessment.  相似文献   

12.
Summary The aim of this article is to develop a spatial model for multi‐subject fMRI data. There has been extensive work on univariate modeling of each voxel for single and multi‐subject data, some work on spatial modeling of single‐subject data, and some recent work on spatial modeling of multi‐subject data. However, there has been no work on spatial models that explicitly account for inter‐subject variability in activation locations. In this article, we use the idea of activation centers and model the inter‐subject variability in activation locations directly. Our model is specified in a Bayesian hierarchical framework which allows us to draw inferences at all levels: the population level, the individual level, and the voxel level. We use Gaussian mixtures for the probability that an individual has a particular activation. This helps answer an important question that is not addressed by any of the previous methods: What proportion of subjects had a significant activity in a given region. Our approach incorporates the unknown number of mixture components into the model as a parameter whose posterior distribution is estimated by reversible jump Markov chain Monte Carlo. We demonstrate our method with a fMRI study of resolving proactive interference and show dramatically better precision of localization with our method relative to the standard mass‐univariate method. Although we are motivated by fMRI data, this model could easily be modified to handle other types of imaging data.  相似文献   

13.
Mathematical models in biology are powerful tools for the study and exploration of complex dynamics. Nevertheless, bringing theoretical results to an agreement with experimental observations involves acknowledging a great deal of uncertainty intrinsic to our theoretical representation of a real system. Proper handling of such uncertainties is key to the successful usage of models to predict experimental or field observations. This problem has been addressed over the years by many tools for model calibration and parameter estimation. In this article we present a general framework for uncertainty analysis and parameter estimation that is designed to handle uncertainties associated with the modeling of dynamic biological systems while remaining agnostic as to the type of model used. We apply the framework to fit an SIR-like influenza transmission model to 7 years of incidence data in three European countries: Belgium, the Netherlands and Portugal.  相似文献   

14.
Assessing the potential future of current forest stands is a key to design conservation strategies and understanding potential future impacts to ecosystem service supplies. This is particularly true in the Mediterranean basin, where important future climatic changes are expected. Here, we assess and compare two commonly used modeling approaches (niche‐ and process‐based models) to project the future of current stands of three forest species with contrasting distributions, using regionalized climate for continental Spain. Results highlight variability in model ability to estimate current distributions, and the inherent large uncertainty involved in making projections into the future. CO2 fertilization through projected increased atmospheric CO2 concentrations is shown to increase forest productivity in the mechanistic process‐based model (despite increased drought stress) by up to three times that of the non‐CO2 fertilization scenario by the period 2050–2080, which is in stark contrast to projections of reduced habitat suitability from the niche‐based models by the same period. This highlights the importance of introducing aspects of plant biogeochemistry into current niche‐based models for a realistic projection of future species distributions. We conclude that the future of current Mediterranean forest stands is highly uncertain and suggest that a new synergy between niche‐ and process‐based models is urgently needed in order to improve our predictive ability.  相似文献   

15.
Biology is an information-driven science. Large-scale data sets from genomics, physiology, population genetics and imaging are driving research at a dizzying rate. Simultaneously, interdisciplinary collaborations among experimental biologists, theorists, statisticians and computer scientists have become the key to making effective use of these data sets. However, too many biologists have trouble accessing and using these electronic data sets and tools effectively. A 'cyberinfrastructure' is a combination of databases, network protocols and computational services that brings people, information and computational tools together to perform science in this information-driven world. This article reviews the components of a biological cyberinfrastructure, discusses current and pending implementations, and notes the many challenges that lie ahead.  相似文献   

16.
Substance flow analysis (SFA) is a frequently used industrial ecology technique for studying societal metal flows, but it is limited in its ability to inform us about future developments in metal flow patterns and how we can affect them. Equation‐based simulation modeling techniques, such as dynamic SFA and system dynamics, can usefully complement static SFA studies in this respect, but they are also restricted in several ways. The objective of this article is to demonstrate the ability of agent‐based modeling to overcome these limitations and its usefulness as a tool for studying societal metal flow systems. The body of the article summarizes the parallel implementation of two models—an agent‐based model and a system dynamics model—both addressing the following research question: What conditions foster the development of a closed‐loop flow network for metals in mobile phones? The results from in silico experimentation with these models highlight three important differences between agent‐based modeling (ABM) and equation‐based modeling (EBM) techniques. An analysis of how these differences affected the insights that could be extracted from the constructed models points to several key advantages of ABM in the study of metal flow systems. In particular, this analysis suggests that a key advantage of the ABM technique is its flexibility to enable the representation of societal metal flow systems in a more native manner. This added flexibility endows modelers with enhanced leverage to identify options for steering metal flows and opens new opportunities for using the metaphor of an ecosystem to understand metal flow systems more fully.  相似文献   

17.
A primary focus of historical biogeography is to understand changes in species ranges, abundance and genetic connectivity, and changes in community composition. Traditionally, biogeographic inference has relied on distinct lines of evidence, including DNA sequences, fossils and hindcasted ecological niche models. In this review we propose that the development of integrative modeling approaches that leverage multiple distinct data types from diverse disciplines has the potential to revolutionize the field of biogeography. Although each data type contains information on a distinct aspect of species’ biogeographic histories, few studies formally integrate multiple types in analysis. For example, post hoc congruence among analyses based on different data types (e.g. fossils and genetics) is commonly assumed to indicate likely biogeographic histories. Unfortunately, analyses of different data often reach discordant conclusions. Thus, fundamental and unresolved debates continue regarding speed and timing of postglacial migration, location and size of glacial refugia, and degree of long distance dispersal. Formal statistical integration can help address these issues. More specifically, formal integration can leverage all available evidence, account for inherent biases associated with different data types, and quantify data and process uncertainty. Novel, quantitative integration of data and models across fields is now possible due to recent advances in cyberinfrastructure, spatial modeling, online and aggregated ecological databases, data processing and quantitative methods. Our purpose is to make the case for and give examples of rigorous integration of genetic, fossil and environmental/occurrence data for inferring biogeographic history. In particular, we 1) review the need for such a framework; 2) explain common data types and approaches used to infer biogeographic history (and the challenges with each); 3) review state‐of‐the‐art examples of data integration in biogeography; 4) lay out a series of novel, suggested improvements on current methods; and 5) provide an outlook on technical feasibility and future opportunities.  相似文献   

18.
19.
刘芳  李晟  李迪强 《生态学报》2013,33(21):7047-7057
详细的物种地理分布信息是生态学研究和制定保护策略的基础。相比较于直接估测种群数量,获取物种分布的有/无数据更为实用。因此,利用分布有/无数据并结合环境变量建立模型预测物种空间分布的方法在近年来得到了长足发展,并被广泛应用。利用分布有/无数据预测物种分布,关键的步骤包括:1)构建总体概念模型,2)收集物种分布有/无数据,并准备环境变量图层;3)选择合适的统计模型和算法,以及4)对模型进行评估。概念模型提出研究假设,并确定数据收集及模型方法。收集物种分布数据有系统调查及非系统调查方法。筛选并准备与物种分布相关的环境变量,利用GIS工具处理,使之成为符合模型条件的具有合适的空间尺度的数字化图层。利用环境变量和物种分布有/无的数据,选择合适的方法及软件建立模型,并对模型进行检验和评估。我们总结了用于构建物种分布模型的不同算法和软件。本文将针对以上各个环节,阐述利用物种分布有/无数据进行研究所需要的技术细节,以期望为读者提供借鉴。  相似文献   

20.
A vast amount of ecological knowledge generated over the past two decades has hinged upon the ability of model selection methods to discriminate among various ecological hypotheses. The last decade has seen the rise of Bayesian hierarchical models in ecology. Consequently, commonly used tools, such as the AIC, become largely inapplicable and there appears to be no consensus about a particular model selection tool that can be universally applied. We focus on a specific class of competing Bayesian spatial capture–recapture (SCR) models and apply and evaluate some of the recommended Bayesian model selection tools: (1) Bayes Factor—using (a) Gelfand‐Dey and (b) harmonic mean methods, (2) Deviance Information Criterion (DIC), (3) Watanabe‐Akaike's Information Criterion (WAIC) and (4) posterior predictive loss criterion. In all, we evaluate 25 variants of model selection tools in our study. We evaluate these model selection tools from the standpoint of selecting the “true” model and parameter estimation. In all, we generate 120 simulated data sets using the true model and assess the frequency with which the true model is selected and how well the tool estimates N (population size), a parameter of much importance to ecologists. We find that when information content is low in the data, no particular model selection tool can be recommended to help realize, simultaneously, both the goals of model selection and parameter estimation. But, in general (when we consider both the objectives together), we recommend the use of our application of the Bayes Factor (Gelfand‐Dey with MAP approximation) for Bayesian SCR models. Our study highlights the point that although new model selection tools are emerging (e.g., WAIC) in the applied statistics literature, those tools based on sound theory even under approximation may still perform much better.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号