首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
科学工作流系统是由一系列经过特殊设计的数据分析与管理步骤组成的、按照一定的逻辑组织在一起, 并在给定的运行环境下, 完成特定科学研究的工作流管理系统。科学工作流系统致力于使全世界的科学家可以在一个简单易用的平台上交换思想, 共同设计全球尺度的实验, 共享数据、实验步骤与结果等。每一个科学家可以独立创建自己的工作流, 执行工作流并实时查看结果; 不同科学家之间也可以方便地共享和复用这些工作流。本文以开普勒系统(Kepler system)和生物多样性虚拟实验室(BioVeL)两个项目为例, 介绍了科学工作流的发展历史、背景、现有项目和应用等。以生态位模型工作流为例, 介绍了科学工作流的流程以及特点等。并通过对现有科学工作流的分析, 对其发展方向和存在的问题提出了自己的看法及预期。  相似文献   

2.
One of the challenges of computational-centric research is to make the research undertaken reproducible in a form that others can repeat and re-use with minimal effort. In addition to the data and tools necessary to re-run analyses, execution environments play crucial roles because of the dependencies of the operating system and software version used. However, some of the challenges of reproducible science can be addressed using appropriate computational tools and cloud computing to provide an execution environment.Here, we demonstrate the use of a Kepler scientific workflow for reproducible science that is sharable, reusable, and re-executable. These workflows reduce barriers to sharing and will save researchers time when undertaking similar research in the future.To provide infrastructure that enables reproducible science, we have developed cloud-based Collaborative Environment for Ecosystem Science Research and Analysis (CoESRA) infrastructure to build, execute and share sophisticated computation-centric research. The CoESRA provides users with a storage and computational platform that is accessible from a web-browser in the form of a virtual desktop. Any registered user can access the virtual desktop to build, execute and share the Kepler workflows. This approach will enable computational scientists to share complete workflows in a pre-configured environment so that others can reproduce the computational research with minimal effort.As a case study, we developed and shared a complete IUCN Red List of Ecosystems Assessment workflow that reproduces the assessments undertaken by Burns et al. (2015) on Mountain Ash forests in the Central Highlands of Victoria, Australia. This workflow provides an opportunity for other researchers and stakeholders to run this assessment with minimal supervision. The workflow also enables researchers to re-evaluate the assessment when additional data becomes available. The assessment can be run in a CoESRA virtual desktop by opening a workflow in a Kepler user interface and pressing a “start” button. The workflow is pre-configured with all the open access datasets and writes results to a pre-configured folder.  相似文献   

3.
4.
Environmental sensor networks are now commonly being deployed within environmental observatories and as components of smaller-scale ecological and environmental experiments. Effectively using data from these sensor networks presents technical challenges that are difficult for scientists to overcome, severely limiting the adoption of automated sensing technologies in environmental science. The Realtime Environment for Analytical Processing (REAP) is an NSF-funded project to address the technical challenges related to accessing and using heterogeneous sensor data from within the Kepler scientific workflow system. Using distinct use cases in terrestrial ecology and oceanography as motivating examples, we describe workflows and extensions to Kepler to stream and analyze data from observatory networks and archives. We focus on the use of two newly integrated data sources in Kepler: DataTurbine and OPeNDAP. Integrated access to both near real-time data streams and data archives from within Kepler facilitates both simple data exploration and sophisticated analysis and modeling with these data sources.  相似文献   

5.
6.
Motivated by the widespread use of workflow systems in e-Science applications, this article introduces a formal analysis framework for the verification and profiling of the control flow aspects of scientific workflows. The framework relies on process algebras that characterise each workflow component with a process behaviour, which is then used to build a CTL state model that can be reasoned about. We demonstrate the benefits of the approach by modelling the control flow behaviour of the Discovery Net system, one of the earliest workflow-based e-Science systems, and present how some key properties of workflows and individual service utilisation can be queried at design time. Our approach is generic and can be applied easily to modelling workflows developed in any other system. It also provides a formal basis for the comparison of control aspects of e-Science workflow systems and a design method for future systems.  相似文献   

7.
Soma  Prathibha  Latha  B. 《Cluster computing》2021,24(2):1123-1134

Scientific workflow applications are used by scientists to carry out research in various domains such as Physics, Chemistry, Astronomy etc. These applications require huge computational resources and currently cloud platform is used for efficiently running these applications. To improve the makespan and cost in workflow execution in cloud platform it requires to identify proper number of Virtual Machines (VM) and choose proper VM type. As cloud platform is dynamic, the available resources and the type of the resources are the two important factors on the cost and makespan of workflow execution. The primary objective of this work is to analyze the relationship among the cloud configuration parameters (Number of VM, Type of VM, VM configurations) for executing scientific workflow applications in cloud platform. In this work, to accurately analyze the influence of cloud platform resource configuration and scheduling polices a new predictive modelling using Box–Behnken design which is one of the modelling technique of Response Surface Methodology (RSM). It is used to build quadratic mathematical models that can be used to analyze relationships among input and output variables. Workflow cost and makespan models were built for real world scientific workflows using ANOVA and it was observed that the models fit well and can be useful in analyzing the performance of scientific workflow applications in cloud

  相似文献   

8.

Background

Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts.

Results

In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure.

Conclusions

Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis. The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org.  相似文献   

9.
10.
Taverna: a tool for the composition and enactment of bioinformatics workflows   总被引:12,自引:0,他引:12  
MOTIVATION: In silico experiments in bioinformatics involve the co-ordinated use of computational tools and information repositories. A growing number of these resources are being made available with programmatic access in the form of Web services. Bioinformatics scientists will need to orchestrate these Web services in workflows as part of their analyses. RESULTS: The Taverna project has developed a tool for the composition and enactment of bioinformatics workflows for the life sciences community. The tool includes a workbench application which provides a graphical user interface for the composition of workflows. These workflows are written in a new language called the simple conceptual unified flow language (Scufl), where by each step within a workflow represents one atomic task. Two examples are used to illustrate the ease by which in silico experiments can be represented as Scufl workflows using the workbench application.  相似文献   

11.
《植物生态学报》2015,39(9):932
The concept of ecological thresholds was raised in the 1970s. However, it was subsequently given different definitions and interpretations depending on research fields or disciplines. For most scientists, ecological thresholds refer to the points or zones that link abrupt changes between alternative stable states of an ecosystem. The measurement and quantification of ecological thresholds have great theoretical and practical significance in ecological research for clarifying the structure and function of ecosystems, for planning sustainable development modes, and for delimiting ecological red lines in managing the ecosystems of a region. By reviewing the existing concepts and classifications of ecological thresholds, we propose a new concept and definition at two different levels: the ecological threshold points, i.e. the turning points of quantitative changes to qualitative changes, which can be considered as ecological red lines; the ecological threshold zones, i.e. the regime shifts of the quantitative changes among different stable states, which can be considered as the yellow and/or orange warning boundaries of the gradual ecological changes. The yellow thresholds mean that an ecosystem can return to a stable state by its self-adjustment, the orange thresholds indicate that the ecosystem will stay in the equilibrium state after interference factors being removed, whereas the red thresholds, as the critical threshold points, indicate that the ecosystem will undergo irreversible degradation or even collapse beyond those points. We also summarizes two types of popular Methods in determining ecological thresholds: statistical analysis and modeling based on data of field observations. The applications of ecological thresholds in ecosystem service, biodiversity conservation and ecosystem management research are also reviewed. Future research on ecological thresholds should focus on the following aspects: (1) methodological development for measurement and quantification of ecological thresholds; (2) emphasizing the scaling effect of ecological thresholds and establishment of national-scale observation system and network; and (3) implementation of ecological thresholds as early warning tools in ecosystem management and delimiting ecological red lines.  相似文献   

12.

Background  

Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration.  相似文献   

13.
Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multidimensional parameter space consisting of input performance parameters to the applications that are known to affect their execution times. While some performance parameters such as grouping of workflow components and their mapping to machines do not affect the accuracy of the analysis, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple such parameters. Using two real-world applications in the spatial, multidimensional data analysis domain, we present an experimental evaluation of the proposed framework.  相似文献   

14.
Many data manipulation processes involve the use of programming libraries. These processes may beneficially be automated due to their repeated use. A convenient type of automation is in the form of workflows that also allow such processes to be shared amongst the community. The Taverna workflow system has been extended to enable it to use and invoke Java classes and methods as tasks within Taverna workflows. These classes and methods are selected for use during workflow construction by a Java Doclet application called the API Consumer. This selection is stored as an XML file which enables Taverna to present the subset of the API for use in the composition of workflows. The ability of Taverna to invoke Java classes and methods is demonstrated by a workflow in which we use libSBML to map gene expression data onto a metabolic pathway represented as a SBML model. AVAILABILITY: Taverna and the API Consumer application can be freely downloaded from http://taverna.sourceforge.net  相似文献   

15.
Ecological niche modelling (ENM) Components are a set of reusable workflow components specialized for performing ENM tasks within the Taverna workflow management system. Each component encapsulates specific functionality and can be combined with other components to facilitate the creation of larger and more complex workflows. One key distinguishing feature of ENM Components is that most tasks are performed remotely by calling web services, simplifying software setup and maintenance on the client side and allowing more powerful computing resources to be exploited. This paper presents the current set of ENM Components in the context of the Taverna family of tools for creating, publishing and sharing workflows. An example is included showing how the components can be used in a preliminary investigation of the effects of mixing different spatial resolutions in ENM experiments.  相似文献   

16.
Recent improvements in online information communication and mobile location-aware technologies have led to the production of large volumes of volunteered geographic information. Widespread, large-scale efforts by volunteers to collect data can inform and drive scientific advances in diverse fields, including ecology and climatology. Traditional workflows to check the quality of such volunteered information can be costly and time consuming as they heavily rely on human interventions. However, identifying factors that can influence data quality, such as inconsistency, is crucial when these data are used in modeling and decision-making frameworks. Recently developed workflows use simple statistical approaches that assume that the majority of the information is consistent. However, this assumption is not generalizable, and ignores underlying geographic and environmental contextual variability that may explain apparent inconsistencies. Here we describe an automated workflow to check inconsistency based on the availability of contextual environmental information for sampling locations. The workflow consists of three steps: (1) dimensionality reduction to facilitate further analysis and interpretation of results, (2) model-based clustering to group observations according to their contextual conditions, and (3) identification of inconsistent observations within each cluster. The workflow was applied to volunteered observations of flowering in common and cloned lilac plants (Syringa vulgaris and Syringa x chinensis) in the United States for the period 1980 to 2013. About 97% of the observations for both common and cloned lilacs were flagged as consistent, indicating that volunteers provided reliable information for this case study. Relative to the original dataset, the exclusion of inconsistent observations changed the apparent rate of change in lilac bloom dates by two days per decade, indicating the importance of inconsistency checking as a key step in data quality assessment for volunteered geographic information. Initiatives that leverage volunteered geographic information can adapt this workflow to improve the quality of their datasets and the robustness of their scientific analyses.  相似文献   

17.
Nowadays, scientists and companies are confronted with multiple competing goals such as makespan in high-performance computing and economic cost in Clouds that have to be simultaneously optimised. Multi-objective scheduling of scientific applications in these systems is therefore receiving increasing research attention. Most existing approaches typically aggregate all objectives in a single function, defined a-priori without any knowledge about the problem being solved, which negatively impacts the quality of the solutions. In contrast, Pareto-based approaches having as outcome a set of (nearly) optimal solutions that represent a tradeoff among the different objectives, have been scarcely studied. In this paper, we analyse MOHEFT, a Pareto-based list scheduling heuristic that provides the user with a set of tradeoff optimal solutions from which the one that better suits the user requirements can be manually selected. We demonstrate the potential of our method for multi-objective workflow scheduling on the commercial Amazon EC2 Cloud. We compare the quality of the MOHEFT tradeoff solutions with two state-of-the-art approaches using different synthetic and real-world workflows: the classical HEFT algorithm for single-objective scheduling and the SPEA2* genetic algorithm used in multi-objective optimisation problems. The results demonstrate that our approach is able to compute solutions of higher quality than SPEA2*. In addition, we show that MOHEFT is more suitable than SPEA2* for workflow scheduling in the context of commercial Clouds, since the genetic-based approach is unable of dealing with some of the constraints imposed by these systems.  相似文献   

18.
通量观测是定量描述土壤-植被-大气间物质循环和能量交换过程的基础。涡度相关技术作为直接测量植被冠层与大气间能量与物质交换通量的技术手段, 已经逐步发展成为国际通用的通量观测标准方法。随着涡度相关技术在全球碳水循环研究中的广泛应用, 长期连续的通量观测正在为准确评价生态系统碳固持能力、水分和能量平衡状况、生态系统对全球气候变化的反馈作用、区域和全球尺度模型的优化与验证、极端事件对生态系统结构与功能影响等方面的研究提供重要数据支撑和机制理解途径。通过站点尺度通量长期动态观测, 明确了不同气候区和植被类型生态系统碳水通量强度基线及其季节与年际变异特征。通过多站点联网观测, 在区域和全球尺度研究生态系统碳通量空间变异特征, 揭示了区域尺度上温度和降水对生态系统碳通量空间格局的生物地理学控制机制。该文概括地介绍了涡度相关技术的基本原理、假设与系统构成, 总结了涡度通量长期联网观测在陆地生态系统碳水通量研究中的主要应用, 并对通量研究发展前景进行了展望。  相似文献   

19.
To improve the efficiency and effectiveness of clinical laboratories, workflow analysis should be applied. To achieve this, specific laboratory functions and processes were identified. Methods for analyzing workflows and rules to control them are discussed. It is shown how workflow analysis can be applied in clinical laboratories using discrete event simulation. For this, a particular model (SCALES: Support and Consequences through Advanced Laboratory Expert Systems) is applied to analyze the workflow on several workstations. The results of a validation attempt are given. The information obtained from this study appeared to be very useful both from a methodological as well as a practical point of view.  相似文献   

20.
Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号