首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Taverna: a tool for the composition and enactment of bioinformatics workflows   总被引:12,自引:0,他引:12  
MOTIVATION: In silico experiments in bioinformatics involve the co-ordinated use of computational tools and information repositories. A growing number of these resources are being made available with programmatic access in the form of Web services. Bioinformatics scientists will need to orchestrate these Web services in workflows as part of their analyses. RESULTS: The Taverna project has developed a tool for the composition and enactment of bioinformatics workflows for the life sciences community. The tool includes a workbench application which provides a graphical user interface for the composition of workflows. These workflows are written in a new language called the simple conceptual unified flow language (Scufl), where by each step within a workflow represents one atomic task. Two examples are used to illustrate the ease by which in silico experiments can be represented as Scufl workflows using the workbench application.  相似文献   

2.

Background  

Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration.  相似文献   

3.

Background

Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts.

Results

In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure.

Conclusions

Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis. The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org.  相似文献   

4.
Mass spectrometry coupled to high-performance liquid chromatography (HPLC-MS) is evolving more quickly than ever. A wide range of different instrument types and experimental setups are commonly used. Modern instruments acquire huge amounts of data, thus requiring tools for an efficient and automated data analysis. Most existing software for analyzing HPLC-MS data is monolithic and tailored toward a specific application. A more flexible alternative consists of pipeline-based tool kits allowing the construction of custom analysis workflows from small building blocks, e.g., the Trans Proteomics Pipeline (TPP) or The OpenMS Proteomics Pipeline (TOPP). One drawback, however, is the hurdle of setting up complex workflows using command line tools. We present TOPPAS, The OpenMS Proteomics Pipeline ASsistant, a graphical user interface (GUI) for rapid composition of HPLC-MS analysis workflows. Workflow construction reduces to simple drag-and-drop of analysis tools and adding connections in between. Integration of external tools into these workflows is possible as well. Once workflows have been developed, they can be deployed in other workflow management systems or batch processing systems in a fully automated fashion. The implementation is portable and has been tested under Windows, Mac OS X, and Linux. TOPPAS is open-source software and available free of charge at http://www.OpenMS.de/TOPPAS .  相似文献   

5.
Challenges and opportunities in proteomics data analysis   总被引:2,自引:0,他引:2  
Accurate, consistent, and transparent data processing and analysis are integral and critical parts of proteomics workflows in general and for biomarker discovery in particular. Definition of common standards for data representation and analysis and the creation of data repositories are essential to compare, exchange, and share data within the community. Current issues in data processing, analysis, and validation are discussed together with opportunities for improving the process in the future and for defining alternative workflows.  相似文献   

6.

The modeling of complex computational applications as giant computational workflows has been a critically effective means of better understanding the intricacies of applications and of determining the best approach to their realization. It is a challenging assignment to schedule such workflows in the cloud while also considering users’ different quality of service requirements. The present paper introduces a new direction based on a divide-and-conquer approach to scheduling these workflows. The proposed Divide-and-conquer Workflow Scheduling algorithm (DQWS) is designed with the objective of minimizing the cost of workflow execution while respecting its deadline. The critical path concept is the inspiration behind the divide-and-conquer process. DQWS finds the critical path, schedules it, removes the critical path from the workflow, and effectively divides the leftover into some mini workflows. The process continues until only chain structured workflows, called linear graphs, remain. Scheduling linear graphs is performed in the final phase of the algorithm. Experiments show that DQWS outperforms its competitors, both in terms of meeting deadlines and minimizing the monetary costs of executing scheduled workflows.

  相似文献   

7.
8.
9.
科学工作流系统是由一系列经过特殊设计的数据分析与管理步骤组成的、按照一定的逻辑组织在一起, 并在给定的运行环境下, 完成特定科学研究的工作流管理系统。科学工作流系统致力于使全世界的科学家可以在一个简单易用的平台上交换思想, 共同设计全球尺度的实验, 共享数据、实验步骤与结果等。每一个科学家可以独立创建自己的工作流, 执行工作流并实时查看结果; 不同科学家之间也可以方便地共享和复用这些工作流。本文以开普勒系统(Kepler system)和生物多样性虚拟实验室(BioVeL)两个项目为例, 介绍了科学工作流的发展历史、背景、现有项目和应用等。以生态位模型工作流为例, 介绍了科学工作流的流程以及特点等。并通过对现有科学工作流的分析, 对其发展方向和存在的问题提出了自己的看法及预期。  相似文献   

10.
《植物生态学报》2013,22(3):277
A scientific workflow system is designed specifically to organize, manage and execute a series of research steps, or a workflow, in a given runtime environment. The vision for scientific workflow systems is that the scientists around the world can collaborate on designing global-scaled experiments, sharing the data sets, experimental processes, and results on an easy-to-use platform. Each scientist can create and execute their own workflows and view results in real-time, and then subsequently share and reuse workflows among other scientists. Two case studies, using the Kepler system and BioVeL, are introduced in this paper. Ecological niche modeling process, which is a specialized form of scientific workflow system included in both Kepler system and BioVeL, was used to describe and discuss the features, developmental trends, and problems of scientific workflows.  相似文献   

11.
MOTIVATION: Computationally, in silico experiments in biology are workflows describing the collaboration of people, data and methods. The Grid and Web services are proposed to be the next generation infrastructure supporting the deployment of bioinformatics workflows. But the growing number of autonomous and heterogeneous services pose challenges to the used middleware w.r.t. composition, i.e. discovery and interoperability of services required within in silico experiments. In the IRIS project, we handle the problem of service interoperability by a semi-automatic procedure for identifying and placing customizable adapters into workflows built by service composition. RESULTS: We show the effectiveness and robustness of the software-aided composition procedure by a case study in the field of life science. In this study we combine different database services with different analysis services with the objective of discovering required adapters. Our experiments show that we can identify relevant adapters with high precision and recall.  相似文献   

12.
Amplicon sequencing has been the method of choice in many high-throughput DNA sequencing (HTS) applications. To date there has been a heavy focus on the means by which to analyse the burgeoning amount of data afforded by HTS. In contrast, there has been a distinct lack of attention paid to considerations surrounding the importance of sample preparation and the fidelity of library generation. No amount of high-end bioinformatics can compensate for poorly prepared samples and it is therefore imperative that careful attention is given to sample preparation and library generation within workflows, especially those involving multiple PCR steps. This paper redresses this imbalance by focusing on aspects pertaining to the benchtop within typical amplicon workflows: sample screening, the target region, and library generation. Empirical data is provided to illustrate the scope of the problem. Lastly, the impact of various data analysis parameters is also investigated in the context of how the data was initially generated. It is hoped this paper may serve to highlight the importance of pre-analysis workflows in achieving meaningful, future-proof data that can be analysed appropriately. As amplicon sequencing gains traction in a variety of diagnostic applications from forensics to environmental DNA (eDNA) it is paramount workflows and analytics are both fit for purpose.  相似文献   

13.
14.
Glycans are intrinsically complex biomolecules that pose particular analytical challenges. Standard workflows for glycan analysis are based on mass spectrometry, often coupled with separation techniques such as liquid chromatography and ion mobility spectrometry. However, this approach does not yield direct structural information and cannot always distinguish between isomers. This gap might be filled in the future by gas-phase infrared spectroscopy, which has emerged as a promising structure-sensitive technique for glycan fingerprinting. This review highlights recent applications of gas-phase infrared spectroscopy for the analysis of synthetic and biological glycans and how they can be integrated into mass spectrometry-based workflows.  相似文献   

15.
Mammalian cell banks for biopharmaceutical production are usually derived from a single progenitor cell. Different methods to estimate the probability that the cell banks are clonally derived, or the probability of clonality (PoC), associated with various cloning workflows have been reported previously. In this review, a systematic analysis and comparison of the methods used to calculate the PoC are provided. As the single cell deposition and high‐resolution imaging technologies continue to advance and the cloning workflow evolves, an aligned understanding and best practice on estimating the PoC is necessary to compare different cloning workflows adopted across the biopharmaceutical industry and it will help to accelerate regulatory acceptance.  相似文献   

16.
Many data manipulation processes involve the use of programming libraries. These processes may beneficially be automated due to their repeated use. A convenient type of automation is in the form of workflows that also allow such processes to be shared amongst the community. The Taverna workflow system has been extended to enable it to use and invoke Java classes and methods as tasks within Taverna workflows. These classes and methods are selected for use during workflow construction by a Java Doclet application called the API Consumer. This selection is stored as an XML file which enables Taverna to present the subset of the API for use in the composition of workflows. The ability of Taverna to invoke Java classes and methods is demonstrated by a workflow in which we use libSBML to map gene expression data onto a metabolic pathway represented as a SBML model. AVAILABILITY: Taverna and the API Consumer application can be freely downloaded from http://taverna.sourceforge.net  相似文献   

17.
Data processing forms an integral part of biomarker discovery and contributes significantly to the ultimate result. To compare and evaluate various publicly available open source label-free data processing workflows, we developed msCompare, a modular framework that allows the arbitrary combination of different feature detection/quantification and alignment/matching algorithms in conjunction with a novel scoring method to evaluate their overall performance. We used msCompare to assess the performance of workflows built from modules of publicly available data processing packages such as SuperHirn, OpenMS, and MZmine and our in-house developed modules on peptide-spiked urine and trypsin-digested cerebrospinal fluid (CSF) samples. We found that the quality of results varied greatly among workflows, and interestingly, heterogeneous combinations of algorithms often performed better than the homogenous workflows. Our scoring method showed that the union of feature matrices of different workflows outperformed the original homogenous workflows in some cases. msCompare is open source software (https://trac.nbic.nl/mscompare), and we provide a web-based data processing service for our framework by integration into the Galaxy server of the Netherlands Bioinformatics Center (http://galaxy.nbic.nl/galaxy) to allow scientists to determine which combination of modules provides the most accurate processing for their particular LC-MS data sets.  相似文献   

18.
Analysis of any mammalian plasma proteome is a challenge, particularly by mass spectrometry, due to the presence of albumin and other abundant proteins which can mask the detection of low abundant proteins. As detection of human plasma proteins is valuable in diagnostics, exploring various workflows with minimal fractionation prior to mass spectral analysis, is required in order to study population diversity involving analysis in a large cohort of samples. Here, we used ‘reference plasma sample’, a pool of plasma from 10 healthy individuals from Indian population in the age group of 25–60 yrs including 5 males and 5 females. The 14 abundant proteins were immunodepleted from plasma and then evaluated by three different workflows for proteome analysis using a nanoflow reverse phase liquid chromatography system coupled to a LTQ Orbitrap Velos mass spectrometer. The analysis of reference plasma sample a) without prefractionation, b) after prefractionation at peptide level by strong cation exchange chromatography and c) after prefractionation at protein level by sodium dodecyl sulfate polyacrylamide gel electrophoresis, led to the identification of 194, 251 and 342 proteins respectively. Together, a comprehensive dataset of 517 unique proteins was achieved from all the three workflows, including 271 proteins with high confidence identified by≥2 unique peptides in any of the workflows or identified by single peptide in any of the two workflows. A total of 70 proteins were common in all the three workflows. Some of the proteins were unique to our study and could be specific to Indian population. The high-confidence dataset obtained from our study may be useful for studying the population diversity, in discovery and validation process for biomarker identification.  相似文献   

19.
All steps of cryogenic electron-microscopy (cryo-EM) workflows have rapidly evolved over the last decade. Advances in both single-particle analysis (SPA) cryo-EM and cryo-electron tomography (cryo-ET) have facilitated the determination of high-resolution biomolecular structures that are not tractable with other methods. However, challenges remain. For SPA, these include improved resolution in an additional dimension: time. For cryo-ET, these include accessing difficult-to-image areas of a cell and finding rare molecules. Finally, there is a need for automated and faster workflows, as many projects are limited by throughput. Here, we review current developments in SPA cryo-EM and cryo-ET that push these boundaries. Collectively, these advances are poised to propel our spatial and temporal understanding of macromolecular processes.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号