首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Large-scale genome projects require the analysis of large amounts of raw data. This analysis often involves the application of a chain of biology-based programs. Many of these programs are difficult to operate because they are non-integrated, command-line driven, and platform-dependent. The problem is compounded when the number of data files involved is large, making navigation and status-tracking difficult. To demonstrate how this problem can be addressed, we have created a platform-independent Web front end that integrates a set of programs used in a genomic project analyzing gene function by transposon mutagenesis in Saccharomyces cerevisiae. In particular, these programs help define a large number of transposon insertion events within the yeast genome, identifying both the precise site of transposon insertion as well as potential open reading frames disrupted by this insertion event. Our Web interface facilitates this analysis by performing the following tasks. Firstly, it allows each of the analysis programs to be launched against multiple directories of data files. Secondly, it allows the user to view, download, and upload files generated by the programs. Thirdly, it indicates which sets of data directories have been processed by each program. Although designed specifically to aid in this project, our interface exemplifies a general approach by which independent software programs may be integrated into an efficient protocol for large-scale genomic data processing. Electronic Publication  相似文献   

2.
Quantifying ecosystem structure is of key importance for ecology, conservation, restoration, and biodiversity monitoring because the diversity, geographic distribution and abundance of animals, plants and other organisms is tightly linked to the physical structure of vegetation and associated microclimates. Light Detection And Ranging (LiDAR) — an active remote sensing technique — can provide detailed and high resolution information on ecosystem structure because the laser pulse emitted from the sensor and its subsequent return signal from the vegetation (leaves, branches, stems) delivers three-dimensional point clouds from which metrics of vegetation structure (e.g. ecosystem height, cover, and structural complexity) can be derived. However, processing 3D LiDAR point clouds into geospatial data products of ecosystem structure remains challenging across broad spatial extents due to the large volume of national or regional point cloud datasets (typically multiple terabytes consisting of hundreds of billions of points). Here, we present a high-throughput workflow called ‘Laserfarm’ enabling the efficient, scalable and distributed processing of multi-terabyte LiDAR point clouds from national and regional airborne laser scanning (ALS) surveys into geospatial data products of ecosystem structure. Laserfarm is a free and open-source, end-to-end workflow which contains modular pipelines for the re-tiling, normalization, feature extraction and rasterization of point cloud information from ALS and other LiDAR surveys. The workflow is designed with horizontal scalability and can be deployed with distributed computing on different infrastructures, e.g. a cluster of virtual machines. We demonstrate the Laserfarm workflow by processing a country-wide multi-terabyte ALS dataset of the Netherlands (covering ∼34,000 km2 with ∼700 billion points and ∼ 16 TB uncompressed LiDAR point clouds) into 25 raster layers at 10 m resolution capturing ecosystem height, cover and structural complexity at a national extent. The Laserfarm workflow, implemented in Python and available as Jupyter Notebooks, is applicable to other LiDAR datasets and enables users to execute automated pipelines for generating consistent and reproducible geospatial data products of ecosystems structure from massive amounts of LiDAR point clouds on distributed computing infrastructures, including cloud computing environments. We provide information on workflow performance (including total CPU times, total wall-time estimates and average CPU times for single files and LiDAR metrics) and discuss how the Laserfarm workflow can be scaled to other LiDAR datasets and computing environments, including remote cloud infrastructures. The Laserfarm workflow allows a broad user community to process massive amounts of LiDAR point clouds for mapping vegetation structure, e.g. for applications in ecology, biodiversity monitoring and ecosystem restoration.  相似文献   

3.
The procedures undertaken during the construction of chronologies often result in large quantities of data, typically in many associated files. The management of these data files, especially maintaining the relationships between processed and raw data, becomes increasingly difficult as a collection grows. To maintain a high level of accountability it is necessary to ensure that any derived or processed data can be replicated by independent researchers. Due primarily to practical limitations, some authors publish only subsets of the processed and/or raw data that constitute a chronology, making the process of independent scientific scrutiny difficult and on occasions impossible. There are associated problems when master chronologies are utilised for the purpose of dating specimens. Maintaining the relationships between chronologies and specimens is extremely important for accountability purposes, especially when errors are detected or revisions required. In these circumstances it is highly desirable to locate all specimens that have been dated with the original chronology to ascertain the implications of changes. This paper describes a new relational database design that addresses these problems and which has been implemented in the Corina dendrochronology application and web service. The new Corina system enables the data and analyses for all chronology building and cross-dating processes to be stored and documented, enabling scrutiny at every stage.  相似文献   

4.
5.
SPIRE is a Python program written to modernize the user interaction with SPIDER, the image processing system for electron microscopical reconstruction projects. SPIRE provides a graphical user interface (GUI) to SPIDER for executing batch files of SPIDER commands. It also lets users quickly view the status of a project by showing the last batch files that were run, as well as the data files that were generated. SPIRE handles the flexibility of the SPIDER programming environment through configuration files: XML-tagged documents that describe the batch files, directory trees, and presentation of the GUI for a given type of reconstruction project. It also provides the capability to connect to a laboratory database, for downloading parameters required by batch files at the start of a project, and uploading reconstruction results at the end of a project.  相似文献   

6.
The usefulness of the 3D Portable Document Format (PDF) for clinical, educational, and research purposes has recently been shown. However, the lack of a simple tool for converting biomedical data into the model data in the necessary Universal 3D (U3D) file format is a drawback for the broad acceptance of this new technology. A new module for the image processing and rapid prototyping framework MeVisLab does not only provide a platform-independent possibility to create surface meshes out of biomedical/DICOM and other data and to export them into U3D – it also lets the user add meta data to these meshes to predefine colors and names that can be processed by a PDF authoring software while generating 3D PDF files. Furthermore, the source code of the respective module is available and well documented so that it can easily be modified for own purposes.  相似文献   

7.
SUMMARY: Chimera allows the construction of chimeric protein or nucleic acid sequence files by concatenating sequences from two or more sequence files in PHYLIP formats. It allows the user to interactively select genes and species from the input files. The concatenated result is stored to one single output file in PHYLIP or NEXUS formats. AVAILABILITY: The computer program, including supporting files and example files, is available from http://www.dalicon.com/chimera/.  相似文献   

8.
High‐throughput sequencing methods have become a routine analysis tool in environmental sciences as well as in public and private sector. These methods provide vast amount of data, which need to be analysed in several steps. Although the bioinformatics may be applied using several public tools, many analytical pipelines allow too few options for the optimal analysis for more complicated or customized designs. Here, we introduce PipeCraft, a flexible and handy bioinformatics pipeline with a user‐friendly graphical interface that links several public tools for analysing amplicon sequencing data. Users are able to customize the pipeline by selecting the most suitable tools and options to process raw sequences from Illumina, Pacific Biosciences, Ion Torrent and Roche 454 sequencing platforms. We described the design and options of PipeCraft and evaluated its performance by analysing the data sets from three different sequencing platforms. We demonstrated that PipeCraft is able to process large data sets within 24 hr. The graphical user interface and the automated links between various bioinformatics tools enable easy customization of the workflow. All analytical steps and options are recorded in log files and are easily traceable.  相似文献   

9.
We introduce here MATtrack, an open source MATLAB-based computational platform developed to process multi-Tiff files produced by a photo-conversion time lapse protocol for live cell fluorescent microscopy. MATtrack automatically performs a series of steps required for image processing, including extraction and import of numerical values from Multi-Tiff files, red/green image classification using gating parameters, noise filtering, background extraction, contrast stretching and temporal smoothing. MATtrack also integrates a series of algorithms for quantitative image analysis enabling the construction of mean and standard deviation images, clustering and classification of subcellular regions and injection point approximation. In addition, MATtrack features a simple user interface, which enables monitoring of Fluorescent Signal Intensity in multiple Regions of Interest, over time. The latter encapsulates a region growing method to automatically delineate the contours of Regions of Interest selected by the user, and performs background and regional Average Fluorescence Tracking, and automatic plotting. Finally, MATtrack computes convenient visualization and exploration tools including a migration map, which provides an overview of the protein intracellular trajectories and accumulation areas. In conclusion, MATtrack is an open source MATLAB-based software package tailored to facilitate the analysis and visualization of large data files derived from real-time live cell fluorescent microscopy using photoconvertible proteins. It is flexible, user friendly, compatible with Windows, Mac, and Linux, and a wide range of data acquisition software. MATtrack is freely available for download at eleceng.dit.ie/courtney/MATtrack.zip.  相似文献   

10.
Parallel BLAST on split databases   总被引:1,自引:0,他引:1  
SUMMARY: BLAST programs often run on large SMP machines where multiple threads can work simultaneously and there is enough memory to cache the databases between program runs. A group of programs is described which allows comparable performance to be achieved with a Beowulf configuration in which no node has enough memory to cache a database but the cluster as an aggregate does. To achieve this result, databases are split into equal sized pieces and stored locally on each node. Each query is run on all nodes in parallel and the resultant BLAST output files from all nodes merged to yield the final output. AVAILABILITY: Source code is available from ftp://saf.bio.caltech.edu/  相似文献   

11.
IDR: the ImmunoDeficiency Resource   总被引:3,自引:0,他引:3       下载免费PDF全文
The ImmunoDeficiency Resource (IDR), freely available at http://www.uta.fi/imt/bioinfo/idr/, is a comprehensive knowledge base on immunodeficiencies. It is designed for different user groups such as researchers, physicians and nurses as well as patients and their families and the general public. Information on immunodeficiencies is stored as fact files, which are disease- and gene-based information resources. We have developed an inherited disease markup language (IDML) data model, which is designed for storing disease- and gene-specific data in extensible markup language (XML) format. The fact files written by the IDML can be used to present data in different contexts and platforms. All the information in the IDR is validated by expert curators.  相似文献   

12.
Failing to open computer files that describe image data is not the most frustrating experience that the user of a computer can suffer, but it is high on list of possible aggravations. To ameliorate this, the structure of uncompressed image data files is described here. The various ways in which information that describes a picture can be recorded are related, and a primary distinction between raster or bitmap based and vector or object based image data files is drawn. Bitmap based image data files are the more useful of the two formats for recording complicated images such as digital light micrographs, whereas object based files are better for recording illustrations and cartoons. Computer software for opening a very large variety of different formats of digital image data is recommended, and if these fail, ways are described for opening bitmap based digital image data files whose format is unknown.  相似文献   

13.
Failing to open computer files that describe image data is not the most frustrating experience that the user of a computer can suffer, but it is high on list of possible aggravations. To ameliorate this, the structure of uncompressed image data files is described here. The various ways in which information that describes a picture can be recorded are related, and a primary distinction between raster or bitmap based and vector or object based image data files is drawn. Bitmap based image data files are the more useful of the two formats for recording complicated images such as digital light micrographs, whereas object based files are better for recording illustrations and cartoons. Computer software for opening a very large variety of different formats of digital image data is recommended, and if these fail, ways are described for opening bitmap based digital image data files whose format is unknown.  相似文献   

14.
With the number of satellite sensors and date centers being increased continuously, it is becoming a trend to manage and process massive remote sensing data from multiple distributed sources. However, the combination of multiple satellite data centers for massive remote sensing (RS) data collaborative processing still faces many challenges. In order to reduce the huge amounts of data migration and improve the efficiency of multi-datacenter collaborative process, this paper presents the infrastructures and services of the data management as well as workflow management for massive remote sensing data production. A dynamic data scheduling strategy was employed to reduce the duplication of data request and data processing. And by combining the remote sensing spatial metadata repositories and Gfarm grid file system, the unified management of the raw data, intermediate products and final products were achieved in the co-processing. In addition, multi-level task order repositories and workflow templates were used to construct the production workflow automatically. With the help of specific heuristic scheduling rules, the production tasks were executed quickly. Ultimately, the Multi-datacenter Collaborative Process System (MDCPS) were implemented for large-scale remote sensing data production based on the effective management of data and workflow. As a consequence, the performance of MDCPS in experiments environment showed that those strategies could significantly enhance the efficiency of co-processing across multiple data centers.  相似文献   

15.
Global lipidomics analysis across large sample sizes produces high-content datasets that require dedicated software tools supporting lipid identification and quantification, efficient data management and lipidome visualization. Here we present a novel software-based platform for streamlined data processing, management and visualization of shotgun lipidomics data acquired using high-resolution Orbitrap mass spectrometry. The platform features the ALEX framework designed for automated identification and export of lipid species intensity directly from proprietary mass spectral data files, and an auxiliary workflow using database exploration tools for integration of sample information, computation of lipid abundance and lipidome visualization. A key feature of the platform is the organization of lipidomics data in ”database table format” which provides the user with an unsurpassed flexibility for rapid lipidome navigation using selected features within the dataset. To demonstrate the efficacy of the platform, we present a comparative neurolipidomics study of cerebellum, hippocampus and somatosensory barrel cortex (S1BF) from wild-type and knockout mice devoid of the putative lipid phosphate phosphatase PRG-1 (plasticity related gene-1). The presented framework is generic, extendable to processing and integration of other lipidomic data structures, can be interfaced with post-processing protocols supporting statistical testing and multivariate analysis, and can serve as an avenue for disseminating lipidomics data within the scientific community. The ALEX software is available at www.msLipidomics.info.  相似文献   

16.
针对目前心电监护设备微型化、实时性、高采样率、存储量大等实际需求,采用了一种基于最新的SOPC(System On a Programmable Chip)技术的心电检测系统的设计。将DSP和MCU的功能集成在一块FPGA上,在FPGA内部实现多路心电信号的并行处理,由SD卡记录较长时间的连续心电信号,并实现心电信号的实时分析和心律失常的预警等扩展功能。  相似文献   

17.
18.
Flow data from a cell sorter have been processed by hardwired circuits which include amplification, discrimination, coincidence requirements, peak sensing and holding, A-D conversion, and a computerized pulse height analysis with storage of the spectra obtained. Two dimensional spectra can be stored directly in memory, on tape and disk. Three and four parametric cellular events can be recorded on line during the flow measurement in a sequential mode on tape for subsequent recall. Simple processing of these data can be performed for displaying of two dimensional projections from these multidimensional spaces based on threshold conditions for the remaining parameters. Interfaced transmission of the stored data to a large scale computer enables more sophisticated data analysis. Data reduction by means of a multidimensional probability analysis has been carried out in order to transfer the spectra to a computerized picture system for display. This system creates perspective two-dimensional images from a three-dimensional data space. Frequency can be converted into grey levels. Hard copy in color (color as the third dimension and color intensity as frequency) simplifies the visualization of multiparametric flow data sets.  相似文献   

19.
Proteomics research infrastructures and core facilities within the Core for Life alliance advocate for community policies for quality control to ensure high standards in proteomics services.

Core facilities and research infrastructures have become an essential part of the scientific ecosystem. In the field of proteomics, national and international networks and research platforms have been established during the past decade that are supposed to set standards for high‐quality services, promote an exchange of professional information, and enable access to cutting‐edge, specialized proteomics technologies. Either centralized or distributed, these national and international proteomics infrastructures and technology platforms are generating massive amounts of data for the research community, and support a broad range of translational, computational and multi‐omics initiatives and basic research projects.By delegating part of their work to these services, researchers expect that the core facility adjusts their analytical protocols appropriately for their project to acquire data conforming best research practice of the scientific community. The implementation of quality assessment measures and commonly accepted quality controls in data generation is therefore crucially important for proteomics research infrastructures and the scientists who rely on them.However, current quality control and quality assessment procedures in proteomics core facilities and research infrastructures are a motley collection of protocols, standards, reference compounds and software tools. Proteomics relies on a customized multi‐step workflow typically consisting of sample preparation, data acquisition and data processing, and the implementation of each step differs among facilities. For example, sample preparation involves enzymatic digestion of the proteins, which can be performed in‐solution, in‐gel, or on‐beads, with often different proteolytic enzymes, chemicals, and conditions among laboratories. Data acquisition protocols are often customized to the particular instrument set up, and the acquired spectra and chromatograms are processed by different software tools provided by equipment vendors, third parties or developed in‐house.
…current quality control and quality assessment procedures in proteomics core facilities and research infrastructures are a motley collection of protocols, standards, reference compounds and software tools.
Moreover, core facilities implement their own guidelines to monitor the performance and quality of the entire workflow, typically utilizing different commercially available standards such as pre‐digested cell lysates, recombinant proteins, protein mixtures, or isotopically labeled peptides. Currently, there is no clear consensus on if, when and how to perform quality control checks. There is even less quality control in walk‐in facilities, where the staff is only responsible for correct usage of the instruments and users select and execute the analytical workflow themselves. It is not surprising therefore that instrument stability and robustness of the applied analytical approach are often unclear, which compromises analytical rigor.  相似文献   

20.
A strain of Listeria monocytogenes isolated from a drain in a food-processing plant was demonstrated, by determination of D values, to be more resistant to the lethal effect of heat at 56 or 59 degrees C following incubation for 45 min in tryptose phosphate broth (TPB) at pH 12.0 than to that of incubation for the same time in TPB at pH 7.3. Cells survived for at least 6 days when they were suspended in TPB at pHs 9.0, 10.0, and 11.0 and stored at 4 or 21 degrees C. Cells of L. monocytogenes incubated at 37 degrees C for 45 min and then stored for 48 or 144 h in TPB at pH 10.0 were more resistant to heat treatment at 56 degrees C than were cells stored in TPB at pH 7.3. The alkaline-stress response in L. monocytogenes may induce resistance to otherwise lethal thermal-processing conditions. Treatment of cells in 0.05 M potassium phosphate buffer (pH 7.00 +/- 0.05) containing 2.0 or 2.4 mg of free chlorine per liter reduced populations by as much as 1.3 log(10) CFU/ml, while treatment with 6.0 mg of free chlorine per liter reduced populations by as much as 4.02 log(10) CFU/ml. Remaining subpopulations of chlorine-treated cells exhibited some injury, and cells treated with chlorine for 10 min were more sensitive to heating at 56 degrees C than cells treated for 5 min. Contamination of foods by L. monocytogenes cells that have survived exposure to processing environments ineffectively cleaned or sanitized with alkaline detergents or disinfectants may have more severe implications than previously recognized. Alkaline-pH-induced cross-protection of L. monocytogenes against heat has the potential to enhance survival in minimally processed as well as in heat-and-serve foods and in foods on holding tables, in food service facilities, and in the home. Cells surviving exposure to chlorine, in contrast, are more sensitive to heat; thus, the effectiveness of thermal processing in achieving desired log(10)-unit reductions is not compromised in these cells.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号