首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
Public health surveillance is undergoing a revolution driven by advances in the field of information technology. Many countries have experienced vast improvements in the collection, ingestion, analysis, visualization, and dissemination of public health data. Resource-limited countries have lagged behind due to challenges in information technology infrastructure, public health resources, and the costs of proprietary software. The Suite for Automated Global Electronic bioSurveillance (SAGES) is a collection of modular, flexible, freely-available software tools for electronic disease surveillance in resource-limited settings. One or more SAGES tools may be used in concert with existing surveillance applications or the SAGES tools may be used en masse for an end-to-end biosurveillance capability. This flexibility allows for the development of an inexpensive, customized, and sustainable disease surveillance system. The ability to rapidly assess anomalous disease activity may lead to more efficient use of limited resources and better compliance with World Health Organization International Health Regulations.  相似文献   

2.
Tsakanikas P  Manolakos ES 《Proteomics》2011,11(10):2038-2050
Two-dimensional gel electrophoresis (2-DE) is the most established protein separation method used in expression proteomics. Despite the existence of sophisticated software tools, 2-DE gel image analysis still remains a serious bottleneck. The low accuracies of commercial software packages and the extensive manual calibration that they often require for acceptable results show that we are far from achieving the goal of a fully automated and reliable, high-throughput gel processing system. We present a novel spot detection and quantification methodology which draws heavily from unsupervised machine-learning methods. Using the proposed hierarchical machine learning-based segmentation methodology reduces both the number of faint spots missed (improves sensitivity) and the number of extraneous spots introduced (improves precision). The detection and quantification performance has been thoroughly evaluated and is shown to compare favorably (higher F-measure) to a commercially available software package (PDQuest). The whole image analysis pipeline that we have developed is fully automated and can be used for high-throughput proteomics analysis since it does not require any manual intervention for recalibration every time a new 2-DE gel image is to be analyzed. Furthermore, it can be easily parallelized for high performance and also applied without any modification to prealigned group average gels.  相似文献   

3.
MOTIVATION: Effective use of proteomics data, specifically mass spectrometry data, relies on the ability to read and write the many mass spectrometer file formats. Even with mass spectrometer vendor-specific libraries and vendor-neutral file formats, such as mzXML and mzData it can be difficult to extract raw data files in a form suitable for batch processing and basic research. Introduced here are the ProteomeCommons.org Input and Output Framework, abbreviated to IO Framework, which is designed to abstractly represent mass spectrometry data. This project is a public, open-source, free-to-use framework that supports most of the mass spectrometry data formats, including current formats, legacy formats and proprietary formats that require a vendor-specific library in order to operate. The IO Framework includes an on-line tool for non-programmers and a set of libraries that developers may use to convert between various proteomics file formats. AVAILABILITY: The current source-code and documentation for the ProteomeCommons.org IO Framework is freely available at http://www.proteomecommons.org/current/531/  相似文献   

4.
Dowsey AW  Dunn MJ  Yang GZ 《Proteomics》2004,4(12):3800-3812
The quest for high-throughput proteomics has revealed a number of critical issues. Whilst improved two-dimensional gel electrophoresis (2-DE) sample preparation, staining and imaging issues are being actively pursued by industry, reliable high-throughput spot matching and quantification remains a significant bottleneck in the bioinformatics pipeline, thus restricting the flow of data to mass spectrometry through robotic spot excision and protein digestion. To this end, it is important to establish a full multi-site Grid infrastructure for the processing, archival, standardisation and retrieval of proteomic data and metadata. Particular emphasis needs to be placed on large-scale image mining and statistical cross-validation for reliable, fully automated differential expression analysis, and the development of a statistical 2-DE object model and ontology that underpins the emerging HUPO PSI GPS (Human Proteome Organization Proteomics Standards Initiative General Proteomics Standards). The first step towards this goal is to overcome the computational and communications burden entailed by the image analysis of 2-DE gels with Grid enabled cluster computing. This paper presents the proTurbo framework as part of the ProteomeGRID, which utilises Condor cluster management combined with CORBA communications and JPEG-LS lossless image compression for task farming. A novel probabilistic eager scheduler has been developed to minimise make-span, where tasks are duplicated in response to the likelihood of the Condor machines' owners evicting them. A 60 gel experiment was pair-wise image registered (3540 tasks) on a 40 machine Linux cluster. Real-world performance and network overhead was gauged, and Poisson distributed worker evictions were simulated. Our results show a 4:1 lossless and 9:1 near lossless image compression ratio and so network overhead did not affect other users. With 40 workers a 32x speed-up was seen (80% resource efficiency), and the eager scheduler reduced the impact of evictions by 58%.  相似文献   

5.
In recent decades, community groups have transformed habitat restoration, pest control and species translocations in New Zealand. Large areas of wild New Zealand benefit hugely from ongoing management by community‐based restoration groups. Areas near cities and towns have especially good access to pools of keen volunteers. Community groups are involved in monitoring progress with their work, as well as monitoring biodiversity changes in general at their project sites. New tools powered by modern technologies are creating the opportunity for New Zealand's community volunteers to play a transformative role in biodiversity monitoring for either purpose. These tools are reducing the resources and expertise required for species detection and identification. Smartphones with cameras, GPS, audio recorders and data apps make it easier than ever to record species observations. Crowd‐sourced identification of species in photographs and sounds loaded onto NatureWatch NZ allow volunteers to make observations of a much wider range of taxa than just common birds and trees. Realising this potential requires community groups, scientists and their institutions to collaborate in building and maintaining simple, accessible monitoring systems that (i) require and promote standard monitoring methods, (ii) provide efficient data entry in standard formats, (iii) generate automated results of use to community groups and (iv) facilitate public sharing of data to contribute to regional, national and global biodiversity monitoring. Some New Zealand monitoring systems developed recently to assist community‐based restoration groups with monitoring mammalian predator control are good examples of this approach. Making this happen at a large scale across many community groups and taxa requires increased and coordinated long‐term institutional support for monitoring systems and training.  相似文献   

6.
Genotoxicity testing is an important component of toxicity assessment. As illustrated by the European registration, evaluation, authorization, and restriction of chemicals (REACH) directive, it concerns all the chemicals used in industry. The commonly used in vivo mammalian tests appear to be ill adapted to tackle the large compound sets involved, due to throughput, cost, and ethical issues. The somatic mutation and recombination test (SMART) represents a more scalable alternative, since it uses Drosophila, which develops faster and requires less infrastructure. Despite these advantages, the manual scoring of the hairs on Drosophila wings required for the SMART limits its usage. To overcome this limitation, we have developed an automated SMART readout. It consists of automated imaging, followed by an image analysis pipeline that measures individual wing genotoxicity scores. Finally, we have developed a wing score-based dose-dependency approach that can provide genotoxicity profiles. We have validated our method using 6 compounds, obtaining profiles almost identical to those obtained from manual measures, even for low-genotoxicity compounds such as urethane. The automated SMART, with its faster and more reliable readout, fulfills the need for a high-throughput in vivo test. The flexible imaging strategy we describe and the analysis tools we provide should facilitate the optimization and dissemination of our methods.  相似文献   

7.
Bioinformatics tools to aid gene and protein sequence analysis have become an integral part of biology in the post-genomic era. Release of the Plasmodium falciparum genome sequence has allowed biologists to define the gene and the predicted protein content as well as their sequences in the parasite. Using pI and molecular weight as characteristics unique to each protein, we have developed a bioinformatics tool to aid identification of proteins from Plasmodium falciparum. The tool makes use of a Virtual 2-DE generated by plotting all of the proteins from the Plasmodium database on a pI versus molecular weight scale. Proteins are identified by comparing the position of migration of desired protein spots from an experimental 2-DE and that on a virtual 2-DE. The procedure has been automated in the form of user-friendly software called "Plasmo2D". The tool can be downloaded from http://144.16.89.25/Plasmo2D.zip.  相似文献   

8.
During a meeting of the SYSGENET working group 'Bioinformatics', currently available software tools and databases for systems genetics in mice were reviewed and the needs for future developments discussed. The group evaluated interoperability and performed initial feasibility studies. To aid future compatibility of software and exchange of already developed software modules, a strong recommendation was made by the group to integrate HAPPY and R/qtl analysis toolboxes, GeneNetwork and XGAP database platforms, and TIQS and xQTL processing platforms. R should be used as the principal computer language for QTL data analysis in all platforms and a 'cloud' should be used for software dissemination to the community. Furthermore, the working group recommended that all data models and software source code should be made visible in public repositories to allow a coordinated effort on the use of common data structures and file formats.  相似文献   

9.
With continued efforts towards a single MSI data format, data conversion routines must be made universally available. The benefits of a common imaging format, imzML, are slowly becoming more widely appreciated but the format remains to be used by only a small proportion of imaging groups. Increased awareness amongst researchers and continued support from major MS vendors in providing tools for converting proprietary formats into imzML are likely to result in a rapidly increasing uptake of the format. It is important that this does not lead to the exclusion of researchers using older or unsupported instruments. We describe an open source converter, imzMLConverter, to ensure against this. We propose that proprietary formats should first be converted to mzML using one of the widely available converters, such as msconvert and then use imzMLConverter to convert mzML to imzML. This will allow a wider audience to benefit from the imzML format immediately.  相似文献   

10.
BACKGROUND: The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. METHODOLOGY: The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.  相似文献   

11.
Multivariate data analysis has been combined with proteomics to enhance the recovery of information from 2-DE of cod muscle proteins during different storage conditions. Proteins were extracted according to 11 different storage conditions and samples were resolved by 2-DE. Data generated by 2-DE was subjected to principal component analysis (PCA) and discriminant partial least squares regression (DPLSR). Applying PCA to 2-DE data revealed the samples to form groups according to frozen storage time, whereas differences due to different storage temperatures or chilled storage in modified atmosphere packing did not lead to distinct changes in protein pattern. Applying DPLSR to the 2-DE data enabled the selection of protein spots critical for differentiation between 3 and 6 months frozen storage with 12 months frozen storage. Some of these protein spots have been identified by MS/MS, revealing myosin light chain 1, 2 and 3, triose-phosphate isomerase, glyceraldehyde-3-phosphate dehydrogenase, aldolase A and two alpha-actin fragments, and a nuclease diphosphate kinase B fragment to change in concentration, during frozen storage. Application of proteomics, multivariate data analysis and MS/MS to analyse protein changes in cod muscle proteins during storage has revealed new knowledge on the issue and enables a better understanding of biochemical processes occurring.  相似文献   

12.
13.
P Y Muller  E Studer  A R Miserez 《BioTechniques》2001,31(6):1306, 1308, 1310-1306, 1308, 1313
In all fields of molecular biology, researchers are increasingly challenged by experiments planned and evaluated on the basis of nucleic acid and protein sequence data generally retrieved from public databases. Despite the wide spectrum of available Web-based software tools for sequence analysis, the routine use of these tools has disadvantages, particularly because of the elaborate and heterogeneous ways of data input, output, and storage. Here we present a Visual Basic-encoded Microsoft Word Add-In, the Molecular BioComputing Suite (MBCS), available at the BioTechniques Software Library (www.BioTechniques.com). The MBCS software aims to manage and expedite a wide range of sequence analyses and manipulations using an integrated text editor environment including menu-guided commands. Its independence of sequence formats enables MBCS to be used as a pivotal application between other software tools for sequence analysis, manipulation, annotation, and editing.  相似文献   

14.
Despite the success of several international initiatives the glycosciences still lack a managed infrastructure that contributes to the advancement of research through the provision of comprehensive structural and experimental glycan data collections. UniCarbKB is an initiative that aims to promote the creation of an online information storage and search platform for glycomics and glycobiology research. The knowledgebase will offer a freely accessible and information-rich resource supported by querying interfaces, annotation technologies and the adoption of common standards to integrate structural, experimental and functional data. The UniCarbKB framework endeavors to support the growth of glycobioinformatics and the dissemination of knowledge through the provision of an open and unified portal to encourage the sharing of data. In order to achieve this, the framework is committed to the development of tools and procedures that support data annotation, and expanding interoperability through cross-referencing of existing databases. Database URL: http://www.unicarbkb.org.  相似文献   

15.
Over recent years, a number of initiatives have proposed standard reporting guidelines for functional genomics experiments. Associated with these are data models that may be used as the basis of the design of software tools that store and transmit experiment data in standard formats. Central to the success of such data handling tools is their usability. Successful data handling tools are expected to yield benefits in time saving and in quality assurance. Here, we describe the collection of datasets that conform to the recently proposed data model for plant metabolomics known as ArMet (architecture for metabolomics) and illustrate a number of approaches to robust data collection that have been developed in collaboration between software engineers and biologists. These examples also serve to validate ArMet from the data collection perspective by demonstrating that a range of software tools, supporting data recording and data upload to central databases, can be built using the data model as the basis of their design.  相似文献   

16.
The Human Proteome Organisation's Proteomics Standards Initiative has developed the GelML (gel electrophoresis markup language) data exchange format for representing gel electrophoresis experiments performed in proteomics investigations. The format closely follows the reporting guidelines for gel electrophoresis, which are part of the Minimum Information About a Proteomics Experiment (MIAPE) set of modules. GelML supports the capture of metadata (such as experimental protocols) and data (such as gel images) resulting from gel electrophoresis so that laboratories can be compliant with the MIAPE Gel Electrophoresis guidelines, while allowing such data sets to be exchanged or downloaded from public repositories. The format is sufficiently flexible to capture data from a broad range of experimental processes, and complements other PSI formats for MS data and the results of protein and peptide identifications to capture entire gel‐based proteome workflows. GelML has resulted from the open standardisation process of PSI consisting of both public consultation and anonymous review of the specifications.  相似文献   

17.
With the human Plasma Proteome Project (PPP) pilot phase completed, the largest and most ambitious proteomics experiment to date has reached its first milestone. The correspondingly impressive amount of data that came from this pilot project emphasized the need for a centralized dissemination mechanism and led to the development of a detailed, PPP specific data gathering infrastructure at the University of Michigan, Ann Arbor as well as the protein identifications database project at the European Bioinformatics Institute as a general proteomics data repository. One issue that crept up while discussing which data to store for the PPP concerns whether the raw, binary data coming from the mass spectrometers should be stored, or rather the more compact and already significantly processed peak lists. As this debate is not restricted to the PPP but relates to the proteomics community in general, we will attempt to detail the relative merits and caveats associated with centralized storage and dissemination of raw data and/or peak lists, building on the extensive experience gained during the PPP pilot phase. Finally, some suggestions are made for both immediate and future storage of MS data in public repositories.  相似文献   

18.

Background  

New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioinformatic tools are available for the analysis and management of genome sequences, limitations still remain. For example, restrictions on the submission of data and use of these tools may be imposed, thereby making them unsuitable for sequencing projects that need to remain in-house or proprietary during their initial stages. Furthermore, the availability and use of next generation sequencing in industrial, governmental and academic environments requires biologist to have access to computational support for the curation and analysis of the data generated; however, this type of support is not always immediately available.  相似文献   

19.
A broad range of mass spectrometers are used in mass spectrometry (MS)-based proteomics research. Each type of instrument possesses a unique design, data system and performance specifications, resulting in strengths and weaknesses for different types of experiments. Unfortunately, the native binary data formats produced by each type of mass spectrometer also differ and are usually proprietary. The diverse, nontransparent nature of the data structure complicates the integration of new instruments into preexisting infrastructure, impedes the analysis, exchange, comparison and publication of results from different experiments and laboratories, and prevents the bioinformatics community from accessing data sets required for software development. Here, we introduce the 'mzXML' format, an open, generic XML (extensible markup language) representation of MS data. We have also developed an accompanying suite of supporting programs. We expect that this format will facilitate data management, interpretation and dissemination in proteomics research.  相似文献   

20.
SUMMARY: The large amount of data produced by proteomics experiments requires effective bioinformatics tools for the integration of data management and data analysis. Here we introduce a suite of tools developed at Vanderbilt University to support production proteomics. We present the Backup Utility Service tool for automated instrument file backup and the ScanSifter tool for data conversion. We also describe a queuing system to coordinate identification pipelines and the File Collector tool for batch copying analytical results. These tools are individually useful but collectively reinforce each other. They are particularly valuable for proteomics core facilities or research institutions that need to manage multiple mass spectrometers. With minor changes, they could support other types of biomolecular resource facilities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号