首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Bioinformatics support for high-throughput proteomics   总被引:2,自引:0,他引:2  
In the "post-genome" era, mass spectrometry (MS) has become an important method for the analysis of proteome data. The rapid advancement of this technique in combination with other methods used in proteomics results in an increasing number of high-throughput projects. This leads to an increasing amount of data that needs to be archived and analyzed.To cope with the need for automated data conversion, storage, and analysis in the field of proteomics, the open source system ProDB was developed. The system handles data conversion from different mass spectrometer software, automates data analysis, and allows the annotation of MS spectra (e.g. assign gene names, store data on protein modifications). The system is based on an extensible relational database to store the mass spectra together with the experimental setup. It also provides a graphical user interface (GUI) for managing the experimental steps which led to the MS data. Furthermore, it allows the integration of genome and proteome data. Data from an ongoing experiment was used to compare manual and automated analysis. First tests showed that the automation resulted in a significant saving of time. Furthermore, the quality and interpretability of the results was improved in all cases.  相似文献   

2.
3.
The effective extraction of information from multidimensional data sets derived from phenotyping experiments is a growing challenge in biology. Data visualization tools are important resources that can aid in exploratory data analysis of complex data sets. Phenotyping experiments of model organisms produce data sets in which a large number of phenotypic measures are collected for each individual in a group. A critical initial step in the analysis of such multidimensional data sets is the exploratory analysis of data distribution and correlation. To facilitate the rapid visualization and exploratory analysis of multidimensional complex trait data, we have developed a user-friendly, web-based software tool called Phenostat. Phenostat is composed of a dynamic graphical environment that allows the user to inspect the distribution of multiple variables in a data set simultaneously. Individuals can be selected by directly clicking on the graphs and thus displaying their identity, highlighting corresponding values in all graphs, allowing their inclusion or exclusion from the analysis. Statistical analysis is provided by R package functions. Phenostat is particularly suited for rapid distribution and correlation analysis of subsets of data. An analysis of behavioral and physiologic data stemming from a large mouse phenotyping experiment using Phenostat reveals previously unsuspected correlations. Phenostat is freely available to academic institutions and nonprofit organizations and can be used from our website at .  相似文献   

4.
Protein microarray technology is rapidly growing and has the potential to accelerate the discovery of targets of serum antibody responses in cancer, autoimmunity and infectious disease. Analytical tools for interpreting this high-throughput array data, however, are not well-established. We developed a concentration-dependent analysis (CDA) method which normalizes protein microarray data based on the concentration of spotted probes. We show that this analysis samples a data space that is complementary to other commonly employed analyses, and demonstrate experimental validation of 92% of hits identified by the intersection of CDA with other tools. These data support the use of CDA either as a preprocessing step for a more complete proteomic microarray data analysis or as a stand-alone analysis method.  相似文献   

5.
Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web services API for MG-RAST. This API complements the existing MG-RAST web interface and constitutes the basis of KBase''s microbial community capabilities. In addition, the API exposes a comprehensive collection of data to programmers. This API, which uses a RESTful (Representational State Transfer) implementation, is compatible with most programming environments and should be easy to use for end users and third parties. It provides comprehensive access to sequence data, quality control results, annotations, and many other data types. Where feasible, we have used standards to expose data and metadata. Code examples are provided in a number of languages both to show the versatility of the API and to provide a starting point for users. We present an API that exposes the data in MG-RAST for consumption by our users, greatly enhancing the utility of the MG-RAST service.  相似文献   

6.
Beharav A  Nevo E 《Genetica》2003,119(3):259-267
We examined the predictive validity of the results using discriminant analysis to distinguish statistically among two or more populations with a large sample of random amplified polymorphic DNA (RAPD) loci, but a small sample of genotypes from each population. We compared and contrasted results from randomized data with results from real data of three studies by 100 randomized shuffling of genotypes into various populations. We generally obtained substantial differences between results from randomized data compared to those from the real data in several characteristics of discriminant analysis. We showed that a high level of correctly classified percentage is also obtainable in the analysis of randomized data, mainly with a low number of populations. However, the correctly classified percentage obtained from the real data was generally significantly higher than the percentage obtained from the randomized data. We suggested that the high level of real differences in allele frequencies of the RAPD polymorphic loci clearly distinguished the various populations and that the populations differ significantly in their RAPD contents in accordance with ecological heterogeneity. We obtained either no or a low level of difference between the correct classification rate obtained by the leaving-one-out procedure and that obtained from the original data, attributed to a low number of loci selected by the stepwise method. The results strengthen and support our conclusion and lead us to focus on the discriminant analysis by selecting only low numbers of discriminating variables.  相似文献   

7.
Biological measurements frequently involve measuring parameters as a function of time, space, or frequency. Later, during the analysis phase of the study, the researcher splits the recorded data trace into smaller sections, analyzes each section separately by finding a mean or fitting against a specified function, and uses the analysis results in the study. Here, we present the software that allows to analyze these data traces in a manner that ensures repeatability of the analysis and simplifies the application of FAIR (findability, accessibility, interoperability, and reusability) principles in such studies. At the same time, it simplifies the routine data analysis pipeline and gives access to a fast overview of the analysis results. For that, the software supports reading the raw data, processing the data as specified in the protocol, and storing all intermediate results in the laboratory database. The software can be extended by study- or hardware-specific modules to provide the required data import and analysis facilities. To simplify the development of the data entry web interfaces, that can be used to enter data describing the experiments, we released a web framework with an example implementation of such a site. The software is covered by open-source license and is available through several online channels.  相似文献   

8.
Next‐generation sequencing (NGS) technology is revolutionizing the fields of population genetics, molecular ecology and conservation biology. But it can be challenging for researchers to learn the new and rapidly evolving techniques required to use NGS data. A recent workshop entitled ‘Population Genomic Data Analysis’ was held to provide training in conceptual and practical aspects of data production and analysis for population genomics, with an emphasis on NGS data analysis. This workshop brought together 16 instructors who were experts in the field of population genomics and 31 student participants. Instructors provided helpful and often entertaining advice regarding how to choose and use a NGS method for a given research question, and regarding critical aspects of NGS data production and analysis such as library preparation, filtering to remove sequencing errors and outlier loci, and genotype calling. In addition, instructors provided general advice about how to approach population genomics data analysis and how to build a career in science. The overarching messages of the workshop were that NGS data analysis should be approached with a keen understanding of the theoretical models underlying the analyses, and with analyses tailored to each research question and project. When analysed carefully, NGS data provide extremely powerful tools for answering crucial questions in disciplines ranging from evolution and ecology to conservation and agriculture, including questions that could not be answered prior to the development of NGS technology.  相似文献   

9.
High-throughout genomic data provide an opportunity for identifying pathways and genes that are related to various clinical phenotypes. Besides these genomic data, another valuable source of data is the biological knowledge about genes and pathways that might be related to the phenotypes of many complex diseases. Databases of such knowledge are often called the metadata. In microarray data analysis, such metadata are currently explored in post hoc ways by gene set enrichment analysis but have hardly been utilized in the modeling step. We propose to develop and evaluate a pathway-based gradient descent boosting procedure for nonparametric pathways-based regression (NPR) analysis to efficiently integrate genomic data and metadata. Such NPR models consider multiple pathways simultaneously and allow complex interactions among genes within the pathways and can be applied to identify pathways and genes that are related to variations of the phenotypes. These methods also provide an alternative to mediating the problem of a large number of potential interactions by limiting analysis to biologically plausible interactions between genes in related pathways. Our simulation studies indicate that the proposed boosting procedure can indeed identify relevant pathways. Application to a gene expression data set on breast cancer distant metastasis identified that Wnt, apoptosis, and cell cycle-regulated pathways are more likely related to the risk of distant metastasis among lymph-node-negative breast cancer patients. Results from analysis of other two breast cancer gene expression data sets indicate that the pathways of Metalloendopeptidases (MMPs) and MMP inhibitors, as well as cell proliferation, cell growth, and maintenance are important to breast cancer relapse and survival. We also observed that by incorporating the pathway information, we achieved better prediction for cancer recurrence.  相似文献   

10.
Inefficient coding and manipulation of pedigree data have often hindered the progress of genetic studies. In this paper we present the methodology for interfacing a data base management system (DBMS) called MEGADATS with a linkage analysis program called LIPED. Two families that segregate a dominant trait and one test marker were used in a simulated exercise to demonstrate how a DBMS can be used to automate tedious clerical steps and improve the efficiency of a genetic analysis. The merits of this approach to data management are discussed. We conclude that a standardized format for genetic analysis programs would greatly facilitate data analysis.  相似文献   

11.
ESTAP--an automated system for the analysis of EST data   总被引:2,自引:0,他引:2  
The EST Analysis Pipeline (ESTAP) is a set of analytical procedures that automatically verify, cleanse, store and analyze ESTs generated on high-throughput platforms. It uses a relational database to store sequence data and analysis results, which facilitates both the search for specific information and statistical analysis. ESTAP provides for easy viewing of the original and cleansed data, as well as the analysis results via a Web browser. It also allows the data owner to submit selected sequences to dbEST in a semi-automated fashion.  相似文献   

12.
Flow cytometry (FCM) is an analytical tool widely used for cancer and HIV/AIDS research, and treatment, stem cell manipulation and detecting microorganisms in environmental samples. Current data standards do not capture the full scope of FCM experiments and there is a demand for software tools that can assist in the exploration and analysis of large FCM datasets. We are implementing a standardized approach to capturing, analyzing, and disseminating FCM data that will facilitate both more complex analyses and analysis of datasets that could not previously be efficiently studied. Initial work has focused on developing a community-based guideline for recording and reporting the details of FCM experiments. Open source software tools that implement this standard are being created, with an emphasis on facilitating reproducible and extensible data analyses. As well, tools for electronic collaboration will assist the integrated access and comprehension of experiments to empower users to collaborate on FCM analyses. This coordinated, joint development of bioinformatics standards and software tools for FCM data analysis has the potential to greatly facilitate both basic and clinical research--impacting a notably diverse range of medical and environmental research areas.  相似文献   

13.
H M Davey  A Jones  A D Shaw  D B Kell 《Cytometry》1999,35(2):162-168
BACKGROUND: When exploited fully, flow cytometry can be used to provide multiparametric data for each cell in the sample of interest. While this makes flow cytometry a powerful technique for discriminating between different cell types, the data can be difficult to interpret. Traditionally, dual-parameter plots are used to visualize flow cytometric data, and for a data set consisting of seven parameters, one should examine 21 of these plots. A more efficient method is to reduce the dimensionality of the data (e.g., using unsupervised methods such as principal components analysis) so that fewer graphs need to be examined, or to use supervised multivariate data analysis methods to give a prediction of the identity of the analyzed particles. MATERIALS AND METHODS: We collected multiparametric data sets for microbiological samples stained with six cocktails of fluorescent stains. Multivariate data analysis methods were explored as a means of microbial detection and identification. RESULTS: We show that while all cocktails and all methods gave good accuracy of predictions (>94%), careful selection of both the stains and the analysis method could improve this figure (to > 99% accuracy), even in a data set that was not used in the formation of the supervised multivariate calibration model. CONCLUSIONS: Flow cytometry provides a rapid method of obtaining multiparametric data for distinguishing between microorganisms. Multivariate data analysis methods have an important role to play in extracting the information from the data obtained. Artificial neural networks proved to be the most suitable method of data analysis.  相似文献   

14.
It is well known that significant metabolic change take place as cells are transformed from normal to malignant. This review focuses on the use of different bioinformatics tools in cancer metabolomics studies. The article begins by describing different metabolomics technologies and data generation techniques. Overview of the data pre-processing techniques is provided and multivariate data analysis techniques are discussed and illustrated with case studies, including principal component analysis, clustering techniques, self-organizing maps, partial least squares, and discriminant function analysis. Also included is a discussion of available software packages.  相似文献   

15.
As pharmacological data sets become increasingly large and complex, new visual analysis and filtering programs are needed to aid their appreciation. One of the most commonly used methods for visualizing biological data is the Venn diagram. Currently used Venn analysis software often presents multiple problems to biological scientists, in that only a limited number of simultaneous data sets can be analyzed. An improved appreciation of the connectivity between multiple, highly-complex datasets is crucial for the next generation of data analysis of genomic and proteomic data streams. We describe the development of VENNTURE, a program that facilitates visualization of up to six datasets in a user-friendly manner. This program includes versatile output features, where grouped data points can be easily exported into a spreadsheet. To demonstrate its unique experimental utility we applied VENNTURE to a highly complex parallel paradigm, i.e. comparison of multiple G protein-coupled receptor drug dose phosphoproteomic data, in multiple cellular physiological contexts. VENNTURE was able to reliably and simply dissect six complex data sets into easily identifiable groups for straightforward analysis and data output. Applied to complex pharmacological datasets, VENNTURE's improved features and ease of analysis are much improved over currently available Venn diagram programs. VENNTURE enabled the delineation of highly complex patterns of dose-dependent G protein-coupled receptor activity and its dependence on physiological cellular contexts. This study highlights the potential for such a program in fields such as pharmacology, genomics, and bioinformatics.  相似文献   

16.
Despite impressive advances in the application of computer image analysis to cytology, many of the identification tasks that cytologists are called on to perform remain refractory to automated image analysis. The major reason is that a large fraction of these images, though simple for a human to deal with, are too complex to yield to current image analysis methodologies. It may be years before automated computer image analysis is reduced to clinical practicality. Even then, it is not clear that all cytologic image analyses will prove amenable to automation. In the meantime, semiautomatic image analysis (computer-aided microscopy) can provide a viable alternative, especially to persistently difficult image analysis problems. In semiautomatic image analysis, the onerous tasks of data acquisition--e.g., stage movement, data entry and storage--are left to the computer, while the decision-making tasks-e.g., identifying a cell's morphologic class--are left to the observer. Such a system proves to be easy and flexible to use as well as economical to build. It can also provide a reliable data base for the later evaluation of fully automated systems as they are developed. One such semiautomatic system, the Image Combining Computer Microscope (ICCM), is described, and the range of its application is illustrated. Some of the examples of ICCM applications discussed are: neuronal cell plots, three-dimensional dendrite tracking, serial section reconstruction of axons and mapping of plaques and tangles in Alzheimer's disease. They illustrate how powerful a semiautomated system can be in handling complex image analysis problems. It is suggested that semiautomated image analysis provides a viable long-range alternative to many cytologic image analysis problems.  相似文献   

17.

Background

Over recent years there has been a strong movement towards the improvement of vital statistics and other types of health data that inform evidence-based policies. Collecting such data is not cost free. To date there is no systematic framework to guide investment decisions on methods of data collection for vital statistics or health information in general. We developed a framework to systematically assess the comparative costs and outcomes/benefits of the various data methods for collecting vital statistics.

Methodology

The proposed framework is four-pronged and utilises two major economic approaches to systematically assess the available data collection methods: cost-effectiveness analysis and efficiency analysis. We built a stylised example of a hypothetical low-income country to perform a simulation exercise in order to illustrate an application of the framework.

Findings

Using simulated data, the results from the stylised example show that the rankings of the data collection methods are not affected by the use of either cost-effectiveness or efficiency analysis. However, the rankings are affected by how quantities are measured.

Conclusion

There have been several calls for global improvements in collecting useable data, including vital statistics, from health information systems to inform public health policies. Ours is the first study that proposes a systematic framework to assist countries undertake an economic evaluation of DCMs. Despite numerous challenges, we demonstrate that a systematic assessment of outputs and costs of DCMs is not only necessary, but also feasible. The proposed framework is general enough to be easily extended to other areas of health information.  相似文献   

18.
《Biophysical journal》2022,121(15):2830-2839
Optical tweezers are a single-molecule technique that allows probing of intra- and intermolecular interactions that govern complex biological processes involving molecular motors, protein-nucleic acid interactions, and protein/RNA folding. Recent developments in instrumentation eased and accelerated optical tweezers data acquisition, but analysis of the data remains challenging. Here, to enable high-throughput data analysis, we developed an automated python-based analysis pipeline called POTATO (practical optical tweezers analysis tool). POTATO automatically processes the high-frequency raw data generated by force-ramp experiments and identifies (un)folding events using predefined parameters. After segmentation of the force-distance trajectories at the identified (un)folding events, sections of the curve can be fitted independently to a worm-like chain and freely jointed chain models, and the work applied on the molecule can be calculated by numerical integration. Furthermore, the tool allows plotting of constant force data and fitting of the Gaussian distance distribution over time. All these features are wrapped in a user-friendly graphical interface, which allows researchers without programming knowledge to perform sophisticated data analysis.  相似文献   

19.

Introduction

Systematic reviewer authors intending to include all randomized participants in their meta-analyses need to make assumptions about the outcomes of participants with missing data.

Objective

The objective of this paper is to provide systematic reviewer authors with a relatively simple guidance for addressing dichotomous data for participants excluded from analyses of randomized trials.

Methods

This guide is based on a review of the Cochrane handbook and published methodological research. The guide deals with participants excluded from the analysis who were considered ‘non-adherent to the protocol’ but for whom data are available, and participants with missing data.

Results

Systematic reviewer authors should include data from ‘non-adherent’ participants excluded from the primary study authors'' analysis but for whom data are available. For missing, unavailable participant data, authors may conduct a complete case analysis (excluding those with missing data) as the primary analysis. Alternatively, they may conduct a primary analysis that makes plausible assumptions about the outcomes of participants with missing data. When the primary analysis suggests important benefit, sensitivity meta-analyses using relatively extreme assumptions that may vary in plausibility can inform the extent to which risk of bias impacts the confidence in the results of the primary analysis. The more plausible assumptions draw on the outcome event rates within the trial or in all trials included in the meta-analysis. The proposed guide does not take into account the uncertainty associated with assumed events.

Conclusions

This guide proposes methods for handling participants excluded from analyses of randomized trials. These methods can help in establishing the extent to which risk of bias impacts meta-analysis results.  相似文献   

20.
A method of quantifying community spatial patterns, community pattern analysis, is described. It is proposed that ordination analysis is used to obtain an integrated score for each quadrat from transect data. For the data presented here, separate ordinations were made of both floristic and environmental (soils) data. The ordination axis scores are then analysed using two or three-term local variance analysis to quantify the scales of community pattern. Correlation analyses allow the relationship between the vegetation and soils data (as represented by ordination axis scores), and other environmental data to be investigated at defined scales. The advantages of this method, that employs the joint application of conventional methods, are that it includes the influence of all species in the analysis, and that multiple uncorrelated scales of pattern within a community are identified.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号