首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Data Analysis Tool Extension (DAnTE) is a statistical tool designed to address challenges associated with quantitative bottom-up, shotgun proteomics data. This tool has also been demonstrated for microarray data and can easily be extended to other high-throughput data types. DAnTE features selected normalization methods, missing value imputation algorithms, peptide-to-protein rollup methods, an extensive array of plotting functions and a comprehensive hypothesis-testing scheme that can handle unbalanced data and random effects. The graphical user interface (GUI) is designed to be very intuitive and user friendly. AVAILABILITY: DAnTE may be downloaded free of charge at http://omics.pnl.gov/software/. SUPPLEMENTARY INFORMATION: An example dataset with instructions on how to perform a series of analysis steps is available at http://omics.pnl.gov/software/  相似文献   

The effective extraction of information from multidimensional data sets derived from phenotyping experiments is a growing challenge in biology. Data visualization tools are important resources that can aid in exploratory data analysis of complex data sets. Phenotyping experiments of model organisms produce data sets in which a large number of phenotypic measures are collected for each individual in a group. A critical initial step in the analysis of such multidimensional data sets is the exploratory analysis of data distribution and correlation. To facilitate the rapid visualization and exploratory analysis of multidimensional complex trait data, we have developed a user-friendly, web-based software tool called Phenostat. Phenostat is composed of a dynamic graphical environment that allows the user to inspect the distribution of multiple variables in a data set simultaneously. Individuals can be selected by directly clicking on the graphs and thus displaying their identity, highlighting corresponding values in all graphs, allowing their inclusion or exclusion from the analysis. Statistical analysis is provided by R package functions. Phenostat is particularly suited for rapid distribution and correlation analysis of subsets of data. An analysis of behavioral and physiologic data stemming from a large mouse phenotyping experiment using Phenostat reveals previously unsuspected correlations. Phenostat is freely available to academic institutions and nonprofit organizations and can be used from our website at .  相似文献   

CellDepot containing over 270 datasets from 8 species and many tissues serves as an integrated web application to empower scientists in exploring single-cell RNA-seq (scRNA-seq) datasets and comparing the datasets among various studies through a user-friendly interface with advanced visualization and analytical capabilities. To begin with, it provides an efficient data management system that users can upload single cell datasets and query the database by multiple attributes such as species and cell types. In addition, the graphical multi-logic, multi-condition query builder and convenient filtering tool backed by MySQL database system, allows users to quickly find the datasets of interest and compare the expression of gene(s) across these. Moreover, by embedding the cellxgene VIP tool, CellDepot enables fast exploration of individual dataset in the manner of interactivity and scalability to gain more refined insights such as cell composition, gene expression profiles, and differentially expressed genes among cell types by leveraging more than 20 frequently applied plotting functions and high-level analysis methods in single cell research. In summary, the web portal available at http://celldepot.bxgenomics.com, prompts large scale single cell data sharing, facilitates meta-analysis and visualization, and encourages scientists to contribute to the single-cell community in a tractable and collaborative way. Finally, CellDepot is released as open-source software under MIT license to motivate crowd contribution, broad adoption, and local deployment for private datasets.  相似文献   

There are many ftp or http servers storing data required for biological research. While some download applications are available, there is no user-friendly download application with a graphical interface specifically designed and adapted to meet the requirements of bioinformatics. BioDownloader is a program for downloading and updating files from ftp and http servers. It is optimized to work robustly with large numbers of files. It allows the selective retrieval of only the required files (batch downloads, multiple file masks, ls-lR file parsing, recursive search, recent updates, etc.). BioDownloader has a built-in repository containing the settings for common bioinformatics file-synchronization needs, including the Protein Data Bank (PDB) and National Center for Biotechnology Information (NCBI) databases. It can post-process downloaded files, including archive extraction and file conversions. AVAILABILITY: The program can be installed from http://dunbrack.fccc.edu/BioDownloader. The software is freely available for both non-commercial and commercial users under the BSD license.  相似文献   

We describe the PloGO R package, a simple open-source tool for plotting gene ontology (GO) annotation and abundance information, which was developed to aid with the bioinformatics analysis of multi-condition label-free proteomics experiments using quantitation based on spectral counting. PloGO can incorporate abundance (raw spectral counts) or normalized spectral abundance factors (NSAF) data in addition to the GO annotation, as well as handle multiple files and allow for a targeted collection of GO categories of interest. Our main aims were to help identify interesting subsets of proteins for further analysis such as those arising from a protein data set partition based on the presence and absence or multiple pair-wise comparisons, as well as provide GO summaries that can be easily used in subsequent analyses. Though developed with label-free proteomics experiments in mind it is not specific to that approach and can be used for any multi-condition experiment for which GO information has been generated.  相似文献   

Novel and improved computational tools are required to transform large-scale proteomics data into valuable information of biological relevance. To this end, we developed ProteoConnections, a bioinformatics platform tailored to address the pressing needs of proteomics analyses. The primary focus of this platform is to organize peptide and protein identifications, evaluate the quality of the acquired data set, profile abundance changes, and accelerate data interpretation. Peptide and protein identifications are stored into a relational database to facilitate data mining and to evaluate the quality of data sets using graphical reports. We integrated databases of known PTMs and other bioinformatics tools to facilitate the analysis of phosphoproteomics data sets and to provide insights for subsequent biological validation experiments. Phosphorylation sites are also annotated according to kinase consensus motifs, contextual environment, protein domains, binding motifs, and evolutionary conservation across different species. The practical application of ProteoConnections is further demonstrated for the analysis of the phosphoproteomics data sets from rat intestinal IEC-6 cells where we identified 9615 phosphorylation sites on 2108 phosphoproteins. Combined proteomics and bioinformatics analyses revealed valuable biological insights on the regulation of phosphoprotein functions via the introduction of new binding sites on scaffold proteins or the modulation of protein-protein, protein-DNA, or protein-RNA interactions. Quantitative proteomics data can be integrated into ProteoConnections to determine the changes in protein phosphorylation under different cell stimulation conditions or kinase inhibitors, as demonstrated here for the MEK inhibitor PD184352.  相似文献   

The creation of classification kernel models to categorize unknown data samples of massive magnitude is an extremely advantageous tool for the scientific community. Excel2SVM, a stand-alone Python mathematical analysis tool, bridges the gap between researchers and computer science to create a simple graphical user interface that allows users to examine data and perform maximal margin classification. This valuable ability to train support vector machines and classify unknown data files is harnessed in this fast and efficient software, granting researchers full access to this complicated, high-level algorithm. Excel2SVM offers the ability to convert data to the proper sparse format while performing a variety of kernel functions along with cost factors/modes, grids, crossvalidation, and several other functions. This program functions with any type of quantitative data making Excel2SVM the ideal tool for analyzing a wide variety of input. The software is free and available at www.bioinformatics.org/excel2svm. A link to the software may also be found at www.kernel-machines.org. This software provides a useful graphical user interface that has proven to provide kernel models with accurate results and data classification through a decision boundary.  相似文献   

Recently, applications of mass spectrometry in the field of clinical proteomics have gained tremendous visibility in the scientific and clinical community. One major objective is the search for potential biomarkers in complex body fluids like serum, plasma, urine, saliva, or cerebral spinal fluid. For this purpose, efficient visualization of large data sets derived from patient cohorts is crucial to provide clinical experts an interactive impression of the data quality. Additionally, it is necessary to apply statistical analysis and pattern matching algorithms to attain validated signal patterns that may allow for later applications in sample classification. We introduce the new ClinProTools bioinformatics software, which performs all major steps of profiling, screening, and monitoring applications in clinical proteomics. ClinProTools is the data interpretation software of the mass spectrometry-based ClinProt solutions for biomarker analysis. ClinProTools performs data pretreatment, visualization, statistics, pattern determination, pattern evaluation, and classification of spectra. This article will focus on ClinProTool's powerful and intuitive visualization options for clinical proteomics applications.  相似文献   

The rapidly growing number of biomedical studies supported by mass spectrometry based quantitative proteomics data has made it increasingly difficult to obtain an overview of the current status of the research field. A better way of organizing the biomedical proteomics information from these studies and making it available to the research community is therefore called for. In the presented work, we have investigated scientific publications describing the analysis of the cerebrospinal fluid proteome in relation to multiple sclerosis, Parkinson's disease and Alzheimer's disease. Based on a detailed set of filtering criteria we extracted 85 data sets containing quantitative information for close to 2000 proteins. This information was made available in CSF-PR 2.0 (http://probe.uib.no/csf-pr-2.0), which includes novel approaches for filtering, visualizing and comparing quantitative proteomics information in an interactive and user-friendly environment. CSF-PR 2.0 will be an invaluable resource for anyone interested in quantitative proteomics on cerebrospinal fluid.  相似文献   

SUMMARY: Besides classical clustering methods such as hierarchical clustering, in recent years biclustering has become a popular approach to analyze biological data sets, e.g. gene expression data. The Biclustering Analysis Toolbox (BicAT) is a software platform for clustering-based data analysis that integrates various biclustering and clustering techniques in terms of a common graphical user interface. Furthermore, BicAT provides different facilities for data preparation, inspection and postprocessing such as discretization, filtering of biclusters according to specific criteria or gene pair analysis for constructing gene interconnection graphs. The possibility to use different biclustering algorithms inside a single graphical tool allows the user to compare clustering results and choose the algorithm that best fits a specific biological scenario. The toolbox is described in the context of gene expression analysis, but is also applicable to other types of data, e.g. data from proteomics or synthetic lethal experiments. AVAILABILITY: The BicAT toolbox is freely available at http://www.tik.ee.ethz.ch/sop/bicat and runs on all operating systems. The Java source code of the program and a developer's guide is provided on the website as well. Therefore, users may modify the program and add further algorithms or extensions.  相似文献   

A user-friendly graphical data analysis to perform stability analysis of genotype x environmental interactions, using Tai's stability model and additive main effects and multiplicative interaction (AMMI) biplots, are presented here. This practical approach integrates statistical and graphical analysis tools available in SAS systems and provides user-friendly applications to perform complete stability analyses without writing SAS program statements or using pull-down menu interfaces by running the SAS macros in the background. By using this macro approach, the agronomists and plant breeders can effectively perform stability analysis and spend more time in data exploration, interpretation of graphs, and output, rather than debugging their program errors. The necessary MACRO-CALL files can be downloaded from the author's home page at http://www.ag.unr.edu/gf. The nature and the distinctive features of the graphics produced by these applications are illustrated by using published data.  相似文献   

As the Human Genome Project and other genome projects experience remarkable success and a flood of biological data is produced by means of high-throughout sequencing techniques, detection of horizontal gene transfer (HGT) becomes a promising field in bioinformatics. This review describes two freeware programs: T-REX for MS Windows and RHOM for Linux. T-REX is a graphical user interface program that offers functions to reconstruct the HGT network among the donor and receptor hosts from the gene and species distance matrices. RHOM is a set of command-line driven programs used to detect HGT in genomes. While T-REX impresses with a user-friendly interface and drawing of the reticulation network, the strength of RHOM is an extensive statistical framework of genome and the graphical display of the estimated sequence position probabilities for the candidate horizontally transferred genes.  相似文献   

As pharmacological data sets become increasingly large and complex, new visual analysis and filtering programs are needed to aid their appreciation. One of the most commonly used methods for visualizing biological data is the Venn diagram. Currently used Venn analysis software often presents multiple problems to biological scientists, in that only a limited number of simultaneous data sets can be analyzed. An improved appreciation of the connectivity between multiple, highly-complex datasets is crucial for the next generation of data analysis of genomic and proteomic data streams. We describe the development of VENNTURE, a program that facilitates visualization of up to six datasets in a user-friendly manner. This program includes versatile output features, where grouped data points can be easily exported into a spreadsheet. To demonstrate its unique experimental utility we applied VENNTURE to a highly complex parallel paradigm, i.e. comparison of multiple G protein-coupled receptor drug dose phosphoproteomic data, in multiple cellular physiological contexts. VENNTURE was able to reliably and simply dissect six complex data sets into easily identifiable groups for straightforward analysis and data output. Applied to complex pharmacological datasets, VENNTURE's improved features and ease of analysis are much improved over currently available Venn diagram programs. VENNTURE enabled the delineation of highly complex patterns of dose-dependent G protein-coupled receptor activity and its dependence on physiological cellular contexts. This study highlights the potential for such a program in fields such as pharmacology, genomics, and bioinformatics.  相似文献   

The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of ; the visualization itself can be done with a complexity of and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with ms. The current 64-bit implementation theoretically supports datasets with up to bytes, on the x86_64 architecture currently up to bytes are supported, and benchmarks have been conducted with bytes/1 TiB or double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.  相似文献   

随着蛋白质组学研究的不断深入,基于质谱的选择反应监测技术(SRM)已经成为以发现生物标志物为代表的定向蛋白质组学研究的重要手段.SRM技术根据假设信息,特异性地获取符合假设条件的质谱信号,去除不符合条件的离子信号干扰,从而得到特定蛋白质的定量信息.SRM技术具有更高的灵敏度和精确性、更大的动态范围等优势.该技术可分为实验设计、数据获取和数据分析三个步骤.在这几个步骤中,最重要的是利用生物信息学手段总结当前实验数据的结果,并用机器学习方法和总结的经验规则进行SRM实验的母离子和子离子对的预测.针对数据质控和定量的生物信息学方法研究在提高SRM数据可靠性方面具有重要作用.此外,为方便SRM的研究,本文还收集、汇总了SRM技术相关的软件、工具和数据库资源.随着质谱仪器的不断发展,新的SRM实验策略以及分析方法、计算工具也应运而生.结合更优化的实验策略、方法,采用更精准的生物信息学算法和工具,SRM在未来蛋白质组学的发展中将发挥更加重要的作用.  相似文献   

DEPD: a novel database for differentially expressed proteins   总被引:4,自引:0,他引:4  
SUMMARY: The Differentially Expressed Protein Database was designed to store the output of comparative proteomics studies and provides a publicly available query and analysis platform for data mining. The database contains information about more than 3000 differentially expressed proteins (DEPs) manually extracted from the published literature, including relevant biological, experimental and methodological elements. Tools for visualization and functional analysis of DEPs are provided via a user-friendly webinterface. AVAILABILITY: http://protchem.hunnu.edu.cn/depd/.  相似文献   

Proteomic studies involve the identification as well as qualitative and quantitative comparison of proteins expressed under different conditions, and elucidation of their properties and functions, usually in a large-scale, high-throughput format. The high dimensionality of data generated from these studies will require the development of improved bioinformatics tools and data-mining approaches for efficient and accurate data analysis of biological specimens from healthy and diseased individuals. Mining large proteomics data sets provides a better understanding of the complexities between the normal and abnormal cell proteome of various biological systems, including environmental hazards, infectious agents (bioterrorism) and cancers. This review will shed light on recent developments in bioinformatics and data-mining approaches, and their limitations when applied to proteomics data sets, in order to strengthen the interdependence between proteomic technologies and bioinformatics tools.  相似文献   

Quality control and preprocessing of metagenomic datasets   总被引:2,自引:0,他引:2  
SUMMARY: Here, we present PRINSEQ for easy and rapid quality control and data preprocessing of genomic and metagenomic datasets. Summary statistics of FASTA (and QUAL) or FASTQ files are generated in tabular and graphical form and sequences can be filtered, reformatted and trimmed by a variety of options to improve downstream analysis. Availability and Implementation: This open-source application was implemented in Perl and can be used as a stand alone version or accessed online through a user-friendly web interface. The source code, user help and additional information are available at http://prinseq.sourceforge.net/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号