首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: The program MBBC 2.0 clusters time-course microarray data using a Bayesian product partition model. RESULTS: The Bayesian product partition model in Booth et al. (2007) simultaneously searches for the optimal number of clusters, and assigns cluster memberships based on temporal changes of gene expressions. MBBC 2.0 to makes this method easily available for statisticians and scientists, and is built with three free computer language software packages: Ox, R and C++, taking advantage of the strengths of each language. Within MBBC, the search algorithm is implemented with Ox and resulting graphs are drawn with R. A user-friendly graphical interface is built with C++ to run the Ox and R programs internally. Thus, MBBC users are not required to know how to use Ox, R or C++, but they must be pre-installed. AVAILABILITY: A self-extractable zip file, MBBC20zip.exe, is available at the MBBC webpage www.stat.ufl.edu/~casella/mbbc/, which contains MBBC.exe, source files, and all other related files. The current version works only in the Windows operating system. A free installation program and overview for Ox is available at www.doornik.com. A detailed installation guide for Ox is provided by MBBC, and is accessible without installing Ox. R is available at www.r-project.org/.  相似文献   

2.
MOTIVATION: Typical GC-MS-based metabolite profiling experiments may comprise hundreds of chromatogram files, which each contain up to 1000 mass spectral tags (MSTs). MSTs are the characteristic patterns of approximately 25-250 fragment ions and respective isotopomers, which are generated after gas chromatography (GC) by electron impact ionization (EI) of the separated chemical molecules. These fragment ions are subsequently detected by time-of-flight (TOF) mass spectrometry (MS). MSTs of profiling experiments are typically reported as a list of ions, which are characterized by mass, chromatographic retention index (RI) or retention time (RT), and arbitrary abundance. The first two parameters allow the identification, the later the quantification of the represented chemical compounds. Many software tools have been reported for the pre-processing, the so-called curve resolution and deconvolution, of GC-(EI-TOF)-MS files. Pre-processing tools generate numerical data matrices, which contain all aligned MSTs and samples of an experiment. This process, however, is error prone mainly due to (i) the imprecise RI or RT alignment of MSTs and (ii) the high complexity of biological samples. This complexity causes co-elution of compounds and as a consequence non-selective, in other words impure MSTs. The selection and validation of optimal fragment ions for the specific and selective quantification of simultaneously eluting compounds is, therefore, mandatory. Currently validation is performed in most laboratories under human supervision. So far no software tool supports the non-targeted and user-independent quality assessment of the data matrices prior to statistical analysis. TagFinder may fill this gap. Strategy: TagFinder facilitates the analysis of all fragment ions, which are observed in GC-(EI-TOF)-MS profiling experiments. The non-targeted approach allows the discovery of novel and unexpected compounds. In addition, mass isotopomer resolution is maintained by TagFinder processing. This feature is essential for metabolic flux analyses and highly useful, but not required for metabolite profiling. Whenever possible, TagFinder gives precedence to chemical means of standardization, for example, the use of internal reference compounds for retention time calibration or quantitative standardization. In addition, external standardization is supported for both compound identification and calibration. The workflow of TagFinder comprises, (i) the import of fragment ion data, namely mass, time and arbitrary abundance (intensity), from a chromatography file interchange format or from peak lists provided by other chromatogram pre-processing software, (ii) the annotation of sample information and grouping of samples into classes, (iii) the RI calculation, (iv) the binning of observed fragment ions of equal mass from different chromatograms into RI windows, (v) the combination of these bins, so-called mass tags, into time groups of co-eluting fragment ions, (vi) the test of time groups for intensity correlated mass tags, (vii) the data matrix generation and (viii) the extraction of selective mass tags supported by compound identification. Thus, TagFinder supports both non-targeted fingerprinting analyses and metabolite targeted profiling. AVAILABILITY: Exemplary TagFinder workspaces and test data sets are made available upon request to the contact authors. TagFinder is made freely available for academic use from http://www-en.mpimp-golm.mpg.de/03-research/researchGroups/01-dept1/Root_Metabolism/smp/TagFinder/index.html.  相似文献   

3.
MHCPEP, a database of MHC-binding peptides: update 1997.   总被引:10,自引:1,他引:10       下载免费PDF全文
MHCPEP (http://wehih.wehi.edu.au/mhcpep/) is a curated database comprising over 13 000 peptide sequences known to bind MHC molecules. Entries are compiled from published reports as well as from direct submissions of experimental data. Each entry contains the peptide sequence, its MHC specificity and where available, experimental method, observed activity, binding affinity, source protein and anchor positions, as well as publication references. The present format of the database allows text string matching searches but can easily be converted for use in conjunction with sequence analysis packages. The database can be accessed via Internet using WWW or FTP.  相似文献   

4.
5.
The NEXUS Class Library (NCL) is a collection of C++ classes designed to simplify interpreting data files written in the NEXUS format used by many computer programs for phylogenetic analyses. The NEXUS format allows different programs to share the same data files, even though none of the programs can interpret all of the data stored therein. Because users are not required to reformat the data file for each program, use of the NEXUS format prevents cut-and-paste errors as well as the proliferation of copies of the original data file. The purpose of making the NCL available is to encourage the use of the NEXUS format by making it relatively easy for programmers to add the ability to interpret NEXUS files in newly developed software. AVAILABILITY: The NCL is freely available under the GNU General Public License from http://hydrodictyon.eeb.uconn.edu/ncl/ Supplementary information: Documentation for the NCL (general information and source code documentation) is available in HTML format at http://hydrodictyon.eeb.uconn.edu/ncl/  相似文献   

6.
Cyclone aims at facilitating the use of BioCyc, a collection of Pathway/Genome Databases (PGDBs). Cyclone provides a fully extensible Java Object API to analyze and visualize these data. Cyclone can read and write PGDBs, and can write its own data in the CycloneML format. This format is automatically generated from the BioCyc ontology by Cyclone itself, ensuring continued compatibility. Cyclone objects can also be stored in a relational database CycloneDB. Queries can be written in SQL, and in an intuitive and concise object-oriented query language, Hibernate Query Language (HQL). In addition, Cyclone interfaces easily with Java software including the Eclipse IDE for HQL edition, the Jung API for graph algorithms or Cytoscape for graph visualization. AVAILABILITY: Cyclone is freely available under an open source license at: http://sourceforge.net/projects/nemo-cyclone. SUPPLEMENTARY INFORMATION: For download and installation instructions, tutorials, use cases and examples, see http://nemo-cyclone.sourceforge.net.  相似文献   

7.
In the present study, we simultaneously measured several polyols, such as adonitol, arabitol, dulcitol, glucose, myo-inositol, mannitol, sorbitol, and xylitol, in urine by gas chromatography/mass spectrometry-positive chemical ionization. We also examined possible relationship between the levels of these metabolites and age in normal individuals. In order to proceed to its quantification by GC/MS, 200 microL of a urine sample were diluted with 3 ml of distilled water, lyophilized, acetylated, and then analyzed them. Using this method, we were able to quantify as little as 0.5-1.0 ng/microL, and we made the calibration curves to be linear from 0.25 to 250 ng/microL (r(2)>0.991). Analytical recoveries were over 89.4%, and the inter-day and intra-day variability for accuracy and reproducibility was less than 20%. In the normal urine sample, the levels of polyols were gender-differentiated and age-related. This simple GC/MS method is sensitive and allows the measurement of wide ranges of polyols using small amounts of urine. We conclude that the quantitation of urinary polyols using GC/MS appears to be a clinically useful method for assessing polyol-pathway activity.  相似文献   

8.
State of the art (DNA) sequencing methods applied in "Omics" studies grant insight into the 'blueprints' of organisms from all domains of life. Sequencing is carried out around the globe and the data is submitted to the public repositories of the International Nucleotide Sequence Database Collaboration. However, the context in which these studies are conducted often gets lost, because experimental data, as well as information about the environment are rarely submitted along with the sequence data. If these contextual or metadata are missing, key opportunities of comparison and analysis across studies and habitats are hampered or even impossible. To address this problem, the Genomic Standards Consortium (GSC) promotes checklists and standards to better describe our sequence data collection and to promote the capturing, exchange and integration of sequence data with contextual data. In a recent community effort the GSC has developed a series of recommendations for contextual data that should be submitted along with sequence data. To support the scientific community to significantly enhance the quality and quantity of contextual data in the public sequence data repositories, specialized software tools are needed. In this work we present CDinFusion, a web-based tool to integrate contextual and sequence data in (Multi)FASTA format prior to submission. The tool is open source and available under the Lesser GNU Public License 3. A public installation is hosted and maintained at the Max Planck Institute for Marine Microbiology at http://www.megx.net/cdinfusion. The tool may also be installed locally using the open source code available at http://code.google.com/p/cdinfusion.  相似文献   

9.
High-density single nucleotide polymorphism microarrays (SNP chips) provide information on a subject's genome, such as copy number and genotype (heterozygosity/homozygosity) at a SNP. While fluorescence in situ hybridization and karyotyping reveal many abnormalities, SNP chips provide a higher resolution map of the human genome that can be used to detect, e.g., aneuploidies, microdeletions, microduplications and loss of heterozygosity (LOH). As a variety of diseases are linked to such chromosomal abnormalities, SNP chips promise new insights for these diseases by aiding in the discovery of such regions, and may suggest targets for intervention. The R package SNPchip contains classes and methods useful for storing, visualizing and analyzing high density SNP data. Originally developed from the SNPscan web-tool, SNPchip utilizes S4 classes and extends other open source R tools available at Bioconductor. This has numerous advantages, including the ability to build statistical models for SNP-level data that operate on instances of the class, and to communicate with other R packages that add additional functionality. AVAILABILITY: The package is available from the Bioconductor web page at www.bioconductor.org. SUPPLEMENTARY INFORMATION: The supplementary material as described in this article (case studies, installation guidelines and R code) is available from http://biostat.jhsph.edu/~iruczins/publications/sm/  相似文献   

10.
MHCPEP--a database of MHC-binding peptides: update 1995.   总被引:1,自引:0,他引:1       下载免费PDF全文
MHCPEP is a curated database comprising over 6000 peptide sequences known to bind MHC molecules. Entries are compiled from published reports as well as from direct submissions of experimental data. Each entry contains peptide sequence, MHC specificity and when available, experimental method, observed activity, binding affinity, source protein, anchor positions, as well as publication references. The present format of the database allows text string matching searches but can easily be converted for use in conjunction with sequence analysis packages. The database can be accessed via Internet using Gopher, FTP or WWW.  相似文献   

11.
PRIDE: the proteomics identifications database   总被引:2,自引:0,他引:2  
The advent of high-throughput proteomics has enabled the identification of ever increasing numbers of proteins. Correspondingly, the number of publications centered on these protein identifications has increased dramatically. With the first results of the HUPO Plasma Proteome Project being analyzed and many other large-scale proteomics projects about to disseminate their data, this trend is not likely to flatten out any time soon. However, the publication mechanism of these identified proteins has lagged behind in technical terms. Often very long lists of identifications are either published directly with the article, resulting in both a voluminous and rather tedious read, or are included on the publisher's website as supplementary information. In either case, these lists are typically only provided as portable document format documents with a custom-made layout, making it practically impossible for computer programs to interpret them, let alone efficiently query them. Here we propose the proteomics identifications (PRIDE) database (http://www.ebi.ac.uk/pride) as a means to finally turn publicly available data into publicly accessible data. PRIDE offers a web-based query interface, a user-friendly data upload facility, and a documented application programming interface for direct computational access. The complete PRIDE database, source code, data, and support tools are freely available for web access or download and local installation.  相似文献   

12.
MHCPEP, a database of MHC-binding peptides: update 1996.   总被引:1,自引:1,他引:0       下载免费PDF全文
MHCPEP is a curated database comprising over 9000 peptide sequences known to bind MHC molecules. Entries are compiled from published reports as well as from direct submissions of experimental data. Each entry contains the peptide sequence, its MHC specificity and, when available, experimental method, observed activity, binding affinity, source protein, anchor positions and publication references. The present format of the database allows text string matching searches but can easily be converted for use in conjunction with sequence analysis packages. The database can be accessed via Internet using WWW, FTP or Gopher.  相似文献   

13.
α-Bisabolol is a commercially important aroma chemical currently obtained from the Candeia tree (Vanillosmopsis erythropappa). Continuous unsustainable harvesting of the Candeia tree has prompted the urgent need to identify alternative crops as a source of this commercially important sesquiterpene alcohol. A chemotaxonomic assessment of two Salvia species indigenous to South Africa is presented and recommended as a potential source of α-bisabolol. The essential oil obtained by hydrodistillation of the aerial parts was analysed by gas chromatography coupled to mass spectrometry (GC–MS) and mid-infrared spectroscopy (MIRS). Orthogonal projections to latent structures–discriminant analysis (OPLS–DA) were used for multivariate classification of the oils based on GC–MS and MIRS data. Partial least squares (PLS) calibration models were developed on the MIRS data for the quantification of α-bisabolol using GC–MS as the reference method. A clear distinction between Salvia stenophylla and Salvia runcinata oils was observed using OPLS–DA on both GC–MS and MIRS data. The MIR calibration model showed high coefficient of determination (R2 = 0.999) and low error of prediction (RMSEP = 0.540%) for α-bisabolol content.  相似文献   

14.
SUMMARY: CisML is an XML-based format for sequence motif detection software. This proposed standard is applicable to many types of sequence motif detection programs. It is intended to facilitate the integration of data and the comparison of results from different software packages, and to simplify the development of downstream tools. XSL stylesheets are provided for easy generation of text, html and graphical reports from CisML-formatted data. AVAILABILITY: http://zlab.bu.edu/CisML/ SUPPLEMENTARY INFORMATION: Example CisML-formatted data and XSL stylesheets for report generation are available along with the sample output.  相似文献   

15.
Data processing forms an integral part of biomarker discovery and contributes significantly to the ultimate result. To compare and evaluate various publicly available open source label-free data processing workflows, we developed msCompare, a modular framework that allows the arbitrary combination of different feature detection/quantification and alignment/matching algorithms in conjunction with a novel scoring method to evaluate their overall performance. We used msCompare to assess the performance of workflows built from modules of publicly available data processing packages such as SuperHirn, OpenMS, and MZmine and our in-house developed modules on peptide-spiked urine and trypsin-digested cerebrospinal fluid (CSF) samples. We found that the quality of results varied greatly among workflows, and interestingly, heterogeneous combinations of algorithms often performed better than the homogenous workflows. Our scoring method showed that the union of feature matrices of different workflows outperformed the original homogenous workflows in some cases. msCompare is open source software (https://trac.nbic.nl/mscompare), and we provide a web-based data processing service for our framework by integration into the Galaxy server of the Netherlands Bioinformatics Center (http://galaxy.nbic.nl/galaxy) to allow scientists to determine which combination of modules provides the most accurate processing for their particular LC-MS data sets.  相似文献   

16.
SUMMARY: We present GenomeDiagram, a flexible, open-source Python module for the visualization of large-scale genomic, comparative genomic and other data with reference to a single chromosome or other biological sequence. GenomeDiagram may be used to generate publication-quality vector graphics, rastered images and in-line streamed graphics for webpages. The package integrates with datatypes from the BioPython project, and is available for Windows, Linux and Mac OS X systems. AVAILABILITY: GenomeDiagram is freely available as source code (under GNU Public License) at http://bioinf.scri.ac.uk/lp/programs.html, and requires Python 2.3 or higher, and recent versions of the ReportLab and BioPython packages. SUPPLEMENTARY INFORMATION: A user manual, example code and images are available at http://bioinf.scri.ac.uk/lp/programs.html.  相似文献   

17.
High-throughput experimentation has revolutionized data-driven experimental sciences and opened the door to the application of machine learning techniques. Nevertheless, the quality of any data analysis strongly depends on the quality of the data and specifically the degree to which random effects in the experimental data-generating process are quantified and accounted for. Accordingly calibration, i.e. the quantitative association between observed quantities and measurement responses, is a core element of many workflows in experimental sciences.Particularly in life sciences, univariate calibration, often involving non-linear saturation effects, must be performed to extract quantitative information from measured data. At the same time, the estimation of uncertainty is inseparably connected to quantitative experimentation. Adequate calibration models that describe not only the input/output relationship in a measurement system but also its inherent measurement noise are required. Due to its mathematical nature, statistically robust calibration modeling remains a challenge for many practitioners, at the same time being extremely beneficial for machine learning applications.In this work, we present a bottom-up conceptual and computational approach that solves many problems of understanding and implementing non-linear, empirical calibration modeling for quantification of analytes and process modeling. The methodology is first applied to the optical measurement of biomass concentrations in a high-throughput cultivation system, then to the quantification of glucose by an automated enzymatic assay. We implemented the conceptual framework in two Python packages, calibr8 and murefi, with which we demonstrate how to make uncertainty quantification for various calibration tasks more accessible. Our software packages enable more reproducible and automatable data analysis routines compared to commonly observed workflows in life sciences.Subsequently, we combine the previously established calibration models with a hierarchical Monod-like ordinary differential equation model of microbial growth to describe multiple replicates of Corynebacterium glutamicum batch cultures. Key process model parameters are learned by both maximum likelihood estimation and Bayesian inference, highlighting the flexibility of the statistical and computational framework.  相似文献   

18.
mzTab is the most recent standard format developed by the Proteomics Standards Initiative. mzTab is a flexible tab‐delimited file that can capture identification and quantification results coming from MS‐based proteomics and metabolomics approaches. We here present an open‐source Java application programming interface for mzTab called jmzTab. The software allows the efficient processing of mzTab files, providing read and write capabilities, and is designed to be embedded in other software packages. The second key feature of the jmzTab model is that it provides a flexible framework to maintain the logical integrity between the metadata and the table‐based sections in the mzTab files. In this article, as two example implementations, we also describe two stand‐alone tools that can be used to validate mzTab files and to convert PRIDE XML files to mzTab. The library is freely available at http://mztab.googlecode.com .  相似文献   

19.
GDPC: connecting researchers with multiple integrated data sources   总被引:1,自引:0,他引:1  
The goal of this project is to simplify access to genomic diversity and phenotype data, thereby encouraging reuse of this data. The Genomic Diversity and Phenotype Connection (GDPC) accomplishes this by retrieving data from one or more data sources and by allowing researchers to analyze integrated data in a standard format. GDPC is written in JAVA and provides (1) data sources available as web services that transfer XML formatted data via the SOAP protocol; (2) a JAVA API for programmatic access to data sources; and (3) a front-end application that allows users to manage data sources, retrieve data based on filters, sort/group data based on property values and save/open the data as XML files. AVAILABILITY: The source code, compiled code, documentation and GDPC Browser are freely available at: www.maizegenetics.net/gdpc/index.html the current release of GDPC is version 1.0, with updated releases planned for the future. Comments are welcome.  相似文献   

20.
ABSTRACT: BACKGROUND: Two-dimensional data needs to be processed and analysed in almost any experimental laboratory. Some tasks in this context may be performed with generic software such as spreadsheet programs which are available ubiquitously, others may require more specialised software that requires paid licences. Additionally, more complex software packages typically require more time by the individual user to understand and operate. Practical and convenient graphical data analysis software in Java with a user-friendly interface are rare. RESULTS: We have developed SDAR, a Java application to analyse two-dimensional data with an intuitive graphical user interface. A smart ASCII parser allows import of data into SDAR without particular format requirements. The centre piece of SDAR is the Java class GraphPanel which provides methods for generic tasks of data visualisation. Data can be manipulated and analysed with respect to the most common operations experienced in an experimental biochemical laboratory. Images of the data plots can be generated in SVG-, TIFF- or PNG-format. Data exported by SDAR is annotated with commands compatible with the Grace software. CONCLUSION: Since SDAR is implemented in Java, it is truly cross-platform compatible. The software is easy to install, and very convenient to use judging by experience in our own laboratories. It is freely available to academic users at http://www.structuralchemistry.org/pcsb/. To download SDAR, users will be asked for their name, institution and email address. A manual, as well as the source code of the GraphPanel class can also be downloaded from this site.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号