首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

A convergence of high-throughput sequencing and computational power is transforming biology into information science. Despite these technological advances, converting bits and bytes of sequence information into meaningful insights remains a challenging enterprise. Biological systems operate on multiple hierarchical levels from genomes to biomes. Holistic understanding of biological systems requires agile software tools that permit comparative analyses across multiple information levels (DNA, RNA, protein, and metabolites) to identify emergent properties, diagnose system states, or predict responses to environmental change.

Results

Here we adopt the MetaPathways annotation and analysis pipeline and Pathway Tools to construct environmental pathway/genome databases (ePGDBs) that describe microbial community metabolism using MetaCyc, a highly curated database of metabolic pathways and components covering all domains of life. We evaluate Pathway Tools’ performance on three datasets with different complexity and coding potential, including simulated metagenomes, a symbiotic system, and the Hawaii Ocean Time-series. We define accuracy and sensitivity relationships between read length, coverage and pathway recovery and evaluate the impact of taxonomic pruning on ePGDB construction and interpretation. Resulting ePGDBs provide interactive metabolic maps, predict emergent metabolic pathways associated with biosynthesis and energy production and differentiate between genomic potential and phenotypic expression across defined environmental gradients.

Conclusions

This multi-tiered analysis provides the user community with specific operating guidelines, performance metrics and prediction hazards for more reliable ePGDB construction and interpretation. Moreover, it demonstrates the power of Pathway Tools in predicting metabolic interactions in natural and engineered ecosystems.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-619) contains supplementary material, which is available to authorized users.  相似文献   

2.
3.

Background

A direct link between the names and structures of compounds and the functional groups contained within them is important, not only because biochemists frequently rely on literature that uses a free-text format to describe functional groups, but also because metabolic models depend upon the connections between enzymes and substrates being known and appropriately stored in databases.

Methodology

We have developed a database named “Biochemical Substructure Search Catalogue” (BiSSCat), which contains 489 functional groups, >200,000 compounds and >1,000,000 different computationally constructed substructures, to allow identification of chemical compounds of biological interest.

Conclusions

This database and its associated web-based search program (http://bisscat.org/) can be used to find compounds containing selected combinations of substructures and functional groups. It can be used to determine possible additional substrates for known enzymes and for putative enzymes found in genome projects. Its applications to enzyme inhibitor design are also discussed.  相似文献   

4.

Background  

Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project. Because this target database is orders of magnitude larger than the databases traditionally employed in tandem mass spectra analysis, careful attention to significance testing is required. Confidence of identification is assessed using our previously described Poisson statistic, which estimates the significance of multi-peptide identifications incorporating the length of the matching sequence, number of spectra searched and size of the target sequence database.  相似文献   

5.

Background  

Pathway models serve as the basis for much of systems biology. They are often built using programs designed for the purpose. Constructing new models generally requires simultaneous access to experimental data of diverse types, to databases of well-characterized biological compounds and molecular intermediates, and to reference model pathways. However, few if any software applications provide all such capabilities within a single user interface.  相似文献   

6.
7.
Mittler T  Levy M  Chad F  Karen S 《Bioinformation》2010,5(5):224-226
Basic Local Alignment Search Tool, (BLAST) allows the comparison of a query sequence/s to a database of sequences and identifies those sequences that are similar to the query above a user-defined threshold. We have developed a user friendly web application, MULTBLAST that runs a series of BLAST searches on a user-supplied list of proteins against one or more target protein or nucleotide databases. The application pre-processes the data, launches each individual BLAST search on the University of Nevada, Reno''s-TimeLogic DeCypher® system (available from Active Motif, Inc.) and retrieves and combines all the results into a simple, easy to read output file. The output file presents the list of the query proteins, followed by the BLAST results for the matching sequences from each target database in consecutive columns. This format is especially useful for either comparing the results from the different target databases, or analyzing the results while keeping the identification of each target database separate.

Availability

The application is available at the URLhttp://blastpipe.biochem.unr.edu/  相似文献   

8.
The EcoCyc and MetaCyc databases   总被引:5,自引:0,他引:5       下载免费PDF全文
EcoCyc is an organism-specific Pathway/Genome Database that describes the metabolic and signal-transduction pathways of Escherichia coli, its enzymes, and-a new addition-its transport proteins. MetaCyc is a new metabolic-pathway database that describes pathways and enzymes of many different organisms, with a microbial focus. Both databases are queried using the Pathway Tools graphical user interface, which provides a wide variety of query operations and visualization tools. EcoCyc and MetaCyc are available at http://ecocyc.PangeaSystems.com/ecocyc/  相似文献   

9.
10.

Background

Despite several recent advances in the automated generation of draft metabolic reconstructions, the manual curation of these networks to produce high quality genome-scale metabolic models remains a labour-intensive and challenging task.

Results

We present PathwayBooster, an open-source software tool to support the manual comparison and curation of metabolic models. It combines gene annotations from GenBank files and other sources with information retrieved from the metabolic databases BRENDA and KEGG to produce a set of pathway diagrams and reports summarising the evidence for the presence of a reaction in a given organism’s metabolic network. By comparing multiple sources of evidence within a common framework, PathwayBooster assists the curator in the identification of likely false positive (misannotated enzyme) and false negative (pathway hole) reactions. Reaction evidence may be taken from alternative annotations of the same genome and/or a set of closely related organisms.

Conclusions

By integrating and visualising evidence from multiple sources, PathwayBooster reduces the manual effort required in the curation of a metabolic model. The software is available online at http://www.theosysbio.bio.ic.ac.uk/resources/pathwaybooster/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0447-2) contains supplementary material, which is available to authorized users.  相似文献   

11.

Background

Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way.

Results

SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers.

Conclusions

This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at.  相似文献   

12.

Background

Cupriavidus necator JMP134 is a Gram-negative β-proteobacterium able to grow on a variety of aromatic and chloroaromatic compounds as its sole carbon and energy source.

Methodology/Principal Findings

Its genome consists of four replicons (two chromosomes and two plasmids) containing a total of 6631 protein coding genes. Comparative analysis identified 1910 core genes common to the four genomes compared (C. necator JMP134, C. necator H16, C. metallidurans CH34, R. solanacearum GMI1000). Although secondary chromosomes found in the Cupriavidus, Ralstonia, and Burkholderia lineages are all derived from plasmids, analyses of the plasmid partition proteins located on those chromosomes indicate that different plasmids gave rise to the secondary chromosomes in each lineage. The C. necator JMP134 genome contains 300 genes putatively involved in the catabolism of aromatic compounds and encodes most of the central ring-cleavage pathways. This strain also shows additional metabolic capabilities towards alicyclic compounds and the potential for catabolism of almost all proteinogenic amino acids. This remarkable catabolic potential seems to be sustained by a high degree of genetic redundancy, most probably enabling this catabolically versatile bacterium with different levels of metabolic responses and alternative regulation necessary to cope with a challenging environment. From the comparison of Cupriavidus genomes, it is possible to state that a broad metabolic capability is a general trait for Cupriavidus genus, however certain specialization towards a nutritional niche (xenobiotics degradation, chemolithoautotrophy or symbiotic nitrogen fixation) seems to be shaped mostly by the acquisition of “specialized” plasmids.

Conclusions/Significance

The availability of the complete genome sequence for C. necator JMP134 provides the groundwork for further elucidation of the mechanisms and regulation of chloroaromatic compound biodegradation.  相似文献   

13.

Background

Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) for yeast identification is limited by the requirement for protein extraction and for robust reference spectra across yeast species in databases. We evaluated its ability to identify a range of yeasts in comparison with phenotypic methods.

Methods

MALDI-TOF MS was performed on 30 reference and 167 clinical isolates followed by prospective examination of 67 clinical strains in parallel with biochemical testing (total n = 264). Discordant/unreliable identifications were resolved by sequencing of the internal transcribed spacer region of the rRNA gene cluster.

Principal Findings

Twenty (67%; 16 species), and 24 (80%) of 30 reference strains were identified to species, (spectral score ≥2.0) and genus (score ≥1.70)-level, respectively. Of clinical isolates, 140/167 (84%) strains were correctly identified with scores of ≥2.0 and 160/167 (96%) with scores of ≥1.70; amongst Candida spp. (n = 148), correct species assignment at scores of ≥2.0, and ≥1.70 was obtained for 86% and 96% isolates, respectively (vs. 76.4% by biochemical methods). Prospectively, species-level identification was achieved for 79% of isolates, whilst 91% and 94% of strains yielded scores of ≥1.90 and ≥1.70, respectively (100% isolates identified by biochemical methods). All test scores of 1.70–1.90 provided correct species assignment despite being identified to “genus-level”. MALDI-TOF MS identified uncommon Candida spp., differentiated Candida parapsilosis from C. orthopsilosis and C. metapsilosis and distinguished between C. glabrata, C. nivariensis and C. bracarensis. Yeasts with scores of <1.70 were rare species such as C. nivariensis (3/10 strains) and C. bracarensis (n = 1) but included 4/12 Cryptococcus neoformans. There were no misidentifications. Four novel species-specific spectra were obtained. Protein extraction was essential for reliable results.

Conclusions

MALDI-TOF MS enabled rapid, reliable identification of clinically-important yeasts. The addition of spectra to databases and reduction in identification scores required for species-level identification may improve its utility.  相似文献   

14.

Background

In metabolomics researches using mass spectrometry (MS), systematic searching of high-resolution mass data against compound databases is often the first step of metabolite annotation to determine elemental compositions possessing similar theoretical mass numbers. However, incorrect hits derived from errors in mass analyses will be included in the results of elemental composition searches. To assess the quality of peak annotation information, a novel methodology for false discovery rates (FDR) evaluation is presented in this study. Based on the FDR analyses, several aspects of an elemental composition search, including setting a threshold, estimating FDR, and the types of elemental composition databases most reliable for searching are discussed.

Methodology/Principal Findings

The FDR can be determined from one measured value (i.e., the hit rate for search queries) and four parameters determined by Monte Carlo simulation. The results indicate that relatively high FDR values (30–50%) were obtained when searching time-of-flight (TOF)/MS data using the KNApSAcK and KEGG databases. In addition, searches against large all-in-one databases (e.g., PubChem) always produced unacceptable results (FDR >70%). The estimated FDRs suggest that the quality of search results can be improved not only by performing more accurate mass analysis but also by modifying the properties of the compound database. A theoretical analysis indicates that FDR could be improved by using compound database with smaller but higher completeness entries.

Conclusions/Significance

High accuracy mass analysis, such as Fourier transform (FT)-MS, is needed for reliable annotation (FDR <10%). In addition, a small, customized compound database is preferable for high-quality annotation of metabolome data.  相似文献   

15.

Background

Flavonoid metabolites remain in blood for periods of time potentially long enough to allow interactions with cellular components of this tissue. It is well-established that flavonoids are metabolised within the intestine and liver into methylated, sulphated and glucuronidated counterparts, which inhibit platelet function.

Methodology/Principal Findings

We demonstrate evidence suggesting platelets which contain metabolic enzymes, as an alternative location for flavonoid metabolism. Quercetin and a plasma metabolite of this compound, 4′-O-methyl quercetin (tamarixetin) were shown to gain access to the cytosolic compartment of platelets, using confocal microscopy. High performance liquid chromatography (HPLC) and mass spectrometry (MS) showed that quercetin was transformed into a compound with a mass identical to tamarixetin, suggesting that the flavonoid was methylated by catechol-O-methyl transferase (COMT) within platelets.

Conclusions/Significance

Platelets potentially mediate a third phase of flavonoid metabolism, which may impact on the regulation of the function of these cells by metabolites of these dietary compounds.  相似文献   

16.
17.

Background

The Distributed Annotation System (DAS) offers a standard protocol for sharing and integrating annotations on biological sequences. There are more than 1000 DAS sources available and the number is steadily increasing. Clients are an essential part of the DAS system and integrate data from several independent sources in order to create a useful representation to the user. While web-based DAS clients exist, most of them do not have direct interaction capabilities such as dragging and zooming with the mouse.

Results

Here we present GenExp, a web based and fully interactive visual DAS client. GenExp is a genome oriented DAS client capable of creating informative representations of genomic data zooming out from base level to complete chromosomes. It proposes a novel approach to genomic data rendering and uses the latest HTML5 web technologies to create the data representation inside the client browser. Thanks to client-side rendering most position changes do not need a network request to the server and so responses to zooming and panning are almost immediate. In GenExp it is possible to explore the genome intuitively moving it with the mouse just like geographical map applications. Additionally, in GenExp it is possible to have more than one data viewer at the same time and to save the current state of the application to revisit it later on.

Conclusions

GenExp is a new interactive web-based client for DAS and addresses some of the short-comings of the existing clients. It uses client-side data rendering techniques resulting in easier genome browsing and exploration. GenExp is open source under the GPL license and it is freely available at http://gralggen.lsi.upc.edu/recerca/genexp.  相似文献   

18.

Background

Frankia is a genus of soil actinobacteria forming nitrogen-fixing root-nodule symbiotic relationships with non-leguminous woody plant species, collectively called actinorhizals, from eight dicotyledonous families. Frankia strains are classified into four host-specificity groups (HSGs), each of which exhibits a distinct host range. Genome sizes of representative strains of Alnus, Casuarina, and Elaeagnus HSGs are highly diverged and are positively correlated with the size of their host ranges.

Results

The content and size of 12 Frankia genomes were investigated by in silico comparative genome hybridization and pulsed-field gel electrophoresis, respectively. Data were collected from four query strains of each HSG and compared with those of reference strains possessing completely sequenced genomes. The degree of difference in genome content between query and reference strains varied depending on HSG. Elaeagnus query strains were missing the greatest number (22–32%) of genes compared with the corresponding reference genome; Casuarina query strains lacked the fewest (0–4%), with Alnus query strains intermediate (14–18%). In spite of the remarkable gene loss, genome sizes of Alnus and Elaeagnus query strains were larger than would be expected based on total length of the absent genes. In contrast, Casuarina query strains had smaller genomes than expected.

Conclusions

The positive correlation between genome size and host range held true across all investigated strains, supporting the hypothesis that size and genome content differences are responsible for observed diversity in host plants and host plant biogeography among Frankia strains. In addition, our results suggest that different dynamics of shuffling of genome content have contributed to these symbiotic and biogeographic adaptations. Elaeagnus strains, and to a lesser extent Alnus strains, have gained and lost many genes to adapt to a wide range of environments and host plants. Conversely, rather than acquiring new genes, Casuarina strains have discarded genes to reduce genome size, suggesting an evolutionary orientation towards existence as specialist symbionts.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-609) contains supplementary material, which is available to authorized users.  相似文献   

19.

Background

Ocimum sanctum L. (O. tenuiflorum) family-Lamiaceae is an important component of Indian tradition of medicine as well as culture around the world, and hence is known as “Holy basil” in India. This plant is mentioned in the ancient texts of Ayurveda as an “elixir of life” (life saving) herb and worshipped for over 3000 years due to its healing properties. Although used in various ailments, validation of molecules for differential activities is yet to be fully analyzed, as about 80 % of the patents on this plant are on extracts or the plant parts, and mainly focussed on essential oil components. With a view to understand the full metabolic potential of this plant whole nuclear and chloroplast genomes were sequenced for the first time combining the sequence data from 4 libraries and three NGS platforms.

Results

The saturated draft assembly of the genome was about 386 Mb, along with the plastid genome of 142,245 bp, turning out to be the smallest in Lamiaceae. In addition to SSR markers, 136 proteins were identified as homologous to five important plant genomes. Pathway analysis indicated an abundance of phenylpropanoids in O. sanctum. Phylogenetic analysis for chloroplast proteome placed Salvia miltiorrhiza as the nearest neighbor. Comparison of the chemical compounds and genes availability in O. sanctum and S. miltiorrhiza indicated the potential for the discovery of new active molecules.

Conclusion

The genome sequence and annotation of O. sanctum provides new insights into the function of genes and the medicinal nature of the metabolites synthesized in this plant. This information is highly beneficial for mining biosynthetic pathways for important metabolites in related species.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1640-z) contains supplementary material, which is available to authorized users.  相似文献   

20.
MOTIVATION: Bioinformatics requires reusable software tools for creating model-organism databases (MODs). RESULTS: The Pathway Tools is a reusable, production-quality software environment for creating a type of MOD called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc (see http://ecocyc.org) integrates our evolving understanding of the genes, proteins, metabolic network, and genetic network of an organism. This paper provides an overview of the four main components of the Pathway Tools: The PathoLogic component supports creation of new PGDBs from the annotated genome of an organism. The Pathway/Genome Navigator provides query, visualization, and Web-publishing services for PGDBs. The Pathway/Genome Editors support interactive updating of PGDBs. The Pathway Tools ontology defines the schema of PGDBs. The Pathway Tools makes use of the Ocelot object database system for data management services for PGDBs. The Pathway Tools has been used to build PGDBs for 13 organisms within SRI and by external users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号