共查询到20条相似文献,搜索用时 31 毫秒
1.
Niels W Hanson Kishori M Konwar Alyse K Hawley Tomer Altman Peter D Karp Steven J Hallam 《BMC genomics》2014,15(1)
Background
A convergence of high-throughput sequencing and computational power is transforming biology into information science. Despite these technological advances, converting bits and bytes of sequence information into meaningful insights remains a challenging enterprise. Biological systems operate on multiple hierarchical levels from genomes to biomes. Holistic understanding of biological systems requires agile software tools that permit comparative analyses across multiple information levels (DNA, RNA, protein, and metabolites) to identify emergent properties, diagnose system states, or predict responses to environmental change.Results
Here we adopt the MetaPathways annotation and analysis pipeline and Pathway Tools to construct environmental pathway/genome databases (ePGDBs) that describe microbial community metabolism using MetaCyc, a highly curated database of metabolic pathways and components covering all domains of life. We evaluate Pathway Tools’ performance on three datasets with different complexity and coding potential, including simulated metagenomes, a symbiotic system, and the Hawaii Ocean Time-series. We define accuracy and sensitivity relationships between read length, coverage and pathway recovery and evaluate the impact of taxonomic pruning on ePGDB construction and interpretation. Resulting ePGDBs provide interactive metabolic maps, predict emergent metabolic pathways associated with biosynthesis and energy production and differentiate between genomic potential and phenotypic expression across defined environmental gradients.Conclusions
This multi-tiered analysis provides the user community with specific operating guidelines, performance metrics and prediction hazards for more reliable ePGDB construction and interpretation. Moreover, it demonstrates the power of Pathway Tools in predicting metabolic interactions in natural and engineered ecosystems.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-619) contains supplementary material, which is available to authorized users. 相似文献2.
3.
Background
A direct link between the names and structures of compounds and the functional groups contained within them is important, not only because biochemists frequently rely on literature that uses a free-text format to describe functional groups, but also because metabolic models depend upon the connections between enzymes and substrates being known and appropriately stored in databases.Methodology
We have developed a database named “Biochemical Substructure Search Catalogue” (BiSSCat), which contains 489 functional groups, >200,000 compounds and >1,000,000 different computationally constructed substructures, to allow identification of chemical compounds of biological interest.Conclusions
This database and its associated web-based search program (http://bisscat.org/) can be used to find compounds containing selected combinations of substructures and functional groups. It can be used to determine possible additional substrates for known enzymes and for putative enzymes found in genome projects. Its applications to enzyme inhibitor design are also discussed. 相似文献4.
Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics 总被引:4,自引:1,他引:3
Fermin D Allen BB Blackwell TW Menon R Adamski M Xu Y Ulintz P Omenn GS States DJ 《Genome biology》2006,7(4):R35-13
Background
Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project. Because this target database is orders of magnitude larger than the databases traditionally employed in tandem mass spectra analysis, careful attention to significance testing is required. Confidence of identification is assessed using our previously described Poisson statistic, which estimates the significance of multi-peptide identifications incorporating the length of the matching sequence, number of spectra searched and size of the target sequence database. 相似文献5.
Robert W Byrnes Dawn Cotter Andreia Maer Joshua Li David Nadeau Shankar Subramaniam 《BMC systems biology》2009,3(1):99-10
Background
Pathway models serve as the basis for much of systems biology. They are often built using programs designed for the purpose. Constructing new models generally requires simultaneous access to experimental data of diverse types, to databases of well-characterized biological compounds and molecular intermediates, and to reference model pathways. However, few if any software applications provide all such capabilities within a single user interface. 相似文献6.
7.
Basic Local Alignment Search Tool, (BLAST) allows the comparison of a query sequence/s
to a database of sequences and identifies those sequences that are similar to the query above a
user-defined threshold. We have developed a user friendly web application, MULTBLAST that runs a
series of BLAST searches on a user-supplied list of proteins against one or more target protein or
nucleotide databases. The application pre-processes the data, launches each individual BLAST search
on the University of Nevada, Reno''s-TimeLogic DeCypher® system (available from
Active Motif, Inc.) and retrieves and combines all the results into a simple, easy to read output file.
The output file presents the list of the query proteins, followed by the BLAST results for the matching
sequences from each target database in consecutive columns. This format is especially useful for
either comparing the results from the different target databases, or analyzing the results while keeping
the identification of each target database separate.
Availability
The application is available at the URLhttp://blastpipe.biochem.unr.edu/ 相似文献8.
Karp PD Riley M Saier M Paulsen IT Paley SM Pellegrini-Toole A 《Nucleic acids research》2000,28(1):56-59
EcoCyc is an organism-specific Pathway/Genome Database that describes the metabolic and signal-transduction pathways of Escherichia coli, its enzymes, and-a new addition-its transport proteins. MetaCyc is a new metabolic-pathway database that describes pathways and enzymes of many different organisms, with a microbial focus. Both databases are queried using the Pathway Tools graphical user interface, which provides a wide variety of query operations and visualization tools. EcoCyc and MetaCyc are available at http://ecocyc.PangeaSystems.com/ecocyc/ 相似文献
9.
10.
Background
Despite several recent advances in the automated generation of draft metabolic reconstructions, the manual curation of these networks to produce high quality genome-scale metabolic models remains a labour-intensive and challenging task.Results
We present PathwayBooster, an open-source software tool to support the manual comparison and curation of metabolic models. It combines gene annotations from GenBank files and other sources with information retrieved from the metabolic databases BRENDA and KEGG to produce a set of pathway diagrams and reports summarising the evidence for the presence of a reaction in a given organism’s metabolic network. By comparing multiple sources of evidence within a common framework, PathwayBooster assists the curator in the identification of likely false positive (misannotated enzyme) and false negative (pathway hole) reactions. Reaction evidence may be taken from alternative annotations of the same genome and/or a set of closely related organisms.Conclusions
By integrating and visualising evidence from multiple sources, PathwayBooster reduces the manual effort required in the curation of a metabolic model. The software is available online at http://www.theosysbio.bio.ic.ac.uk/resources/pathwaybooster/.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0447-2) contains supplementary material, which is available to authorized users. 相似文献11.
Background
Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way.Results
SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers.Conclusions
This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at. 相似文献12.
Athanasios Lykidis Danilo Pérez-Pantoja Thomas Ledger Kostantinos Mavromatis Iain J. Anderson Natalia N. Ivanova Sean D. Hooper Alla Lapidus Susan Lucas Bernardo González Nikos C. Kyrpides 《PloS one》2010,5(3)
Background
Cupriavidus necator JMP134 is a Gram-negative β-proteobacterium able to grow on a variety of aromatic and chloroaromatic compounds as its sole carbon and energy source.Methodology/Principal Findings
Its genome consists of four replicons (two chromosomes and two plasmids) containing a total of 6631 protein coding genes. Comparative analysis identified 1910 core genes common to the four genomes compared (C. necator JMP134, C. necator H16, C. metallidurans CH34, R. solanacearum GMI1000). Although secondary chromosomes found in the Cupriavidus, Ralstonia, and Burkholderia lineages are all derived from plasmids, analyses of the plasmid partition proteins located on those chromosomes indicate that different plasmids gave rise to the secondary chromosomes in each lineage. The C. necator JMP134 genome contains 300 genes putatively involved in the catabolism of aromatic compounds and encodes most of the central ring-cleavage pathways. This strain also shows additional metabolic capabilities towards alicyclic compounds and the potential for catabolism of almost all proteinogenic amino acids. This remarkable catabolic potential seems to be sustained by a high degree of genetic redundancy, most probably enabling this catabolically versatile bacterium with different levels of metabolic responses and alternative regulation necessary to cope with a challenging environment. From the comparison of Cupriavidus genomes, it is possible to state that a broad metabolic capability is a general trait for Cupriavidus genus, however certain specialization towards a nutritional niche (xenobiotics degradation, chemolithoautotrophy or symbiotic nitrogen fixation) seems to be shaped mostly by the acquisition of “specialized” plasmids.Conclusions/Significance
The availability of the complete genome sequence for C. necator JMP134 provides the groundwork for further elucidation of the mechanisms and regulation of chloroaromatic compound biodegradation. 相似文献13.
Pinto A Halliday C Zahra M van Hal S Olma T Maszewska K Iredell JR Meyer W Chen SC 《PloS one》2011,6(10):e25712
Background
Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) for yeast identification is limited by the requirement for protein extraction and for robust reference spectra across yeast species in databases. We evaluated its ability to identify a range of yeasts in comparison with phenotypic methods.Methods
MALDI-TOF MS was performed on 30 reference and 167 clinical isolates followed by prospective examination of 67 clinical strains in parallel with biochemical testing (total n = 264). Discordant/unreliable identifications were resolved by sequencing of the internal transcribed spacer region of the rRNA gene cluster.Principal Findings
Twenty (67%; 16 species), and 24 (80%) of 30 reference strains were identified to species, (spectral score ≥2.0) and genus (score ≥1.70)-level, respectively. Of clinical isolates, 140/167 (84%) strains were correctly identified with scores of ≥2.0 and 160/167 (96%) with scores of ≥1.70; amongst Candida spp. (n = 148), correct species assignment at scores of ≥2.0, and ≥1.70 was obtained for 86% and 96% isolates, respectively (vs. 76.4% by biochemical methods). Prospectively, species-level identification was achieved for 79% of isolates, whilst 91% and 94% of strains yielded scores of ≥1.90 and ≥1.70, respectively (100% isolates identified by biochemical methods). All test scores of 1.70–1.90 provided correct species assignment despite being identified to “genus-level”. MALDI-TOF MS identified uncommon Candida spp., differentiated Candida parapsilosis from C. orthopsilosis and C. metapsilosis and distinguished between C. glabrata, C. nivariensis and C. bracarensis. Yeasts with scores of <1.70 were rare species such as C. nivariensis (3/10 strains) and C. bracarensis (n = 1) but included 4/12 Cryptococcus neoformans. There were no misidentifications. Four novel species-specific spectra were obtained. Protein extraction was essential for reliable results.Conclusions
MALDI-TOF MS enabled rapid, reliable identification of clinically-important yeasts. The addition of spectra to databases and reduction in identification scores required for species-level identification may improve its utility. 相似文献14.
Fumio Matsuda Yoko Shinbo Akira Oikawa Masami Yokota Hirai Oliver Fiehn Shigehiko Kanaya Kazuki Saito 《PloS one》2009,4(10)
Background
In metabolomics researches using mass spectrometry (MS), systematic searching of high-resolution mass data against compound databases is often the first step of metabolite annotation to determine elemental compositions possessing similar theoretical mass numbers. However, incorrect hits derived from errors in mass analyses will be included in the results of elemental composition searches. To assess the quality of peak annotation information, a novel methodology for false discovery rates (FDR) evaluation is presented in this study. Based on the FDR analyses, several aspects of an elemental composition search, including setting a threshold, estimating FDR, and the types of elemental composition databases most reliable for searching are discussed.Methodology/Principal Findings
The FDR can be determined from one measured value (i.e., the hit rate for search queries) and four parameters determined by Monte Carlo simulation. The results indicate that relatively high FDR values (30–50%) were obtained when searching time-of-flight (TOF)/MS data using the KNApSAcK and KEGG databases. In addition, searches against large all-in-one databases (e.g., PubChem) always produced unacceptable results (FDR >70%). The estimated FDRs suggest that the quality of search results can be improved not only by performing more accurate mass analysis but also by modifying the properties of the compound database. A theoretical analysis indicates that FDR could be improved by using compound database with smaller but higher completeness entries.Conclusions/Significance
High accuracy mass analysis, such as Fourier transform (FT)-MS, is needed for reliable annotation (FDR <10%). In addition, a small, customized compound database is preferable for high-quality annotation of metabolome data. 相似文献15.
Bernice Wright Trevor Gibson Jeremy Spencer Julie A. Lovegrove Jonathan M. Gibbins 《PloS one》2010,5(3)
Background
Flavonoid metabolites remain in blood for periods of time potentially long enough to allow interactions with cellular components of this tissue. It is well-established that flavonoids are metabolised within the intestine and liver into methylated, sulphated and glucuronidated counterparts, which inhibit platelet function.Methodology/Principal Findings
We demonstrate evidence suggesting platelets which contain metabolic enzymes, as an alternative location for flavonoid metabolism. Quercetin and a plasma metabolite of this compound, 4′-O-methyl quercetin (tamarixetin) were shown to gain access to the cytosolic compartment of platelets, using confocal microscopy. High performance liquid chromatography (HPLC) and mass spectrometry (MS) showed that quercetin was transformed into a compound with a mass identical to tamarixetin, suggesting that the flavonoid was methylated by catechol-O-methyl transferase (COMT) within platelets.Conclusions/Significance
Platelets potentially mediate a third phase of flavonoid metabolism, which may impact on the regulation of the function of these cells by metabolites of these dietary compounds. 相似文献16.
17.
Background
The Distributed Annotation System (DAS) offers a standard protocol for sharing and integrating annotations on biological sequences. There are more than 1000 DAS sources available and the number is steadily increasing. Clients are an essential part of the DAS system and integrate data from several independent sources in order to create a useful representation to the user. While web-based DAS clients exist, most of them do not have direct interaction capabilities such as dragging and zooming with the mouse.Results
Here we present GenExp, a web based and fully interactive visual DAS client. GenExp is a genome oriented DAS client capable of creating informative representations of genomic data zooming out from base level to complete chromosomes. It proposes a novel approach to genomic data rendering and uses the latest HTML5 web technologies to create the data representation inside the client browser. Thanks to client-side rendering most position changes do not need a network request to the server and so responses to zooming and panning are almost immediate. In GenExp it is possible to explore the genome intuitively moving it with the mouse just like geographical map applications. Additionally, in GenExp it is possible to have more than one data viewer at the same time and to save the current state of the application to revisit it later on.Conclusions
GenExp is a new interactive web-based client for DAS and addresses some of the short-comings of the existing clients. It uses client-side data rendering techniques resulting in easier genome browsing and exploration. GenExp is open source under the GPL license and it is freely available at http://gralggen.lsi.upc.edu/recerca/genexp. 相似文献18.
Ken-ichi Kucho Takashi Yamanaka Hideo Sasakawa Samira R Mansour Toshiki Uchiumi 《BMC genomics》2014,15(1)
Background
Frankia is a genus of soil actinobacteria forming nitrogen-fixing root-nodule symbiotic relationships with non-leguminous woody plant species, collectively called actinorhizals, from eight dicotyledonous families. Frankia strains are classified into four host-specificity groups (HSGs), each of which exhibits a distinct host range. Genome sizes of representative strains of Alnus, Casuarina, and Elaeagnus HSGs are highly diverged and are positively correlated with the size of their host ranges.Results
The content and size of 12 Frankia genomes were investigated by in silico comparative genome hybridization and pulsed-field gel electrophoresis, respectively. Data were collected from four query strains of each HSG and compared with those of reference strains possessing completely sequenced genomes. The degree of difference in genome content between query and reference strains varied depending on HSG. Elaeagnus query strains were missing the greatest number (22–32%) of genes compared with the corresponding reference genome; Casuarina query strains lacked the fewest (0–4%), with Alnus query strains intermediate (14–18%). In spite of the remarkable gene loss, genome sizes of Alnus and Elaeagnus query strains were larger than would be expected based on total length of the absent genes. In contrast, Casuarina query strains had smaller genomes than expected.Conclusions
The positive correlation between genome size and host range held true across all investigated strains, supporting the hypothesis that size and genome content differences are responsible for observed diversity in host plants and host plant biogeography among Frankia strains. In addition, our results suggest that different dynamics of shuffling of genome content have contributed to these symbiotic and biogeographic adaptations. Elaeagnus strains, and to a lesser extent Alnus strains, have gained and lost many genes to adapt to a wide range of environments and host plants. Conversely, rather than acquiring new genes, Casuarina strains have discarded genes to reduce genome size, suggesting an evolutionary orientation towards existence as specialist symbionts.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-609) contains supplementary material, which is available to authorized users. 相似文献19.
Shubhra Rastogi Alok Kalra Vikrant Gupta Feroz Khan Raj Kishori Lal Anil Kumar Tripathi Sriram Parameswaran Chellappa Gopalakrishnan Gopalakrishna Ramaswamy Ajit Kumar Shasany 《BMC genomics》2015,16(1)
Background
Ocimum sanctum L. (O. tenuiflorum) family-Lamiaceae is an important component of Indian tradition of medicine as well as culture around the world, and hence is known as “Holy basil” in India. This plant is mentioned in the ancient texts of Ayurveda as an “elixir of life” (life saving) herb and worshipped for over 3000 years due to its healing properties. Although used in various ailments, validation of molecules for differential activities is yet to be fully analyzed, as about 80 % of the patents on this plant are on extracts or the plant parts, and mainly focussed on essential oil components. With a view to understand the full metabolic potential of this plant whole nuclear and chloroplast genomes were sequenced for the first time combining the sequence data from 4 libraries and three NGS platforms.Results
The saturated draft assembly of the genome was about 386 Mb, along with the plastid genome of 142,245 bp, turning out to be the smallest in Lamiaceae. In addition to SSR markers, 136 proteins were identified as homologous to five important plant genomes. Pathway analysis indicated an abundance of phenylpropanoids in O. sanctum. Phylogenetic analysis for chloroplast proteome placed Salvia miltiorrhiza as the nearest neighbor. Comparison of the chemical compounds and genes availability in O. sanctum and S. miltiorrhiza indicated the potential for the discovery of new active molecules.Conclusion
The genome sequence and annotation of O. sanctum provides new insights into the function of genes and the medicinal nature of the metabolites synthesized in this plant. This information is highly beneficial for mining biosynthetic pathways for important metabolites in related species.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1640-z) contains supplementary material, which is available to authorized users. 相似文献20.
MOTIVATION: Bioinformatics requires reusable software tools for creating model-organism databases (MODs). RESULTS: The Pathway Tools is a reusable, production-quality software environment for creating a type of MOD called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc (see http://ecocyc.org) integrates our evolving understanding of the genes, proteins, metabolic network, and genetic network of an organism. This paper provides an overview of the four main components of the Pathway Tools: The PathoLogic component supports creation of new PGDBs from the annotated genome of an organism. The Pathway/Genome Navigator provides query, visualization, and Web-publishing services for PGDBs. The Pathway/Genome Editors support interactive updating of PGDBs. The Pathway Tools ontology defines the schema of PGDBs. The Pathway Tools makes use of the Ocelot object database system for data management services for PGDBs. The Pathway Tools has been used to build PGDBs for 13 organisms within SRI and by external users. 相似文献