首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
The ability to aggregate experimental data analysis and results into a concise and interpretable format is a key step in evaluating the success of an experiment. This critical step determines baselines for reproducibility and is a key requirement for data dissemination. However, in practice it can be difficult to consolidate data analyses that encapsulates the broad range of datatypes available in the life sciences. We present STENCIL, a web templating engine designed to organize, visualize, and enable the sharing of interactive data visualizations. STENCIL leverages a flexible web framework for creating templates to render highly customizable visual front ends. This flexibility enables researchers to render small or large sets of experimental outcomes, producing high-quality downloadable and editable figures that retain their original relationship to the source data. REST API based back ends provide programmatic data access and supports easy data sharing. STENCIL is a lightweight tool that can stream data from Galaxy, a popular bioinformatic analysis web platform. STENCIL has been used to support the analysis and dissemination of two large scale genomic projects containing the complete data analysis for over 2,400 distinct datasets. Code and implementation details are available on GitHub: https://github.com/CEGRcode/stencil  相似文献   

3.
Corynebacteria are used for a wide variety of industrial purposes but some species are associated with human diseases. With increasing number of corynebacterial genomes having been sequenced, comparative analysis of these strains may provide better understanding of their biology, phylogeny, virulence and taxonomy that may lead to the discoveries of beneficial industrial strains or contribute to better management of diseases. To facilitate the ongoing research of corynebacteria, a specialized central repository and analysis platform for the corynebacterial research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. Here we present CoryneBase, a genomic database for Corynebacterium with diverse functionality for the analysis of genomes aimed to provide: (1) annotated genome sequences of Corynebacterium where 165,918 coding sequences and 4,180 RNAs can be found in 27 species; (2) access to comprehensive Corynebacterium data through the use of advanced web technologies for interactive web interfaces; and (3) advanced bioinformatic analysis tools consisting of standard BLAST for homology search, VFDB BLAST for sequence homology search against the Virulence Factor Database (VFDB), Pairwise Genome Comparison (PGC) tool for comparative genomic analysis, and a newly designed Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomic analysis. CoryneBase offers the access of a range of Corynebacterium genomic resources as well as analysis tools for comparative genomics and pathogenomics. It is publicly available at http://corynebacterium.um.edu.my/.  相似文献   

4.
Data summarization and triage is one of the current top challenges in visual analytics. The goal is to let users visually inspect large data sets and examine or request data with particular characteristics. The need for summarization and visual analytics is also felt when dealing with digital representations of DNA sequences. Genomic data sets are growing rapidly, making their analysis increasingly more difficult, and raising the need for new, scalable tools. For example, being able to look at very large DNA sequences while immediately identifying potentially interesting regions would provide the biologist with a flexible exploratory and analytical tool. In this paper we present a new concept, the “information profile”, which provides a quantitative measure of the local complexity of a DNA sequence, independently of the direction of processing. The computation of the information profiles is computationally tractable: we show that it can be done in time proportional to the length of the sequence. We also describe a tool to compute the information profiles of a given DNA sequence, and use the genome of the fission yeast Schizosaccharomyces pombe strain 972 h and five human chromosomes 22 for illustration. We show that information profiles are useful for detecting large-scale genomic regularities by visual inspection. Several discovery strategies are possible, including the standalone analysis of single sequences, the comparative analysis of sequences from individuals from the same species, and the comparative analysis of sequences from different organisms. The comparison scale can be varied, allowing the users to zoom-in on specific details, or obtain a broad overview of a long segment. Software applications have been made available for non-commercial use at http://bioinformatics.ua.pt/software/dna-at-glance.  相似文献   

5.
6.

Background

Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging.

Results

To greatly simplify the analysis, we present an Assembly and Alignment-Free (AAF) method (https://sourceforge.net/projects/aaf-phylogeny) that constructs phylogenies directly from unassembled genome sequence data, bypassing both genome assembly and alignment. Using mathematical calculations, models of sequence evolution, and simulated sequencing of published genomes, we address both evolutionary and sampling issues caused by direct reconstruction, including homoplasy, sequencing errors, and incomplete sequencing coverage. From these results, we calculate the statistical properties of the pairwise distances between genomes, allowing us to optimize parameter selection and perform bootstrapping. As a test case with real data, we successfully reconstructed the phylogeny of 12 mammals using raw sequencing reads. We also applied AAF to 21 tropical tree genome datasets with low coverage to demonstrate its effectiveness on non-model organisms.

Conclusion

Our AAF method opens up phylogenomics for species without an appropriate reference genome or high sequence coverage, and rapidly creates a phylogenetic framework for further analysis of genome structure and diversity among non-model organisms.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1647-5) contains supplementary material, which is available to authorized users.  相似文献   

7.
High-throughput techniques have considerably increased the potential of comparative genomics whilst simultaneously posing many new challenges. One of those challenges involves efficiently mining the large amount of data produced and exploring the landscape of both conserved and idiosyncratic genomic regions across multiple genomes. Domains of application of these analyses are diverse: identification of evolutionary events, inference of gene functions, detection of niche-specific genes or phylogenetic profiling. Insyght is a comparative genomic visualization tool that combines three complementary displays: (i) a table for thoroughly browsing amongst homologues, (ii) a comparator of orthologue functional annotations and (iii) a genomic organization view designed to improve the legibility of rearrangements and distinctive loci. The latter display combines symbolic and proportional graphical paradigms. Synchronized navigation across multiple species and interoperability between the views are core features of Insyght. A gene filter mechanism is provided that helps the user to build a biologically relevant gene set according to multiple criteria such as presence/absence of homologues and/or various annotations. We illustrate the use of Insyght with scenarios. Currently, only Bacteria and Archaea are supported. A public instance is available at http://genome.jouy.inra.fr/Insyght. The tool is freely downloadable for private data set analysis.  相似文献   

8.
9.
In plants and animals, chromosomal breakage and fusion events based on conserved syntenic genomic blocks lead to conserved patterns of karyotype evolution among species of the same family. However, karyotype information has not been well utilized in genomic comparison studies. We present CrusView, a Java-based bioinformatic application utilizing Standard Widget Toolkit/Swing graphics libraries and a SQLite database for performing visualized analyses of comparative genomics data in Brassicaceae (crucifer) plants. Compared with similar software and databases, one of the unique features of CrusView is its integration of karyotype information when comparing two genomes. This feature allows users to perform karyotype-based genome assembly and karyotype-assisted genome synteny analyses with preset karyotype patterns of the Brassicaceae genomes. Additionally, CrusView is a local program, which gives its users high flexibility when analyzing unpublished genomes and allows users to upload self-defined genomic information so that they can visually study the associations between genome structural variations and genetic elements, including chromosomal rearrangements, genomic macrosynteny, gene families, high-frequency recombination sites, and tandem and segmental duplications between related species. This tool will greatly facilitate karyotype, chromosome, and genome evolution studies using visualized comparative genomics approaches in Brassicaceae species. CrusView is freely available at http://www.cmbb.arizona.edu/CrusView/.The Brassicaceae (crucifer) plant family contains more than 3,700 species, including the model plant organism Arabidopsis (Arabidopsis thaliana); economically important crop species, such as Brassica rapa and Brassica napus; and close relatives of Arabidopsis used in abiotic stress research, such as Eutrema salsugineum and Schrenkiella parvula. Because Brassicaceae plants have high scientific and economic importance, several whole-genome sequencing projects of the species in this family have been recently launched (http://www.brassica.info). Moreover, Brassicaceae is also a good system for population genomics. The 1001 Arabidopsis Genomes Project (http://www.1001genomes.org/) plans to generate complete genome sequences for 1,001 Arabidopsis strains to study the associations between genetic variation and phenotypic diversity. The Value-directed Evolutionary Genomics Initiative project aims to understand the genome evolution of Brassicaceae species by sequencing several close relatives of Arabidopsis, such as Arabidopsis lyrata and Capsella rubella. Recent advances in high-throughput sequencing technology have greatly expedited these whole-genome sequencing projects of versatile nonmodel organisms. Although increasingly longer reads can now be produced from high-throughput sequencing experiments, de novo assembler tools can only generate contig and/or scaffold sequences from high-throughput sequencing reads. These tools cannot generate complete chromosome sequences without genetic and/or physical maps that typically require years to create. This limitation makes chromosome-scale structural variation (i.e. translocation, inversion, deletion and insertion, and segmental and tandem duplication) and genomic macrosynteny analyses difficult to perform.In both plants and animals, genomes of species within the same family have evolved with conserved karyotype patterns due to the rearrangements of large chromosomal segments. Chromosomal karyotypes can be obtained from comparative chromosomal painting (CCP) experiments by performing in situ hybridization experiments on bacterial artificial chromosome sequences between related species. The genome of each Brassicaceae member is composed of 24 conserved genomic blocks that have been considered as the basic units of chromosomal rearrangement during genome evolution (Lysak et al., 2006). The sizes of these conserved blocks range from several to dozens of megabases. Currently, karyotypes profiled by CCP experiments in approximately 20 Brassicaceae species are available; such karyotypes include those from Arabidopsis (n = 5), Homungia alpine (n = 6), Eutrema spp. (n = 7), A. lyrata (n = 8), B. rapa (n = 10), and Polyctenium fremontii (n = 14). By utilizing the karyotype information in Brassicaceae, we have developed a tool, KGBassembler (for Karyotype-based Genome assembler for Brassicaceae), to finalize the assembly of chromosomes from scaffolds/contigs without relying on a genetic/physical map (Ma et al., 2012).Over the past 2 years, complete whole-genome sequences of several Brassicaceae species have been released, including the aforementioned A. lyrata, S. parvula, B. rapa, and E. salsugineum (Dassanayake et al., 2011; Hu et al., 2011; Wang et al., 2011; Wright and Agren, 2011; Wu et al., 2012; Yang et al., 2013). These genomic resources have opened a new era of comparative genomics in Brassicaceae to better understand the genomic evolution (Cheng et al., 2012). Numerous tools and databases are available for performing comparative genomics analysis in plants. CoGe is a comparative genomics analysis platform that is now a part of the iPlant Collaborative Project (Goff et al., 2011). The CoGe database currently includes nearly 2,000 genome sequences of approximately 1,500 organisms, allowing users to perform online visual analyses of genome synteny and duplication events (Tang and Lyons, 2012). PLAZA and Vista are also Web-based databases that provide comparative analysis services on the genomic data deposited in the databases (Frazer et al., 2004; Van Bel et al., 2012). Other stand-alone bioinformatic applications for comparative genomic analysis, such as Easyfig and genoPlotR, are commonly used to generate synteny plots of given genome segments at a scale ranging from a single gene to one chromosome (Guy et al., 2010; Sullivan et al., 2011).In this work, we present a Java-based bioinformatic application, CrusView, for performing visualized analyses of genome synteny and karyotype evolution in Brassicaceae species. CrusView features a user-friendly graphical user interface (GUI) implemented with Standard Widget Toolkit (SWT)/Swing graphics libraries and a SQLite database used to manage local genomic data. Compared with the most commonly used tools in comparative genomics, one of the unique features of CrusView is that available karyotype data of a Brassicaceae species are incorporated to facilitate karyotype-based chromosome assembly and analyses of chromosomal structural evolution. Compared with Web-based tools, the stand-alone CrusView tool was also designed to give users higher flexibility in analyzing currently unpublished genome data and integrating self-defined genomic information based on the users’ interests, such as gene families, gene duplications, chromosomal break points, Gene Ontology terms, and groups of orthologs/paralogs, with the genomic synteny maps. In addition, CrusView can generate images representing genomic synteny between two compared genomes in PNG/SVG/PDF high-resolution formats that are suitable for publication.  相似文献   

10.
Microbial community profiling identifies and quantifies organisms in metagenomic sequencing data using either reference based or unsupervised approaches. However, current reference based profiling methods only report the presence and abundance of single reference genomes that are available in databases. Since only a small fraction of environmental genomes is represented in genomic databases, these approaches entail the risk of false identifications and often suggest a higher precision than justified by the data. Therefore, we developed MicrobeGPS, a novel metagenomic profiling approach that overcomes these limitations. MicrobeGPS is the first method that identifies microbiota in the sample and estimates their genomic distances to known reference genomes. With this strategy, MicrobeGPS identifies organisms down to the strain level and highlights possibly inaccurate identifications when the correct reference genome is missing. We demonstrate on three metagenomic datasets with different origin that our approach successfully avoids misleading interpretation of results and additionally provides more accurate results than current profiling methods. Our results indicate that MicrobeGPS can enable reference based taxonomic profiling of complex and less characterized microbial communities. MicrobeGPS is open source and available from https://sourceforge.net/projects/microbegps/ as source code and binary distribution for Windows and Linux operating systems.  相似文献   

11.
PhyloTrac is an integrated desktop application for analysis of PhyloChip microarray data. PhyloTrac combined with PhyloChip provides turnkey and comprehensive identification and analysis of bacterial and archaeal communities in complex environmental samples. PhyloTrac is free for noncommercial organizations and is available for all major operating systems at http://www.phylotrac.org/.The PhyloChip is a low-cost Affymetrix GeneChip microarray, developed at Lawrence Berkeley National Laboratory (LBNL), designed to detect and quantify abundance of bacterial and archaeal taxa using signature probes targeting all known 16S rRNA gene sequences. The second generation of the PhyloChip microarray targets nearly 9,000 operational taxonomic units (OTUs), with an average of 24 probes, each 25 bp long, and the upcoming third-generation PhyloChip application will target an even larger number of OTUs. Multiple, complex environments have been successfully analyzed using the PhyloChip microarray, including, among others, air (2), soil (1), the human lung (6), and the gut (9). PhyloChip microarrays are manufactured by Affymetrix, but to date, analysis has been available only from within LBNL, limiting the accessibility of the technology. PhyloTrac addresses this limitation by providing a standardized analysis package for the PhyloChip microarray, including microarray normalization, OTU quantification, multiple interactive visualizations, and integrated analytics.  相似文献   

12.
13.
14.
TnSeq has become a popular technique for determining the essentiality of genomic regions in bacterial organisms. Several methods have been developed to analyze the wealth of data that has been obtained through TnSeq experiments. We developed a tool for analyzing Himar1 TnSeq data called TRANSIT. TRANSIT provides a graphical interface to three different statistical methods for analyzing TnSeq data. These methods cover a variety of approaches capable of identifying essential genes in individual datasets as well as comparative analysis between conditions. We demonstrate the utility of this software by analyzing TnSeq datasets of M. tuberculosis grown on glycerol and cholesterol. We show that TRANSIT can be used to discover genes which have been previously implicated for growth on these carbon sources. TRANSIT is written in Python, and thus can be run on Windows, OSX and Linux platforms. The source code is distributed under the GNU GPL v3 license and can be obtained from the following GitHub repository: https://github.com/mad-lab/transit
This is a PLOS Computational Biology Software paper
  相似文献   

15.

Background

An increasing number of microbial genomes are being sequenced and deposited in public databases. In addition, several closely related strains are also being sequenced in order to understand the genetic basis of diversity and mechanisms that lead to the acquisition of new genetic traits. These exercises have necessitated the requirement for visualizing microbial genomes and performing genome comparisons on a finer scale. We have developed GenomeViz to enable rapid visualization and subsequent comparisons of several microbial genomes in an interactive environment.

Results

Here we describe a program that allows visualization of both qualitative and quantitative information from complete and partially sequenced microbial genomes. Using GenomeViz, data deriving from studies on genomic islands, gene/protein classifications, GC content, GC skew, whole genome alignments, microarrays and proteomics may be plotted. Several genomes can be visualized interactively at the same time from a comparative genomic perspective and publication quality circular genome plots can be created.

Conclusions

GenomeViz should allow researchers to perform visualization and comparative analysis of up to eight different microbial genomes simultaneously.
  相似文献   

16.

Background

First pass methods based on BLAST match are commonly used as an initial step to separate the different phylogenetic histories of genes in microbial genomes, and target putative horizontal gene transfer (HGT) events. This will continue to be necessary given the rapid growth of genomic data and the technical difficulties in conducting large-scale explicit phylogenetic analyses. However, these methods often produce misleading results due to their inability to resolve indirect phylogenetic links and their vulnerability to stochastic events.

Results

A new computational method of rapid, exhaustive and genome-wide detection of HGT was developed, featuring the systematic analysis of BLAST hit distribution patterns in the context of a priori defined hierarchical evolutionary categories. Genes that fall beyond a series of statistically determined thresholds are identified as not adhering to the typical vertical history of the organisms in question, but instead having a putative horizontal origin. Tests on simulated genomic data suggest that this approach effectively targets atypically distributed genes that are highly likely to be HGT-derived, and exhibits robust performance compared to conventional BLAST-based approaches. This method was further tested on real genomic datasets, including Rickettsia genomes, and was compared to previous studies. Results show consistency with currently employed categories of HGT prediction methods. In-depth analysis of both simulated and real genomic data suggests that the method is notably insensitive to stochastic events such as gene loss, rate variation and database error, which are common challenges to the current methodology. An automated pipeline was created to implement this approach and was made publicly available at: https://github.com/DittmarLab/HGTector. The program is versatile, easily deployed, has a low requirement for computational resources.

Conclusions

HGTector is an effective tool for initial or standalone large-scale discovery of candidate HGT-derived genes.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-717) contains supplementary material, which is available to authorized users.  相似文献   

17.
Recombination is an important evolutionary force in bacteria, but it remains challenging to reconstruct the imports that occurred in the ancestry of a genomic sample. Here we present ClonalFrameML, which uses maximum likelihood inference to simultaneously detect recombination in bacterial genomes and account for it in phylogenetic reconstruction. ClonalFrameML can analyse hundreds of genomes in a matter of hours, and we demonstrate its usefulness on simulated and real datasets. We find evidence for recombination hotspots associated with mobile elements in Clostridium difficile ST6 and a previously undescribed 310kb chromosomal replacement in Staphylococcus aureus ST582. ClonalFrameML is freely available at http://clonalframeml.googlecode.com/.  相似文献   

18.
Cyanobacterial KnowledgeBase (CKB) is a free access database that contains the genomic and proteomic information of 74 fully sequenced cyanobacterial genomes belonging to seven orders. The database also contains tools for sequence analysis. The Species report and the gene report provide details about each species and gene (including sequence features and gene ontology annotations) respectively. The database also includes cyanoBLAST, an advanced tool that facilitates comparative analysis, among cyanobacterial genomes and genomes of E. coli (prokaryote) and Arabidopsis (eukaryote). The database is developed and maintained by the Sub-Distributed Informatics Centre (sponsored by the Department of Biotechnology, Govt. of India) of the National Facility for Marine Cyanobacteria, a facility dedicated to marine cyanobacterial research. CKB is freely available at http://nfmc.res.in/ckb/index.html.  相似文献   

19.
This article introduces the neuroimaging community to the dynamic visualization workbench, Weave (https://www.oicweave.org/), and a set of enhancements to allow the visualization of brain maps. The enhancements comprise a set of brain choropleths and the ability to display these as stacked slices, accessible with a slider. For the first time, this allows the neuroimaging community to take advantage of the advanced tools already available for exploring geographic data. Our brain choropleths are modeled after widely used geographic maps but this mashup of brain choropleths with extant visualization software fills an important neuroinformatic niche. To date, most neuroinformatic tools have provided online databases and atlases of the brain, but not good ways to display the related data (e.g., behavioral, genetic, medical, etc). The extension of the choropleth to brain maps allows us to leverage general-purpose visualization tools for concurrent exploration of brain images and related data. Related data can be represented as a variety of tables, charts and graphs that are dynamically linked to each other and to the brain choropleths. We demonstrate that the simplified region-based analyses that underlay choropleths can provide insights into neuroimaging data comparable to those achieved by using more conventional methods. In addition, the interactive interface facilitates additional insights by allowing the user to filter, compare, and drill down into the visual representations of the data. This enhanced data visualization capability is useful during the initial phases of data analysis and the resulting visualizations provide a compelling way to publish data as an online supplement to journal articles.  相似文献   

20.
Isolating pure microbial cultures and cultivating them in the laboratory on defined media is used to more fully characterize the metabolism and physiology of organisms. However, identifying an appropriate growth medium for a novel isolate remains a challenging task. Even organisms with sequenced and annotated genomes can be difficult to grow, despite our ability to build genome-scale metabolic networks that connect genomic data with metabolic function. The scientific literature is scattered with information about defined growth media used successfully for cultivating a wide variety of organisms, but to date there exists no centralized repository to inform efforts to cultivate less characterized organisms by bridging the gap between genomic data and compound composition for growth media. Here we present MediaDB, a manually curated database of defined media that have been used for cultivating organisms with sequenced genomes, with an emphasis on organisms with metabolic network models. The database is accessible online, can be queried by keyword searches or downloaded in its entirety, and can generate exportable individual media formulation files. The data assembled in MediaDB facilitate comparative studies of organism growth media, serve as a starting point for formulating novel growth media, and contribute to formulating media for in silico investigation of metabolic networks. MediaDB is freely available for public use at https://mediadb.systemsbiology.net.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号