首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture.  相似文献   

2.
We present the first collection of tools aimed at automated genome assembly validation. This work formalizes several mechanisms for detecting mis-assemblies, and describes their implementation in our automated validation pipeline, called amosvalidate. We demonstrate the application of our pipeline in both bacterial and eukaryotic genome assemblies, and highlight several assembly errors in both draft and finished genomes. The software described is compatible with common assembly formats and is released, open-source, at .  相似文献   

3.
MOTIVATION: The increased availability of genome sequences of closely related organisms has generated much interest in utilizing homology to improve the accuracy of gene prediction programs. Generalized pair hidden Markov models (GPHMMs) have been proposed as one means to address this need. However, all GPHMM implementations currently available are either closed-source or the details of their operation are not fully described in the literature, leaving a significant hurdle for others wishing to advance the state of the art in GPHMM design. RESULTS: We have developed an open-source GPHMM gene finder, TWAIN, which performs very well on two related Aspergillus species, A.fumigatus and A.nidulans, finding 89% of the exons and predicting 74% of the gene models exactly correctly in a test set of 147 conserved gene pairs. We describe the implementation of this GPHMM and we explicitly address the assumptions and limitations of the system. We suggest possible ways of relaxing those assumptions to improve the utility of the system without sacrificing efficiency beyond what is practical. AVAILABILITY: Available at http://www.tigr.org/software/pirate/twain/twain.html under the open-source Artistic License.  相似文献   

4.
Restauro-G: A Rapid Genome Re-Annotation System for Comparative Genomics   总被引:1,自引:0,他引:1  
of complete genome sequences submitted directly from sequencing projects are diverse in terms of annotation strategies and update frequencies. These inconsistencies make comparative studies difficult. To allow rapid data preparation of a large number of complete genomes, automation and speed are important for genome re-annotation. Here we introduce an open-source rapid genome re-annotation software system, Restauro-G, specialized for bacterial genomes. Restauro-G re-annotates a genome by similarity searches utilizing the BLASTLike Alignment Tool, referring to protein databases such as UniProt KB, NCBI nr, NCBI COGs, Pfam, and PSORTb. Re-annotation by Restauro-G achieved over 98% accuracy for most bacterial chromosomes in comparison with the original manually curated annotation of EMBL releases. Restauro-G was developed in the generic bioinformatics workbench G-language Genome Analysis Environment and is distributed at http://restauro-g.iab.keio.ac.jp/ under the GNU General Public License.  相似文献   

5.
Rainbow is a program that provides a graphic user interface to construct supertrees using different methods. It also provides tools to analyze the quality of the supertrees produced. Rainbow is available for Mac OS X, Windows and Linux. AVAILABILITY: Rainbow is a free open-source software. Its binary files, source code, and manual can be downloaded from the Rainbow web page: http://genome.cs.iastate.edu/Rainbow/  相似文献   

6.
Controlled simulations of genome evolution are useful for benchmarking tools. However, many simulators lack extensibility and cannot measure parameters directly from data. These issues are addressed by three new open-source programs: GSIMULATOR (for neutrally evolving DNA), SIMGRAM (for generic structured features) and SIMGENOME (for syntenic genome blocks). Each offers algorithms for parameter measurement and reconstruction of ancestral sequence. All three tools out-perform the leading neutral DNA simulator (DAWG) in benchmarks. The programs are available at .  相似文献   

7.
BioNetBuilder: automatic integration of biological networks   总被引:1,自引:0,他引:1  
BioNetBuilder is an open-source client-server Cytoscape plugin that offers a user-friendly interface to create biological networks integrated from several databases. Users can create networks for approximately 1500 organisms, including common model organisms and human. Currently supported databases include: DIP, BIND, Prolinks, KEGG, HPRD, The BioGrid and GO, among others. The BioNetBuilder plugin client is available as a Java Webstart, providing a platform-independent network interface to these public databases. Availability: http://err.bio.nyu.edu/cytoscape/bionetbuilder/  相似文献   

8.
9.
Third-generation sequencing technologies can generate very long reads with relatively high error rates. The lengths of the reads, which sometimes exceed one million bases, make them invaluable for resolving complex repeats that cannot be assembled using shorter reads. Many high-quality genome assemblies have already been produced, curated, and annotated using the previous generation of sequencing data, and full re-assembly of these genomes with long reads is not always practical or cost-effective. One strategy to upgrade existing assemblies is to generate additional coverage using long-read data, and add that to the previously assembled contigs. SAMBA is a tool that is designed to scaffold and gap-fill existing genome assemblies with additional long-read data, resulting in substantially greater contiguity. SAMBA is the only tool of its kind that also computes and fills in the sequence for all spanned gaps in the scaffolds, yielding much longer contigs. Here we compare SAMBA to several similar tools capable of re-scaffolding assemblies using long-read data, and we show that SAMBA yields better contiguity and introduces fewer errors than competing methods. SAMBA is open-source software that is distributed at https://github.com/alekseyzimin/masurca.  相似文献   

10.
This article describes specific procedures for conducting quality assessment of Affymetrix GeneChip(R) soybean genome data and for performing analyses to determine differential gene expression using the open-source R programming environment in conjunction with the open-source Bioconductor software. We describe procedures for extracting those Affymetrix probe set IDs related specifically to the soybean genome on the Affymetrix soybean chip and demonstrate the use of exploratory plots including images of raw probe-level data, boxplots, density plots and M versus A plots. RNA degradation and recommended procedures from Affymetrix for quality control are discussed. An appropriate probe-level model provides an excellent quality assessment tool. To demonstrate this, we discuss and display chip pseudo-images of weights, residuals and signed residuals and additional probe-level modeling plots that may be used to identify aberrant chips. The Robust Multichip Averaging (RMA) procedure was used for background correction, normalization and summarization of the AffyBatch probe-level data to obtain expression level data and to discover differentially expressed genes. Examples of boxplots and MA plots are presented for the expression level data. Volcano plots and heatmaps are used to demonstrate the use of (log) fold changes in conjunction with ordinary and moderated t-statistics for determining interesting genes. We show, with real data, how implementation of functions in R and Bioconductor successfully identified differentially expressed genes that may play a role in soybean resistance to a fungal pathogen, Phakopsora pachyrhizi. Complete source code for performing all quality assessment and statistical procedures may be downloaded from our web source: http://css.ncifcrf.gov/services/download/MicroarraySoybean.zip.  相似文献   

11.
Versatile and open software for comparing large genomes   总被引:1,自引:0,他引:1       下载免费PDF全文
The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing tools provide alternative ways to analyze genome alignments. The new system is the first version of MUMmer to be released as open-source software. This allows other developers to contribute to the code base and freely redistribute the code. The MUMmer sources are available at .  相似文献   

12.
MOTIVATION: As more whole genome sequences become available, comparing multiple genomes at the sequence level can provide insight into new biological discovery. However, there are significant challenges for genome comparison. The challenge includes requirement for computational resources owing to the large volume of genome data. More importantly, since the choice of genomes to be compared is entirely subjective, there are too many choices for genome comparison. For these reasons, there is pressing need for bioinformatics systems for comparing multiple genomes where users can choose genomes to be compared freely. RESULTS: PLATCOM (Platform for Computational Comparative Genomics) is an integrated system for the comparative analysis of multiple genomes. The system is built on several public databases and a suite of genome analysis applications are provided as exemplary genome data mining tools over these internal databases. Researchers are able to visually investigate genomic sequence similarities, conserved gene neighborhoods, conserved metabolic pathways and putative gene fusion events among a set of selected multiple genomes. AVAILABILITY: http://platcom.informatics.indiana.edu/platcom  相似文献   

13.
14.
One of the most complex and computationally intensive tasks of genome sequence analysis is genome assembly. Even today, few centres have the resources, in both software and hardware, to assemble a genome from the thousands or millions of individual sequences generated in a whole-genome shotgun sequencing project. With the rapid growth in the number of sequenced genomes has come an increase in the number of organisms for which two or more closely related species have been sequenced. This has created the possibility of building a comparative genome assembly algorithm, which can assemble a newly sequenced genome by mapping it onto a reference genome. We describe here a novel algorithm for comparative genome assembly that can accurately assemble a typical bacterial genome in less than four minutes on a standard desktop computer. The software is available as part of the open-source AMOS project.  相似文献   

15.
REGANOR     
With >1,000 prokaryotic genome sequencing projects ongoing or already finished, comprehensive comparative analysis of the gene content of these genomes has become viable. To allow for a meaningful comparative analysis, gene prediction of the various genomes should be as accurate as possible. It is clear that improving the state of genome annotation requires automated gene identification methods to cope with the influence of artifacts, such as genomic GC content. There is currently still room for improvement in the state of annotations. We present a web server and a database of high-quality gene predictions. The web server is a resource for gene identification in prokaryote genome sequences. It implements our previously described, accurate gene finding method REGANOR. We also provide novel gene predictions for 241 complete, or almost complete, prokaryotic genomes. We demonstrate how this resource can easily be utilised to identify promising candidates for currently missing genes from genome annotations with several examples. All data sets are available online. AVAILABILITY: The gene finding server is accessible via https://www.cebitec.uni-bielefeld.de/groups/brf/software/reganor/cgi-bin/reganor_upload.cgi. The server software is available with the GenDB genome annotation system (version 2.2.1 onwards) under the GNU general public license. The software can be downloaded from https://sourceforge.net/projects/gendb/. More information on installing GenDB and REGANOR and the system requirements can be found on the GenDB project page http://www.cebitec.uni-bielefeld.de/groups/brf/software/wiki/GenDBWiki/AdministratorDocumentation/GenDBInstallation  相似文献   

16.
Variable (V) domains of immunoglobulins (Ig) and T cell receptors (TCR) are generated from genomic V gene segments (V-genes). At present, such V-genes have been annotated only within the genome of a few species. We have developed a bioinformatics tool that accelerates the task of identifying functional V-genes from genome datasets. Automated recognition is accomplished by recognizing key V-gene signatures, such as recombination signal sequences, size of the exon region, and position of amino acid motifs within the translated exon. This algorithm also classifies extracted V-genes into either TCR or Ig loci. We describe the implementation of the algorithm and validate its accuracy by comparing V-genes identified from the human and mouse genomes with known V-gene annotations documented and available in public repositories. The advantages and utility of the algorithm are illustrated by using it to identify functional V-genes in the rat genome, where V-gene annotation is still incomplete. This allowed us to perform a comparative human–rodent phylogenetic analysis based on V-genes that supports the hypothesis that distinct evolutionary pressures shape the TCRs and Igs V-gene repertoires. Our program, together with a user graphical interface, is available as open-source software, downloadable at http://code.google.com/p/vgenextract/.  相似文献   

17.
MOTIVATION: Advances in microscopy technology have led to the creation of high-throughput microscopes that are capable of generating several hundred gigabytes of images in a few days. Analyzing such wealth of data manually is nearly impossible and requires an automated approach. There are at present a number of open-source and commercial software packages that allow the user to apply algorithms of different degrees of sophistication to the images and extract desired metrics. However, the types of metrics that can be extracted are severely limited by the specific image processing algorithms that the application implements, and by the expertise of the user. In most commercial software, code unavailability prevents implementation by the end user of newly developed algorithms better suited for a particular type of imaging assay. While it is possible to implement new algorithms in open-source software, rewiring an image processing application requires a high degree of expertise. To obviate these limitations, we have developed an open-source high-throughput application that allows implementation of different biological assays such as cell tracking or ancestry recording, through the use of small, relatively simple image processing modules connected into sophisticated imaging pipelines. By connecting modules, non-expert users can apply the particular combination of well-established and novel algorithms developed by us and others that are best suited for each individual assay type. In addition, our data exploration and visualization modules make it easy to discover or select specific cell phenotypes from a heterogeneous population. AVAILABILITY: CellAnimation is distributed under the Creative Commons Attribution-NonCommercial 3.0 Unported license (http://creativecommons.org/licenses/by-nc/3.0/). CellAnimationsource code and documentation may be downloaded from www.vanderbilt.edu/viibre/software/documents/CellAnimation.zip. Sample data are available at www.vanderbilt.edu/viibre/software/documents/movies.zip. CONTACT: walter.georgescu@vanderbilt.edu SUPPLEMENTARY INFORMATION: Supplementary data available at Bioinformatics online.  相似文献   

18.
The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5–10 hours to process a full exome sequence and $30 and 3–8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2.  相似文献   

19.
The introduction of next generation sequencing methods in genome studies has made it possible to shift research from a gene-centric approach to a genome wide view. Although methods and tools to detect single nucleotide polymorphisms are becoming more mature, methods to identify and visualize structural variation (SV) are still in their infancy. Most genome browsers can only compare a given sequence to a reference genome; therefore, direct comparison of multiple individuals still remains a challenge. Therefore, the implementation of efficient approaches to explore and visualize SVs and directly compare two or more individuals is desirable. In this article, we present a visualization approach that uses space-filling Hilbert curves to explore SVs based on both read-depth and pair-end information. An interactive open-source Java application, called Meander, implements the proposed methodology, and its functionality is demonstrated using two cases. With Meander, users can explore variations at different levels of resolution and simultaneously compare up to four different individuals against a common reference. The application was developed using Java version 1.6 and Processing.org and can be run on any platform. It can be found at http://homes.esat.kuleuven.be/~bioiuser/meander.  相似文献   

20.
MOTIVATION: A few years ago, FlyBase undertook to design a new database schema to store Drosophila data. It would fully integrate genomic sequence and annotation data with bibliographic, genetic, phenotypic and molecular data from the literature representing a distillation of the first 100 years of research on this major animal model system. In developing this new integrated schema, FlyBase also made a commitment to ensure that its design was generic, extensible and available as open source, so that it could be employed as the core schema of any model organism data repository, thereby avoiding redundant software development and potentially increasing interoperability. Our question was whether we could create a relational database schema that would be successfully reused. RESULTS: Chado is a relational database schema now being used to manage biological knowledge for a wide variety of organisms, from human to pathogens, especially the classes of information that directly or indirectly can be associated with genome sequences or the primary RNA and protein products encoded by a genome. Biological databases that conform to this schema can interoperate with one another, and with application software from the Generic Model Organism Database (GMOD) toolkit. Chado is distinctive because its design is driven by ontologies. The use of ontologies (or controlled vocabularies) is ubiquitous across the schema, as they are used as a means of typing entities. The Chado schema is partitioned into integrated subschemas (modules), each encapsulating a different biological domain, and each described using representations in appropriate ontologies. To illustrate this methodology, we describe here the Chado modules used for describing genomic sequences. AVAILABILITY: GMOD is a collaboration of several model organism database groups, including FlyBase, to develop a set of open-source software for managing model organism data. The Chado schema is freely distributed under the terms of the Artistic License (http://www.opensource.org/licenses/artistic-license.php) from GMOD (www.gmod.org).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号