首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Comparative genomics is a powerful means to gain insight into the evolutionary processes that shape the genomes of related species. As the number of sequenced genomes increases, the development of software to perform accurate cross-species analyses becomes indispensable. However, many implementations that have the ability to compare multiple genomes exhibit unfavorable computational and memory requirements, limiting the number of genomes that can be analyzed in one run. Here, we present a software package to unveil genomic homology based on the identification of conservation of gene content and gene order (collinearity), i-ADHoRe 3.0, and its application to eukaryotic genomes. The use of efficient algorithms and support for parallel computing enable the analysis of large-scale data sets. Unlike other tools, i-ADHoRe can process the Ensembl data set, containing 49 species, in 1?h. Furthermore, the profile search is more sensitive to detect degenerate genomic homology than chaining pairwise collinearity information based on transitive homology. From ultra-conserved collinear regions between mammals and birds, by integrating coexpression information and protein-protein interactions, we identified more than 400 regions in the human genome showing significant functional coherence. The different algorithmical improvements ensure that i-ADHoRe 3.0 will remain a powerful tool to study genome evolution.  相似文献   

2.
Published genomes frequently contain erroneous gene models that represent issues associated with identification of open reading frames,start sites,splice sites,and related structural features.The source of these inconsistencies is often traced back to integration across text file formats designed to describe long read alignments and predicted gene structures.In addition,the majority of gene prediction frameworks do not provide robust downstream filtering to remove problematic gene annotations,nor do they represent these annotations in a format consistent with current file standards.These frameworks also lack consideration for functional attributes,such as the presence or absence of protein domains that can be used for gene model validation.To provide oversight to the increasing number of published genome annotations,we present a software package,the Gene Filtering,Analysis,and Conversion(gFACs),to filter,analyze,and convert predicted gene models and alignments.The software operates across a wide range of alignment,analysis,and gene prediction files with a flexible framework for defining gene models with reliable structural and functional attributes.gFACs supports common downstream applications,including genome browsers,and generates extensive details on the filtering process,including distributions that can be visualized to further assess the proposed gene space.gFACs is freely available and implemented in Perl with support from Bio Perl libraries at https://gitlab.com/Plant Genomics Lab/gFACs.  相似文献   

3.
4.
Identifying genomic homology within and between genomes is essential when studying genome evolution. In the past years, different computational techniques have been developed to detect homology even when the actual similarity between homologous segments is low. Depending on the strategy used, these methods search for pairs of chromosomal segments between which either both gene content and order are conserved or gene content only. However, due to fact that, after their divergence, homologous segments can lose a different set of genes, these methods still often fail to detect genomic homology. Recently, more advanced approaches have been developed that can combine gene order and content information of multiple genomic segments.  相似文献   

5.
6.
7.

Background  

The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes.  相似文献   

8.
We present a software package, Genquire, that allows visualization, querying, hand editing, and de novo markup of complete or partially annotated genomes. The system is written in Perl/Tk and uses, where possible, existing BioPerl data models and methods for representation and manipulation of the sequence and annotation objects. An adaptor API is provided to allow Genquire to display a wide range of databases and flat files, and a plugins API provides an interface to other sequence analysis software. AVAILABILITY: Genquire v3.03 is open-source software. The code is available for download and/or contribution at http://www.bioinformatics.org/Genquire  相似文献   

9.
Hasan MS  Liu Q  Wang H  Fazekas J  Chen B  Che D 《Bioinformation》2012,8(4):203-205
Genomic Islands (GIs) are genomic regions that are originally from other organisms, through a process known as Horizontal Gene Transfer (HGT). Detection of GIs plays a significant role in biomedical research since such align genomic regions usually contain important features, such as pathogenic genes. We have developed a use friendly graphic user interface, Genomic Island Suite of Tools (GIST), which is a platform for scientific users to predict GIs. This software package includes five commonly used tools, AlienHunter, IslandPath, Colombo SIGI-HMM, INDeGenIUS and Pai-Ida. It also includes an optimization program EGID that ensembles the result of existing tools for more accurate prediction. The tools in GIST can be used either separately or sequentially. GIST also includes a downloadable feature that facilitates collecting the input genomes automatically from the FTP server of the National Center for Biotechnology Information (NCBI). GIST was implemented in Java, and was compiled and executed on Linux/Unix operating systems. AVAILABILITY: The database is available for free at http://www5.esu.edu/cpsc/bioinfo/software/GIST.  相似文献   

10.
MOTIVATION: The annotation of the Arabidopsis thaliana genome remains a problem in terms of time and quality. To improve the annotation process, we want to choose the most appropriate tools to use inside a computer-assisted annotation platform. We therefore need evaluation of prediction programs with Arabidopsis sequences containing multiple genes. RESULTS: We have developed AraSet, a data set of contigs of validated genes, enabling the evaluation of multi-gene models for the Arabidopsis genome. Besides conventional metrics to evaluate gene prediction at the site and the exon levels, new measures were introduced for the prediction at the protein sequence level as well as for the evaluation of gene models. This evaluation method is of general interest and could apply to any new gene prediction software and to any eukaryotic genome. The GeneMark.hmm program appears to be the most accurate software at all three levels for the Arabidopsis genomic sequences. Gene modeling could be further improved by combination of prediction software. AVAILABILITY: The AraSet sequence set, the Perl programs and complementary results and notes are available at http://sphinx.rug.ac.be:8080/biocomp/napav/. CONTACT: Pierre.Rouze@gengenp.rug.ac.be.  相似文献   

11.
BlastAlign uses NCBI blastn to build a multiple nucleotide alignment and is intended for use with sequences that have large indels or are otherwise difficult to align globally. The program builds a matrix representing regions of homology along the sequences, from which it selects the 'most representative' sequence and then extracts the blastn query-anchored multiple alignment for this sequence. The matrix is printed and allows subgroups to be identified visually and an option allows other sequences to be used as the 'most representative'. The program contains elements of both Perl and Python and will run on UNIX (including Mac OSX) and DOS. An additional Perl program BlastAlignP uses tblastn to align nucleotide sequences to a single amino acid sequence, thus allowing an open reading frame to be maintained in the resulting multiple alignment. AVAILABILITY: It is freely available at http://www.bio.ic.ac.uk/research/belshaw/BlastAlign.tar and at http://evolve.zoo.ox.ac.uk/software/blastalign.  相似文献   

12.
TOPD/FMTS: a new software to compare phylogenetic trees   总被引:1,自引:0,他引:1  
SUMMARY: TOPD/FMTS has been developed to evaluate similarities and differences between phylogenetic trees. The software implements several new algorithms (including the Disagree method that returns the taxa, that disagree between two trees and the Nodal method that compares two trees using nodal information) and several previously described methods (such as the Partition method, Triplets or Quartets) to compare phylogenetic trees. One of the novelties of this software is that the FMTS (From Multiple to Single) program allows the comparison of trees that contain both orthologs and paralogs. Each option is also complemented with a randomization analysis to test the null hypothesis that the similarity between two trees is not better than chance expectation. AVAILABILITY: The Perl source code of TOPD/FMTS is available at http://genomes.urv.es/topd.  相似文献   

13.
We report the development of SearchDOGS Bacteria, software to automatically detect missing genes in annotated bacterial genomes by combining BLAST searches with comparative genomics. Having successfully applied the approach to yeast genomes, we redeveloped SearchDOGS to function as a standalone, downloadable package, requiring only a set of GenBank annotation files as input. The software automatically generates a homology structure using reciprocal BLAST and a synteny-based method; this is followed by a scan of the entire genome of each species for unannotated genes. Results are provided in a HTML interface, providing coordinates, BLAST results, syntenic location, omega values (Ka/Ks, where Ks is the number of synonymous substitutions per synonymous site and Ka is the number of nonsynonymous substitutions per nonsynonymous site) for protein conservation estimates, and other information for each candidate gene. Using SearchDOGS Bacteria, we identified 155 gene candidates in the Shigella boydii sb227 genome, including 56 candidates of length < 60 codons. SearchDOGS Bacteria has two major advantages over currently available annotation software. First, it outperforms current methods in terms of sensitivity and is highly effective at identifying small or highly diverged genes. Second, as a freely downloadable package, it can be used with unpublished or confidential data.  相似文献   

14.
The emergence of next-generation sequencing (NGS) technologies has significantly improved sequencing throughput and reduced costs. However, the short read length, duplicate reads and massive volume of data make the data processing much more difficult and complicated than the first-generation sequencing technology. Although there are some software packages developed to assess the data quality, those packages either are not easily available to users or require bioinformatics skills and computer resources. Moreover, almost all the quality assessment software currently available didn’t taken into account the sequencing errors when dealing with the duplicate assessment in NGS data. Here, we present a new user-friendly quality assessment software package called BIGpre, which works for both Illumina and 454 platforms. BIGpre contains all the functions of other quality assessment software, such as the correlation between forward and reverse reads, read GC-content distribution, and base Ns quality. More importantly, BIGpre incorporates associated programs to detect and remove duplicate reads after taking sequencing errors into account and trimming low quality reads from raw data as well. BIGpre is primarily written in Perl and integrates graphical capability from the statistics package R. This package produces both tabular and graphical summaries of data quality for sequencing datasets from Illumina and 454 platforms. Processing hundreds of millions reads within minutes, this package provides immediate diagnostic information for user to manipulate sequencing data for downstream analyses. BIGpre is freely available at http://bigpre.sourceforge.net/.  相似文献   

15.
Recognition of gene starts is a difficult and yet unsolved problem. We present a program, Dragon Gene Start Finder (DGSF), which assesses the gene start in mammalian genomes and predicts a region which should overlap with the first exon of the gene or be in its proximity. The program has been rigorously tested on human chromosomes 4, 21 and 22, and in a strand specific search achieves an overall sensitivity of approximately 65% and a positive predictive value of approximately 78%. The sensitivity for the CpG-island related promoters is >88%. DGSF is free for academic and non-profit users at http://sdmc.lit.org.sg/promoter/dragonGSF1_0/genestart.htm; the download version of the program integrated within the TRANSPLORER package can be obtained from Biobase GmbH, at http://www.biobase.de/.  相似文献   

16.
17.
SUMMARY: Currently, new bacterial genomes are being published on a monthly basis. With the growing amount of genome sequence data, there is a demand for a flexible and easy-to-maintain structure for storing sequence data and results from bioinformatic analysis. More than 150 sequenced bacterial genomes are now available, and comparisons of properties for taxonomically similar organisms are not readily available to many biologists. In addition to the most basic information, such as AT content, chromosome length, tRNA count and rRNA count, a large number of more complex calculations are needed to perform detailed comparative genomics. DNA structural calculations like curvature and stacking energy, DNA compositions like base skews, oligo skews and repeats at the local and global level are just a few of the analysis that are presented on the CBS Genome Atlas Web page. Complex analysis, changing methods and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently, these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues. AVAILABILITY: A web based user interface which is dynamically linked to the Genome Atlas Database can be accessed via www.cbs.dtu.dk/services/GenomeAtlas/. SUPPLEMENTARY INFORMATION: This paper has a supplemental information page which links to the examples presented: www.cbs.dtu.dk/services/GenomeAtlas/suppl/bioinfdatabase.  相似文献   

18.
DiffTool is a resource to build and visualize protein clusters computed from a sequence database. The package provides a clustering tool to construct protein families according to sequence similarities and a web interface to query the corresponding clusters. A subtractive genome analysis tool selects protein families specific for a genome or a group of genomes. For each protein cluster, DiffTool includes access to sequences, coloured multiple alignments and phylogenetic trees. AVAILABILITY: A cluster database built from yeast and complete prokaryotic genomes is queryable at http://bioweb.pasteur.fr/seqanal/difftool. All the Perl sources are freely available to non-profit organizations upon request.  相似文献   

19.

Background

The design of oligonucleotides and PCR primers for studying large genomes is complicated by the redundancy of sequences. The eukaryotic genomes are particularly difficult to study due to abundant repeats. The speed of most existing primer evaluation programs is not sufficient for large-scale experiments.

Results

In order to improve the efficiency and success rate of automatic primer/oligo design, we created a novel method which allows rapid masking of repeats in large sequence files, for example in eukaryotic genomes. It also allows the detection of all alternative binding sites of PCR primers and the prediction of PCR products. The new method was implemented in a collection of efficient programs, the GENOMEMASKER package. The performance of the programs was compared to other similar programs. We also modified the PRIMER3 program, to be able to design primers from lowercase-masked sequences.

Conclusion

The GENOMEMASKER package is able to mask the entire human genome for non-unique primers within 6 hours and find locations of all binding sites for 10 000 designed primer pairs within 10 minutes. Additionally, it predicts all alternative PCR products from large genomes for given primer pairs.  相似文献   

20.
SUMMARY: Repeated elements such as satellites and transposons are ubiquitous in eukaryotic genomes. De novo computational identification and classification of such elements is a challenging problem. Therefore, repeat annotation of sequenced genomes has historically largely relied on sequence similarity to hand-curated libraries of known repeat families. We present a new approach to de novo repeat annotation that exploits characteristic patterns of local alignments induced by certain classes of repeats. We describe PILER, a package of efficient search algorithms for identifying such patterns. Novel repeats found using PILER are reported for Homo sapiens, Arabidopsis thalania and Drosophila melanogaster. AVAILABILITY: The PILER software is freely available at http://www.drive5.com/piler.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号