首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The SYSTERS (short for SYSTEmatic Re-Searching) protein sequence cluster set consists of the classification of all sequences from SWISS-PROT and PIR into disjoint protein family clusters and hierarchically into superfamily and subfamily clusters. The cluster set can be searched with a sequence using the SSMAL search tool or a traditional database search tool like BLAST or FASTA. Additionally a multiple alignment is generated for each cluster and annotated with domain information from the Pfam database of protein domain families. A taxonomic overview of the organisms covered by a cluster is given based on the NCBI taxonomy. The cluster set is available for querying and browsing at http://www.dkfz-heidelberg. de/tbi/services/cluster/systersform  相似文献   

2.
Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a ‘taxonomy to tree'' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408 315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/.  相似文献   

3.
The flanking sequences provided by dbSNP of NCBI are usually short and fixed length without further extension, thus making the design of appropriate PCR primers difficult. Here, we introduce a tool named “SNP-Flankplus” to provide a web environment for retrieval of SNP flanking sequences from both the dbSNP and the nucleotide databases of NCBI. Two SNP ID types, rs# and ss#, are acceptable for querying SNP flanking sequences with adjustable lengths for at least sixteen organisms.  相似文献   

4.
根据物种学名、分类号、任意一段核酸或蛋白质的序列,判定其属于什么物种及其详细分类的信息如何,是生物信息分析的最为基础且重要的环节,但该过程的分析及结果的获取均为手动,费时费力且容易出错。本研究旨在解决如何在NCBI网站上自动或批量获取物种信息。通过解析NCBI在线BLAST结果及其网页源程序特点,利用Perl语言编写自动化脚本,以达到批量获取查询或比对结果的物种分类信息。本研究编写的Perl语言脚本可解决序列在NCBI在线比对后自动或批量获取物种的分类信息问题,适用于细菌、真菌、动物、植物等物种学名、分类号、核酸或蛋白质的任意序列,可以为同行生物数据分析提供参考。  相似文献   

5.
The NCBI Gene Expression Omnibus (GEO) represents the largest public repository of microarray data. However, finding data in GEO can be challenging. We have developed GEOmetadb in an attempt to make querying the GEO metadata both easier and more powerful. All GEO metadata records as well as the relationships between them are parsed and stored in a local MySQL database. A powerful, flexible web search interface with several convenient utilities provides query capabilities not available via NCBI tools. In addition, a Bioconductor package, GEOmetadb that utilizes a SQLite export of the entire GEOmetadb database is also available, rendering the entire GEO database accessible with full power of SQL-based queries from within R. AVAILABILITY: The web interface and SQLite databases available at http://gbnci.abcc.ncifcrf.gov/geo/. The Bioconductor package is available via the Bioconductor project. The corresponding MATLAB implementation is also available at the same website.  相似文献   

6.

Background  

The NCBI taxonomy provides one of the most powerful ways to navigate sequence data bases but currently users are forced to formulate queries according to a single taxonomic classification. Given that there is not universal agreement on the classification of organisms, providing a single classification places constraints on the questions biologists can ask. However, maintaining multiple classifications is burdensome in the face of a constantly growing NCBI classification.  相似文献   

7.
NEWT is a new taxonomy portal to the SWISS-PROT protein sequence knowledgebase. It contains taxonomy data, which is updated daily, for the complete set of species represented in SWISS-PROT, as well as those stored at the NCBI. Users can navigate through the taxonomy tree and access corresponding SWISS-PROT protein entries. In addition, a manually curated selection of external links allows access to specific information on selected species. NEWT is available at http://www.ebi.ac.uk/newt/.  相似文献   

8.
GenBank   总被引:51,自引:4,他引:47       下载免费PDF全文
The GenBank((R))sequence database incorporates publicly available DNA sequences of >55 000 different organisms, primarily through direct submission of sequence data from individual laboratories and large-scale sequencing projects. Most submissions are made using the BankIt (Web) or Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI's integrated retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping and protein structure information, plus the biomedical literature via PubMed. Sequence similarity searching is provided by the BLAST family of programs. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. NCBI also offers a wide range of WWW retrieval and analysis services based on GenBank data. The GenBank database and related resources are freely accessible via the NCBI home page at http://www.ncbi.nlm.nih.gov  相似文献   

9.
Retrieving homologous DNA and protein sequences from existing databases is a fundamental routine in bioinformatics research. Programs of the NCBI BLAST family are widely used for this purpose. We evaluated paraBLAST, a parallelised version of the NCBI BLAST algorithm, using a Message Passing Interface (MPI) on a multi-node compute cluster. Here, we propose static and dynamic database-partitioning schemes based on the availability of the cluster. We evaluated the application of the algorithm in querying nucleotide sequences against a large-scale sequence database with different numbers of database partitions, and hence, different numbers of CPUs. Since the program's tasks are performed independently of each other, each available CPU can run its own copy of BLAST queries, resulting in reduced interference between processes and leading to a highly scalable solution.  相似文献   

10.
GenBank          下载免费PDF全文
The GenBank sequence database incorporates publicly available DNA sequences of more than 105 000 different organisms, primarily through direct submission of sequence data from individual laboratories and large-scale sequencing projects. Most submissions are made using the BankIt (web) or Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI’s integrated retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical literature via PubMed. Sequence similarity searching is provided by the BLAST family of programs. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. NCBI also offers a wide range of World Wide Web retrieval and analysis services based on GenBank data. The GenBank database and related resources are freely accessible via the NCBI home page at http://www.ncbi.nlm.nih.gov.  相似文献   

11.
PGTdb: a database providing growth temperatures of prokaryotes   总被引:6,自引:0,他引:6  
Included in Prokaryotic Growth Temperature database (PGTdb) are a total of 1334 temperature data from 1072 prokaryotic organisms, Bacteria and Archaea: PGTdb integrates microbial growth temperature data from literature survey with their nucleotide/protein sequence and protein structure data from related databases. A direct correlation is observed between the average growth temperature of an organism and the melting temperature of proteins from the organism. Therefore, this database is useful not only for microbiologists to obtain cultivation condition, but also for biochemists and structure biologists to study the correlation between protein sequences/structures and their thermostability. In addition, the taxonomy and ribosomal RNA sequence(s) of an organism are linked through NCBI Taxonomy and the Ribosomal RNA Operon Copy Number Database umdb, respectively. PGTdb is the only integrated database on the Internet to provide the growth temperature data of the prokaryotes and the combined information of their nucleotide/protein sequences, protein structures, taxonomy and phylogeny. AVAILABILITY: http://pgtdb.csie.ncu.edu.tw  相似文献   

12.
MOTIVATION: The recent rapid rise in the availability of whole genome DNA sequence data has led to bottlenecks in their complete analysis. Specifically, there is a need for software tools that will allow mining of gene and putative gene data at a whole genome level. These new tools will complement the current set already in use for studying specific aspects of individual genes and putative genes in detail. A key software challenge is to make them user-friendly, without losing their flexibility and capability for use in research. RESULTS: The creation of GeneOrder-a web-based interactive, computational tool-allows researchers to compare the order of genes in two genomes. It has been tested on full genome sequence data for viruses, mitochondria and chloroplasts that were obtained from the NCBI GenBank database. It is accessible at http://www.bif.atcc.org/GENEOrder/index.html. GeneOrder prepares the comparison in table form, listing the order of similar genes. Hyperlinks are provided from this output; these lead to the 'Protein Coding Regions' in the NCBI database.  相似文献   

13.
14.
The measurement of biodiversity is an integral aspect of life science research. With the establishment of second- and third-generation sequencing technologies, an increasing amount of metabarcoding data is being generated as we seek to describe the extent and patterns of biodiversity in multiple contexts. The reliability and accuracy of taxonomically assigning metabarcoding sequencing data have been shown to be critically influenced by the quality and completeness of reference databases. Custom, curated, eukaryotic reference databases, however, are scarce, as are the software programs for generating them. Here, we present crabs (Creating Reference databases for Amplicon-Based Sequencing), a software package to create custom reference databases for metabarcoding studies. crabs includes tools to download sequences from multiple online repositories (i.e., NCBI, BOLD, EMBL, MitoFish), retrieve amplicon regions through in silico PCR analysis and pairwise global alignments, curate the database through multiple filtering parameters (e.g., dereplication, sequence length, sequence quality, unresolved taxonomy, inclusion/exclusion filter), export the reference database in multiple formats for immediate use in taxonomy assignment software, and investigate the reference database through implemented visualizations for diversity, primer efficiency, reference sequence length, database completeness and taxonomic resolution. crabs is a versatile tool for generating curated reference databases of user-specified genetic markers to aid taxonomy assignment from metabarcoding sequencing data. crabs can be installed via docker and is available for download as a conda package and via GitHub ( https://github.com/gjeunen/reference_database_creator ).  相似文献   

15.
16.
Peter Schattner 《Genomics》2009,93(3):187-195
Integrated genome databases – such as the UCSC, Ensembl and NCBI MapViewer databases – and their associated data querying and visualization interfaces (e.g. the genome browsers) have transformed the way that molecular biologists, geneticists and bioinformaticists analyze genomic data. Nevertheless, because of the complexity of these tools, many researchers take advantage of only a fraction of their capabilities. In this tutorial, using examples from medical genetics and alternative splicing, I describe some of the biological questions that can be addressed with these techniques. I also show why doing so typically is more effective than using alternative methods and indicate some of the resources available for learning more about the advanced capabilities of these powerful tools.  相似文献   

17.
As an archive of sequence data for over 165,000 species, GenBank is an indispensable resource for phylogenetic inference. Here we describe an informatics processing pipeline and online database, the PhyLoTA Browser (http://loco.biosci.arizona.edu/pb), which offers a view of GenBank tailored for molecular phylogenetics. The first release of the Browser is computed from 2.6 million sequences representing the taxonomically enriched subset of GenBank sequences for eukaryotes (excluding most genome survey sequences, ESTs, and other high-throughput data). In addition to summarizing sequence diversity and species diversity across nodes in the NCBI taxonomy, it reports 87,000 potentially phylogenetically informative clusters of homologous sequences, which can be viewed or downloaded, along with provisional alignments and coarse phylogenetic trees. At each node in the NCBI hierarchy, the user can display a "data availability matrix" of all available sequences for entries in a subtaxa-by-clusters matrix. This matrix provides a guidepost for subsequent assembly of multigene data sets or supertrees. The database allows for comparison of results from previous GenBank releases, highlighting recent additions of either sequences or taxa to GenBank and letting investigators track progress on data availability worldwide. Although the reported alignments and trees are extremely approximate, the database reports several statistics correlated with alignment quality to help users choose from alternative data sources.  相似文献   

18.
The importance of choosing the proper systematic system in Fusarium taxonomy is stated and a comparative listing of synonyms of the sections Sporotrichiella, Roseum, Liseola, Discolor, Gibbosum, Martiella, as used by various authors in three monographs, is given (2, 4, 9).  相似文献   

19.
GenBank          下载免费PDF全文
GenBank (R) is a comprehensive sequence database that contains publicly available DNA sequences for more than 119 000 different organisms, obtained primarily through the submission of sequence data from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the BankIt (web) or Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in the UK and the DNA Data Bank of Japan helps ensure worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI home page at: http://www.ncbi.nlm.nih.gov.  相似文献   

20.
Large-scale comparative and systematic studies rely on the seamless merging of multiple datasets. However, taxonomic nomenclature is constantly being revised making it problematic to combine data from different resources or different years of publication, which use different synonyms. This is certainly true for amphibians, which have experienced a spike in taxonomic revisions in part as the result of the widespread use of DNA barcoding to resolve cryptic species delimitation issues and large-scale collaborative efforts to revise the entire amphibian tree. The ‘Amphibian Species of the World Online Reference’ (ASW) is one of the most widely used and most regularly updated databases for amphibian taxonomy, but existing R tools for querying synonyms such as ‘taxize’ do not include this resource. ‘AmphiNom’ is a tool suite written in the R programming language designed to facilitate batch-querying amphibian species names against the ASW database. This facilitates the merging of datasets that use different nomenclature and its functionality is easily integrated into customizable R workflows. Moreover, it allows direct querying of the ASW website using R and straightforward reporting of summary information on current amphibian systematics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号