首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
GenBank.   总被引:8,自引:3,他引:5       下载免费PDF全文
The GenBank sequence database continues to expand its data coverage, quality control, annotation content and retrieval services for the scientific community. Besides handling direct submissions of sequence data from authors, GenBank also incorporates DNA sequences from all available public sources; an integrated retrieval system, known as Entrez, also makes available data from the major protein sequence and structural databases, and from U.S. and European patents. MIDLINE abstracts from published articles describing the sequences are also included as an additional source of biological annotation for sequence entries. GenBank supports distribution of the data via FTP, CD-ROM, and E-mail servers. Network server-client programs provide access to an integrated database for literature retrieval and sequence similarity searching.  相似文献   

2.
MOTIVATION: To identify and characterize regions of functional interest in genomic sequence requires full, flexible query access to an integrated, up-to-date view of all related information, irrespective of where it is stored (within an organization or across the Internet) and its format (traditional database, flat file, web site, results of runtime analysis). Wide-ranging multi-source queries often return unmanageably large result sets, requiring non-traditional approaches to exclude extraneous data. RESULTS: Target Informatics Net (TINet) is a readily extensible data integration system developed at GlaxoSmith- Kline (GSK), based on the Object-Protocol Model (OPM) multidatabase middleware system of Gene Logic Inc. Data sources currently integrated include: the Mouse Genome Database (MGD) and Gene Expression Database (GXD), GenBank, SwissProt, PubMed, GeneCards, the results of runtime BLAST and PROSITE searches, and GSK proprietary relational databases. Special-purpose class methods used to filter and augment query results include regular expression pattern-matching over BLAST HSP alignments and retrieving partial sequences derived from primary structure annotations. All data sources and methods are accessible through an SQL-like query language or a GUI, so that when new investigations arise no additional programming beyond query specification is required. The power and flexibility of this approach are illustrated in such integrated queries as: (1) 'find homologs in genomic sequence to all novel genes cloned and reported in the scientific literature within the past three months that are linked to the MeSH term 'neoplasms"; (2) 'using a neuropeptide precursor query sequence, return only HSPs where the target genomic sequences conserve the G[KR][KR] motif at the appropriate points in the HSP alignment'; and (3) 'of the human genomic sequences annotated with exon boundaries in GenBank, return only those with valid putative donor/acceptor sites and start/stop codons'.  相似文献   

3.
GenBank.   总被引:2,自引:1,他引:2       下载免费PDF全文
The GenBank(R) sequence database (http://www.ncbi.nlm.nih.gov/) incorporates DNA sequences from all available public sources, primarily through the direct submission of sequence data from individual laboratories and from large-scale sequencing projects. Most submitters use the BankIt (WWW) or Sequin programs to send their sequence data. Data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI's integrated retrieval system, Entrez , which integrates data from the major DNA and protein sequence databases along with taxonomy, genome and protein structure information. MEDLINE(R) abstracts from published articles describing the sequences are also included as an additional source of biological annotation. Sequence similarity searching is offered through the BLAST series of database search programs. In addition to FTP, e-mail and server/client versions of Entrez and BLAST, NCBI offers a wide range of World Wide Web retrieval and analysis services of interest to biologists.  相似文献   

4.
PISCES: a protein sequence culling server   总被引:21,自引:0,他引:21  
PISCES is a public server for culling sets of protein sequences from the Protein Data Bank (PDB) by sequence identity and structural quality criteria. PISCES can provide lists culled from the entire PDB or from lists of PDB entries or chains provided by the user. The sequence identities are obtained from PSI-BLAST alignments with position-specific substitution matrices derived from the non-redundant protein sequence database. PISCES therefore provides better lists than servers that use BLAST, which is unable to identify many relationships below 40% sequence identity and often overestimates sequence identity by aligning only well-conserved fragments. PDB sequences are updated weekly. PISCES can also cull non-PDB sequences provided by the user as a list of GenBank identifiers, a FASTA format file, or BLAST/PSI-BLAST output.  相似文献   

5.
Database scanning programs such as BLAST and FASTA are used nowadays by most biologists for the post-genomic processing of DNA or protein sequence information (in particular to retrieve the structure/function of uncharacterized proteins). Unfortunately, their results can be polluted by identical alignments (called redundancies) coming from the same protein or DNA sequences present in different entries of the database. This makes the efficient use of the listed alignments difficult. Pretreatment of databases has been proposed to suppress strictly identical entries. However, there still remain many identical alignments since redundancies may occur locally for entries corresponding to various fragments of the same sequence or for entries corresponding to very homologous sequences but differing at the level of a few residues such as ortholog proteins. In the present work, we show that redundant alignments can be indeed numerous even when working with a pretreated non-redundant data bank, going as high as 60% of the output results according to the query and the bank. Therefore the accuracy and the efficiency of the post-genomic work will be greatly increased if these redundancies are removed. To solve this up to now unaddressed problem, we have developed an algorithm that allows for the efficient and safe suppression of all the redundancies with no loss of information. This algorithm is based on various filtering steps that we describe here in the context of the Automat similarity search program, and such an algorithm should also be added to the other similarity search programs (BLAST, FASTA, etc...).  相似文献   

6.
Wolbachia endosymbiotic bacteria are widespread in filarial nematodes and are directly involved in the immune response of the host. In addition, antibiotics which disrupt Wolbachia interfere with filarial nematode development thus, Wolbachia provide an excellent target for control of filariasis. A 63.1 kb bacterial artificial chromosome insert, from the Wolbachia endosymbiont of the human filarial parasite Brugia malayi, has been sequenced using the New England Biolabs Inc. Genome Priming System() transposition kit in conjunction with primer walking methods. The bacterial artificial chromosome insert contains approximately 57 potential ORFs which have been compared by individual protein BLAST analysis with the 35 published complete microbial genomes in the Comprehensive Microbial Resource database at The Institute for Genomic Research and in the NCBI GenBank database, as well as to data from 22 incomplete genomes from the DOE Joint Genome Institute. Twenty five of the putative ORFs have significant similarity to genes from the alpha-proteobacteria Rickettsia prowazekii, the most closely related completed genome, as well as to the newly sequenced alpha-proteobacteria endosymbiont Sinorhizobium meliloti. The bacterial artificial chromosome insert sequence however has little conserved synteny with the R. prowazekii and S. meliloti genomes. Significant sequence similarity was also found in comparisons with the currently available sequence data from the Wolbachia endosymbiont of Drosophila melanogaster. Analysis of this bacterial artificial chromosome insert provides useful gene density and comparative genomic data that will contribute to whole genome sequencing of Wolbachia from the B. malayi host. This will also lead to a better understanding of the interactions between the endosymbiont and its host and will offer novel approaches and drug targets for elimination of filarial disease.  相似文献   

7.
The BayGenomics gene-trap resource (http://baygenomics.ucsf.edu) provides researchers with access to thousands of mouse embryonic stem (ES) cell lines harboring characterized insertional mutations in both known and novel genes. Each cell line contains an insertional mutation in a specific gene. The identity of the gene that has been interrupted can be determined from a DNA sequence tag. Approximately 75% of our cell lines contain insertional mutations in known mouse genes or genes that share strong sequence similarities with genes that have been identified in other organisms. These cell lines readily transmit the mutation to the germline of mice and many mutant lines of mice have already been generated from this resource. BayGenomics provides facile access to our entire database, including sequence tags for each mutant ES cell line, through the World Wide Web. Investigators can browse our resource, search for specific entries, download any portion of our database and BLAST sequences of interest against our entire set of cell line sequence tags. They can then obtain the mutant ES cell line for the purpose of generating knockout mice.  相似文献   

8.
GenBank.   总被引:2,自引:0,他引:2       下载免费PDF全文
The GenBank (Registered Trademark symbol) sequence database incorporates DNA sequences from all available public sources, primarily through the direct submission of sequence data from individual laboratories and from large-scale sequencing projects. Most submitters use the BankIt (Web) or Sequin programs to format and send sequence data. Data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI's integrated retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome and protein structure information. MEDLINE (Registered Trademark symbol) s from published articles describing the sequences are included as an additional source of biological annotation through the PubMed search system. Sequence similarity searching is offered through the BLAST series of database search programs. In addition to FTP, Email, and server/client versions of Entrez and BLAST, NCBI offers a wide range of World Wide Web retrieval and analysis services based on GenBank data. The GenBank database and related resources are freely accessible via the URL: http://www.ncbi.nlm.nih.gov  相似文献   

9.
We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%.  相似文献   

10.
根据铁蛋白基因的保守序列,搜索GenBank数据库中华鲟的EST数据库得到一条同源序列。通过RT-PCR的方法对该序列进行扩增,修改其测序错误,获得中华鲟铁蛋白亚基cDNA全长,经过注释提交GenBank数据库,获取序列登录号EU348782。该cDNA长度为896 bp,包含531bp的完整编码区,推测编码的蛋白质为176 aa,分子量为20339.9 Mr,理论等电点为5.66。它和大西洋鲑鱼铁蛋白序列同源性最高,达到82.9%。该基因在中华鲟肝脏、胰脏、肌肉、脑、心脏、鳃和胃粘膜等多种组织表达,在胰脏和心脏中表达量较高,在肌肉组织中表达较低。根据同源模建的方法得到该蛋白质三维结构,其包括5个α螺旋和10个转角结构,和人、蛙和细菌的铁蛋白均能很好的叠合,表现了很高的相似性,表明该蛋白结构和功能在基因进化中的高度保守性。  相似文献   

11.
MOTIVATION: The past decade has seen extension in the methods of sequence analysis from single gene based to analyzing multiple genes and proteins simultaneously. Consequently, there is a need for software tools that will allow mining of these enormous datasets at genome level effectively. A key challenge is to make them user-friendly, available to a larger community and integrate with public domain software without much hassle. RESULTS: A web-based interactive computational tool is described for visualization and comparison of gene order from prokaryotic and selected viral genome data. Many intriguing similarities and differences in gene order of multiple genomes can be compared and revealed. The interface facilitates easy extraction of the nucleotide sequence of the gene of interest and BLAST analysis against GenBank at NCBI to provide insights into gene functions and orthologs of the gene in other species.  相似文献   

12.
13.
GenBank.   总被引:4,自引:1,他引:3       下载免费PDF全文
The GenBank sequence database incorporates DNA sequences from all available public sources, primarily through the direct submission of sequence data from authors and from large-scale sequencing projects. Data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive coverage. GenBank continues to focus on quality control and annotation while expanding data coverage and retrieval services. An integrated retrieval system, known asEntrez, incorporates data from the major DNA and protein sequence databases, along with genome maps and protein structure information. MEDLINE abstracts from published articles describing the sequences are also included as an additional source of biological annotation. Sequence similarity searching is offered through the BLAST family of programs. All of NCBI's services are offered through the World Wide Web. In addition, there are specialized server/client versions as well as FTP and e-mail server access.  相似文献   

14.
GDB: the Human Genome Database.   总被引:6,自引:0,他引:6       下载免费PDF全文
The Genome Database (GDB, http://www.gdb.org ) is a public repository of data on human genes, clones, STSs, polymorphisms and maps. GDB entries are highly cross-linked to each other, to literature citations and to entries in other databases, including the sequence databases, OMIM, and the Mouse Genome Database. Mapping data from large genome centers and smaller mapping efforts are added to GDB on an ongoing basis. The database can be searched by a variety of methods, ranging from keyword searches to complex queries. Major functionality extensions in the last year include the ongoing computation of integrated human genome maps, called Comprehensive Maps, and the use of those maps to support positional queries and graphic displays. The capabilities of the GDB map viewer (Mapview) have been extended to include map printing and the graphical display of ad hoc query results. The HUGO Nomenclature Committee continues to curate the proposed and official gene symbols and related data in collaboration with GDB. As genome research shifts its emphasis from mapping to sequencing and functional analysis, the scope of the GDB schema is being extended. We are in the process of adding representations of gene function and expression, and improving our representation of human polymorphism and mutation.  相似文献   

15.
As an archive of sequence data for over 165,000 species, GenBank is an indispensable resource for phylogenetic inference. Here we describe an informatics processing pipeline and online database, the PhyLoTA Browser (http://loco.biosci.arizona.edu/pb), which offers a view of GenBank tailored for molecular phylogenetics. The first release of the Browser is computed from 2.6 million sequences representing the taxonomically enriched subset of GenBank sequences for eukaryotes (excluding most genome survey sequences, ESTs, and other high-throughput data). In addition to summarizing sequence diversity and species diversity across nodes in the NCBI taxonomy, it reports 87,000 potentially phylogenetically informative clusters of homologous sequences, which can be viewed or downloaded, along with provisional alignments and coarse phylogenetic trees. At each node in the NCBI hierarchy, the user can display a "data availability matrix" of all available sequences for entries in a subtaxa-by-clusters matrix. This matrix provides a guidepost for subsequent assembly of multigene data sets or supertrees. The database allows for comparison of results from previous GenBank releases, highlighting recent additions of either sequences or taxa to GenBank and letting investigators track progress on data availability worldwide. Although the reported alignments and trees are extremely approximate, the database reports several statistics correlated with alignment quality to help users choose from alternative data sources.  相似文献   

16.
The National Center for Biotechnology Information (NCBI) integrates data from more than 20 biological databases through a flexible search and retrieval system called Entrez. A core Entrez database, Entrez Nucleotide, includes GenBank and is tightly linked to the NCBI Taxonomy database, the Entrez Protein database, and the scientific literature in PubMed. A suite of more specialized databases for genomes, genes, gene families, gene expression, gene variation, and protein domains dovetails with the core databases to make Entrez a powerful system for genomic research. Linked to the full range of Entrez databases is the NCBI Map Viewer, which displays aligned genetic, physical, and sequence maps for eukaryotic genomes including those of many plants. A specialized plant query page allow maps from all plant genomes covered by the Map Viewer to be searched in tandem to produce a display of aligned maps from several species. PlantBLAST searches against the sequences shown in the Map Viewer allow BLAST alignments to be viewed within a genomic context. In addition, precomputed sequence similarities, such as those for proteins offered by BLAST Link, enable fluid navigation from unannotated to annotated sequences, quickening the pace of discovery. NCBI Web pages for plants, such as Plant Genome Central, complete the system by providing centralized access to NCBI's genomic resources as well as links to organism-specific Web pages beyond NCBI.  相似文献   

17.
Knowledge about the influence of environmental stress such as the action of chemotherapeutic agents on gene expression in Entamoeba histolytica is limited. We plan to use oligonucleotide microarray hybridization to approach these questions. As the basis for our array, sequence data from the genome project carried out by the Institute for Genomic Research (TIGR) and the Sanger Institute were used to annotate parts of the parasite genome. Three subgenomic databases containing enzymes, cytoskeleton genes, and stress genes were compiled with the help of the ExPASy proteomics website and the BLAST servers at the two genome project sites. The known sequences from reference species, mostly human and Escherichia coli, were searched against TIGR and Sanger E. histolytica sequence contigs and the homologs were copied into a Microsoft Access database. In a similar way, two additional databases of cytoskeletal genes and stress genes were generated. Metabolic pathways could be assembled from our enzyme database, but sometimes they were incomplete as is the case for the sterol biosynthesis pathway. The raw databases contained a significant number of duplicate entries which were merged to obtain curated non-redundant databases. This procedure revealed that some E. histolytica genes may have several putative functions. Representative examples such as the case of the delta-aminolevulinate synthase/serine palmitoyltransferase are discussed.  相似文献   

18.
An extensive effort of the International Rice Genome Sequencing Project (IRGSP) has resulted in rapid accumulation of genome sequence, and >137 Mb has already been made available to the public domain as of August 2001. This requires a high-throughput annotation scheme to extract biologically useful and timely information from the sequence data on a regular basis. A new automated annotation system and database called Rice Genome Automated Annotation System (RiceGAAS) has been developed to execute a reliable and up-to-date analysis of the genome sequence as well as to store and retrieve the results of annotation. The system has the following functional features: (i) collection of rice genome sequences from GenBank; (ii) execution of gene prediction and homology search programs; (iii) integration of results from various analyses and automatic interpretation of coding regions; (iv) re-execution of analysis, integration and automatic interpretation with the latest entries in reference databases; (v) integrated visualization of the stored data using web-based graphical view. RiceGAAS also has a data submission mechanism that allows public users to perform fully automated annotation of their own sequences. The system can be accessed at http://RiceGAAS.dna.affrc.go.jp/.  相似文献   

19.
GenBank   总被引:51,自引:4,他引:47       下载免费PDF全文
The GenBank((R))sequence database incorporates publicly available DNA sequences of >55 000 different organisms, primarily through direct submission of sequence data from individual laboratories and large-scale sequencing projects. Most submissions are made using the BankIt (Web) or Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI's integrated retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping and protein structure information, plus the biomedical literature via PubMed. Sequence similarity searching is provided by the BLAST family of programs. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. NCBI also offers a wide range of WWW retrieval and analysis services based on GenBank data. The GenBank database and related resources are freely accessible via the NCBI home page at http://www.ncbi.nlm.nih.gov  相似文献   

20.
Human bone marrow stromal cells (HBMSC) are pluripotent cells with the potential to differentiate into osteoblasts, chondrocytes, myelosupportive stroma, and marrow adipocytes. We used high-throughput DNA sequencing analysis to generate 4258 single-pass sequencing reactions (known as expressed sequence tags, or ESTs) obtained from the 5' (97) and 3' (4161) ends of human cDNA clones from a HBMSC cDNA library. Our goal was to obtain tag sequences from the maximum number of possible genes and to deposit them in the publicly accessible database for ESTs (dbEST of the National Center for Biotechnology Information). Comparisons of our EST sequencing data with nonredundant human mRNA and protein databases showed that the ESTs represent 1860 gene clusters. The EST sequencing data analysis showed 60 novel genes found only in this cDNA library after BLAST analysis against 3.0 million ESTs in NCBI's dbEST database. The BLAST search also showed the identified ESTs that have close homology to known genes, which suggests that these may be newly recognized members of known gene families. The gene expression profile of this cell type is revealed by analyzing both the frequency with which a message is encountered and the functional categorization of expressed sequences. Comparing an EST sequence with the human genomic sequence database enables assignment of an EST to a specific chromosomal region (a process called digital gene localization) and often enables immediate partial determination of intron/exon boundaries within the genomic structure. It is expected that high-throughput EST sequencing and data mining analysis will greatly promote our understanding of gene expression in these cells and of growth and development of the skeleton.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号