首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Recent changes in the GenBank On-line Service.   总被引:2,自引:0,他引:2       下载免费PDF全文
The GenBank On-line Service provides access to the GenBank and EMBL nucleic acid sequence databases and to the Swiss-Prot and GenPept protein sequence databases. Users can query the databases by sequence similarity and annotation keywords and retrieve entries of interest. This access is available through e-mail servers, anonymous FTP, anonymous interactive login, and login to established, password-protected, individual accounts.  相似文献   

3.
We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%.  相似文献   

4.
‘The GenBank’* nucleic acid sequence database isa computer-based collection of all published DNA and RNA sequences;it contains over five million bases in close to six thousandsequence entries drawn from four thousand five hundred publishedarticles. Each sequence is accompanied by relevant biologicalannotation. The database is available either on magnetic tape,on floppy diskettes, on-line or in hardcopy form. We discussthe structure of the database, the extent of the data and theimplications of the database for research on nucleic acids.  相似文献   

5.
The EMBL data library.   总被引:25,自引:15,他引:10       下载免费PDF全文
The EMBL Data Library was the first internationally supported central resource for nucleic acid sequence data. Working in close collaboration with its American counterpart, GenBank (1), the library prepares and makes available to the scientific community a comprehensive collection of the published nucleic acid sequences. This paper describes briefly the contents of the database, how it is available, and possible future enhancements of Data Library services.  相似文献   

6.
The use of databanks in genetic research assumes reliability of the information they contain. Currently, error-detection in the manually or electronically entered data contained in the nucleotide sequence databanks at EMBL, Heidelberg and GenBank at Los Alamos is limited. We have used a subset of sequences from these databanks to train neural networks to recognize pre-mRNA splicing signals in human genes. During the training on 33 human genes from the EMBL databank seven genes appeared to disturb the learning process. Subsequent investigation revealed discrepancies from the original published papers, for three genes. In four genes, we found wrongly assigned splicing frames of introns. We believe this to be a reflection of the fact that splicing frames cannot always be unambiguously assigned on the basis of experimental data. Thus incorrect assignment appear both due to mere typographical misprints as well as erroneous interpretation of experiments. Training on 241 human sequences from GenBank revealed nine new errors. We propose that such errors could be detected by computer algorithms designed to check the consistency of data prior to their incorporation in databanks.  相似文献   

7.
序列同源性分析软件Blast的WEB界面构建及其应用   总被引:5,自引:1,他引:4  
基于局域网(Intranet)内的PC/Linux服务器, 构建了序列同源性分析软件Blast的WEB界面. 局域网内的所有计算机均可通过WEB方式访问该服务器进行公共数据库和自建数据库的查询,具有保密、高效、免费的优点,能够满足实验室和研究院所的大规模、快速数据分析任务.  相似文献   

8.
《FEBS letters》1986,205(2):299-302
We have searched the GenBank nucleic acid sequence database for potential short restriction fragments. All possible oligonucleotides up to length five are found at least once flanked by known restriction recognition patterns. Thus, searches in the database for a specific sequence corresponding to a desired oligonucleotide would often point to one or more sources of short, retrievable fragments containing that sequence. These results underscore the potential of nucleic acid sequence databases in planning experiments.  相似文献   

9.
A database (SpliceDB) of known mammalian splice site sequences has been developed. We extracted 43 337 splice pairs from mammalian divisions of the gene-centered Infogene database, including sites from incomplete or alternatively spliced genes. Known EST sequences supported 22 815 of them. After discarding sequences with putative errors and ambiguous location of splice junctions the verified dataset includes 22 489 entries. Of these, 98.71% contain canonical GT-AG junctions (22 199 entries) and 0.56% have non-canonical GC-AG splice site pairs. The remainder (0.73%) occurs in a lot of small groups (with a maximum size of 0.05%). We especially studied non-canonical splice sites, which comprise 3.73% of GenBank annotated splice pairs. EST alignments allowed us to verify only the exonic part of splice sites. To check the conservative dinucleotides we compared sequences of human non-canonical splice sites with sequences from the high throughput genome sequencing project (HTG). Out of 171 human non-canonical and EST-supported splice pairs, 156 (91.23%) had a clear match in the human HTG. They can be classified after sequence analysis as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors corrected to AT-AC), one case was produced from a non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two other cases left of supported non-canonical splice pairs. The information about verified splice site sequences for canonical and non-canonical sites is presented in SpliceDB with the supporting evidence. We also built weight matrices for the major splice groups, which can be incorporated into gene prediction programs. SpliceDB is available at the computational genomic Web server of the Sanger Centre: http://genomic.sanger.ac. uk/spldb/SpliceDB.html and at http://www.softberry. com/spldb/SpliceDB.html.  相似文献   

10.
Ligand-Gated Ion Channels (LGIC) are polymeric transmembrane proteins involved in the fast response to numerous neurotransmitters. All these receptors are formed by homologous subunits and the last two decades revealed an unexpected wealth of genes coding for these subunits. The Ligand-Gated Ion Channel database (LGICdb) has been developed to handle this increasing amount of data. The database aims to provide only one entry for each gene, containing annotated nucleic acid and protein sequences. The repository is carefully structured and the entries can be retrieved by various criteria. In addition to the sequences, the LGICdb provides multiple sequence alignments, phylogenetic analyses and atomic coordinates when available. The database is accessible via the World Wide Web (http://www.pasteur.fr/recherche/banques/LGIC /LGIC.html), where it is continuously updated. The version 16 (September 2000) available for download contained 333 entries covering 34 species.  相似文献   

11.
The GenBank genetic sequence databank.   总被引:36,自引:6,他引:30       下载免费PDF全文
The GenBank Genetic Sequence Data Bank contains over 5700 entries for DNA and RNA sequences that have been reported since 1967. This paper briefly describes the contents of the database, the forms in which the database is distributed, and the services we offer to scientists who use the GenBank database.  相似文献   

12.
A strategy has been developed for the construction of a validated, comprehensive composite protein sequence database. Entries are amalgamated from primary source data bases by a largely automated set of processes in which redundant and trivially different entries are eliminated. A modular approach has been adopted to allow scientific judgement to be used at each stage of database processing and amalgamation. Source databases are assigned a priority depending on the quality of sequence validation and commenting. Rejection of entries from the lower priority database, in each pairwise comparison of databases, is carried out according to optionally defined redundancy criteria based on sequence segment mismatches. Efficient algorithms for this methodology are embodied in the COMPO software system. COMPO has been applied for over 2 years in construction and regular updating of the OWL composite protein sequence database from the source databases NBRF-PIR, SWISS-PROT, a GenBank translation retrieved from the feature tables, NBRF-NEW, NEWAT86, PSD-KYOTO and the sequences contained in the Brookhaven protein structure databank. OWL is part of the ISIS integrated data resource of protein sequence and structure [Akrigg et al. (1988) Nature, 335, 745-746]. The modular nature of the integration process greatly facilitates the frequent updating of OWL following releases of the source databases. The extent of redundancy in these sources is revealed by the comparison process. The advantages of a robust composite database for sequence similarity searching and information retrieval are discussed.  相似文献   

13.
Mapping from GenBank to MEDLINE   总被引:1,自引:1,他引:0       下载免费PDF全文
GenBank has been based largely on literature that provides nucleic acid sequences. To find additional literature that is relevant to a given sequence, a search of MEDLINE can prove helpful. This paper documents some of the similarities between GenBank and MEDLINE that facilitate retrieval of documents from MEDLINE. In particular, techniques and examples are presented which take GenBank information and lead to MEDLINE information that supplements the GenBank information.  相似文献   

14.
EXProt (database for EXPerimentally verified Protein functions) is a new non-redundant database containing protein sequences for which the function has been experimentally verified. It is a selection of 3976 entries from the Prokaryotes section of the EMBL Nucleotide Sequence Database, Release 66, and 375 entries from the Pseudomonas Community Annotation Project (PseudoCAP). The entries in EXProt all have a unique ID number and provide information about the organism, protein sequence, functional annotation, link to entry in original database, and if known, gene name and link to references in PubMed/Medline. The EXProt web page (http://www.cmbi.nl/EXProt) provides further details of the database and a link to a BLAST search (blastp & blastx) of the database. The EXProt entries are indexed in SRS (http://www.cmbi.nl/srs/) and can be searched by means of keywords. Authors can be reached by email (exprot(cmbi.kun.nl).  相似文献   

15.
MHCBN: a comprehensive database of MHC binding and non-binding peptides   总被引:6,自引:0,他引:6  
MHCBN is a comprehensive database of Major Histocompatibility Complex (MHC) binding and non-binding peptides compiled from published literature and existing databases. The latest version of the database has 19 777 entries including 17 129 MHC binders and 2648 MHC non-binders for more than 400 MHC molecules. The database has sequence and structure data of (a) source proteins of peptides and (b) MHC molecules. MHCBN has a number of web tools that include: (i) mapping of peptide on query sequence; (ii) search on any field; (iii) creation of data sets; and (iv) online data submission. The database also provides hypertext links to major databases like SWISS-PROT, PDB, IMGT/HLA-DB, GenBank and PUBMED.  相似文献   

16.
The Exon/Intron (ExInt) database incorporates information on the exon/intron structure of eukaryotic genes. Features in the database include: intron nucleotide sequence, amino acid sequence of the corresponding protein, position of the introns at the amino acid level and intron phase. From ExInt, we have also generated four additional databases each with ExInt entries containing predicted introns, introns experimentally defined, organelle introns or nuclear introns. ExInt is accessible through a retrieval system with pointers to GenBank. The database can be searched by keywords, locus name, NID, accession number or length of the protein. ExInt is freely accessible at http://intron.bic.nus.edu.sg/exint/exint.html  相似文献   

17.
GenBank.   总被引:8,自引:3,他引:5       下载免费PDF全文
The GenBank sequence database continues to expand its data coverage, quality control, annotation content and retrieval services for the scientific community. Besides handling direct submissions of sequence data from authors, GenBank also incorporates DNA sequences from all available public sources; an integrated retrieval system, known as Entrez, also makes available data from the major protein sequence and structural databases, and from U.S. and European patents. MIDLINE abstracts from published articles describing the sequences are also included as an additional source of biological annotation for sequence entries. GenBank supports distribution of the data via FTP, CD-ROM, and E-mail servers. Network server-client programs provide access to an integrated database for literature retrieval and sequence similarity searching.  相似文献   

18.
In this paper, we describe an automated system for distributingupdates to the GenBank nucleic acid sequence database, usingthe Usenet news system as the underlying transport mechanism.Our system allows new loci to be distributed as soon as thesequences are available, over existing networks, using existingUsenet software and infrastructure currently available on awide range of computer systems.  相似文献   

19.
家蚕组织蛋白酶D基因的克隆、序列分析及其表达谱研究   总被引:2,自引:0,他引:2  
组织蛋白酶D (cathepsin D,CtD)是溶酶体内天冬氨酸内切蛋白酶,参与机体多种生理病理过程,尤其在昆虫的发育变态过程中起着重要作用。利用NCBI上登录的组织蛋白酶D基因核酸序列和家蚕Bombyx mori表达序列标签(expressed sequence tags, EST)数据库,进行电子克隆获得家蚕组织蛋白酶D (BmCtD) 基因的全长cDNA (DQ010007)。该cDNA大小为1 543 bp,其中ORF长1 152 bp,同源性分析表明BmCtD与其他物种的CtD具有较高的相似性。BmCtD的mRNA存在选择性拼接,另外一种mRNA形式命名为BmCtDⅠ。RT-PCR实验表明该基因在本实验所调查的家蚕不同发育时期和组织中都有表达。  相似文献   

20.
Prediction of splice junctions in mRNA sequences.   总被引:8,自引:6,他引:2       下载免费PDF全文
K Nakata  M Kanehisa    C DeLisi 《Nucleic acids research》1985,13(14):5327-5340
A general method based on the statistical technique of discriminant analysis is developed to distinguish boundaries of coding and non-coding regions in nucleic acid sequences. In particular, the method is applied to the prediction of splicing sites in messenger RNA precursors. Information used for discrimination includes consensus sequence patterns around splice junctions, free energy of snRNA and mRNA base pairing, and statistical differences between coding and non-coding regions such as periodic appearance of specific bases in coding regions reflecting the non-random usage of degenerate codons. Given the reading frame of an exon (but not the exon/intron boundaries), the method will predict the following exon, namely, the intron to be excised out. When applied to human sequences in the GenBank database, the method correctly identified 80% of true splice junctions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号