首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
O-GLYCBASE is a database of glycoproteins with O-linked glycosylation sites. Entries with at least one experimentally verified O-glycosylation site have been compiled from protein sequence databases and literature. Each entry contains information about the glycan involved, the species, sequence, a literature reference and http-linked cross-references to other databases. Version 4.0 contains 179 protein entries, an approximate 15% increase over the last version. Sequence logos representing the acceptor specificity patterns for GalNAc, GlcNAc, mannosyl and xylosyl transferases are shown. The O-GLYCBASE database is available through the WWW at http://www.cbs.dtu.dk/databases/OGLYCBASE/  相似文献   

3.
The 5' and 3' untranslated regions of eukaryotic mRNAs may play a crucial role in the regulation of gene expression controlling mRNA localization, stability and translational efficiency. For this reason we developed UTRdb, a specialized database of 5' and 3' untranslated sequences of eukaryotic mRNAs cleaned from redundancy. UTRdb entries are enriched with specialized information not present in the primary databases including the presence of nucleotide sequence patterns already demonstrated by experimental analysis to have some functional role. All these patterns have been collected in the UTRsite database so that it is possible to search any input sequence for the presence of annotated functional motifs. Furthermore, UTRdb entries have been annotated for the presence of repetitive elements. All internet resources implemented for retrieval and functional analysis of 5' and 3' untranslated regions of eukaryotic mRNAs are accessible at http://bigarea.area.ba.cnr.it:8000/EmbIT/UTRH ome/  相似文献   

4.
The 5' and 3' untranslated regions of eukaryotic mRNAs may play a crucial role in the regulation of gene expression controlling mRNA localization, stability and translational efficiency. For this reason we developed UTRdb (http://bigarea.area.ba.cnr.it:8000/BioWWW/#U TRdb), a specialized database of 5' and 3' untranslated sequences of eukaryotic mRNAs cleaned from redundancy. UTRdb entries are enriched with specialized information not present in the primary databases including the presence of nucleotide sequence patterns already demonstrated by experimental analysis to have some functional role. All these patterns have been collected in the UTRsite database so that it is possible to search any input sequence for the presence of annotated functional motifs. Furthermore, UTRdb entries have been annotated for the presence of repetitive elements.  相似文献   

5.
6.
7.
MOTIVATION: Tandemly organized repetitive sequences (satellite DNA) are widespread in complex eukaryotic genomes. In plants, satellite repeats often represent a substantial part of nuclear DNA but only a little is known about the molecular mechanisms of their amplification and their possible role(s) in genome evolution and function. Unfortunately, addressing these questions via characterization of general sequence properties of known satellite repeats has been hindered by a difficulty in obtaining a complete and unbiased set of sequence data for this analysis. This is mainly due to the presence of multiple entries of homologous sequences and of single entries that contain more than one repeated unit (monomer) in the public databases. RESULTS: We have established a computer database specialized for plant satellite repeats (PlantSat) that integrates sequence data available from various resources with supplementary information including repeat consensus sequences, abundances, and chromosomal localizations. The sequences are stored as individual repeat monomers grouped into families, which simplifies their computer analysis and makes it more accurate. Using this feature, we have performed a basic sequence analysis of the whole set of plant satellite repeats with respect to their monomer length and nucleotide composition. The analysis revealed several preferred length ranges of the monomers (approximately 165 bp and its multiples) and an over-representation of the AA/TT dinucleotide in the repeats. We have also detected an enrichment of satellite DNA sequences for the motif CAAAA that is supposed to be involved in breakage-reunion of repeated sequences.  相似文献   

8.
The 1999 SWISS-2DPAGE database update   总被引:9,自引:0,他引:9  
SWISS-2DPAGE (http://www.expasy.ch/ch2d/ ) is an annotated two-dimensional polyacrylamide gel electro-phoresis (2-DE) database established in 1993. The current release contains 24 reference maps from human and mouse biological samples, as well as from Saccharomyces cerevisiae, Escherichia coli and Dictyostelium discoideum origin. These reference maps have now 2824 identified spots, corresponding to 614 separate protein entries in the database, in addition to virtual entries for each SWISS-PROT sequence or any user-entered amino acids sequence. Last year improvements in the SWISS-2DPAGE database are as follows: three new maps have been created and several others have been updated; cross-references to newly built federated 2-DE databases have been added; new functions to access the data have been provided through the ExPASy proteomics server.  相似文献   

9.
UniProt archive     
UniProt Archive (UniParc) is the most comprehensive, non-redundant protein sequence database available. Its protein sequences are retrieved from predominant, publicly accessible resources. All new and updated protein sequences are collected and loaded daily into UniParc for full coverage. To avoid redundancy, each unique sequence is stored only once with a stable protein identifier, which can be used later in UniParc to identify the same protein in all source databases. When proteins are loaded into the database, database cross-references are created to link them to the origins of the sequences. As a result, performing a sequence search against UniParc is equivalent to performing the same search against all databases cross-referenced by UniParc. UniParc contains only protein sequences and database cross-references; all other information must be retrieved from the source databases.  相似文献   

10.
To evaluate the importance of the surrounding nucleotide sequence in the selection of a splice site for mRNA, we have carried out computer studies of eukaryotic protein genes whose entire nucleotide sequences were available. A splice site-like sequence that has a significant homology to the consensus splice junction sequences is frequently found within an intron and exon. It is found that the higher the homology of a candidate donor site sequence to the nine-nucleotide consensus sequence, the higher is its probability of being a donor site. For most of the donors, the stability of presumed base-pairing with U1-RNA is higher than that of donor-like sequences, if any, in the adjacent exon and intron. However, homology of a candidate acceptor sequence to the 15-nucleotide consensus is a poor criterion of an acceptor site. The presence of a sequence that could serve as a branch-point 18 to 37 nucleotides before an acceptor does not seem to be critical in distinguishing it from an acceptor-like sequence. For genes of human, rat, mouse and chicken, respectively, nucleotide frequencies around splice junctions of many genes have been calculated. They seem to be different at some positions around a donor site from species to species. The acceptors for these vertebrates have longer pyrimidine-rich regions than the previous consensus sequence. The newly derived nucleotide frequencies were used as the standard to calculate the weighted homology score of a candidate splice site sequence in a gene of the four species. This weighted homology score of the 40 to 60-nucleotide intron-exon sequence is a much better criterion of an acceptor. These results suggest that the most important signal in the selection of a splice resides in the surrounding nucleotide sequence. It is also suggested that the surrounding nucleotide sequence alone is not generally sufficient for the selection.  相似文献   

11.
BioThesaurus is a web-based system designed to map a comprehensive collection of protein and gene names to protein entries in the UniProt Knowledgebase. Currently covering more than two million proteins, BioThesaurus consists of over 2.8 million names extracted from multiple molecular biological databases according to the database cross-references in iProClass. The BioThesaurus web site allows the retrieval of synonymous names of given protein entries and the identification of protein entries sharing the same names. AVAILABILITY: BioThesaurus is accessible for online searching at http://pir.georgetown.edu/iprolink/biothesaurus  相似文献   

12.
13.
14.
15.
GenBank.   总被引:19,自引:15,他引:19       下载免费PDF全文
D Benson  D J Lipman    J Ostell 《Nucleic acids research》1993,21(13):2963-2965
The GenBank sequence database has undergone an expansion in data coverage, annotation content and the development of new services for the scientific community. In addition to nucleotide sequences, data from the major protein sequence and structural databases, and from U.S. and European patents is now included in an integrated system. MEDLINE abstracts from published articles describing the sequences provide an important new source of biological annotation for sequence entries. In addition to the continued support of existing services, new CD-ROM and network-based systems have been implemented for literature retrieval and sequence similarity searching. Major releases of GenBank are now more frequent and the data are distributed in several new forms for both end users and software developers.  相似文献   

16.
MOTIVATION: Promoter prediction is important for the analysis of gene regulations. Although a number of promoter prediction algorithms have been reported in literature, significant improvement in prediction accuracy remains a challenge. In this paper, an effective promoter identification algorithm, which is called PromoterExplorer, is proposed. In our approach, we analyze the different roles of various features, that is, local distribution of pentamers, positional CpG island features and digitized DNA sequence, and then combine them to build a high-dimensional input vector. A cascade AdaBoost-based learning procedure is adopted to select the most 'informative' or 'discriminating' features to build a sequence of weak classifiers, which are combined to form a strong classifier so as to achieve a better performance. The cascade structure used for identification can also reduce the false positive. RESULTS: PromoterExplorer is tested based on large-scale DNA sequences from different databases, including the EPD, DBTSS, GenBank and human chromosome 22. Experimental results show that consistent and promising performance can be achieved.  相似文献   

17.
18.
The Swiss-Prot protein knowledgebase provides manually annotated entries for all species, but concentrates on the annotation of entries from model organisms to ensure the presence of high quality annotation of representative members of all protein families. A specific Plant Protein Annotation Program (PPAP) was started to cope with the increasing amount of data produced by the complete sequencing of plant genomes. Its main goal is the annotation of proteins from the model plant organism Arabidopsis thaliana. In addition to bibliographic references, experimental results, computed features and sometimes even contradictory conclusions, direct links to specialized databases connect amino acid sequences with the current knowledge in plant sciences. As protein families and groups of plant-specific proteins are regularly reviewed to keep up with current scientific findings, we hope that the wealth of information of Arabidopsis origin accumulated in our knowledgebase, and the numerous software tools provided on the Expert Protein Analysis System (ExPASy) web site might help to identify and reveal the function of proteins originating from other plants. Recently, a single, centralized, authoritative resource for protein sequences and functional information, UniProt, was created by joining the information contained in Swiss-Prot, Translation of the EMBL nucleotide sequence (TrEMBL), and the Protein Information Resource-Protein Sequence Database (PIR-PSD). A rising problem is that an increasing number of nucleotide sequences are not being submitted to the public databases, and thus the proteins inferred from such sequences will have difficulties finding their way to the Swiss-Prot or TrEMBL databases.  相似文献   

19.
20.
The activity of eukaryotic promoters is highly sensitive to site-specific modifications by DNA methylations. We have used the E1A promoter of adenovirus type 12 (Ad12) DNA to investigate the effects of methylations at different promoter sites on its activity. The chloramphenicol acetyltransferase gene has served as an activity indicator. Activity of the E1A promoter is lost or markedly decreased by deoxycytidine methylation of two HpaII (5'-C-C-G-G-3') or seven HhaI (5'-G-C-G-C-3') sites upstream from the 3' located T-A-T-A signal. There are two T-A-T-A signals in the E1A promoter of adenovirus type 12 DNA, one T-A-T-T-A-T sequence starting at nucleotide 276 (5' located), a second T-A-T-T-T-A-A sequence starting at nucleotide 414 (3' located). Deoxycytidine methylations at two AluI (5'-A-G-C-T-3') sites downstream from the 5' located T-A-T-A signal have no effect on promoter activity. When one EcoRI (5'-G-A-A-T-T-C-3') or one TaqI (5'-T-C-G-A-3') sequence at 281 base-pairs upstream or 61 base-pairs downstream from the 5' located E1A T-A-T-A signal, respectively, is deoxyadenosine methylated, the promoter becomes inactive. Deoxyadenosine methylation at one MboI (5'-G-A-T-C-3') site, which is located 127 nucleotides downstream from the 5' located T-A-T-A signal, fails to decrease E1A promoter activity. There is no conspicuous anatomical relation of any of these sites to the two presumptive enhancer sequences in the E1A promoter. We conclude that 5-deoxymethylcytidine or N6-methyldeoxyadenosine residues have to be introduced at highly specific promoter sites to inactivate the promoter. These sites are probably different for different promoters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号