首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: The context of the start codon (typically, AUG) and the features of the 5' Untranslated Regions (5' UTRs) are important for understanding translation regulation in eukaryotic mRNAs and for accurate prediction of the coding region in genomic and cDNA sequences. The presence of AUG triplets in 5' UTRs (upstream AUGs) might effect the initiation rate and, in the context of gene prediction, could reduce the accuracy of the identification of the authentic start. To reveal potential connections between the presence of upstream AUGs and other features of 5' UTRs, such as their length and the start codon context, we undertook a systematic analysis of the available eukaryotic 5' UTR sequences. RESULTS: We show that a large fraction of 5' UTRs in the available cDNA sequences, 15-53% depending on the organism, contain upstream ATGs. A negative correlation was observed between the information content of the translation start signal and the length of the 5' UTR. Similarly, a negative correlation exists between the 'strength' of the start context and the number of upstream ATGs. Typically, cDNAs containing long 5' UTRs with multiple upstream ATGs have a 'weak' start context, and in contrast, cDNAs containing short 5' UTRs without ATGs have 'strong' starts. These counter-intuitive results may be interpreted in terms of upstream AUGs having an important role in the regulation of translation efficiency by ensuring low basal translation level via double negative control and creating the potential for additional regulatory mechanisms. One of such mechanisms, supported by experimental studies of some mRNAs, includes removal of the AUG-containing portion of the 5' UTR by alternative splicing. AVAILABILITY: An ATG_ EVALUATOR program is available upon request or at www.itba.mi.cnr.it/webgene. CONTACT: rogozin@ncbi.nlm.nih.gov, milanesi@itba.mi.cnr.it.  相似文献   

2.
ISSD Version 2.0: taxonomic range extended.   总被引:7,自引:0,他引:7       下载免费PDF全文
Two more organisms from different taxonomic groups were added to a new version of the Integrated Sequence-Structure Database (ISSD). ISSD serves as an integrated source of sequence and structure information for the analysis of correlations between mRNA synonymous codon usage and three-dimensional structure of the encoded proteins. ISSD now holds 88 non-homologous Escherichia coli proteins and 25 yeast Saccharomyces cerevisiae proteins in addition to the expanded set of mammalian proteins, which includes 166 proteins (107 in ISSD Version 1.0). Comparison of ISSD sequences with organism-specific codon usage data derived from CUTG database shows that it is a representative subset of the GenBank coding sequences data. Preliminary results of the statistical analysis confirm that sequence-structure correlations observed by us earlier are also present in the upgraded ISSD (Version 2.0), including bacterial and yeast proteins. The ISSD Version 2.0 release includes an improved Web-based data search and retrieval system and is accessible via URL http://www.protein.bio.msu.su/issd/. ISSD can be also accessed at ExPASy, URL http://www.expasy.ch/swissmod/swiss-model.htm l  相似文献   

3.
Mitochondria, besides their central role in energy metabolism, have recently been found to be involved in a number of basic processes of cell life and to contribute to the pathogenesis of many degenerative diseases. All functions of mitochondria depend on the interaction of nuclear and organellar genomes. Mitochondrial genomes have been extensively sequenced and analysed and the data collected in several specialised databases. In order to collect information on nuclear coded mitochondrial proteins we developed MitoNuc and MitoAln, two related databases containing, respectively, detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa and yeast, and the multiple alignments of the relevant homologous protein coding regions. MitoNuc and MitoAln retrieval through SRS at http://bio-www.ba.cnr.it:8000/srs6/ can easily allow the extraction of sequence data, subsequences defined by specific features and nucleotide or amino acid multiple alignments.  相似文献   

4.
The AMmtDB database (http://bio-www.ba.cnr.it:8000/srs6/ ) has been updated by collecting the multi-aligned sequences of Chordata mitochondrial genes coding for proteins and tRNAs. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. AMmtDB data selected through SRS can be viewed and managed using GeneDoc or other programs for the management of multi-aligned data depending on the user's operative system. The multiple alignments have been produced with CLUSTALW and PILEUP programs and then carefully optimized manually.  相似文献   

5.
Sample classification and class prediction is the aim of many gene expression studies. We present a web-based application, Prophet, which builds prediction rules and allows using them for further sample classification. Prophet automatically chooses the best classifier, along with the optimal selection of genes, using a strategy that renders unbiased cross-validated errors. Prophet is linked to different microarray data analysis modules, and includes a unique feature: the possibility of performing the functional interpretation of the molecular signature found. Availability: Prophet can be found at the URL http://prophet.bioinfo.cipf.es/ or within the GEPAS package at http://www.gepas.org/ Supplementary information: http://gepas.bioinfo.cipf.es/tutorial/prophet.html.  相似文献   

6.
The GoSh database is a collection of 58 990 Capra hircus and Ovis aries expressed sequence tags. A perl pipeline was prepared to process sequences, and data were collected in a MySQL database. A PHP-based web interface allows browsing and querying the database. Putative single nucleotide polymorphism (SNP) detection, as well as search to repeats were performed, and links to external related resources were provided. Sequences were annotated against three different databases and an algorithm was implemented to create statistics of the distribution of retrieved homologous ontologies in the Gene Ontology categories. The GoSh database is a repository of data and links related to goat and sheep expressed genes. AVAILABILITY: The GoSh database is available at http://www.itb.cnr.it/gosh/  相似文献   

7.
SUMMARY: HELM is a web tool designed to automate the analysis of protein sequences searching for alpha helix motifs. This analysis can be useful in protein engineering studies, aimed at the identification of regions to be modified in order to obtain more suitable features of local and/or global stability. AVAILABILITY: The tool is available to academic and commercial institutions at the URL http://crisceb.area.na.cnr.it/angelo/ PROTEIN_TOOLS/HELM/ CONTACT: angelo@crisceb.area.na.cnr.it  相似文献   

8.
The AMmtDB database (http://bighost.area.ba.cnr.it/mitochondriome) has been updated by collecting the multi-aligned sequences of Chordata and Invertebrata mitochondrial genes coding for proteins and tRNAs. Links to the multi-aligned mtDNA intraspecies variants, collected in VarMmtDB at the Mitochondriome web site, have been introduced. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. AMmtDB data selected through SRS can be viewed and managed using GeneDoc or other programs for the management of multi-aligned data depending on the user’s operative system. The multiple alignments have been produced with CLUSTALW and PILEUP programs and then carefully optimized manually.  相似文献   

9.
We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and straight phi,psi angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for each entry are selected on the basis of exact matches of the source organism and cell environment. The current version 1.0 of ISSD is available on the WWW at http://www.protein.bio.msu.su/issd/ and includes 107 non-homologous mammalian proteins, of which 80 are human proteins. The database has been used by us for the analysis of synonymous codon usage patterns in mRNA sequences showing their correlation with the three-dimensional structure features in the encoded proteins. Possible ISSD applications include optimisation of protein expression, improvement of the protein structure prediction accuracy, and analysis of evolutionary aspects of the nucleotide sequence-protein structure relationship.  相似文献   

10.
11.
12.
MOTIVATION: Accurate gene structure annotation is a challenging computational problem in genomics. The best results are achieved with spliced alignment of full-length cDNAs or multiple expressed sequence tags (ESTs) with sufficient overlap to cover the entire gene. For most species, cDNA and EST collections are far from comprehensive. We sought to overcome this bottleneck by exploring the possibility of using combined EST resources from fairly diverged species that still share a common gene space. Previous spliced alignment tools were found inadequate for this task because they rely on very high sequence similarity between the ESTs and the genomic DNA. RESULTS: We have developed a computer program, GeneSeqer, which is capable of aligning thousands of ESTs with a long genomic sequence in a reasonable amount of time. The algorithm is uniquely designed to tolerate a high percentage of mismatches and insertions or deletions in the EST relative to the genomic template. This feature allows use of non-cognate ESTs for gene structure prediction, including ESTs derived from duplicated genes and homologous genes from related species. The increased gene prediction sensitivity results in part from novel splice site prediction models that are also available as a stand-alone splice site prediction tool. We assessed GeneSeqer performance relative to a standard Arabidopsis thaliana gene set and demonstrate its utility for plant genome annotation. In particular, we propose that this method provides a timely tool for the annotation of the rice genome, using abundant ESTs from other cereals and plants. AVAILABILITY: The source code is available for download at http://bioinformatics.iastate.edu/bioinformatics2go/gs/download.html. Web servers for Arabidopsis and other plant species are accessible at http://www.plantgdb.org/cgi-bin/AtGeneSeqer.cgi and http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi, respectively. For non-plant species, use http://bioinformatics.iastate.edu/cgi-bin/gs.cgi. The splice site prediction tool (SplicePredictor) is distributed with the GeneSeqer code. A SplicePredictor web server is available at http://bioinformatics.iastate.edu/cgi-bin/sp.cgi  相似文献   

13.
Computer system mRNA-FAST (mRNA--Function, Activity, STructure; http://wwwmgs.bionet.nsc.ru/mgs/dbases/trsig/) is described. The system has been developed to analyze nucleotide sequences of mRNA and to measure their essential properties. The system compiles the data base on translation signals including nucleotide sequences of the regulatory regions with structural and experimental information on their specific activities. It also contains programs to search for local homology between mRNA and translation signals, to search for potential signals basing on analysis of the oligonucleotide dictionaries, and to model secondary RNA structure. Possible applications of the system mRNA-FAST are discussed.  相似文献   

14.
MOTIVATION: Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences. RESULTS: In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions. The total prediction accuracies reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic organisms. Predictions by our approach are robust to errors in the protein N-terminal sequences. This new approach provides superior prediction performance compared with existing algorithms based on amino acid composition and can be a complementary method to other existing methods based on sorting signals. AVAILABILITY: A web server implementing the prediction method is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/. SUPPLEMENTARY INFORMATION: Supplementary material is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/.  相似文献   

15.
HUGE is a database for human large proteins newly identified by Kazusa cDNA project, which aims to predict protein primary structures from sequences of human large cDNAs (>4 kb). In particular, cDNA clones capable of coding for large proteins (>50 kDa) are current targets of the project. More than 700 sequences of human cDNAs (average size, 5.1 kb) have been determined to date and deposited in the public databases. Notable information implied from the cDNAs and the predicted protein sequences can be obtained through HUGE via the World Wide Web at URL http://www.kazusa.or.jp/huge  相似文献   

16.
Computer system mRNA-FAST (mRNA Function, Activity, STructure; http://wwwmgs.bionet.nsc.ru/mgs/dbases/trsig/) is described. The system has been developed to analyze nucleotide sequences of mRNA and to measure their essential properties. The system compiles the data base on translation signals including nucleotide sequences of the regulatory regions with structural and experimental information on their specific activities. It also contains programs to search for local homology between mRNA and translation signals, to search for potential signals basing on analysis of the oligonucleotide dictionaries, and to model secondary RNA structure. Possible applications of the system mRNA-FAST are discussed.  相似文献   

17.
The present paper describes the improvements in MmtDB, a specialised database designed to collect Metazoa mitochondrial DNA variants. Priority in the data collection has been given to Metazoa for which a large amount of variants is available, e.g., for humans. Starting from the sequences available in the Nucleotide Sequence Databases, the redundant sequences have been removed and new sequences from other sources have been added. Value-added information is associated to each variant sequence, e.g., analysed region, experimental method, tissue and cell lines, population data, sex, age, family code and information about the variation events (nucleotide position, involved gene, restriction site gain or loss). Cross-references are introduced to the EMBL Data Library, as well as an internal cross-referencing among MmtDB entries according to tissual, heteroplasmic, familiar and aplotypical correlation. Furthermore MmtDB has a new section, AMmtDB: Aligned Metazoan mitochondrial biosequences. MmtDB can be accessed through the World Wide Web at URL http://WWW.ba.cnr.it/[symbol: see text]areamt08/MmtDBWWW.htm  相似文献   

18.
KEYnet is a database where gene and protein names are hierarchically structured. Particular care has been devoted to the search and organisation of synonyms. The structuring is based on biological criteria in order to assist the user in data search and to minimise the risk of information loss. Links to the EMBL data library by the entry name and the accession number are implemented. KEYnet is available through the WWW at the following site: http://www.ba.cnr.it/keynet.html  相似文献   

19.
Nair R  Rost B 《Nucleic acids research》2003,31(13):3337-3340
LOC3D (http://cubic.bioc.columbia.edu/db/LOC3d/) is both a weekly-updated database and a web server for predictions of sub-cellular localization for eukaryotic proteins of known three-dimensional (3D) structure. Localization is predicted using four different methods: (i) PredictNLS, prediction of nuclear proteins through nuclear localization signals; (ii) LOChom, inferring localization through sequence homology; (iii) LOCkey, inferring localization through automatic text analysis of SWISS-PROT keywords; and (iv) LOC3Dini, ab initio prediction through a system of neural networks and vector support machines. The final prediction is based on the method that predicts localization with the highest confidence. The LOC3D database currently contains predictions for >8700 eukaryotic protein chains taken from the Protein Data Bank (PDB). The web server can be used to predict sub-cellular localization for proteins for which only a predicted structure is available from threading servers. This makes the resource of particular interest to structural genomics initiatives.  相似文献   

20.
A proteomic approach combining two-dimensional electrophoresis, Western blot and matrix-assisted laser desorption tandem time-of-flight mass spectrometry has been used to map the extracellular proteins of Streptococcus equi ssp. zooepidemicus ( S . zooepidemicus ) strain ATCC 35246. These bioinformatic technologies facilitated the identification of novel S . zooepidemicus vaccine candidate antigens and therapeutic agents. Despite the limitations posed by the unavailability of complete genome and proteome data for S . zooepidemicus , seven of 15 chosen immunogenic spots were successfully identified as streptococcal proteins (AE1 and AE4 c . 10) from homologous Streptococcus species. Among these, AE6 and AE7 were identified as S . zooepidemicus UDP- N -acetyl-glucosamine pyrophosphorylase and UDP-glucose pyrophosphorylase proteins. In addition, AE4 was determined to be glyceraldehyde-3-phosphate dehydrogenase from Enterococcus faecalis . Following signalip 3.0 ( http://www.cbs.dtu.dk/servicess/SignalIP ) prediction, data suggested that AE5, AE7 and AE9 contained signal peptides. blast ( http://www.sanger.ac.uk ) results found that nucleotide sequences of all identified proteins shared high homology (≥65%) with S. zooepidemicus . The majority of proteins identified in our study remain formally unreported in S. zooepidemicus . However, these proteins serve a vital role in the immune system and reproduction of host species. Therefore, we further evaluated the proteins as vaccine candidates in this study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号