首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
TransTerm: a database of translational signals.   总被引:3,自引:0,他引:3       下载免费PDF全文
The TransTerm database of sequence contexts of stop and start codons has been expanded to include approximately 50% more species than last year's release. It now contains 148 organisms and >39 500 coding sequences; it is now available on the World Wide Web. The database includes: (i) initiation and termination sequence contexts organized by species; (ii) summary parameters about the individual sequences (sequence length, GC%, GC3, Nc, CAI) in addition to tables of base frequencies for each species' stop and start codon sequence context; (iii) species codon usage tables; and (iv) summary tables of stop signal frequency.  相似文献   

2.
The TransTerm database of termination codon contexts has been extended to include sense codon usage, and initiation codon contexts. The database was constructed from 23,721 coding sequences from 93 organisms. The database contains: a) the sequence around the termination codon (-10, +10); b) the sequence around the initiation codon (-20, +10); c) the length, 'G+C%' of the third position of codons (GC3), the 'codon adaptation index' (CAI) and the 'effective number of codons' statistic (Nc); d) summary tables for each organism including total codon usage, stop codon and tetranucleotide stop-signal usage, and matrices tallying base frequencies at each position around the initiation and termination codons. The data are arranged to facilitate investigation of the relationships between the three phases of protein synthesis. The database is available electronically from EMBL.  相似文献   

3.
TransTerm is a database of mRNA sequences and parameters useful for detecting translational control signals in general. TransTerm-98 has been expanded beyond previous years to include full coding sequences and UTRs, while retaining the original small contexts about the coding sequence start- and stop-codons. The database contains more than 130 000 non-redundant coding sequences with associated untranslated regions (UTRs) from over 450 species. This includes the complete genomes of 12 prokaryotic and one eukaryotic organism. Several coding sequence parameters are available: coding sequence length, Nc, GC3 and, when it is computable, Codon Adaptation Index (CAI). Codon usage tables and summaries of start- and stop-codon contexts are also included. TransTerm-98 has both a relational database form with a WWW interface and a flatfile format, also available by Internet browser. TransTerm is available at: http://biochem.otago.ac.nz:800/Transterm/homepage.h tml  相似文献   

4.
TransTerm-97 contains more than 97 500 non-redundant coding-sequence initiation and termination contexts compiled from GenBank, release 101 (15-June-1997). In addition, several coding sequence parameters are available: coding sequence length, Nc, GC3, and, when it is computable, codon adaptation index (CAI). Codon usage tables and summaries of start and stop codon contexts are also included. The information covers more than 325 species and organelles, including seven complete bacterial genomes and one complete eukaryotic genome. To promote research in translational control of protein synthesis, TransTerm has been converted into a relational database to ease the process of making queries. The relational database manager, Postgresql, gives access to the database using SQL (Structured Query Language). A World Wide Web interface using forms is being completed to allow the casual user access to the database. Extensions are planned to include the full 5'-UTR, full coding sequence and 3'-UTR. TransTerm-97 is available on the World Wide Web at:http://biochem. otago.ac.nz:800/Transterm/homepage.html  相似文献   

5.
The translational termination signal database.   总被引:12,自引:5,他引:7       下载免费PDF全文
The Translational Termination Database (TransTerm) consists of the immediate context sequences around the natural termination codons from 45 organisms, and summary tables. The influence of termination codon context on their effectivness as stop signals has been widely documented. The SPECIES--TRI.DAT table shows trinucleotide stop codon usage in each organism and for comparison the occurrence of these sequences in the noncoding region. The SPECIES--TETRA.DAT table contains is a similar table of tetranucleotide stop signal usage. The database is available from EMBL.  相似文献   

6.
7.
Transterm facilitates studies of messenger RNAs and translational control signals. Each messenger RNA (mRNA) from GenBank is extracted and broken into its functional components, its coding sequence, initiation context, termination context, flanking sequence representing its 5' UTR (untranslated region), 3' UTR and translational signals. In addition, numerical parameters characterising each coding region in Transterm, including codon and GC bias, are available. For each species in Transterm, the initiation and termination regions are aligned by their start or stop codons and presented as base frequency matrices and tables of the information content of the bases in the alignments. Users can obtain summaries of characteristics of the mRNAs for species of their choice and search for translational signals both in the Transterm database and in their own sequence. The current release contains data from over 10 000 species, including the complete genomes of 20 prokaryotes and three eukaryotes. Both flat-file and relational database forms of Transterm are accessible via the WWW at http://biochem.otago.ac.nz/Transterm/  相似文献   

8.
In the present study, we developed a method for detecting sequences whose similarity to a target sequence is statistically significant and we examined the distribution of these sequences in the E. coli K-12 genome. Target sequences examined are as follows: (i) short repeat: Crossover hot-spot instigator (Chi) sequence, replication termination (Ter) sequence, and DnaA binding sequence (DnaA box); (ii) potential stem-loop structure repeats: palindromic unit (PU), boxC sequences, and intergenic repeat unit (IRU); (iii) potential RNA coding repeats: rRNAs, PAIR, TRIP, and QUAD; and (iv) potential protein coding repeats: insertion elements (ISs) and Long Direct Repeats (LDRs). We also examined the distribution of these sequences on leading and lagging strands. We obtained another four statistically significant LDR sequences with more than 187 bp matched to LDR-A near the LDR loci, suggesting that these regions might be used as high recombination hot spots for LDR. Adaptation of individual LDRs to E. coli genome is also discussed on the basis of codon usage.  相似文献   

9.
CRITICA: coding region identification tool invoking comparative analysis.   总被引:34,自引:0,他引:34  
Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).  相似文献   

10.
同义密码子用语的位置依赖   总被引:4,自引:0,他引:4  
研究了在大肠杆菌编码区不同位置上的同底密码子用语,发现许多氨基酸的密码子用语在转译起始区有显著的变化,仅有少数氨基酸在转译区有较弱的变化,由于密码子用语与基因表达关系密切。这些结果与实验发现的编码区5‘端密码子用对表达的重要性是一致的。更进一步的结果还暗示了哪些密码子在特定位置的使用可能会影响基因表达。  相似文献   

11.
The nucleotide sequences of Serratia marcescens trpG and the corresponding regions of Escherichia coli, Shigella dysenteriae and Salmonella typhimurium trpD have been determined. Analysis of the nucleotide sequence divergence suggests the following evolutionary relationships: Serratia-[Salmonella, (Escherichia, Shigella)]. Partial reconstruction of ancestral nucleotide sequences and subsequent analysis of nucleotide substitutions show that the majority of nucleotide substitutions in the evolution of trp(G)D are transitions that result in a reduction of G + C content. Since most of the nucleotide substitutions are in the third position of codons, bias in synonymous codon usage also reflects G + C content. The trpE-trp(G)D junction in the four organisms is characterized by overlapping translation termination and initiation codons. The relative positions of trpE and trp(G)D thus became fixed in evolution before the fusion of trpG and trpD. Nucleotide sequences representing the fusion of trpG and trpD in Escherichia, Shigella and Salmonella are not more nor less divergent than other portions of the trp(G)D coding sequences.  相似文献   

12.
Salim HM  Ring KL  Cavalcanti AR 《Protist》2008,159(2):283-298
We used the recently sequenced genomes of the ciliates Tetrahymena thermophila and Paramecium tetraurelia to analyze the codon usage patterns in both organisms; we have analyzed codon usage bias, Gln codon usage, GC content and the nucleotide contexts of initiation and termination codons in Tetrahymena and Paramecium. We also studied how these trends change along the length of the genes and in a subset of highly expressed genes. Our results corroborate some of the trends previously described in Tetrahymena, but also negate some specific observations. In both genomes we found a strong bias toward codons with low GC content; however, in highly expressed genes this bias is smaller and codons ending in GC tend to be more frequent. We also found that codon bias increases along gene segments and in highly expressed genes and that the context surrounding initiation and termination codons are always AT rich. Our results also suggest differences in the efficiency of translation of the reassigned stop codons between the two species and between the reassigned codons. Finally, we discuss some of the possible causes for such translational efficiency differences.  相似文献   

13.
MOTIVATION: Prediction of the coding potential for stretches of DNA is crucial in gene calling and genome annotation, where it is used to identify potential exons and to position their boundaries in conjunction with functional sites, such as splice sites and translation initiation sites. The ability to discriminate between coding and non-coding sequences relates to the structure of coding sequences, which are organized in codons, and by their biased usage. For statistical reasons, the longer the sequences, the easier it is to detect this codon bias. However, in many eukaryotic genomes, where genes harbour many introns, both introns and exons might be small and hard to distinguish based on coding potential. RESULTS: Here, we present novel approaches that specifically aim at a better detection of coding potential in short sequences. The methods use complementary sequence features, combined with identification of which features are relevant in discriminating between coding and non-coding sequences. These newly developed methods are evaluated on different species, representative of four major eukaryotic kingdoms, and extensively compared to state-of-the-art Markov models, which are often used for predicting coding potential. The main conclusions drawn from our analyses are that (1) combining complementary sequence features clearly outperforms current Markov models for coding potential prediction in short sequence fragments, (2) coding potential prediction benefits from length-specific models, and these models are not necessarily the same for different sequence lengths and (3) comparing the results across several species indicates that, although our combined method consistently performs extremely well, there are important differences across genomes. SUPPLEMENTARY DATA: http://bioinformatics.psb.ugent.be/.  相似文献   

14.
The nucleotide sequence running from the genetic left end of bacteriophage T7 DNA to within the coding sequence of gene 4 is given, except for the internal coding sequence for the gene 1 protein, which has been determined elsewhere. The sequence presented contains nucleotides 1 to 3342 and 5654 to 12,100 of the approximately 40,000 base-pairs of T7 DNA. This sequence includes: the three strong early promoters and the termination site for Escherichia coli RNA polymerase: eight promoter sites for T7 RNA polymerase; six RNAase III cleavage sites; the primary origin of replication of T7 DNA; the complete coding sequences for 13 previously known T7 proteins, including the anti-restriction protein, protein kinase, DNA ligase, the gene 2 inhibitor of E. coli RNA polymerase, single-strand DNA binding protein, the gene 3 endonuclease, and lysozyme (which is actually an N-acetylmuramyl-l-alanine amidase); the complete coding sequences for eight potential new T7-coded proteins; and two apparently independent initiation sites that produce overlapping polypeptide chains of gene 4 primase. More than 86% of the first 12,100 base-pairs of T7 DNA appear to be devoted to specifying amino acid sequences for T7 proteins, and the arrangement of coding sequences and other genetic elements is very efficient. There is little overlap between coding sequences for different proteins, but junctions between adjacent coding sequences are typically close, the termination codon for one protein often overlapping the initiation codon for the next. For almost half of the potential T7 proteins, the sequence in the messenger RNA that can interact with 16 S ribosomal RNA in initiation of protein synthesis is part of the coding sequence for the preceding protein. The longest non-coding region, about 900 base-pairs, is at the left end of the DNA. The right half of this region contains the strong early promoters for E. coli RNA polymerase and the first RNAase III cleavage site. The left end contains the terminal repetition (nucleotides 1 to 160), followed by a striking array of repeated sequences (nucleotides 175 to 340) that might have some role in packaging the DNA into phage particles, and an A · T-rich region (nucleotides 356 to 492) that contains a promoter for T7 RNA polymerase, and which might function as a replication origin.  相似文献   

15.
16.
17.
不具有3-碱基周期性的编码序列初探   总被引:4,自引:0,他引:4  
对120个较短编码序列(<1 200 bp)的Fourier频谱进行分析表明,3-碱基周期性在短编码序列中并不是绝对存在的.统计分析提示,编码序列有无3-碱基周期性与序列的碱基组成和分布、所编码蛋白质氨基酸的选用和顺序以及同义密码子的使用都有一定的关系.一般地,非周期-3序列中A+U含量高于G+C含量,周期-3序列的情况则相反;非周期-3序列中碱基在密码子三个位点上的分布比周期-3序列中的分布均匀;非周期-3序列密码子和氨基酸的使用偏向没有周期-3序列的大.在利用Fourier分析方法预测DNA序列中的基因和外显子时,应充分考虑到这些现象.  相似文献   

18.
Codon usage tables have been produced for E. coli, yeast, human, and mouse. The nonrandom employment of codons allows assignment of probability values to trinucleotides in any DNA sequence. These values represent the probability that a given trinucleotide is used as a codon in the organism from which the table is derived. For the graphical delineation of coding areas in DNA sequences, a probability is assigned to each trinucleotide equal to its frequency in the codon table. Averaging and smoothing procedures then greatly enhance the detectability of areas of high average codon probability and better represent the mean codon probability. These manipulations increase graphical clarity without altering the overall magnitude of probabilities. Averaging introduces an error of less than 0.5% between "raw" and smoothed data. This graphical delineation of coding sequences does not depend on the presence of punctuation, ribosomal binding sites, etc: moreover the delineation of introns and exons is also possible.  相似文献   

19.
Entire genomes of hepatitis B virus (subtype adr) have been cloned. The nucleotide sequence data were compared with other sequences of HBV genome including: adw [Valenzuela et al. (1981) in Animal Virus Genetics. Fields et al. eds. Academic Press, Inc., NY. pp. 57-70], ayw [Galibert et al. (1979) Nature, 281, 646-650], and adyw [Pasek et al. (1979) Nature 282, 575-579]. Four open coding frames for polypeptides larger than 6,000 dalton were found to be conserved and were highly compressed by overlapping with each other in one strand (L-strand). Sites of initiation of the S gene and termination of the P gene were not conserved. No conserved coding frame was found on the opposite strand (S strand). Amino acid sequences of six surface antigen (HBsAg) peptides, including subtypes adr, adw, and ayw, are deduced from the DNA sequences, and the substitution of amino acid residues which are consistent with the change of subtypes are demonstrated.  相似文献   

20.
The frequencies of each of the 257 468 complete protein coding sequences (CDSs) have been compiled from the taxonomical divisions of the GenBank DNA sequence database. The sum of the codons used by 8792 organisms has also been calculated. The data files can be obtained from the anonymous ftp sites of DDBJ, Kazusa and EBI. A list of the codon usage of genes and the sum of the codons used by each organism can be obtained through the web site http://www.kazusa.or.jp/codon/ . The present study also reports recent developments on the WWW site. The new web interface provides data in the CodonFrequency-compatible format as well as in the traditional table format. The use of the database is facilitated by keyword based search analysis and the availability of codon usage tables for selected genes from each species. These new tools will provide users with the ability to further analyze for variations in codon usage among different genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号