首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Codon usage in 87 602 genes has been calculated using the nucleotide sequence data obtained from the GenBank Genetic Sequence Data Bank (Release 90.0; September 1995). The database is called the CUTG Database; the complete form of the database can be obtained by anonymous ftp from DDBJ and a part of the database, which lists the frequency of codon use in each organism, is made searchable through our World Wide Web server.  相似文献   

2.
Frequencies for each of the 206 526 complete protein-coding genes (CDS's) have been compiled from taxonomical divisions of the GenBank DNA sequence database. The sum of the codon use of 7434 organisms has also been calculated. These data files can be obtained from anonymous ftp sites of DDBJ, DISC and EBI. The list of the codon usage of genes in an organism as well as the sum of the codon usage of the organism was made searchable by the name of organism through a web site http://www.dna.affrc.go.jp//CUTG.html  相似文献   

3.
The codon usage in individual protein genes has been calculated using the nucleotide sequence obtained from the GenBank Genetic Sequence Database. Sum of the codon use of each organism has been also calculated. The data files can be obtained from anonymous ftp sites of DDBJ, DISC and EBI. The list of codon usage of genes in organisms was made searchable by name of organism through a web site. The compilation has been synchronized with a major release of GenBank.  相似文献   

4.
The frequencies of each of the 257 468 complete protein coding sequences (CDSs) have been compiled from the taxonomical divisions of the GenBank DNA sequence database. The sum of the codons used by 8792 organisms has also been calculated. The data files can be obtained from the anonymous ftp sites of DDBJ, Kazusa and EBI. A list of the codon usage of genes and the sum of the codons used by each organism can be obtained through the web site http://www.kazusa.or.jp/codon/ . The present study also reports recent developments on the WWW site. The new web interface provides data in the CodonFrequency-compatible format as well as in the traditional table format. The use of the database is facilitated by keyword based search analysis and the availability of codon usage tables for selected genes from each species. These new tools will provide users with the ability to further analyze for variations in codon usage among different genomes.  相似文献   

5.
The TransTerm database of termination codon contexts has been extended to include sense codon usage, and initiation codon contexts. The database was constructed from 23,721 coding sequences from 93 organisms. The database contains: a) the sequence around the termination codon (-10, +10); b) the sequence around the initiation codon (-20, +10); c) the length, 'G+C%' of the third position of codons (GC3), the 'codon adaptation index' (CAI) and the 'effective number of codons' statistic (Nc); d) summary tables for each organism including total codon usage, stop codon and tetranucleotide stop-signal usage, and matrices tallying base frequencies at each position around the initiation and termination codons. The data are arranged to facilitate investigation of the relationships between the three phases of protein synthesis. The database is available electronically from EMBL.  相似文献   

6.
ISSD Version 2.0: taxonomic range extended.   总被引:7,自引:0,他引:7       下载免费PDF全文
Two more organisms from different taxonomic groups were added to a new version of the Integrated Sequence-Structure Database (ISSD). ISSD serves as an integrated source of sequence and structure information for the analysis of correlations between mRNA synonymous codon usage and three-dimensional structure of the encoded proteins. ISSD now holds 88 non-homologous Escherichia coli proteins and 25 yeast Saccharomyces cerevisiae proteins in addition to the expanded set of mammalian proteins, which includes 166 proteins (107 in ISSD Version 1.0). Comparison of ISSD sequences with organism-specific codon usage data derived from CUTG database shows that it is a representative subset of the GenBank coding sequences data. Preliminary results of the statistical analysis confirm that sequence-structure correlations observed by us earlier are also present in the upgraded ISSD (Version 2.0), including bacterial and yeast proteins. The ISSD Version 2.0 release includes an improved Web-based data search and retrieval system and is accessible via URL http://www.protein.bio.msu.su/issd/. ISSD can be also accessed at ExPASy, URL http://www.expasy.ch/swissmod/swiss-model.htm l  相似文献   

7.
从GenBank获得大肠杆菌K-12MG1655株的全基因组序列,计算了与基因密码子偏好性相关的多个参数(Nc、CAI、GC、GC3s),对其mRNA编码区长度、形成二级结构倾向与密码子偏好性之间的关系进行了统计学分析,发现虽然翻译效率(包括翻译速度和翻译精度)是制约大肠杆菌高表达基因的密码子偏好性的主要因素,同时,mRNA编码区长度及其形成二级结构的倾向也是形成这种偏好性的不可忽略的原因,而且对偏好性有一定程度的削弱。另外对mRNA编码区形成二级结构倾向的生物学意义进行了讨论分析。  相似文献   

8.
The translational termination signal database.   总被引:12,自引:5,他引:7       下载免费PDF全文
The Translational Termination Database (TransTerm) consists of the immediate context sequences around the natural termination codons from 45 organisms, and summary tables. The influence of termination codon context on their effectivness as stop signals has been widely documented. The SPECIES--TRI.DAT table shows trinucleotide stop codon usage in each organism and for comparison the occurrence of these sequences in the noncoding region. The SPECIES--TETRA.DAT table contains is a similar table of tetranucleotide stop signal usage. The database is available from EMBL.  相似文献   

9.
The Horizontal Gene Transfer DataBase (HGT-DB) is a genomic database that includes statistical parameters such as G+C content, codon and amino-acid usage, as well as information about which genes deviate in these parameters for prokaryotic complete genomes. Under the hypothesis that genes from distantly related species have different nucleotide compositions, these deviated genes may have been acquired by horizontal gene transfer. The current version of the database contains 88 bacterial and archaeal complete genomes, including multiple chromosomes and strains. For each genome, the database provides statistical parameters for all the genes, as well as averages and standard deviations of G+C content, codon usage, relative synonymous codon usage and amino-acid content. It also provides information about correspondence analyses of the codon usage, plus lists of extraneous group of genes in terms of G+C content and lists of putatively acquired genes. With this information, researchers can explore the G+C content and codon usage of a gene when they find incongruities in sequence-based phylogenetic trees. A search engine that allows searches for gene names or keywords for a specific organism is also available. HGT-DB is freely accessible at http://www.fut.es/~debb/HGT.  相似文献   

10.
TransTerm-97 contains more than 97 500 non-redundant coding-sequence initiation and termination contexts compiled from GenBank, release 101 (15-June-1997). In addition, several coding sequence parameters are available: coding sequence length, Nc, GC3, and, when it is computable, codon adaptation index (CAI). Codon usage tables and summaries of start and stop codon contexts are also included. The information covers more than 325 species and organelles, including seven complete bacterial genomes and one complete eukaryotic genome. To promote research in translational control of protein synthesis, TransTerm has been converted into a relational database to ease the process of making queries. The relational database manager, Postgresql, gives access to the database using SQL (Structured Query Language). A World Wide Web interface using forms is being completed to allow the casual user access to the database. Extensions are planned to include the full 5'-UTR, full coding sequence and 3'-UTR. TransTerm-97 is available on the World Wide Web at:http://biochem. otago.ac.nz:800/Transterm/homepage.html  相似文献   

11.
In the context of the international project aimed at sequencing the whole genome of Bacillus subtilis we have developed a non-redundant, fully annotated database of sequences from this organism. Starting from the B.subtilis sequences available in the EMBL, GenBank and DDBJ collections we have removed all encountered duplications and then added extra annotations to the sequences (e.g. accession numbers for the genes, locations on the genetic map, codon usage, etc.) We have also added cross-references to the EMBL, MEDLINE, SWISS-PROT and ENZYME data banks. The present system results from merging of the NRSub and SubtiList databases and the sequence contigs used in the two systems are identical. NRSub is distributed as a flatfile in EMBL format (which is supported by most sequence analysis software packages) and as an ACNUC database, while SubtiList is distributed as a relational database under 4th Dimension. It is possible to access the data through two dedicated World Wide Web servers located in France and Japan.  相似文献   

12.
We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and straight phi,psi angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for each entry are selected on the basis of exact matches of the source organism and cell environment. The current version 1.0 of ISSD is available on the WWW at http://www.protein.bio.msu.su/issd/ and includes 107 non-homologous mammalian proteins, of which 80 are human proteins. The database has been used by us for the analysis of synonymous codon usage patterns in mRNA sequences showing their correlation with the three-dimensional structure features in the encoded proteins. Possible ISSD applications include optimisation of protein expression, improvement of the protein structure prediction accuracy, and analysis of evolutionary aspects of the nucleotide sequence-protein structure relationship.  相似文献   

13.
Plant chitinase consensus sequences   总被引:6,自引:0,他引:6  
Eighty-six plant chitinase sequences from 29 different species and one hybrid were obtained from the on-line GenBank nucleotide database. These sequences were grouped into five gene families based on previously published guidelines (Meins et al., 1994), and the amino-acid and nucleotide sequences of each gene family were aligned. Consensus amino-acid and nucleotide sequences were derived for each gene family based on the alignments. The consensus sequences were analyzed to determine, their amino-acid composition, hydropathy profiles, and codon usage.  相似文献   

14.
Burkholderia pseudomallei is a recognized biothreat agent and the causative agent of melioidosis. Codon usage biases of all protein-coding genes (length greater than or equal to 300 bp) from the complete genome of B. pseudomallei K96243 have been analyzed. As B. pseudomallei is a GC-rich organism (68.5%), overall codon usage data analysis indicates that indeed codons ending in G and/or C are predominant in this organism. But multivariate statistical analysis indicates that there is a single major trend in the codon usage variation among the genes in this organism, which has a strong positively correlation with the expressivities of the genes. The majority of the lowly expressed genes are scattered towards the negative end of the major axis whereas the highly expressed genes are clustered towards the positive end. At the same time, from the results that there were two significant correlations between axis 1 coordinates and the GC, GC3s content at silent sites of each sequence, and clearly significant negatively correlations between the ‘Effective Number of Codons’ values and GC, GC3s content, we inferred that codon usage bias was affected by gene nucleotide composition also. In addition, some other factors such as the lengths of the genes as well as the hydrophobicity of genes also influence the codon usage variation among the genes in this organism in a minor way. At the same time, notably, 21 codons have been defined as ‘optimal codons’ of the B. pseudomallei. In summary, our work have provided a basic understanding of the mechanisms for codon usage bias and some more useful information for improving the expression of target genes in vivo and in vitro. Sheng Zhao and Qin Zhang contributed equally to this work.  相似文献   

15.
Long Open Reading Frames (ORFs) in antisense DNA strands have been reported in the literature as being rare events. However, an extensive analysis of the GenBank database revealed that a substantial number of genes from several species contain an in-phase ORF in the antisense strand, that overlaps entirely the coding sequence of the sense strand, or even extends beyond. The findings described in this paper show that this is a frequent, non-random phenomenon, which is primarily dependent on codon usage, and to a lesser extent on gene size and GC content. Examination of the sequence database for several prokaryotic and eukaryotic organisms, demonstrates that coding sequences with in-phase, 100% overlapping antisense ORFs are present in every genome studied so far.  相似文献   

16.
A lambdaZAP Express cDNA library was constructed with mRNA obtained from immature miracidia within eggs, hatched miracidia, and sporocysts of Echinostoma paraensei. This cDNA library was amplified and 213 expressed sequence tag (EST) sequences (averaging 466 nucleotides in length) were obtained. The mean percentage of unresolved bases within the EST sequences was 0.4%, ranging from 0 to 4.6%. The 213 ESTs represent 151 unique messages. BLAST (version 2.0.8) analysis disclosed that 64 unique E. paraensei messages (42.4%) had significant similarities (BLAST score < or =e-5), at deduced amino acid or nucleotide levels, with known sequences in the nonredundant GenBank databases or the dbEST database (NCBI). The remainder, 57.6% of the unique EST-encoded messages, scored nonsignificant hits. Most of the E. paraensei messages that could be assigned a cellular role based on sequence similarities were involved in gene/protein expression. Several ESTs scored highest similarities with sequences obtained from trematode species. A total of 22,560 nucleotides present in open reading frames from ESTs that aligned with known sequences was used to determine codon usage for E. paraensei. Analysis of a subset of eight ESTs that contained full-length open reading frames did not reveal a bias in codon usage. Also, EST sequences were found to contain 3' untranslated regions with an average length of 69.9 +/- 88.4 nucleotides (n = 46). The EST sequences were submitted to GenBank/dbEST, adding to the 51 available Echinostoma-derived sequences, to provide reference information for both phylogenetic analysis and study of general trematode biology.  相似文献   

17.
In the context of the international project aiming at sequencing the whole genome of Bacillus subtilis we have developed NRSub, a non-redundant database of sequences from this organism. Starting from the B.subtilis sequences available in the repository collections we have removed all encountered duplications, then we have added extra annotations to the sequences (e.g. accession numbers for the genes, locations on the genetic map, codon usage index). We have also added cross-references with EMBL/GenBank/DDBJ, MEDLINE, SWISS-PROT and ENZYME databases. NRSub is distributed through anonymous FTP as a text file in EMBL format and as an ACNUC database. It is also possible to access the database through two dedicated World Wide Web servers located in France (http://acnuc.univ-lyon1.fr/nrsub/nrsub.++ +html ) and in Japan (http://ddbjs4h.genes.nig.ac.jp/ ).  相似文献   

18.
An ab initio model for gene prediction in prokaryotic genomes is proposed based on physicochemical characteristics of codons calculated from molecular dynamics (MD) simulations. The model requires a specification of three calculated quantities for each codon: the double-helical trinucleotide base pairing energy, the base pair stacking energy, and an index of the propensity of a codon for protein-nucleic acid interactions. The base pairing and stacking energies for each codon are obtained from recently reported MD simulations on all unique tetranucleotide steps, and the third parameter is assigned based on the conjugate rule previously proposed to account for the wobble hypothesis with respect to degeneracies in the genetic code. The third interaction propensity parameter values correlate well with ab initio MD calculated solvation energies and flexibility of codon sequences as well as codon usage in genes and amino acid composition frequencies in ∼175,000 protein sequences in the Swissprot database. Assignment of these three parameters for each codon enables the calculation of the magnitude and orientation of a cumulative three-dimensional vector for a DNA sequence of any length in each of the six genomic reading frames. Analysis of 372 genomes comprising ∼350,000 genes shows that the orientations of the gene and nongene vectors are well differentiated and make a clear distinction feasible between genic and nongenic sequences at a level equivalent to or better than currently available knowledge-based models trained on the basis of empirical data, presenting a strong support for the possibility of a unique and useful physicochemical characterization of DNA sequences from codons to genomes.  相似文献   

19.
TransTerm: a database of translational signals.   总被引:3,自引:0,他引:3       下载免费PDF全文
The TransTerm database of sequence contexts of stop and start codons has been expanded to include approximately 50% more species than last year's release. It now contains 148 organisms and >39 500 coding sequences; it is now available on the World Wide Web. The database includes: (i) initiation and termination sequence contexts organized by species; (ii) summary parameters about the individual sequences (sequence length, GC%, GC3, Nc, CAI) in addition to tables of base frequencies for each species' stop and start codon sequence context; (iii) species codon usage tables; and (iv) summary tables of stop signal frequency.  相似文献   

20.
《Journal of molecular biology》2019,431(13):2434-2441
Usage of sequential codon-pairs is non-random and unique to each species. Codon-pair bias is related to but clearly distinct from individual codon usage bias. Codon-pair bias is thought to affect translational fidelity and efficiency and is presumed to be under the selective pressure. It was suggested that changes in codon-pair utilization may affect human disease more significantly than changes in single codons. Although recombinant gene technologies often take codon-pair usage bias into account, codon-pair usage data/tables are not readily available, thus potentially impeding research efforts. The present computational resource (https://hive.biochemistry.gwu.edu/review/codon2) systematically addresses this issue. Building on our recent HIVE-Codon Usage Tables, we constructed a new database to include genomic codon-pair and dinucleotide statistics of all organisms with sequenced genome, available in the GenBank. We believe that the growing understanding of the importance of codon-pair usage will make this resource an invaluable tool to many researchers in academia and pharmaceutical industry.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号