首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
TransTerm-97 contains more than 97 500 non-redundant coding-sequence initiation and termination contexts compiled from GenBank, release 101 (15-June-1997). In addition, several coding sequence parameters are available: coding sequence length, Nc, GC3, and, when it is computable, codon adaptation index (CAI). Codon usage tables and summaries of start and stop codon contexts are also included. The information covers more than 325 species and organelles, including seven complete bacterial genomes and one complete eukaryotic genome. To promote research in translational control of protein synthesis, TransTerm has been converted into a relational database to ease the process of making queries. The relational database manager, Postgresql, gives access to the database using SQL (Structured Query Language). A World Wide Web interface using forms is being completed to allow the casual user access to the database. Extensions are planned to include the full 5'-UTR, full coding sequence and 3'-UTR. TransTerm-97 is available on the World Wide Web at:http://biochem. otago.ac.nz:800/Transterm/homepage.html  相似文献   

2.
TransTerm is a database of initiation and termination sequence contexts from more than 250 organisms listed in GenBank, including the four complete genomes:Haemophilus influenzae, Methanococcus jannaschii, Mycoplasma genitalium,and Saccharomyces cerevisiae. For the current release, more than 60 000 coding sequences were analysed. The tabulated data include initiation and termination contexts organised by species along with quantitative parameters about individual coding sequences (length, %GC, GC3, Nc and CAI). There are also tables of initiation- and termination-region nucleotide-frequencies, codon usage tables and summaries of stop signal usage. TransTerm is available on the World Wide Web at: http://biochem.otago.ac.nz:800/Transterm/homepage.h tml  相似文献   

3.
TransTerm: a database of translational signals.   总被引:3,自引:0,他引:3       下载免费PDF全文
The TransTerm database of sequence contexts of stop and start codons has been expanded to include approximately 50% more species than last year's release. It now contains 148 organisms and >39 500 coding sequences; it is now available on the World Wide Web. The database includes: (i) initiation and termination sequence contexts organized by species; (ii) summary parameters about the individual sequences (sequence length, GC%, GC3, Nc, CAI) in addition to tables of base frequencies for each species' stop and start codon sequence context; (iii) species codon usage tables; and (iv) summary tables of stop signal frequency.  相似文献   

4.
The TransTerm database of termination codon contexts has been extended to include sense codon usage, and initiation codon contexts. The database was constructed from 23,721 coding sequences from 93 organisms. The database contains: a) the sequence around the termination codon (-10, +10); b) the sequence around the initiation codon (-20, +10); c) the length, 'G+C%' of the third position of codons (GC3), the 'codon adaptation index' (CAI) and the 'effective number of codons' statistic (Nc); d) summary tables for each organism including total codon usage, stop codon and tetranucleotide stop-signal usage, and matrices tallying base frequencies at each position around the initiation and termination codons. The data are arranged to facilitate investigation of the relationships between the three phases of protein synthesis. The database is available electronically from EMBL.  相似文献   

5.
MamMiBase, the mammalian mitochondrial genome database, is a relational database of complete mitochondrial genome sequences of mammalian species. The database is useful for phylogenetic analysis, since it allows a ready retrieval of nucleotide and aminoacid individual alignments, in three different formats (NEXUS for PAUP program, for MEGA program and for PHYLIP program) of the 13 protein coding mitochondrial genes. The user may download the sequences that are useful for him/her based on their parameters values, such as sequence length, p-distances, base content, transition transversion ratio, gamma, which are also given by MamMiBase. A simple phylogenetic tree (neighbor-joining tree with Jukes Cantor distance) is also available for download, useful for parameter calculations and other simple tasks. AVAILABILITY: MamMiBase is available at http://www.mammibase.lncc.br  相似文献   

6.
The synthesis of complete genes is becoming a more and more popular approach in heterologous gene expression. Reasons for this are the decreasing prices and the numerous advantages in comparison to classic molecular cloning methods. Two of these advantages are the possibility to adapt the codon usage to the host organism and the option to introduce restriction enzyme target sites of choice. C.U.R.R.F. (Codon Usage regarding Restriction Finder) is a free Java(?)-based software program which is able to detect possible restriction sites in both coding and non-coding DNA sequences by introducing multiple silent or non-silent mutations, respectively. The deviation of an alternative sequence containing a desired restriction motive from the sequence with the optimal codon usage is considered during the search of potential restriction sites in coding DNA and mRNA sequences as well as protein sequences. C.U.R.R.F is available at http://www.zvm.tu-dresden.de/die_tu_dresden/fakultaeten/fakultaet_mathematik_und_naturwissenschaften/fachrichtung_biologie/mikrobiologie/allgemeine_mikrobiologie/currf .  相似文献   

7.
By searching the current protein sequence databases using sequences from human and chicken histones H1/H5, H2A, H2B, H3 and H4, a database of aligned histone protein sequences with statistically significant sequence similarity to the search sequence was constructed. In addition, a nucleotide sequence database of the corresponding coding regions for these proteins has been assembled. The region of each of the core histones containing the histone fold motif is identified in the protein alignments. The database contains >1300 protein and nucleotide sequences. All sequences and alignments in this database are available through the World Wide Web at http://www.ncbi.nlm.nih.gov/Baxevani/HISTO NES.  相似文献   

8.
The translational termination signal database.   总被引:12,自引:5,他引:7       下载免费PDF全文
The Translational Termination Database (TransTerm) consists of the immediate context sequences around the natural termination codons from 45 organisms, and summary tables. The influence of termination codon context on their effectivness as stop signals has been widely documented. The SPECIES--TRI.DAT table shows trinucleotide stop codon usage in each organism and for comparison the occurrence of these sequences in the noncoding region. The SPECIES--TETRA.DAT table contains is a similar table of tetranucleotide stop signal usage. The database is available from EMBL.  相似文献   

9.
Simple sequence repeats are predominantly found in most organisms. They play a major role in studies of genetic diversity, and are useful as diagnostic markers for many diseases. The simple sequence repeats database (SSRD) for the human genome was created for easy access to such repeats, for analysis, and to be used to understand their biological significance. The data includes the abundance and distribution of SSRs in the coding and non-coding regions of the genome, as well as their association with the UTRs of genes. The exact locations of repeats with respect to genomic regions (such as UTRs, exons, introns or intergenic regions) and their association with STS markers are also highlighted. The resource will facilitate repeat sequence analysis in the human genome and the understanding of the functional and evolutionary significance of simple sequence repeats. SSRD is available through two websites, http://www.ccmb.res.in/ssr and http://www.ingenovis.com/ssr.  相似文献   

10.
CRITICA: coding region identification tool invoking comparative analysis.   总被引:34,自引:0,他引:34  
Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).  相似文献   

11.
Zhou H  Hickford JG  Fang Q 《Immunogenetics》2005,57(6):453-457
Genetic variation in immunoglobulin A, the most abundant immunoglobulin in mammalian cells, has not been reported in ruminants. In this study, variation in the immunoglobulin heavy alpha chain constant gene (IGHA) of sheep was investigated by amplification of a fragment that included the hinge coding sequence, followed by single-strand conformational polymorphism (SSCP) analysis and DNA sequencing. Three novel sequences, each characterized by unique SSCP banding patterns, were identified. One or two sequences were detected in individual sheep and all the sequences identified shared high homology to the published ovine and bovine IGHA sequences, suggesting that these sequences represent allelic variants of the IGHA gene in sheep. Sequence alignment showed that these sequences differed mainly in the 3′ end of exon 1 and in the coding sequence of the hinge region. There was either a deletion or an insertion of two codons in the hinge coding region in these allelic variants. Codon usage in the hinge coding region was quite different from that in the non-hinge coding regions of the gene, suggesting different evolution of the IGHA hinge sequence. Three novel amino acid sequences of ovine IGHA were also predicted, and variation in these sequences might not only affect antigen recognition but also susceptibility to cleavage by bacterial or parasitic proteases. Nucleotide sequence data reported in this paper have been submitted to the NCBI GenBank nucleotide sequence database and have been assigned the accession nos. AY956424–AY956426.  相似文献   

12.
Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment.We present an algorithm that has the same space and time complexity as the classical Needleman-Wunsch algorithm while accommodating sequencing errors and other biological deviations from the coding frame. The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons). MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences (pseudogenes) without disrupting the underlying codon structure. It has also proved useful in detecting undocumented frameshifts in public database sequences and in aligning next-generation sequencing reads/contigs against a reference coding sequence.MACSE is distributed as an open-source java file executable with freely available source code and can be used via a web interface at: http://mbb.univ-montp2.fr/macse.  相似文献   

13.
J L Weber 《Gene》1987,52(1):103-109
The genome of the human malaria parasite Plasmodium falciparum has an A + T content of about 82%, higher than any other organism whose DNA has been characterized. Computer analysis of 36 kb of available nucleotide sequences from this species showed that the coding regions, with an A + T content of 69.0%, are flanked by more A + T-rich regions of 86.0% A + T. Within the coding sequences, the A/T ratio was 1.68 in the mRNA sense strand, and overall A + T content in the three codon positions increased in the order 1st-2nd-3rd position. Codons with T or especially A in the third position were strongly preferred. Codon usage among individual parasite genes was very similar compared to genes from other species. Dinucleotide frequencies for the parasite DNA were close to those expected for a random sequence with the known base composition, except that the CpG frequency in the coding sequences was low.  相似文献   

14.
A total of 10,154 5'-end expressed sequence tags (EST) were established from the normalized and size-selected cDNA libraries of a marine red alga, Porphyra yezoensis. Among the ESTs, 2140 were unique species, and the remaining 8014 were grouped into 1127 species. Database search of the 3267 non-redundant ESTs by BLAST algorithm showed that the sequences of 1080 species (33.1%) have similarity to those of registered genes from various organisms including higher plants, mammals, yeasts, and cyanobacteria, while 2187 (66.9%) are novel. Codon usage analysis in the coding regions of 101 non-redundant EST groups showing significant similarity to known genes indicated the higher GC contents at the third position of codons (79.4%) than the first (62.2%) and the second position (45.0%), suggesting that the genome has been exposed to high GC pressure during evolution. The sequence data of individual ESTs are available at the web site http://www.kazusa.or.jp/en/plant/porphyra/EST/.  相似文献   

15.
We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and straight phi,psi angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for each entry are selected on the basis of exact matches of the source organism and cell environment. The current version 1.0 of ISSD is available on the WWW at http://www.protein.bio.msu.su/issd/ and includes 107 non-homologous mammalian proteins, of which 80 are human proteins. The database has been used by us for the analysis of synonymous codon usage patterns in mRNA sequences showing their correlation with the three-dimensional structure features in the encoded proteins. Possible ISSD applications include optimisation of protein expression, improvement of the protein structure prediction accuracy, and analysis of evolutionary aspects of the nucleotide sequence-protein structure relationship.  相似文献   

16.
17.
18.
Finding microRNA targets in the coding region is difficult due to the overwhelming signal encoding the amino acid sequence. Here, we introduce an algorithm (called PACCMIT-CDS) that finds potential microRNA targets within coding sequences by searching for conserved motifs that are complementary to the microRNA seed region and also overrepresented in comparison with a background model preserving both codon usage and amino acid sequence. Precision and sensitivity of PACCMIT-CDS are evaluated using PAR-CLIP and proteomics data sets. Thanks to the properly constructed background, the new algorithm achieves a lower rate of false positives and better ranking of predictions than do currently available algorithms, which were designed to find microRNA targets within 3′ UTRs.  相似文献   

19.
20.
Most research concerning the evolution of introns has largely considered introns within coding sequences (CDSs), without regard for introns located within untranslated regions (UTRs) of genes. Here, we directly determined intron size, abundance, and distribution in UTRs of genes using full-length cDNA libraries and complete genome sequences for four species, Arabidopsis thaliana, Drosophila melanogaster, human, and mouse. Overall intron occupancy (introns/exon kbp) is lower in 5' UTRs than CDSs, but intron density (intron occupancy in regions containing introns) tends to be higher in 5' UTRs than in CDSs. Introns in 5' UTRs are roughly twice as large as introns in CDSs, and there is a sharp drop in intron size at the 5' UTR-CDS boundary. We propose a mechanistic explanation for the existence of selection for larger intron size in 5' UTRs, and outline several implications of this hypothesis. We found introns to be randomly distributed within 5' UTRs, so long as a minimum required exon size was assumed. Introns in 3' UTRs were much less abundant than in 5' UTRs. Though this was expected for human and mouse that have intron-dependent nonsense-mediated decay (NMD) pathways that discourage the presence of introns within the 3' UTR, it was also true for A. thaliana and D. melanogaster, which may lack intron-dependent NMD. Our findings have several implications for theories of intron evolution and genome evolution in general.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号